The GPU itself is built on a slightly different production process to the professional Ampere chips too, with the 7nm TSMC design being ditched for the second-gen RTX cards in favour of Samsung's 8nm node, codenamed 8N. Compare that last with the RTX 2080 Ti and its miserly 4,352 CUDA cores and you can see why the new $499 card is beating Turing's top $1,200 GPU. For now though, suffice to say the RTX 3090 is rocking 10,496 CUDA cores, the RTX 3080 has 8,704, and the RTX 3070 has 5,888. Whatever the fine details, we'll know more of everything when the whitepapers and architecture deep dives have been published. Nvidia RTX 30-Series confirmed specs Graphics Card Name But you can expect to see some shiny, innovative cooling solutions (opens in new tab) dropped on the new Ampere GPUs pretty rapidly. The factory-overclocked versions might be a little later, so they may not arrive until a month later than the standard cards. Normally you'd expect there to be a month delay on the AIB cards, but I've got a feeling we'll see at least the reference clock cards just after the main launch. Those are the stated release dates for the Founders Edition reference cards from Nvidia, but what we don't know is how quickly the add-in board partners (AIBs) are going to get their own myriad designs out of the factories and into the shops. This is arguably the most intriguing Ampere card for gamers because it's offering straight performance that is higher than the RTX 2080 Ti for less than half the price. The final piece of the initial jigsaw is the RTX 3070, with a far more vague release date of sometime in October. This is the new ultra-enthusiast class GPU that Nvidia is noting as effectively the new, more widely available Titan card. The RTX 3090 release date is September 24, following hot on the heels of the far more reasonably priced card. It's the card that Jen-Hsun Huang referred to as "our new flagship GPU" and so it's the one being put front and centre with the new second-gen RTX launch. The RTX 3080 release date is September 17, making it the first of the new Nvidia Ampere GeForce GPUs to hit the shelves. The other cards are more reasonable, with the RTX 3080 coming in at $699 and the impressive-looking RTX 3070 at $499. But who wants to stand still? The RTX 3090 is reportedly 1.5x faster than the Titan RTX, the RTX 3080 is up to twice as fast as the RTX 2080, and the RTX 3070 can outperform the RTX 2080 Ti.Īt the top end of the stack the RTX 3090 is a massive $1,499 for the reference card. The green team is touting a 1.9x performance-per-Watt improvement over Turing, which means that to hit 60fps you'll need almost half the power. The memory capacity is also relatively chunky with 24GB, 10GB, and 8GB of either GDDR6X or GDDR6 respectively. With twice the FP32 units, the RTX 3090 has 10,496 CUDA cores, the RTX 3080 has 8,704, and the RTX 3070 has 5,888 CUDA cores inside it. The new GeForce RTX Ampere GPUs come with an astounding core count, double what was originally rumoured.
The most affordable of the three new Ampere GPUs, the RTX 3070 is slated for launch in October. The flagship Nvidia RTX 3080 is being launched first, on September 17, with the Titan-eqsue RTX 3090 coming later on September 24. From what I understand, high FP16x2 throughput is needed primarily for the training phase, for which relevant major customers presumably buy high-end GPUs, and it would be beneficial to NVIDIA to keep it that way.Though, as the rumour mill is wont to do, future cards are being bandied around in the form of the 1 6GB RTX 3070 Ti (opens in new tab), a potential stop-gap card in between the RTX 3080 and RTX 3070. The motivation for this approach would be the same as in the case of double precision: Differentiate parts by target market, in offering small die, low power, low cost GPUs for the mass market, and big die, higher power, more fully featured, high cost GPUs for specialized markets. The most straightforward hypothesis is that the low throughput is simply due to the use of a tiny “native FP16x2” unit that is provided for architectural compatibility, in the same way this is done for double precision units. Using a scalar FP16 unit to emulate FP16x2 SIMD via simple state machine would certainly be possible the emulation of wider SIMD via narrower SIMD has been used extensively in x86 processors, particularly for first generation implementations. So emulation of FP16 via FP32 seems unlikely, and any microcode implementation seems very unlikely since GPUs typically do not have the machinery for that. FMA on a narrow floating-point type isn’t trivially emulated by FMA on a wider floating-point type, at least not if you need to get denormal results correct, and as far as I know NVIDIA provides FP16 with denormal support.