First of all, take this with great skepticism, although the information leaker was correct in his day about the specifications of the RTX 3000 and the NVIDIA A100, it is also possible that he has received counter-information from NVIDIA with In order to protect the specification plans of the next generation of NVIDIA GeForce, in any case we will inform you about them and we will comment on the possible changes that NVIDIA could make to reach them.
Remember that according to rumors, Lovelace is the next NVIDIA architecture, which will be released from 2022 as the earliest date and under a 5 nm node, of which we do not know if it will be from TSMC or Samsung but everything points to the first. In any case be skeptical even in these because nobody expected the RTX 3000 in the 8 nm node of Samsung.
NVIDIA's possible spectacular leap from Ampere to Lovelace
According to the insider Kopite7kimi on his twitter, the AD102 chip under the GeForce Lovelace architecture will have a 12 * 6 structure instead of a 7 * 6 like the GA102, if this sounds like Chinese to you, it is easy to explain it using the diagram from the GA102, the GPU used in the RTX 3080 to the RTX 3090 and all variants in between.
The NVIDIa GA102 has 7 GPCs, each GPC has 6 units that are called TPCs and each one of them includes 2 SMs. What does Kopite refer to in your information? Well, we are going to have a total of 12 GPCs with 6 TPC each. Which would be an indication of how NVIDIA for the next generation would seek to place the largest number of units in the chip area. but there are elements that make us skeptical.
The main reason is that NVIDIA does not usually look for as many elements as possible in the CPU, but what it seeks is to increase the capacity of each one of them. They are currently under the roadmap to achieve full Ray Tracing support and despite the power of the RTX 3000, we still haven't seen support for things like Ray Tracing coherence and even frame acceleration units. of data.
NVIDIA Lovelace is going to need a new interconnect fabric
The Kopite specs indicate a very large improvement in the interconnections that the different GPCs communicate and the elements to which they are connected, such as the L2 cache, since one of the reasons why the number of cores of a processor is due to the amount of power required to increase the number of interconnections, so fewer but more powerful cores are preferred.
We do not know if NVIDIA has managed in any way that the interconnection that joins the different GPCs with the L2 Cache and these with each other allows them a spectacular leap that places the configuration of this GPU from 84 SM to 144 SM in one generation, it would be the biggest leap of all NVIDIA ever made in this regard.
The other possibility is that the 5nm node does not allow for clock speed increases as large as expected and this makes it necessary to increase the number of SM drives in the GPU, but this is required by NVIDIA remake the entire interconnection structure inside the GPU and overcome the handicaps that they have in terms of energy consumption at the moment when transmitting data within a processor.
In any case, we are skeptical, we can believe that NVIDIA makes profound changes in SMs and there is a list of things that it could do, but the increase in the number of SMs seems to us today a too exaggerated jump without changes. on the rest of the GPU.
NVIDIA Lovelace, not so monolithic?
A year ago NVIDIA introduced an experimental chip called RC-18, of which what I especially highlight is called GRS or Ground Reference Signaling, a type of vertical interconnection that NVIDIA used to communicate several chiplets within an MCM.
The idea is that each chip has 4 complete communication channels (North, East, South, West) and the transmitter and receiver of each channel, so we would be talking about a NoC type configuration in which each element can communicate in a direct with the 4 around him.
In the NVIDIA example several chiplets were used, but this does not mean that it is not possible to do it with an apparently monolithic GPU, since the GRS communicate through an Interposer underneath, so in appearance the chip would appear to be a monolithic one but it would be a 3DIC composition in which the intercom structure would be on a chip placed at the bottom.
VRAM beyond GDDR6X
What we can not forget about these specifications is that this GPU monster is going to require a type of VRAM memory with a high bandwidth in order to be able to feed it at a constant speed, and even the GDDR6X we do not see feeding a monster of similar characteristics. Are we going to see NVIDIA roll out its FG-DRAM memory for the first time on the market with this GPU?
The problem with the FG-DRAM is that it is a type of memory designed to function like the HBM so it would be very expensive, another possibility would be that, as with the RTX IO, data decompression is supported on the fly. with RAM, but we are talking about 50 times faster bandwidths and this decompression speed may not be achieved in real-time.
That is why NVIDIA Lovelace could come with a new type of VRAM memory, be it the FG-DRAM or another type we do not think so but the GDDR6X falls short before such a monster if the technical specifications that have been leaked are true.
0 Comments