NVIDIA introduced the Turing RT Cores alongside their first-generation Ray Tracing, or RTX as they say. These small nuclei are vital for ray tracing, so we have decided to address their operation and concept, both in Turing (1st generation) and Ampere (2nd generation).
They debuted with the RTX 2000 and are essential on any NVIDIA graphics card to be able to play using Ray Tracing. After the introduction of the Ampere architecture, NVIDIA showed the 2nd generation of the RT Cores, which came with certain new features for performance purposes. We are going to analyze what they are and what they are used for, as well as see the main differences with Ampere.
What are RT Cores
The Ray Tracing in real-time lands for consumers in 2018 with the Turing range, but there is a key to all this: the RT Cores. After 10 years of intense work at NVIDIA to bring Ray Tracing to light in real-time, RT Cores are the essential pillar for 3 reasons:
- Hybrid rendering pipeline.
- Algorithms to eliminate noise.
- Efficient BVH algorithms.
Simply explained, RT Cores are a piece of hardware that NVIDIA implements in its RTX GPUs to accelerate Ray Tracing in real-time. The RT Core adds a dedicated pipeline (ASIC) to the SM to calculate the intersection of rays and triangles, being able to access the BVH (an algorithm). BVH is used to preprocess a scene in order to speed up the process of finding the closest triangle to light (sun, light bulb, etc.).
It is thanks to BVH that the number of triangles that must be crossed to calculate the closest intersection of each ray is reduced. This makes sense when a scene has a lot of light, and can only be pre-processed once.
This unburdens much to the SM, preventing spend thousands of instructional spaces by lightning. An RT core is made up of 2 specialized units:
- One performs the bounding box tests.
- The second performs tests of the intersection of rays and triangles to report whether the results are correct to the SM.
We find it useful to know how ray tracing works in a concise way:
- A test ray probe (Ray Probe) is launched.
- A box is searched for, decoded, and intersection tests are made.
- Next, a ray intersection test is done.
- The shock is returned.
- Proceed to shade it.
What is the function of the RT Core? Specifically, functions 2, 3 and 4 of the process listed above. Thanks to the RT Cores, the result is returned and allows the shader to apply it. On the other hand, it manages groupings and scheduling of memory operations to try to maximize memory performance in relation to rays.
The RT Cores work like a conventional offload IP block, as the instructions directed to the RT Cores are routed out of the sub-cores. An RT Core receives a beam probe from the SM and autonomously begins to traverse the BVH to perform beam intersection tests.
Within Ray Tracing, memory bandwidth is a bottleneck issue, something that NVIDIA has studied extensively. The problem with the workload derived from Ray Tracing is that it results in irregular or random access to memory, why? Because of the incoherent rays, and that thousands of calculations are made. This is where the API (DXR) comes in, which manages the Ray Tracing application.
Therefore, the idea of NVIDIA was to implement a hardware with a special architecture for special calculations / algorithms in order to achieve more FPS in Ray Tracing. Therefore, there are 2 algorithms to speed up Ray Tracing: BVH (Bounding Volume Hierachy) and Ray Packet Tracing.
Finally, NVIDIA calculates the power or performance of the RT Cores using the " Giga Rays per second " ratio.
Ampere vs Turing, what's new in these RT cores?
In the previous section, the mission was to explain roughly how Ray Tracing works by relating it to the RT Cores, the SM, the BVH algorithm, etc. Now, it is time to see what the 2nd generation RT Cores that equip the RTX 3000 improve. Some of you may ask, why? For Ray Tracing performance.
In our review of the RTX 3060, we saw how it lagged behind the RTX 2070 and 2080 in Ray Tracing at 1080p in some games; however, we saw the 3060 Ti outperform the RTX 2080 Super. Let's see which RT Cores each have and then dig deeper:
RTX 3060 | RTX 2070 | RTX 3060 Ti | RTX 2070 SUPER | RTX 2080 SUPER | |
RT Cores | 28 | 36 | 38 | 40 | 48 |
Actually, the RTX 3060 Ti outperforms the RTX 2070 and 2080 SUPER having fewer RT Cores, is it because the 2nd generation of these cores is more powerful? NVIDIA doesn't go into much detail on how it calculates RT TFLOPs, but Ampere's RT Cores are twice as fast as Turing's.
One of the reasons why the performance of the Ampere RT Cores would have been improved would be by adding additional computational units. In their presentation, NVIDIA discussed an improved MIMD (execution unit) and new triangle interpolation routines, which help with motion blur.
In addition, the Ampere architecture adds support for concurrent RT graphics workloads (RT + compute), which enables improved Ray Tracing performance overall. This would be one of the reasons why the RTX 3060 Ti can outperform the Turing RTX 2070 and 2080 by having fewer RT Cores.
However, that "double performance" of the 2nd generation RT Cores is not seen in the same way in the RTX 3060, even though they have 12 GB GDDR6. Well, you have to take into account the amount of TMUs and ROPs that these graphics cards have, not to mention the bandwidth.
RTX 3060 | RTX 2070 | RTX 3060 Ti | RTX 2070 SUPER | RTX 2080 SUPER | |
TMUs | 112 | 144 | 152 | 160 | 192 |
ROPs | 48 | 64 | 80 | 64 | 64 |
Bandwidth | 192-bit | 256-bit | 256-bit | 256-bit | 256-bit |
We hope this information has been helpful to you. If you have any questions, comment below and we will answer your questions.
Did you know the differences in architectural performance? Did you know how the RT Cores worked?
0 Comments