NVIDIA GeForce RTX 30 series notebook GPU analysis: Stronger performance, stronger AI


In addition to the new generation of dessert card GeForce RTX 3060 announced by NVIDIA at CES 2021, the key product is actually GeForce RTX 30 series notebook GPUs. In fact, in the desktop market, we have already seen NVIDIA's new generation of GeForce RTX 30 series GPUs. Compared with the GeForce RTX 20 series, the performance increase is doubled, thanks to the leap-forward progress brought by the new NVIDIA Ampere architecture. And now the GeForce RTX 30 series notebooks will also be an all-around speed increase.

NVIDIA’s first GeForce RTX 30 series notebook GPUs include GeForce RTX 3080, RTX 3070 and RTX 3060. At the CES 2021 conference, NVIDIA stated that the GeForce RTX 3080 and RTX 3070 are targeted for 1440p resolution gaming. The former is available at When the ray tracing technology is turned on, it provides 100+ FPS frame performance, while the latter can provide the highest picture quality performance of 90FPS. At present, gaming notebooks equipped with these two notebook GPUs are already on sale, while GeForce RTX 3060 notebook GPUs It is geared towards 1080p resolution games, which can achieve an average of 90 frames in the highest picture quality.

Comparison of Specifications of GeForce RTX 30 Laptop Series

modelRTX 3080 LaptopRTX 3070 LaptopRTX 3060 Laptop
GPU architectureAmpere
CUDA Cores614451203840
Tensor CoresThird Generation
RT CoresSecond generation
Boost frequency (MHz)1245-17101290-16201283-1703
TGP (W)80-150+80-12560-115
Video memory16GB GDDR6
8GB GDDR6
8GB GDDR66GB GDDR6
Video memory bit width (bit)256256192
PCI-E4.0
Super Network Production

At present, NVIDIA GeForce RTX 3080/3070 Laptop GPU is the best choice for 1440p resolution games on gaming laptops. It can provide the highest picture quality while bringing smooth frame rates, and now the new features of NVIDIA Ampere architecture are also applied to laptops, including NVIDIA Reflex, NVIDIA Broadcast, NVIDIA Studio, etc., as well as the newly added third-generation Max-Q technology and Resizable BAR technology.

NVIDIA Ampere architecture analysis

GeForce RTX 3080 and RTX 3070 notebook GPUs use GA104 cores, while GeForce RTX 3060 notebook GPUs use GA106. The details of the latter have not yet been announced, while the former are used by the desktop versions of GeForce RTX 3070 and RTX 3060 Ti. The GPU, with a core area of ​​392.5mm 2 and a transistor number of 17.4 billion, is produced using Samsung’s 8nm process customized for NVIDIA.

The GeForce RTX 3080 laptop GPU uses the full version of the GA104 core, with 6 groups of GPC, each group has 4 groups of TPC, a total of 24 groups of TPC, each group of TPC contains 2 groups of SM (Streaming Multiprocess), so a total of 48 SM (Streaming Multiprocess), each group has 128 CUDAs, a total of 6144 CUDAs, 8 groups of 32-bit video memory controllers form a 256-bit video memory bit width.

GeForce RTX 3080 notebook GPU only enables 40 SM (Streaming Multiprocess), a total of 5120 CUDA, but still retains the 256bit video memory width.

2 times the performance of the second generation RTX SM (Streaming Multiprocess)

In August 2018, NVIDIA introduced its RTX concept when it launched the RTX 20 series of graphics cards, and introduced real-time ray tracing and AI computing to the GPU. It's SM (Streaming Multiprocess) can be said to have undergone earth-shaking changes. NVIDIA focuses on improving the performance of the entire SM (Streaming Multiprocess) on the NVIDIA Ampere architecture. Although there are no major changes in the structure, the performance is no longer the same. There are three main improvements. The FP32 unit for traditional graphics computing is doubled, and the second generation RT Core and the third generation Tensor Core are introduced.

NVIDIA Ampere architecture SM (Streaming Multiprocess) has twice the performance of Turing architecture

GA100 (left) vs. GA102 (right)

Double FP32 unit, double happiness

On the NVIDIA Turing architecture, NVIDIA introduced the concept of sub-data type calculations. The two different data types of integer (INT32) and single-precision floating-point (FP32) are handed over to two different ALUs for calculation, which greatly improves SM (Streaming Multiprocess) parallel computing efficiency. However, the most common in modern game applications is FP32, that is, single-precision floating-point calculations. The usage rate of INT32 ALU is lower than that of FP32 ALU. In order to improve computing efficiency, NVIDIA introduced a new ALU that can support both INT32 and FP32 data types, replacing the ALU that originally only supports INT32 calculations. In other words, there are now two different data paths (Datapath), one can handle integer or single-precision floating-point, and the other can only handle single-precision floating-point calculations.

The original SM (Streaming Multiprocess) is divided into four smaller blocks. Each block has its own scheduler and register. It can schedule 16 INT32 ALUs and 16 FP32 ALUs. The entire SM (Streaming Multiprocess) is simultaneously It can handle 64 INT32 calculation instructions and 64 FP32 calculation instructions. On the NVIDIA Ampere architecture, it becomes 128 FP32 calculation instructions or 64 INT 32 calculation instructions and 64 FP32 calculation instructions. When encountering FP32-based graphics calculations, the calculation throughput can be increased to twice the original.

In addition, NVIDIA has also updated the counting method of CUDA cores. Now an FP32 ALU is used as a CUDA core. Therefore, on the NVIDIA Ampere architecture, the number of CUDA cores owned by each SM (Streaming Multiprocess) has doubled to 128.

In order to match the scale of the computing unit with a certain expansion, NVIDIA has also made certain improvements to the cache system of each SM (Streaming Multiprocess). The shared cache/L1 data cache capacity of NVIDIA Ampere architecture SM (Streaming Multiprocess) has increased from 96KB to 128KB, and its bandwidth has doubled, achieving both capacity and bandwidth growth.

The second-generation RT Core brings a significant improvement in optical tracking efficiency

On the NVIDIA Turing architecture, NVIDIA introduced for the first time RT Core which can accelerate real-time ray tracing operations. When performing real-time ray tracing-related calculations, modern SIMD-based CUDA cores are too inefficient when performing calculations such as light and object performance collision points. Instead, specific-purpose computing modules based on the MIMD architecture are more efficient. NVIDIA's RT Core is such a dedicated hardware unit designed to accelerate real-time ray tracing calculations.

The RT Core on the NVIDIA Ampere architecture GPU mainly adds support for accelerated calculation of dynamic blur. In the case of non-light chasing, the motion blur is often just a post-processing filter applied to the picture, and its effect is not real. In the case of real-time light tracking, dynamic blur is generated by real-time calculation of the interaction between the object and the light. The calculation is very complicated, and even the RT Core on Turing is difficult to carry. With the NVIDIA Ampere architecture, the second-generation RT Core adds an interpolation algorithm designed by NVIDIA, which improves the real-time ray tracing efficiency in this case while ensuring the accuracy of a dynamic blur. The official said that it can achieve up to 8 times the previous generation. speed. In addition, in the basic BVH calculation, the new generation RT Core can also be twice as fast.

The third generation of Tensor Core makes AI performance leap

Starting from the NVIDIA Volta architecture, NVIDIA has introduced Tensor Core optimized for AI computing in SM (Streaming Multiprocess). These tensor computing units can improve the efficiency of graphics cards in machine learning calculations. On the NVIDIA Ampere architecture, Tensor Core has evolved to the third generation, which can provide 4 times higher performance than the second-generation Tensor Core. However, the Tensor Core on the game card has been streamlined to a certain extent, and the throughput of its FP16 FMA calculation is only half of the Tensor Core in the GA100 core.


In addition to the performance improvement of the third-generation Tensor Core, it also provides support for sparse matrix operations. For a detailed introduction, you can see our previous analysis of the NVIDIA Ampere architecture for the computing card direction: "A simple interpretation of the NVIDIA new-generation Ampere architecture: An improved and revolutionary structure upgrade. " In general, even if the game-oriented NVIDIA Ampere architecture reduces the number of Tensor Cores per SM (Streaming Multiprocess) from 8 to 4, its overall performance is still greatly improved.

DLSS 2.0

The more powerful AI computing power brought by the new Tensor Core will help DLSS. Earlier this year, NVIDIA began to promote DLSS 2.0 technology in an all-round way. Compared with the original DLSS, DLSS 2.0 is both in terms of image quality and rendering efficiency. It is no longer a so-called tasteless function. It can significantly improve game performance and ensure the smoothness of the game at 1440p resolution. After DLSS is turned on, the rendering pressure on the GPU is low, which can be effectively reduced. GPU power consumption during gaming, thereby extending battery life.

First of all, DLSS 2.0 has greatly improved efficiency and processing speed. NVIDIA claims that its speed can reach twice the speed of the original version. In the actual game, the number of frames can be increased with the same setting.

Then there is better image super-sampling quality. DLSS 2.0 extends the multiple of super-sampling and can support 4x resolution stretching, that is to say, it can be stretched to 4K resolution through DLSS 2.0 at 1080p rendering resolution. , Which greatly saves GPU resources and can provide a higher number of frames.

The most important point is that DLSS 2.0 no longer requires model learning and reasoning for a single game. Now all games will use one model, which greatly reduces the threshold for game developers to use DLSS technology. In the future, integrating DLSS technology will be a piece of cake. Very simple thing.

More parallel rendering pipeline

Delivering different types of calculations to different units for processing is a concept that has been adopted since the NVIDIA Volta architecture. The Tensor Core introduced at that time diverted many AI-related operations, and the RT Core introduced afterward used real-time light. Tracking related calculations are diverted. Can they be executed in parallel? Yes, but not all operations can be executed in parallel.

As shown in the figure above, when Turing GPU turns on real-time light tracking and DLSS, its RT Core and Tensor Core do not work in parallel. The time when Tensor Core is called is close to the end of the entire rendering process, and it does not run simultaneously with RT Core.

On the NVIDIA Ampere architecture, NVIDIA has improved the parallelism between the various units inside the GPU. Now the three major units of the traditional computing unit, RT Core and Tensor Core can work at the same time, continuing to shorten the frame rendering time on the original basis.

The third generation of Max-Q technology

Max-Q is a system-level technology that provides excellent performance for thin and light gaming laptops. From chip, software, PCB design, to power distribution and cooling system, all parts of the notebook computer are specially optimized for power and performance. The third-generation Max-Q technology introduces WhisperMode 2.0 and Dynamic Boost 2.0 through AI and new system optimization options, allowing high-performance gaming laptops to perform far beyond the previous.

Dynamic Boost 2.0

In this GeForce RTX 30 series notebook GPU, NVIDIA also introduced Dynamic Boost 2.0 technology for gaming notebooks, because in most mainstream gaming notebooks, the GPU and CPU share the cooling system and power consumption. There are situations in which GPU and CPU resources may be unreasonably allocated. For example, in games, some are more GPU-oriented, or the CPU may be used more in certain scenarios, and general games have fixed power consumption. Allocation, no matter which side has a higher demand, no more resources will be allocated.

NVIDIA’s Dynamic Boost 2.0 technology will analyze the game running conditions according to the performance requirements of different games and different scenarios, combined with AI technology, and automatically adjust the power distribution of GPU and CPU, and the power consumption of GPU and GPU memory to maximize each The operating efficiency of the official claims can bring up to 16% performance improvement.

WhisperMode 2.0

WhisperMode 2.0 can take the noise control of gaming laptops to a new level. WhisperMode has been completely redesigned and customized from the system level to the notebook. However, not all notebooks equipped with RTX 30 series GPUs will be equipped with this feature. The decision is left to the manufacturer. After choosing the noise level you want, the AI ​​driving algorithm of WhisperMode 2.0 can manage the CPU, GPU, system temperature, and fan speed to provide you with good noise performance while still maintaining excellent performance.

In the past, the silent mode of gaming laptops was mainly achieved by reducing the fan speed, which would limit system performance or cause temperature rise. WhisperMode 2.0 is a more complex system-level controller that maximizes performance within the noise level selected by the user. The AI-driven algorithm can dynamically manage CPU power, GPU power, system temperature, and system fan speed to provide the best experience at the selected noise level.

In addition to the noise level, users can also adjust the minimum frame rate target to ensure a smooth gaming experience, providing users with an ultra-efficient mode that makes the laptop quieter during gaming and creation.

Resizable BAR

In order to take advantage of the high-speed connectivity features of the PCI-E interface, NVIDIA also announced the Resizable BAR technology through this RTX 30 series GPU update, which mainly allows the CPU to directly access the GPU's video memory through PCI-E. The data exchange between each other is more direct, instead of going through the system memory, especially now that there are more and more game files, it is easy to queue up for data in the traditional access mode. Resizable BAR can let the CPU and GPU Do more efficient processing in the game.

However, this technology of NVIDIA is not only realized by GPU hardware but also requires the optimization and cooperation of notebook manufacturers in motherboard design and game developers. It is expected that within this year, new game notebooks and games will be upgraded through patches. Support Resizable BAR technology.

NVIDIA Reflex

Along with the GeForce RTX 30 series desktop GPU released, there is a new thing that is very important for e-sports games, or more specifically for e-sports players, that is NVIDIA Reflex, and now this technology is also brought to notebooks. So what exactly is this NVIDIA Reflex? In fact, it is divided into two parts, one is the hardware and the other is the software.

The hardware part is something very similar to the LDAT we used this time. It is called Reflex Latency Analyzer. It can actually be regarded as an advanced version of LDAT. It is directly pre-installed in the monitor and can be used to measure the player’s click of the mouse. The time difference between the changes in the picture, that is, all the delays of the entire system.

The software part is NVIDIA Reflex SDK. The function of this NVIDIA Reflex SDK is to reduce and measure the rendering delay, and developers can integrate it directly into the game. After turning on its low-latency mode, you can synchronize the CPU and the graphics card, greatly reducing the rendering sequence, thereby reducing rendering delay.

NVIDIA Broadcast

NVIDIA Broadcast is launched for live broadcasters. This uses the AI ​​capability of the RTX GPU to eliminate or replace the background of the live broadcaster, as well as camera reconstruction, and even help the microphone to eliminate background noise.

After installing the NVIDIA Broadcast software, it will establish an intermediary role between the camera, headset, and live broadcast software, allowing external devices to use the AI ​​capabilities of the RTX GPU to do some AI enhancement effects. Headphones and microphones now support With the noise reduction function, AI will analyze which are the main audio and which are the background noise to reduce noise and present clear and useful sounds to the live broadcaster and audience.

The camera now has the ability to automatically reconstruct images and background processing. From the camera capture to the screen, it can be set to be processed by Broadcast, and then transmitted to the live broadcast software such as OBS, which can make the background of the live broadcast host more vivid and flexible. At the same time, the construction cost of the live broadcast scene can be reduced.

RTX Studio

The concept of RTX Studio was proposed by NVIDIA in 2019 because NVIDIA believes that GeForce RTX series GPUs are not limited to playing games. With the increasing demand for hardware in the content creation market in recent years, NVIDIA hopes that RTX series graphics cards can also enable content. Creators benefit, while RTX Studio notebooks are for individual creators and studio users to meet creative needs. The NVIDIA Ampere architecture has made significant improvements in the SM (Streaming Multiprocess) for general computing, RT core for ray tracing operations, and Tensor core for AI operations. These three main parts have been greatly improved. The mainstream creative applications can also be further accelerated.

Because in the creative applications supported by RTX Studio, there are already a large number of mainstream software that uses these three features of RTX series GPUs. For example, the video editing software Premiere Pro supports CUDA-based mercury hardware acceleration, and the 3D animation production software Blender can use RT Core to improve rendering speed, and DaVinci Resolve, Photoshop, Lightroom, with the help of Tensor core, achieve faster and more accurate AI functions.

Therefore, the three major speedups of the RTX 30 series GPU will correspondingly help those creative applications that utilize these three features to obtain faster processing speeds, and not only that, some applications also obtain new functional features. For example, Blender supports the motion blur acceleration of the second-generation RT core. In rendering 3D animations with high-speed motion scenes, it can better cope with the motion blur effects. There is also AI-based supersampling technology DLSS, which can also be applied now When it comes to creative work, the interior design and rendering software D5 renderer is the first 3D renderer that supports DLSS technology, which greatly increases the frame rate of the image in a real-time preview.

Finally, the RTX 30 series GPU has also upgraded the built-in NVDEC to the fifth generation, which supports AV1 hard decoding of HDR videos with a maximum resolution of 8K. This will also be of great help to video post workers who need 8K HDR video playback. The original seventh-generation NVENC hardware encoder reduces the video export time by up to five times and helps reduce hardware performance consumption during live streaming.

The powerful performance of GeForce RTX 30 series notebook GPUs allows gaming notebooks to maintain a higher frame rate with the highest image quality and light tracking enabled. At the same time, Dynamic Boost 2.0 in the third-generation Max-Q technology can automatically adjust the power consumption of GPU and CPU to maximize their respective operating efficiency. WhisperMode 2.0 can achieve the maximum noise level selected by the user化 performance. Resizable BAR allows CPU and GPU to do more efficient processing in games. RTX 30 series notebook GPUs with more powerful AI performance can bring qualitative changes to the user experience both in games and content creation.

Post a Comment

0 Comments