What kind of GPU is the best choice for AI training?


In 2020, what kind of GPU is the best choice for artificial intelligence training?

In order to optimize the matrix operation of deep learning, NVIDIA has specially designed mixed-precision cores such as Tensor Cores in the newer micro-architecture. Therefore, it is best to choose GPUs with Tensor Cores for artificial intelligence training.

As we all know, today's industry-leading (State-of-the-art) deep learning models will take up huge memory space, and many GPUs that used to be powerful in performance may now be slightly insufficient in memory. Explores which GPUs can train models without memory errors, which are better suited for PCs and small workstations. The core conclusion of this article is that video memory size is very important. Yes, video memory size is limiting the training of many deep learning models.

Because of the rapid advancement of deep learning technology, the previous situation where 12G of memory dominated the world no longer exists. In February 2020, you'll need to spend at least $2,500 on Nvidia's latest Titan RTX to just barely run the industry's best-performing models, and it's hard to imagine what that will look like by the end of the year.

consumer grade
For individual users, Nvidia's consumer-grade GeForce series is the first choice. Cheaper options are:

GeForce RTX 2080 Ti: $1200, 11GB VRAM, Turing microarchitecture (with Tensor Core support)
Titan RTX: $2500, 24GB of VRAM, Turing microarchitecture (with Tensor Core support)
It should be noted that these consumer-grade graphics cards do not support multiple cards in parallel. By default, they do not support direct communication between multiple cards. If we want card 1 and card 2 to communicate with each other, the data will be sent from card 1 first. The video memory is copied back to the main memory through the PIC-E bus, and then copied from the main memory to the video memory of card 2 through PCI-E, which is obviously a waste of time and is not conducive to the communication between multiple cards. 2080 Ti and Titan RTX are not good for P2P (Peer-to-Peer) communication support for PCI-E channels between multiple cards, but it does not mean that they do not support NVLink, users can build multiple cards by purchasing an NVLink bridge communication channel between. Some people call the problem a design flaw in the two GPUs, while others believe that Nvidia did it on purpose to get people with multi-card parallel computing needs to buy the Telsa series of GPUs.

GPU products in the data center are more expensive and suitable for enterprise users. They have higher video memory and can better support multi-card parallelism.

Quadro RTX 6000: $4000, 24GB VRAM, Turing microarchitecture (with Tensor Core support)
Quadro RTX 8000: $5500, 48GB VRAM, Turing microarchitecture (with Tensor Core support)
Telsa V100: Two versions with 16 or 32GB memory, two versions with PCI-E and NVLink, Volta microarchitecture (supports Tensor Core)
Telsa V100S: 32GB video memory, PCI-E bus, Volta micro-architecture (supports Tensor Core)
Enterprise-level GPUs generally have to be plugged into servers or workstations. These servers and workstations themselves are not cheap, especially servers supporting the Telsa platform cost 100,000 yuan. Of course, costs such as computer room construction and electricity bills are not considered here.

In May 2020, NVIDIA GTC 2020 released a new generation of Ampere microarchitecture and Telsa A100 graphics card. The A100 graphics card has stronger artificial intelligence training and reasoning capabilities, and a single A100 can be divided into up to 7 independent GPUs to handle various computing tasks .

For friends who have multiple cards for parallel training tasks, it is recommended to choose a Telsa series graphics card that supports NVLink.

For deep learning research, GeForce RTX 2080 Ti (11GB) may be the starting standard; Titan RTX (24GB) is a good option, taking into account price, memory and computing performance. For enterprise users, graphics cards such as Quadro RTX 8000 (48GB) and Telsa V100 (32GB) are suitable for cutting-edge researchers in the field of deep learning. In the second half of 2020, Nvidia's new computing platform is about to ship, and the new product will bring more powerful performance on the one hand, and will also reduce the price of existing products on the other hand.

At a time when physical hardware is expensive, perhaps we should turn our attention to cloud GPUs.

Recommended news

Table of Contents 1. Introduction: Understanding the Importance of Efficient Cooling 2. The Basics of ASIC Cooling 2.1 What is ASIC Cooling? 2.2 Why is Efficient Cooling Crucial for ASIC Devices? 3. Factors Influencing ASIC Cooling Efficiency 3.1 Heat Dissipation and Thermal Management 3.2 Airflow and Ventilation Considerations 3.3 Cooling Systems and Technologies 4. Best Practices

Introduction: In the realm of electrical engineering, power supply and distribution equipment play a vital role. One crucial aspect of such equipment is ASIC cooling. This article aims to provide you with valuable insights into ASIC cooling, its significance, and how it impacts the electrical industry. Understanding ASIC Cooling: ASIC stands for Application-Specific Integrated Circuit. These integ

Global search