AI chip characteristics and comparison
At present, in the field of intelligent driving, general-purpose chips suitable for parallel computing such as GPUs and FPGAs are mainly used to achieve acceleration in processing deep learning AI algorithms. At the same time, some chip companies have begun to design ASIC-specific chips for AI algorithms, such as Google TPU, Horizon BPU, etc. Before the large-scale rise and mass launch of intelligent driving industry applications, the use of existing general-purpose chips such as GPUs and FPGAs can avoid the high investment and high risk of specializing in the development of custom chips (ASICs). However, because the original intention of such general-purpose chips is not Specifically for deep learning, there are problems such as insufficient performance and high power consumption. These problems will become increasingly prominent as the application scale of the autonomous driving industry expands.
This article introduces AI chips from various perspectives such as chip types, performance, applications and suppliers, and is used to literate newcomers in the industry.
Is it an artificial intelligence (AI) chip?
Broadly speaking, chips that can run AI algorithms are called AI chips.
At present, common CPUs, GPUs, FPGAs, etc. can execute AI algorithms, but the execution efficiency varies greatly.
But in a narrow sense, AI chips are generally defined as "chips specially designed for AI algorithms to accelerate."
At present, AI chips are mainly used in speech recognition, natural language processing, image processing and other fields where AI algorithms are widely used, and the algorithm efficiency is improved through chip acceleration. The main task of the AI chip is to multiply and add matrices or vectors, and then cooperate with some division, exponentiation and other algorithms. In the fields of image recognition and other AI algorithms, the CNN convolution network is commonly used. A mature AI algorithm is a large number of calculations such as convolution, residual network, and full connection. The essence is multiplication and addition.
For the automotive industry, AI chips are mainly used to handle a large number of parallel computing requirements brought by algorithms such as environmental perception, sensor fusion, and path planning in intelligent driving.
The AI chip can be understood as a calculator that quickly calculates multiplication and addition, while the CPU has to process and run a very complex instruction set, which is much more difficult than the AI chip. Although the GPU is designed for graphics processing, the CPU and GPU are not dedicated AI chips. There are a lot of other logics inside them to implement other functions, which are completely useless for current AI algorithms. At present, there are many GPU applications that have been developed specifically for AI algorithms, and some companies use FPGAs for development, but there must be dedicated AI chips for AI algorithms in the industry.
3. Why use AI chips?
From a functional point of view, artificial intelligence includes two links: reasoning and training, and the same is true for the intelligent driving industry. In the training phase, a complex neural network model is trained through big data. At present, most companies mainly use NVIDIA GPU clusters to complete the training phase. The reasoning link refers to using the trained model to infer various conclusions using a large amount of data. Therefore, the training process has relatively high requirements on the computing performance of the chip, and the inference process has high requirements for simply specified repeated calculations and low latency.
From the perspective of application scenarios, artificial intelligence chips are used in the cloud and on the device side. In the field of intelligent driving, there are also cloud servers and various in-vehicle computing platforms or domain controllers. In the training stage of intelligent driving deep learning, a huge amount of data is required And a large number of operations, a single processor cannot be completed independently, so the training link can only be implemented in the cloud server. In contrast, on the equipment side, that is, on the vehicle, there are a large number of terminals such as various ECUs and DCUs, and the demands vary greatly. Therefore, the reasoning link cannot be completed in the cloud, which requires various electronic units, hardware computing platforms or domain controllers on the vehicle to have independent inference computing capabilities. Therefore, special AI chips must be used to meet these inference computing needs.
Both traditional CPUs and GPUs can be used to execute AI algorithms, but they are slow and low in performance, especially CPUs, which cannot be put into commercial use in the field of intelligent driving.
For example, automatic driving needs to identify road conditions and traffic conditions such as roads, pedestrians, traffic lights, etc. This is a parallel calculation in the automatic driving algorithm. If the CPU performs the calculation, then it is estimated that the car hits the person. No result is calculated. The slow parallel computing speed of CPU is a congenital deficiency. It is much faster to use GPU. After all, GPU is designed for parallel computing of image processing. However, the power consumption of GPU is too large, the battery of the car cannot support normal use for a long time, and the price of GPU is relatively high. If it is used for mass production of autonomous driving Ordinary consumers can't use it either. In addition, because the GPU is not an ASIC specially developed for AI algorithms, the speed advantage of executing AI calculations has not yet reached the limit, and there is still room for improvement.
In the field of intelligent driving, deep learning applications such as environment perception and object recognition require fast computing response! Time is life, and one step slow may cause irreversible situations. However, while ensuring fast performance and high efficiency, the power consumption should not be too high, and it should not have a great impact on the cruising range of smart cars, that is, AI chips must have low power consumption , so GPU is not the best AI chip choice for intelligent driving. Therefore, the development of ASIC has become inevitable.
4. Types of AI chips
The current mainstream AI chips are mainly divided into three categories: GPU, FPGA, and ASIC. GPU and FPGA are relatively mature chip architectures in the early stage and belong to general-purpose chips. ASICs are chips that are customized for specific AI scenarios. It has been confirmed in the industry that CPU is not suitable for AI computing, but it is also essential in the field of AI application. Another argument is that there is also a brain-like chip, which is a kind of ASIC.
FPGA (Field Programmable Gate Array, Field Programmable Gate Array) has enough computing power and enough flexibility. The fast computing speed of FPGA stems from its inherently instructionless, shared-memory architecture. For the need to save the state, the registers and on-chip memory (BRAM) in the FPGA belong to their own control logic, and there is no need for unnecessary arbitration and caching. Therefore, the FPGA is fast enough to be faster than the GPU. At the same time, FPGA is also a semi-custom hardware, which can be programmed to define the unit configuration and link architecture for calculation, so it has strong flexibility. Compared with GPU, FPGA can manage and perform operations, but the development cycle is relatively long, and the development of complex algorithms is difficult.
ASIC (Application Specific Integrated Circuit) is an integrated circuit specially designed and manufactured according to the needs of the product, which can be strengthened in specific functions, with higher processing speed and lower energy consumption. The disadvantage is that the cost of research and development is high, the investment cycle of early research and development is long, and because it is customized, the reproducibility is general, so only when the amount is large enough can the preliminary investment be amortized and costs reduced.
4.1 CPU (Central Processing Unit)
As the operation and control core of the computer system, the central processing unit is the final execution unit for information processing and program operation. The CPU is the core hardware unit that controls and allocates all the hardware resources of the computer (such as memory, input and output units) and performs general operations. .
Advantages: The CPU has a large number of caches and complex logic control units, and is very good at logic control and serial operations
Disadvantages: Not good at complex arithmetic operations and processing parallel repeated operations.
For AI chips, the weakest computing power is the CPU. Although the CPU frequency is the highest, but a single one has 8 cores and 16 cores, one core is 3.5g, and 16 cores are 56g. Considering the instruction cycle, the maximum is 30g multiplications per second. Still fixed.
Manufacturer: intel, AMD
4.2 GPU (GraphicsProcessing Unit)
Graphics processor, also known as display core, visual processor, and display chip, is a kind of computer that specializes in image and graphics-related operations on personal computers, workstations, game consoles and some mobile devices (such as tablet computers, smart phones, etc.). microprocessor.
Advantages: It provides the infrastructure for multi-core parallel computing, and the number of cores is very large, which can support parallel computing of large amounts of data, and has higher floating-point computing capabilities.
Disadvantages: management control capability (weakest), power consumption (highest).
Manufacturer: AMD, NVIDIA
4.3 FPGA (Field Programmable Gate Array)
FPGA is a product of further development on the basis of programmable devices such as PAL and GAL. It emerged as a semi-custom circuit in the field of application-specific integrated circuits (ASIC), which not only solved the shortcomings of the custom circuit, but also overcome the shortcomings of the limited number of gate circuits of the original programmable device.
Advantages: infinite programming, low latency, pipeline parallelism and data parallelism (GPU only has data parallelism), the strongest real-time performance, and the highest flexibility
Disadvantages: difficult to develop, only suitable for fixed-point operations, expensive
Manufacturer: Altera (acquired by Intel), Xilinx
4.4 ASIC (Application Specific Integrated Circuit)
ASIC, that is, an application-specific integrated circuit, refers to an integrated circuit designed and manufactured in response to specific user requirements and the needs of specific electronic systems. Currently using CPLD (Complex Programmable Logic Device) and FPGA (Field Programmable Logic Array) for ASIC design is one of the most popular ways.
Advantages: As a product of integrated circuit technology and a specific user's complete machine or system technology, it has smaller size, lighter weight, lower power consumption, improved reliability, improved performance, and confidentiality compared with general-purpose integrated circuits. Enhancement and cost reduction.
Disadvantages: not flexible enough, more expensive than FPGA
Key performance indicators: power consumption, speed, cost
Manufacturers: Google, Horizon, Cambrian, etc.
4.5 Summary of Characteristics of Four Chips
The CPU is an excellent leader with multiple capabilities. Its advantage lies in its strong scheduling, management, and coordination capabilities, and its computing power is second. The GPU is equivalent to an employee with "a lot of computing power" who is scheduled by the CPU.
As an image processor, the GPU was originally designed to cope with the need for large-scale parallel computing in image processing. Therefore, it has three limitations when applied to deep learning algorithms:
First, the advantages of parallel computing cannot be fully utilized in the application process. Deep learning includes two computing links: training and application. GPU is very efficient in deep learning algorithm training, but it can only process one input image at a time during application, and the advantages of parallelism cannot be fully utilized.
Second, the hardware structure is fixed without programmability. The deep learning algorithm is not yet completely stable. If the deep learning algorithm changes greatly, the GPU cannot configure the hardware structure as flexibly as the FPGA.
Third, running deep learning algorithms is much less energy efficient than FPGAs. Research in academia and industry has proved that to achieve the same performance in running deep learning algorithms, the power consumption of GPU is much greater than that of FPGA. For example, the artificial intelligence chip based on FPGA platform of domestic start-up company Shenjian Technology is relatively energy efficient compared to GPU in the same development cycle. There is an order of magnitude improvement.
The original intention of FPGA design is to realize the function of semi-custom chip, that is, the hardware structure can be flexibly changed in real-time configuration as needed.
The research report shows that the current FPGA market is dominated by Xilinx and Altera, which together hold 85% of the market share. Altera was acquired by Intel for $16.7 billion in 2015, and Xilinx chose to cooperate with IBM. The important position of FPGA in the era of artificial intelligence.
Although FPGA is very optimistic, even Baidu Brain and Horizon AI chips are developed based on FPGA platform, but after all, they are not specially developed for the application of deep learning algorithms, and there are still many limitations in practice:
First, the basic unit has limited computing power. In order to achieve reconfigurable characteristics, there are a large number of extremely fine-grained basic units inside the FPGA, but the computing power of each unit (mainly relying on the LUT lookup table) is far lower than that of the ALU module in the CPU and GPU.
Second, there is still a big gap in speed and power consumption compared to dedicated custom chips (ASICs).
Third, the price of FPGA is relatively expensive, and the cost of a single FPGA is much higher than that of a dedicated custom chip in the case of large-scale production.
Artificial intelligence custom chips are a major trend. From the perspective of development trends, artificial intelligence custom chips will be the general direction of the development of computing chips.
Open-pit mining is a project to extract useful minerals in the crust from the surface down in an open environment with the help of mining, excavation and transportation equipment.