AI makes graphics card, graphics card runs AI! Nvidia realizes chip design self-loop?

2022年06月21日

Learning about NVIDIA's research and development has become a regular feature of NVIDIA's Spring GTC conference every year.

Bill Dally, chief scientist and senior vice president of research, for example, provides an annual overview of Nvidia's R&D organization and some details on current priorities.

In 2022, Dally will focus primarily on the AI ​​tools Nvidia is developing and using that are improving Nvidia's own products.

If you are willing to accept it, this is actually a clever reverse marketing case: Nvidia went from a company that makes graphics cards to run AI tools to a company that uses AI tools to make graphics cards.

Yes, Nvidia has started using AI intelligence to effectively improve and speed up its own graphics card designs.

Dally described in his 2022 GTC presentation: "Our design team is a group of about 300 people trying to explore a more forward-looking lead in Nvidia's product design. We're a bit like a high beam, trying to illuminate Bright things in the distance. The team is loosely organized in two halves.

The supply part is responsible for providing the technology for the graphics card. It makes the graphics card itself better, from circuits, to VLSI design methodologies, architectural networks, programming systems, and storage systems that read into graphics cards and graphics-based software. "

"The demand segment is trying to drive demand for Nvidia products by developing software systems and technologies that require graphics cards to function well.

For example we have three different graphics research groups as we continue to push the boundaries of computer graphics technology. We also have five different AI groups, because using GPUs to run AI is a huge hit right now, and it's only going to get hotter. We also have groups doing robotics and self-driving cars. "

"We also have some geo-focused labs, like our Toronto and Tel Aviv AI labs."

Occasionally, Nvidia pulls some of those groups out to retool a moonshot-style project, such as one such project that resulted in Nvidia's real-time ray-tracing technology.


Organization Chart of the Light Pursuit Project

As always, the 2022 study has repetitions of Dally's conversation the previous year, but also new information. The size of the department, for example, must have grown a lot from around 175 people in 2019. Undoubtedly, Nvidia's efforts to develop self-driving systems and robots have also intensified. Dally didn't say much about the design work of the CPU, which is no doubt also enhancing.

What follows is a small portion of Dally's presentation on Nvidia's increasing use of AI in designing chips.

Plot the voltage drop

"As experts in artificial intelligence, we naturally want to use AI to design better chips," Dally said.

Nvidia's graphics card design department does this in a few different ways. The first and most obvious way is to take existing computer-aided design tools and incorporate AI models into them.

For example, the graphics card design department has a design-aided software tool in which AI maps in real time the location of changes in power usage in Nvidia's GPU graphics products, and predicts in real time how much the voltage grid will drop.

Generally, the voltage drop is calculated by multiplying the current by the resistance drop. It takes three hours to calculate this value on a traditional CAD tool. Because this is an iterative process, if traditional tools are used all the time, the workload will become very arduous for the design department.


Concept show

What Nvidia's graphics card design department wants to do is train an AI model to take the same data. After the developers experimented with this on a bunch of designs, they then fed the AI ​​a power map. The resulting inference time for the AI ​​is only three seconds.

Of course, if you include the time for eigenvalue extraction on the power map, it's 18 minutes. Regardless, this is a very fast result compared to traditional effects.

Under a similar premise, the graphics card design department has also tried AI that does not use convolutional neural network AI, but uses graph neural network. The purpose of this process is to estimate the switching frequency of different nodes in the graphics card circuit, which actually drives the power eigenvalue input efficiency in the previous example.

As a side effect, Nvidia developers were able to get very accurate graphics card circuit power estimates much faster than traditional tools and in a fraction of the time.

Predict parasitic elements

One of Dally's favorite design projects, to which he devoted considerable time as a circuit designer several years ago, was using graph neural network AI to predict how parasitic components (electronic components that were not intended or anticipated at the time of design) would be attached to the in the final product.

In the past, circuit design was a very iterative process, where designers would draw a circuit schematic with a bunch of transistor logos on it. But he won't know how the board will perform.

Until the layout designer takes this schematic, samples the layout, and attaches parasitic elements, then the designer can run circuit simulations and discover which parts of it are not meeting the expected design specifications.

The designer would then go back and revise his schematic and again trouble the layout designer to run the process again, a very long, iterative, and inhumanely labor-intensive process.


What Nvidia's graphics card design department can do now is to train a neural network to predict parasitic components without requiring layout designers to make layout samples to discover flaws.

Thus, circuit designers can iterate very quickly without repeating the manual design-layout-redesign steps in a loop.

According to Dally, Nvidia's design AI is now very accurate in predicting these parasitic devices compared to the ground truth.

Place/Route Congestion

At the same time, we can also predict the congestion problems caused by the placement/routing of the chip, which is critical to the placement of the chip.

According to the normal process, the chip designer will make a grid table and run through the process of placement and routing. Usually this process is very time consuming and can take several days.

Only in this way can designers get the actual congestion situation of the graphics card samples, and the initial layout of the blueprint is not enough.

Designers need to refactor it and place the macros differently to avoid the red areas (shown below). The red area has too many staggered lines crossing a given area, like a bit version of "traffic jam."


In this case, after using AI for auxiliary design, there is no need to run the layout and wiring, you can take these grid lists, and use the image neural network AI to make basic predictions on the location of congestion, and the results are quite accurate. .

It's not perfect, but it shows which areas have problems. Design teams can then take action in specific areas and iterate quickly without having to repeat global placement and routing over and over again.

Chip standard cell design automation

Existing methods use AI to evaluate human designs, and in the future, it is even more exciting to be able to use AI to complete designs.

Dally cites two examples of how AI can accomplish chip design.

The first system, called NVCell, combines simulated low-temperature annealing and reinforcement learning to essentially design a library of standard cells for digital integrated circuits.

Every time a designer implements an upgrade in chip technology—say from 7-nanometer optimization to 5-nanometer—the designer creates a library of cells.

A unit is like an AND gate and an OR gate, a complete accelerator.

Right now, Nvidia's design department has thousands of these cell libraries. These units need to be redesigned within the framework of new technologies, following a complex set of design criteria.

The second is to use reinforcement learning models to design transistor layouts.

Designers have mostly used reinforcement learning to place transistors. More importantly, however, after the transistors are placed, there are generally a lot of design rule errors, and finding them is like a video game.

In fact, reinforcement learning is good at this. A good example is the application of reinforcement learning to Atari video games.

So, it's like an Atari video game, only this time it's a game that fixes a design bug in the standard unit.

Through reinforcement learning, Nvidia designers can sift through design rule errors as they arise and then fix them. In this way, the designer can basically complete the design work of the standard unit.

As shown in the figure below, 92% of the cell library can be done with this tool without any design rule errors or circuit errors.

In addition, 12% of the cells were smaller than the manually designed cells. Overall, AI-designed units are as good as human-designed units in terms of unit complexity.


This helps designers accomplish two things.

First, the application of AI can greatly save labor. Without AI, it would take a team of 10 people about a year to build a new technology arsenal. Today, designers can use several GPUs and run for a few days.

Then, humans only need to intervene in the remaining 8% of the units that AI failed to complete.

In most cases, designers tend to make better designs with AI.

So, not only saves time and effort, but also makes things better.

Very time consuming and can take days.

Only in this way can designers get the actual congestion situation of the graphics card samples, and the initial layout of the blueprint is not enough.

Designers need to refactor it and place the macros differently to avoid the red areas (shown below). The red area has too many staggered lines crossing a given area, like a bit version of "traffic jam."


In this case, after using AI for auxiliary design, there is no need to run the layout and wiring, you can take these grid lists, and use the image neural network AI to make basic predictions on the location of congestion, and the results are quite accurate. .

It's not perfect, but it shows which areas have problems. Design teams can then take action in specific areas and iterate quickly without having to repeat global placement and routing over and over again.

Chip standard cell design automation

Existing methods use AI to evaluate human designs, and in the future, it is even more exciting to be able to use AI to complete designs.

Dally cites two examples of how AI can accomplish chip design.

The first system, called NVCell, combines simulated low-temperature annealing and reinforcement learning to essentially design a library of standard cells for digital integrated circuits.

Every time a designer implements an upgrade in chip technology—say from 7-nanometer optimization to 5-nanometer—the designer creates a library of cells.

A unit is like an AND gate and an OR gate, a complete accelerator.

Right now, Nvidia's design department has thousands of these cell libraries. These units need to be redesigned within the framework of new technologies, following a complex set of design criteria.

The second is to use reinforcement learning models to design transistor layouts.

Designers have mostly used reinforcement learning to place transistors. More importantly, however, after the transistors are placed, there are generally a lot of design rule errors, and finding them is like a video game.

In fact, reinforcement learning is good at this. A good example is the application of reinforcement learning to Atari video games.

So, it's like an Atari video game, only this time it's a game that fixes a design bug in the standard unit.

Through reinforcement learning, Nvidia designers can sift through design rule errors as they arise and then fix them. In this way, the designer can basically complete the design work of the standard unit.

As shown in the figure below, 92% of the cell library can be done with this tool without any design rule errors or circuit errors.

In addition, 12% of the cells were smaller than the manually designed cells. Overall, AI-designed units are as good as human-designed units in terms of unit complexity.


This helps designers accomplish two things.

First, the application of AI can greatly save labor. Without AI, it would take a team of 10 people about a year to build a new technology arsenal. Today, designers can use several GPUs and run for a few days.

Then, humans only need to intervene in the remaining 8% of the units that AI failed to complete.

In most cases, designers tend to make better designs with AI.

So, not only saves time and effort, but also makes things better. Is the chip design self-looping?


Recommended news

Open-pit mining is a project to extract useful minerals in the crust from the surface down in an open environment with the help of mining, excavation and transportation equipment.


The common power sources in customized mining rig psu products factory are dry batteries (direct current) and household 110V-220V AC power.


Global search