SambaNova's new chip means GPTs for everyone

Chips and talent are in short supply as companies race to jump on the AI bandwagon. To start up SambaNova says its new processor could help businesses get theirs large language model (LLM) operational in just a few days.

The Palo Alto-based company, which has raised more than $1 billion in venture capital, won’t sell the chip directly to companies. Instead, it sells access to its bespoke technology stack, which includes proprietary hardware and software specifically designed to run the largest AI models.

This technology stack has now received a major upgrade following the launch of the company’s new SN40L processor. Built with Taiwanese chip giant Taiwan Semiconductor Manufacturing Co.Using ‘s 5-nanometer process, each device features 102 billion transistors across 1,040 cores capable of reaching speeds of up to 638 teraflops. It also features a new three-tier memory system designed to cope with the huge data streams associated with AI workloads.

“A trillion parameters is actually not a big model if you can run it on eight (chips).”
—Rodrigo Liang, SambaNova

SambaNova claims that a node made up of just eight of these chips is capable of supporting models with up to 5 trillion parameters, almost three times the reported size from OpenAI GPT-4 LLM. And that’s with a sequence length (a measure of the input length a model can handle) of up to 256,000 tokens. Doing the same thing using industry-standard GPUs would require hundreds of chips, says CEO Rodrigo Liang, representing a total cost of ownership less than 1/25 of the industry-standard approach.

“A trillion parameters is actually not a big model if you can run it on eight (chip) sockets,” Liang says. “We’re reducing the cost structure and really reshaping the way people think about this, without looking at models with billions of parameters as something that’s inaccessible.”

The new chip uses the same data flow architecture that the company has relied on for its previous processors. SambaNova underlying thesis is that existing chip designs are too focused on facilitating instruction flow, but for most machine learning applications, efficient data movement is a bigger bottleneck.

To get around this problem, the company’s chips feature a tiled network of memory and compute units connected by a high-speed switching fabric, which allows the way the units are linked to be dynamically reconfigured depending on the problem at hand. solve. It works together with the company SambaFlow softwarewhich can analyze a machine learning model and determine the best way to connect units to ensure smooth data flows and maximum hardware utilization.

The main difference between the company’s latest chip and its predecessor, the SN30, besides the move from a 7nm process to a 5nm process, is the addition of a third memory layer. The previous chip featured 640 MB of on-chip SRAM and a terabyte of external DRAM, but the SN40L will have 520 MB of on-chip memory, 1.5 TB of external memory, and an additional 64 GB of high-bandwidth memory (HBM ).

Memory is increasingly becoming a key differentiator for AI chips, as the growing size of generative AI models means that moving data is often more of a drag on performance than raw computing power. This is pushing companies to increase both the quantity and speed of memory in their chips. SambaNova is not the first to turn to HBM to combat this so-called memory walland its new chip has less memory than its competitors: Nvidia’s industry-leading H100 GPU offers 80 GB worth of memory, while AMD’s next GPU MI300X will feature 192GB. SambaNova wouldn’t disclose bandwidth figures for its memory, so it’s difficult to judge how it compares to other chips.

But even though it relies more on slower external memory, what sets SambaNova’s technology apart, Liang says, is a software compiler that can intelligently divide the load between the three memory layers. A proprietary interconnect between the company’s chips also allows the compiler to treat an eight-processor configuration as if it were a single system. “The performance in training is going to be fantastic,” says Liang. “Inference will be a real obstacle.”

SambaNova’s technology stack is expressly focused on running the largest AI models. Their target audience is the 2,000 largest companies in the world.

SambaNova was also cagey about how it handles another hot topic regarding AI chips:rarity. Many weights in LLMs are set to zero, and so performing operations on them is computationally wasteful. Finding ways to exploit this scarcity can go a long way. In its promotional material, SambaNova claims that the SN40L “delivers both dense and sparse computing.” This is partly possible on a software level through scheduling and the way data is transferred onto the chip, Liang says, but there is also a hardware component that he declined to discuss. “Scarcity is a battleground,” he says, “so we’re not yet ready to disclose exactly how we achieve it.”

Another common trick to help AI chips run large models faster and cheaper is to reduce precision with which the parameters are represented. The SN40L uses the bfloat16 digital format invented by Google engineers, and it also supports 8-bit precision, but Liang says low precision computing is not a priority for them because their architecture already allows them to run models with a much smaller footprint.

Liang says the company’s tech stack is expressly focused on running the biggest AI models: their target audience is the world’s 2,000 largest companies. The sales pitch is that these companies have huge stores of data, but they don’t know what most of it is saying. SambaNova says it can provide all the hardware and software needed to create AI models that unlock this data without companies having to compete for chips or AI talent. “You’re up and running in days, not months or quarters,” Liang says. “Every company can now have its own GPT model.”

One area where the SN40L is likely to have a significant advantage over competing hardware, says the Gartner analyst. Chirag Dekate, is about multimodal AI. The future of generative AI lies in large models that can handle a variety of different data types, such as images, videos and text, he says, but this leads to widely varying workloads. GPUs’ fairly rigid architectures aren’t well suited to this type of work, Dekate says, but that’s where SambaNova’s emphasis on reconfigurability could shine. “You can tailor the hardware to meet the demands of the workload,” he explains.

However, custom AI chips like those made by SambaNova make a tradeoff between performance and flexibility, Dekate says. While they may not be as powerful, GPUs can run almost any neural network out of the box and are supported by a powerful software ecosystem. On the other hand, the models must be specially adapted to work on chips like the SN40L. Dekate notes that SambaNova has built a catalog of pre-built models for customers to take advantage of, but Nvidia’s dominance in all aspects of AI development poses a major challenge.

“The architecture is actually superior to conventional GPU architectures,” Dekate explains. “But unless you put these technologies in the hands of customers and enable mass consumerization, I think you’ll probably struggle.”

This will be even more difficult now that Nvidia is also entering the full-stack AI-as-a-service market with its DGX-Cloud offer, said Dylan Patel, chief analyst at the consulting firm SemiAnalysis. “The chip is a significant step forward,” he says. “I don’t believe the chip will change the landscape.”

From the articles on your site