AWS and Nvidia have announced an extension of their strategic collaboration that will see the new Blackwell GPU platform arrive on the AWS platform.
The hyperscaler will offer Nvidia GB200 Grace Blackwell Superchip and B100 Tensor Core GPUs in a move designed to help customers unlock and leverage new generative AI capabilities.
The collaboration provides integration between Nvidia's newest multi-node systems featuring the chipmaker's next-generation Blackwell platform and AI software, advanced Nitro system security, and Nvidia's key management service (AWS KMS). AWS, Elastic Fabric Adapter (EFA) petabit-scale networking, and Amazon Elastic Compute Cloud UltraCuster Hyperscale Clustering (Amazon EC2).
In an announcement, the companies said this combination of technologies will allow customers to build and run real-time inference on multi-million-parameter large language models (LLMs) more efficiently than previous generation Nvidia GPUs on Amazon EC2.
“NVIDIA's next-generation Grace Blackwell processor marks a significant step forward in AI and GPU generative computing,” said Adam Selipsky, CEO of AWS.
“When combined with the powerful AWS Elastic Fabric adapter network, the hyperscale clustering of Amazon EC2 UltraClusters, and the advanced virtualization and security capabilities of our unique Nitro system, we make it possible for customers to create and run high-performance language models.” size with multi-million dollar parameters faster and at a massive pace. climb and more safely than anywhere else.”
Accelerated LLMs through AWS
As part of the expanded partnership, Nvidia's Blackwell platform, featuring GB200 NVL72, will now be available through the AWS platform, complete with 72 Blackwell GPUs and 36 Grace GPUs interconnected by 5th generation Nvidia NVLink.
The platform will connect to AWS's EFA network and leverage advanced Nitro system virtualization and hyperscale clustering from cloud giant EC2 UltraClusters.
AWS said this combination will allow customers to scale to thousands of GB200 Superchips and accelerate inference workloads for resource-intensive, multi-million-parameter language models.
Additionally, AWS plans to offer EC2 instances with the new B100 GPUs deployed in EC2 UltraClusters to accelerate generative AI training and inference at larger scales.
The GB200s will be available on Nvidia's DGX Cloud platform to help accelerate the development of generative AI and LLM that have the capacity to reach more than 1 trillion parameters.
Improved security
AWS and Nvidia are also leveraging existing AI security measures, with the combination of AWS Nitro System and Nvidia's GB200 designed to prevent unauthorized users from accessing model weights.
The GB200 works to enable physical encryption of NVLink connections between GPUs and encrypts data transfer from the Grace CPU to the Blackwell GPU, while EFA will encrypt data between servers for distributed training and inference.
The GB200 will also benefit from the AWS Nitro system's ability to offload I/O for functions from the host CPU/GPU to specialized AWS hardware, while implementing enhanced security to protect customer code and data during processing. .
With GB200 on Amazon EC2, AWS said customers will be able to create a trusted execution environment alongside their EC2 instance, leveraging AWS Nitro Enclaves to encrypt training data and weights with AWS KMS.
Users can load the enclave from the GB200 instance to communicate directly with the superchip, which will allow KMS to communicate directly with the enclave and transfer key material directly and securely.
The enclave can then pass that material to the GB200 securely and in a way that prevents AWS operators from accessing the key or decrypting the training data or model weights.
More details about the 'Ceiba Project'
First announced at AWS re:Invent 2023, Nvidia and AWS are also collaborating to build one of the world's fastest AI supercomputers.
Nicknamed 'Ceiba Project', the new supercomputer will be hosted on AWS and will be used by Nvidia to advance AI for LLM, graphics and simulation, digital biology, robotics, self-driving cars, as well as Nvidia Earth 2 for climate prediction.
The supercomputer will feature 20,736 B200 GPUs and will be built using the new Nvidia GB200 NVL72 system, which features fifth-generation NVLink connecting to 10,368 Grace GPUs. It will also leverage the fourth-generation EFA network to scale, delivering up to 8,000 Gbps per superchip of high-bandwidth, low-latency network performance.
The pair said this combination will enable processing of up to 400 exaflops of AI and a six-fold increase over previous plans to build Ceiba on the Hopper architecture.
“AI is driving advances at an unprecedented pace, generating new applications, business models and innovation across industries,” said Jensen Huang, founder and CEO of Nvidia.
“Our collaboration with AWS is accelerating new generative AI capabilities and giving customers unprecedented computing power to push the boundaries of what's possible.”