Supercharging Performance using NVIDIA Virtual Compute Server on Dell EMC Servers

A new Reference Architecture for NVIDIA Virtual Compute Server (vCS) on Dell EMC infrastructure provides a solution to enable server GPU virtualization.  

A recent study that analyzed GPU utilization metrics across different customer sites running AI workloads revealed that GPU resources were underutilized in most cases. Here we present the study’s two key findings, along with recommendations for solving them.

  1. Nearly a third of the users are averaging less than 15% utilization. Average GPU memory usage is quite similar. Given that the users are experienced deep learning practitioners, this is very surprising. GPUs are getting faster and faster, but it doesn’t matter if the applications don’t completely use them.

Recommendation: Improve utilization by sharing the GPU across multiple users by using virtualization. Those who use optimal batch size, learning rates and hyper-parameters to fully utilize the GPU memory and compute core capabilities can be allocated a dedicated virtualized GPU instance or multiple GPUs inside a single virtual machine (VM).

  1. There’s another, probably larger, waste of resources GPUs that sit unused. It’s hard to queue up work efficiently for GPUs. In a typical workflow, a data scientist will set up many experiments, wait for them to finish, and then spend quite a lot of time digesting the results while the GPUs sit idle.

Recommendation: GPU pooling and disaggregation can solve this problem by providing the ability to dynamically re-assign and spin up resources, allowing idle resources to be used by other data scientist applications. Using VMware® vSphere® vMotion™ to dynamically transfer GPU-accelerated VMs and workloads will reduce GPU resources.

New NVIDIA A100 offers GPU partitioning

NVIDIA® recently announced hardware partitioning with the NVIDIA A100 Tensor Core GPU as a complementary solution to virtualization. The A100 in multi-instance GPU (MIG) mode can run any mix of up to seven AI or HPC workloads of different sizes simultaneously. GPU partitioning is especially useful for AI inferencing jobs as well as early-stage AI development work that typically do not  consume all the performance that a modern GPU delivers. With GPU virtualization software, a virtual machine (VM) can be run on each of these MIG instances so organizations can take advantage of management, monitoring, and operational benefits of hypervisor-based server virtualization.

For many years, data centers have used server CPU virtualization to increase IT agility and improve the utilization of their compute hardware. Today, this focus on virtualization is expanding to encompass the GPUs that accelerate many compute-intensive workloads, such as AI training and inferencing as well as data analytics. With virtualization, data centers can make GPUs available to more users, while increasing the overall utilization of these valuable assets.

Virtualizing GPUs inside Dell EMC servers

At Dell Technologies, we’ve worked closely with our technology partners to make GPU virtualization available in our line of GPU-accelerated Dell EMC PowerEdge servers. We took a big step in this direction in August 2019 when we rolled out support for NVIDIA Virtual Compute Server software to enable hypervisor-based virtualization on GPU-accelerated servers equipped with NVIDIA Mellanox® ConnectX-5 or newer network interface cards (NICs). NVIDIA Virtual Compute Server allows data centers to accelerate server virtualization with the latest GPUs so that the most compute-intensive workloads can run in virtual machines.

Today, we’re taking another big step forward with a new Dell EMC reference architecture for NVIDIA Virtual Compute Server. With this solution, your IT administrators can allocate partitions of GPU resources within VMware vSphere, as well as support the live migration of virtual machines running NVIDIA CUDA™ workloads.

There are many valuable benefits in the move to GPU virtualization with Virtual Compute Server with Dell EMC PowerEdge servers. For example, virtualization helps your IT administrators:

  • Democratize GPU access by providing partitions of GPUs on demand
  • Scale GPU resource assignments up and down, as needed and
  • Support live migration of GPU memory

If your IT organization is considering GPU virtualization in your data center, the Dell EMC reference architecture for NVIDIA Virtual Compute Server is a great place to get started. It walks you through the use cases for Virtual Compute Server and your options for NVIDIA GPUs in Dell EMC PowerEdge servers.

Putting Virtual Compute Server to the Test

Dell Technologies engineers investigated how GPU virtualization with Virtual Compute Server impacts overall performance. These tests initially compared an NVIDIA GPU running on bare-metal Linux to a virtualized GPU. After establishing that baseline of performance, the team conducted additional testing with multiple virtual GPUs and virtual GPU partitions.

Test results show that in most cases, users can expect a small difference in performance, in the range of two to five percent, compared to bare metal when using virtual GPU profiles for machine learning and deep learning workloads. And in an interesting twist, there are scenarios where the performance difference is favorable. For example, when VMs running a mix of workloads, you might see faster time to result using multiple fractional GPUs in parallel than you would using a full GPU and scheduling the tasks to run serially. This can occur when workloads across virtual machines aren’t executed at the same time, or aren’t always GPU-bound. Choosing the appropriate GPU scheduling policy can impact performance, and the team compared performance of different scheduling policies.

For full details on the performance tests conducted in the Dell EMC Server CTO lab, along with detailed configuration information, see Virtualizing GPUs in VMware vSphere using NVIDIA Virtual Compute Server on Dell EMC infrastructure. Visit here to learn more about Dell EMC PowerEdge server accelerators.

About the Author: Ramesh Radhakrishnan

Ramesh is an Engineering Technologist in Dell's Server CTO Office. He has led technology strategy and architecture for Dell EMC in the areas of Energy Efficient MicroServer Architecture (ARM/Xeon-D), Microsoft Hybrid Cloud and is currently engaged in driving technology strategy and architecture for Dell EMCvaround advanced Analytics and Machine Learning/Deep Learning. He is a member of the Dell Patent Committee and has 15 published patents. He received his Ph.D in Computer Science and Engineering from the University of Texas at Austin.