The Finer (Floating) Points of Accelerating HPC

This partnership between PowerEdge and AMD brings amazing hardware together to increase the speed of your HPC workloads.

Co-author: Bhavesh Patel, Distinguished Engineer, Infrastructure Solutions Group

Dell EMC PowerEdge R7525 and AMD Instinct MI100

The drive for advancing research in high performance computing (HPC) and artificial intelligence (AI) has sped the demand for computation at an exponential rate. Performance of HPC systems is more than doubling every two years while performance required to train AI models is doubling every 3-4 months.

In addition to the advancement of processor technology, architectural enhancements and compute accelerators are becoming the norm to meet the growing computational demands of HPC and AI. Specialized architectures and accelerators can help speed up application-specific core arithmetic operations such as floating-point multiply add and accumulate, vector and matrix operations that take up much of the execution cycles for HPC and AI workloads.

Dell EMC PowerEdge R7525

Several industries need these advanced HPC capabilities. Data analytics, such as Reverse Time Migration used in the Oil and Gas industry, require accelerated computing to solve the massive IOPS challenges. Private and government research companies require dense compute and GPU capabilities to run complex simulations that must resolve the complications of chaos theory. Academia leverages GPUs to accelerate virus simulations, genomics research, and quantum physics workloads. Modern GPUs are extending their capabilities to efficiently run as a compute co-processor for HPC and AI workloads with inclusion of hardware support for a variety of numerical precision and increased memory bandwidth. The AMD Instinct MI100 is the latest entry into the compute accelerator arena delivering compelling performance coupled with a flexible and open software ecosystem.

This collaboration brings amazing hardware together to increase the performance of your HPC workloads.

Dell has teamed with AMD to bring you the PowerEdge R7525 with AMD Instinct MI100. This combination enables faster scientific discoveries by speeding up simulations and time to insight, using complex deep learning models for your most compute intensive use cases. The AMD Instinct MI100 uses the new AMD CDNA (Compute DNA) with all-new Matrix Core Technology to deliver a nearly 7x (FP16) performance boost for AI workloads vs AMD prior Gen. Scientific applications will benefit from MI100’s single-precision FP32 Matrix Core performance for a nearly 3.5x boost for HPC & AI workloads vs AMD prior gen. Leading AI researchers can leverage MI100’s support for newer machine learning  operations like bfloat16 to help reduce training time from weeks and days to hours on the PowerEdge R7525.

AMD Instinct MI100

This collaboration brings amazing hardware together to increase the performance of your HPC workloads. The R7525 with MI100 leverages PCIe Gen4 capability, ideal for bandwidth intensive HPC applications where there is lot of data movement over PCIe bus. These performance boosts lead to faster solution times, more efficient resource utilization, and more seamless HPC scaling.

The MI100 brings more to the table than just 2x the amount of processing density compared to previous gen of AMD products, but it also complements the other components within the R7525. By itself, the MI100 delivers:

  1. The world’s fastest HPC accelerator, with up to 11.5 TFLOPs peak double precision (FP64) performance¹
  2. Nearly 3.5x (FP32) matrix performance for HPC and nearly 7x (FP16) performance boost for AI workloads vs AMD prior generations²

The new AMD Infinity Architecture connects three MI100s inside the R7525over PCIe® Gen4. Higher bandwidth and lower latencies due to PCIe Gen4 over Gen3 improves the utilization of GPUs, making HPC workloads run more efficiently.

Further, AMD’s ROCm open software allows you to use different compute languages and move that across compute platforms. This is an open and portable ecosystem supporting multi-architectures, including GPUs from other vendors. AMD also added a tool called Hipify which enables code written in native CUDA to be easily converted to the AMD ROCm HIP programming model with minimal post tuning or optimization needed.

Hopefully this is enough to get you excited about your next model training run or scientific simulation or discovery session. If you would like to learn more, please check out the Dell accelerators page or view the rest of what Dell has to offer for HPC. The Dell EMC PowerEdge R7525 with the AMD Instinct MI100 is the next step towards your accelerated computing journey.


¹ Calculations conducted by AMD Performance Labs as of Sep 18, 2020 for the AMD Instinct™ MI100 (32GB HBM2 PCIe® card) accelerator at 1,502 MHz peak boost engine clock resulted in 11.54 TFLOPS peak double precision (FP64), 46.1 TFLOPS peak single precision matrix (FP32), 23.1 TFLOPS peak single precision (FP32), 184.6 TFLOPS peak half precision (FP16) peak theoretical, floating-point performance.

²The results calculated for Radeon Instinct™ MI50 GPU at 1,725 MHz peak engine clock resulted in 26.5 TFLOPS peak theoretical half precision (FP16) and 13.25 TFLOPS peak theoretical single precision (FP32) matrix floating-point performance. Server manufacturers may vary configuration offerings yielding different results.

About the Author: Ramesh Radhakrishnan

Ramesh is an Engineering Technologist in Dell's Server CTO Office. He has led technology strategy and architecture for Dell EMC in the areas of Energy Efficient MicroServer Architecture (ARM/Xeon-D), Microsoft Hybrid Cloud and is currently engaged in driving technology strategy and architecture for Dell EMCvaround advanced Analytics and Machine Learning/Deep Learning. He is a member of the Dell Patent Committee and has 15 published patents. He received his Ph.D in Computer Science and Engineering from the University of Texas at Austin.