Public Research Seminar by Advanced Materials Thrust, Function Hub - Towards Efficient and Scalable Multi-GPU Computing
Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標:支持以下联合国可持续发展目标:
In the past decade, Graphics Processing Units (GPUs) have rapidly evolved as one of the most popular computing platforms to provide significant acceleration in various application fields. Multi-GPU systems become popular to cater to the ever-growing application parallelism and input dataset sizes. However, the delivered performance rarely scales with the number of GPUs. Specifically, the execution efficiency suffers expensive address translations and NUMA overheads in multi-GPUs. The translation process involves non-local components such as TLB and page table on the CPU I/O memory management unit (IOMMU), resulting in non-uniform and unpredictable long latencies. Moreover, frequent page migration due to page sharing incurs substantial page table management overheads. In this talk, I will discuss the detailed address translation procedure in multi-GPU systems and root the sources of scalability and performance constraints. I will discuss several architecture and runtime innovations on how to optimize the address translation and page placement on multi-GPU systems toward scalable executions.
Dr. Xulong Tang is an Assistant Professor in the Computer Science Department at the University of Pittsburgh. He received his Ph.D. degree from the Pennsylvania State University in 2019. His current research focuses on i) designing next-generation GPU architectures and systems, ii) exploring efficient edge computing, and iii) advancing quantum computing systems. His work has been published in top-tier venues including MICRO, HPCA, ISCA, PLDI, DAC, NeurIPS, ICLR, and ECCV. You can find out more about him at https://xzt102.github.io/.
For enquiries, please contact Ms. Lina ZHOU at linalnzhou@hkust-gz.edu.cn.