Supporting the below United Nations Sustainable Development Goals:支持以下聯合國可持續發展目標:支持以下联合国可持续发展目标:
Examination Committee
Prof Ning LI, LIFS/HKUST (Chairperson)
Prof Wei ZHANG, ECE/HKUST (Thesis Supervisor)
Prof Deming CHEN, Department of Electrical and Computer Engineering, University of Illinois, Urbana-Champaign (External Examiner)
Prof Weichuan YU, ECE/HKUST
Prof Chin-Tau LEA, ECE/HKUST
Prof Qiong LUO, CSE/HKUST
Abstract
Alongside the developing trend of numerous and various automated software/hardware applications taking over all aspects of our lives, the underlying computation platforms have further exhibited their incompetence of being omnipotent facing versatile computation tasks. The philosophy behind heterogeneous computing is to combine various computation resources with different strength to collaboratively surpass traditional homogeneous computing systems with respect to performance, power, cost, etc. Field Programmable Gate Array (FPGA) based heterogeneous systems, for example, provide excellent performance and power efficiency that inherited with FPGAs, while maintaining high flexibility and scalability.
We first investigated commercial FPGA-based heterogeneous system and its potential to accelerate Deep Neural Network (DNN) applications. We proposed an end-to-end mapping flow, FP-DNN, that takes in a Tensorflow python-described DNN and generates hardware implementation on CPU-FPGA heterogeneous system automatically. DNN applications are both computation-intensive and memory-intensive compared to shallow neural networks and challenging to be efficiently deployed automatically. Our mapping flow tackles performance, power, and flexibility simultaneously and is the first to cover sophisticated DNNs like ResNet and Inception with state-of-the-art performance.
Although accelerating a single type of application on FPGA is common practice to enhance performance in CPU-FPGA heterogeneous systems, versatility in the application domain often leaves FPGA in an awkward situation, that switching between multiple tasks on FPGA is very time-consuming. Compared with traditional single-context FPGAs, multi-context FPGAs store multiple configurations on-chip for logic units, which achieves faster context switching between different tasks. A heterogeneous system with CPU and multi-context FPGA is proposed, and we re-design the on-chip configuration memory controller, enabling context pre-fetching, that further reduces runtime reconfiguration overhead. Static and dynamic placement and scheduling algorithms for hardware tasks are invented, which are the first solutions for multi-context FPGAs, to improve FPGA runtime resource utilization and real-time task performance.