PPoPP 2024
Sat 2 - Wed 6 March 2024 Edinburgh, United Kingdom
Tue 5 Mar 2024 11:30 - 11:50 at Moorfoot - ML Workloads Chair(s): Xipeng Shen

Convolutional neural networks (CNNs) have achieved remarkable success in various application fields. Although model compression techniques mitigate the ever-increasing resource demands of large CNN models, the compressed models usually exhibit irregular memory access and unstructured sparsity, which are difficult for dominant operators such as sparse convolution to achieve expected performance speedup on popular inference platforms such as GPU. In this paper, we propose Tetris, an efficient sparse convolution approach optimized for GPU. Tetris first fully exploits the input reuse opportunity of sparse convolution to reduce the memory accesses to global memory. It then adopts a stride packed filter (SPF) format and a bank-sensing reorganization scheme to eliminate the irregular memory accesses caused by unstructured sparsity. It also leverages a filter group reorder technique to address load imbalance among threads, and a parameter tuning method to determine the optimal parameters of the sparse convolution implementation. The experiment results show that Tetris outperforms dense/sparse convolution libraries and cutting-edge implementations with promising performance speedup.

Tue 5 Mar

Displayed time zone: London change

11:30 - 12:30
ML WorkloadsMain Conference at Moorfoot
Chair(s): Xipeng Shen North Carolina State University
11:30
20m
Talk
Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU
Main Conference
xiaoyanliu Beihang University, Xuegui Zheng Beihang University, Hailong Yang Beihang University, China, Zhongzhi Luan Beihang University, Depei Qian Beihang University, China
Link to publication DOI
11:50
20m
Talk
Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips
Main Conference
Ismet Dagli Colorado School of Mines, Mehmet Belviranli Colorado School of Mines
Link to publication DOI
12:10
20m
Talk
Training one DeePMD Model in Minutes: a Step Towards Online Learning
Main Conference
Siyu Hu Institute of Computing Technology, Chinese Academy of Sciences, Tong Zhao Institute of Computing Technology, Chinese Academy of Sciences, Qiuchen Sha Institute of Computing Technology, Chinese Academy of Sciences, Enji Li Institute of Computing Technology, Chinese Academy of Sciences, Xiangyu Meng College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Liping Liu Institute of Semiconductors, Chinese Academy of Sciences, Lin-Wang Wang Institute of Semiconductors, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Weile Jia Institute of Computing Technology, Chinese Academy of Sciences
Link to publication DOI