PPoPP 2024
Sat 2 - Wed 6 March 2024 Edinburgh, United Kingdom
Tue 5 Mar 2024 16:10 - 16:30 at Moorfoot - Optimizing for Memory Chair(s): Yan Gu

Tensor Core Unit (TCU) is increasingly integrated into modern high-performance processors to enhance matrix multiplication performance. However, constrained to its over-specification, its potential for improving other critical scientific operations like stencil computations remains untapped. This paper presents ConvStencil, a novel stencil computing system designed to efficiently transform stencil computation to matrix multiplication on Tensor Cores. We first develop a performance model for ConvStencil to guide algorithm design and optimization on TCUs. Based on this model, we propose three techniques: (1) Memory-efficient Layout Transformation using the stencil2row method; (2) Computation-dense Compute Adaptation with Dual Tessellation and kernel fusion; and (3) Performance-boosting Conflict Removal using a Lookup Table and Dirty Bits Padding. ConvStencil outperforms other stencil optimization frameworks, achieving significant speedups compared to solutions like AMOS, cuDNN, Brick, DRStencil, and TCStencil. By transforming stencil computation on Tensor Cores, ConvStencil promises to improve the performance of various scientific and engineering applications.

Tue 5 Mar

Displayed time zone: London change

16:10 - 17:10
Optimizing for MemoryMain Conference at Moorfoot
Chair(s): Yan Gu University of California, Riverside
16:10
20m
Talk
ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor CoresBest Paper Award
Main Conference
Yuetao Chen Microsoft Research, Kun Li Microsoft Research, Yuhao Wang Microsoft Research, Donglin Bai Microsoft Research, Lei Wang Microsoft Research, Lingxiao Ma Microsoft Research, Liang Yuan Chinese Academy of Sciences, Yunquan Zhang Zhang, Ting Cao Microsoft Research, Mao Yang Microsoft Research
Link to publication DOI
16:30
20m
Talk
CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers
Main Conference
Brian Wheatman Johns Hopkins University, Randal Burns Johns Hopkins, Aydin Buluc University of California at Berkeley & Lawrence Berkeley National Lab, Helen Xu Lawrence Berkeley National Laboratory
Link to publication DOI
16:50
20m
Talk
Gallatin: A General-Purpose GPU Memory Manager
Main Conference
Hunter James McCoy University of Utah, Prashant Pandey University of Utah
Link to publication DOI