PPoPP 2024
Sat 2 - Wed 6 March 2024 Edinburgh, United Kingdom
Wed 6 Mar 2024 10:00 - 10:20 at Moorfoot - Linear Algebra Chair(s): I-Ting Angelina Lee

Sparse-Matrix Dense-Matrix Multiplication (SpMM) and Sampled Dense Dense Matrix Multiplication (SDDMM) are important sparse kernels in various computation domains. The uneven distribution of non-zeros in the sparse matrix and the tight data dependence between sparse and dense matrixes make it a challenge to run sparse matrix multiplication efficiently on GPUs. By analyzing the aforementioned problems, we propose a row decomposition (RoDe)-based approach to optimize the two kernels on GPUs, using the standard Compressed Sparse Row (CSR) format. Specifically, RoDe divides the sparse matrix rows into regular parts and residual parts, to fully optimize their computations separately. We also devise the corresponding load balancing and fine-grained pipelining technologies. Profiling results show that RoDe can achieve more efficient memory access and reduce warp stall cycles significantly. Compared to the state-of-the-art (SOTA) alternatives, RoDe achieves a speedup of up to 7.86x with a geometric mean of 1.45x for SpMM, and a speedup of up to 8.99x with a geometric mean of 1.49x for SDDMM; the dataset is SuiteSparse. RoDe also outperforms its counterpart in the deep learning dataset. Furthermore, its preprocessing overhead is significantly smaller, averaging only 16% of the SOTA.

Wed 6 Mar

Displayed time zone: London change

10:00 - 11:00
Linear AlgebraMain Conference at Moorfoot
Chair(s): I-Ting Angelina Lee Washington University in St. Louis, USA
10:00
20m
Talk
A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs
Main Conference
Pang Meng Department of Computer Science and Technology, Tsinghua University, Xiang Fei Department of Computer Science and Technology, Tsinghua University, Peng Qu Department of Computer Science and Technology, Tsinghua University, Youhui Zhang Department of Computer Science and Technology, Tsinghua University, Zhaolin Li Department of Computer Science and Technology, Tsinghua University
Link to publication DOI
10:20
20m
Talk
Fast Kronecker Matrix-Matrix Multiplications on GPUs
Main Conference
Abhinav Jangda Microsoft Research, Mohit Yadav University of Massachusetts Amherst
Link to publication DOI
10:40
20m
Talk
Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication
Main Conference
Lukas Gianinazzi ETH Zurich, Alexandros Nikolaos Ziogas ETH Zurich, Piotr Luczynski ETH Zurich, Langwen Huang ETH Zurich, Saleh Ashkboosh ETH Zurich, Florian Scheidl ETH Zurich, Armon Carigiet ETH Zurich, Chio Ge ETH Zurich, Nabil Abubaker ETH Zurich, Maciej Besta ETH Zurich, Tal Ben-Nun Lawrence Livermore National Laboratory, Torsten Hoefler ETH Zurich
Link to publication DOI