PPoPP 2024
Sat 2 - Wed 6 March 2024 Edinburgh, United Kingdom
Tue 5 Mar 2024 16:50 - 17:10 at Moorfoot - Optimizing for Memory Chair(s): Yan Gu

Dynamic memory management is critical for efficiently porting modern data processing pipelines to GPUs. However, building a general-purpose dynamic memory manager on GPUs is challenging due to the massive parallelism and weak memory coherence. Existing state-of-the-art GPU memory managers, Ouroboros and Reg-Eff, employ traditional data structures such as arrays and linked lists to manage memory objects. They build specialized pipelines to achieve perfor- mance for a fixed set of allocation sizes and fall back to the CUDA allocator for allocating large sizes. In the process, they lose general-purpose usability and fail to support critical applications such as streaming graph processing.

In this paper, we introduce Gallatin, a general-purpose and high-performance GPU memory manager. Gallatin uses the van Emde Boas (vEB) tree to manage memory objects effi- ciently and supports allocations of any size. We develop a wait-free GPU implementation of the vEB tree to exploit mas- sive parallelism on GPUs. It supports constant time insertions, deletions, and successor operations for a given memory size.

In our evaluation, we compare Gallatin with state-of-the- art specialized allocator variants. It is up to 568× faster on single-sized allocations and up to 374× faster on mixed-size allocations than the next-best allocator. Gallatin also scales well as the number of threads increases and is up to 146× faster for single-sized allocations. For the graph benchmarks, Gallatin is faster than the state-of-the-art for range operations and is the fastest allocator for all graph expansion tests.

Tue 5 Mar

Displayed time zone: London change

16:10 - 17:10
Optimizing for MemoryMain Conference at Moorfoot
Chair(s): Yan Gu University of California, Riverside
16:10
20m
Talk
ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor CoresBest Paper Award
Main Conference
Yuetao Chen Microsoft Research, Kun Li Microsoft Research, Yuhao Wang Microsoft Research, Donglin Bai Microsoft Research, Lei Wang Microsoft Research, Lingxiao Ma Microsoft Research, Liang Yuan Chinese Academy of Sciences, Yunquan Zhang Zhang, Ting Cao Microsoft Research, Mao Yang Microsoft Research
Link to publication DOI
16:30
20m
Talk
CPMA: An Efficient Batch-Parallel Compressed Set Without Pointers
Main Conference
Brian Wheatman Johns Hopkins University, Randal Burns Johns Hopkins, Aydin Buluc University of California at Berkeley & Lawrence Berkeley National Lab, Helen Xu Lawrence Berkeley National Laboratory
Link to publication DOI
16:50
20m
Talk
Gallatin: A General-Purpose GPU Memory Manager
Main Conference
Hunter James McCoy University of Utah, Prashant Pandey University of Utah
Link to publication DOI