PPoPP 2024
Sat 2 - Wed 6 March 2024 Edinburgh, United Kingdom
Sun 3 Mar 2024 18:00 - 20:00 at Strathblane Hall - Reception and Poster Session

Recommendation is an important instance of deep learning. These ever-enlarging models consist of a sparse part with TBs of memory footprint and a dense part that demands PFLOPs of computing capability to train. Therefore, achiev- ing both memory efficiency and computation scalability is critical to distributed training. Existing systems can relieve memory pressure by partitioning embedding tables. Yet they do not perform well when scaling up due to the high sparse communication cost between different parallel strategies in different training stages. While such cost can be reduced by designing fine-grained strategies on the sparse part that utilize skewed access pattern of data, the performance tax of maintaining consistency must be paid.

We design P2Res, a two-fold system that replicates hot embedding vectors on all GPUs and stores the rest ones across hosts. Our parallel strategy guarantees consistency of all embedding vectors handled by both parts, keeping the system transparent to training algorithms. A performance model is created in selecting the replicated items for optimal communication latency. We reduce the overhead of accessing embedding vectors residing in different processes using a pipeline over decentralized indexing tables and a contention- avoiding schedule for data exchange. In our evaluation on 32 GPUs over real-world datasets, 2.16 − 16.8× end-to-end speedup is achieved over HugeCTR, TorchRec and TFDE.

Sun 3 Mar

Displayed time zone: London change

18:00 - 20:00
Reception and Poster SessionMain Conference at Strathblane Hall
18:00
2h
Poster
POSTER - H3: A Hash-table Based and Holistically Optimized High-Performance Sparse Tensor Contraction
Main Conference
Guofeng Feng Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Weile Jia Institute of Computing Technology, Chinese Academy of Sciences, Ninghui Sun State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Jiajia Li North Carolina State University
18:00
2h
Poster
POSTER - P2Res: Pattern-Aware Sparse Communication for Scalable Recommendation Model Training
Main Conference
Jiaao He Tsinghua University, China, Shengqi Chen Tsinghua University, Jidong Zhai Tsinghua University
18:00
2h
Poster
POSTER - gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters
Main Conference
Jiajun Huang University of California, Riverside, Sheng Di Argonne National Laboratory, Xiaodong Yu Stevens Institute of Technology, Yujia Zhai University of California Riverside, Jinyang Liu University of California, Riverside, Yafan Huang The University of Iowa, Ken Raffenetti Argonne National Laboratory, Hui Zhou Argonne National Laboratory, Kai Zhao Florida State University, zizhong chen University of California, Riverside, Franck Cappello Argonne National Laboratory, Yanfei Guo Argonne National Laboratory, Rajeev Thakur Argonne National Laboratory
18:00
2h
Poster
POSTER - RadiK: Scalable Radix Top-K Selection on GPUs
Main Conference
Yifei Li Alibaba Group, Bole Zhou Independent, Jiejing Zhang Alibaba Group, Xuechao Wei Alibaba Group, Yinghan Li Alibaba Group, Yingda Chen Alibaba Group
18:00
2h
Poster
POSTER - Accelerating High-Precision Integer Multiplication used in Cryptosystems with GPUs
Main Conference
Zhuoran Ji Shandong University, Zhaorui Zhang The Hong Kong Polytechnic University, Jiming Xu Ant Group, Lei Ju Shandong University
18:00
2h
Poster
POSTER - Enabling Extreme-Scale Phase Field Simulation with In-situ Feature Extraction
Main Conference
Zhichen Feng Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Jialin Li Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Yaqian Gao Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences, TianShaobo Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Huang Ye Computer Network Information Center, Chinese Academy of Sciences, Jian Zhang Computer Network Information Center, Chinese Academy of Sciences
18:00
2h
Poster
POSTER - ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters
Main Conference
lishunde Computer Network Information Center, Chinese Academy of Sciences;University of Chinese Academy of Sciences, Junyu Gu Computer Network Information Center, Chinese Academy of Sciences;University of Chinese Academy of Sciences, Jue Wang Computer Network Information Center, Chinese Academy of Sciences, Tiechui Yao Computer Network Information Center, Chinese Academy of Sciences, ZhiQiang Liang Computer Network Information Center, Chinese Academy of Sciences, Yumeng Shi Computer Network Information Center, Chinese Academy of Sciences, Shigang Li Beijing University of Posts and Telecommunications, Weiting Xi North China Electric Power University, Shushen Li North China Electric Power University, Chunbao Zhou Computer Network Information Center, Chinese Academy of Sciences, Yangang Wang Computer Network Information Center, Chinese Academy of Sciences, Xuebin Chi Computer Network Information Center, Chinese Academy of Sciences;University of Chinese Academy of Sciences
18:00
2h
Poster
POSTER - RELAX: Durable Data Structures with Swift Recovery
Main Conference
Almog Zur Technion, Nachshon Cohen Amazon, Michal Friedman ETH Zurich, Switzerland, Erez Petrank Technion
18:00
2h
Poster
POSTER - FineCo: Fine-grained Heterogeneous Resource Management for Concurrent DNN Inferences
Main Conference
Lixian Ma State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, Haoruo Chen State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, En Shao State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, Leping Wang State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, Quan Chen Shanghai Jiao Tong University, Guangming Tan Chinese Academy of Sciences(CAS)
18:00
2h
Poster
POSTER - LLM-PQ:Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization
Main Conference
Juntao Zhao The University of Hong Kong, Borui Wan The University of Hong Kong, Chuan Wu The University of Hong Kong, Yanghua Peng ByteDance Inc., Haibin Lin ByteDance Inc.
18:00
2h
Poster
POSTER - StructMG: A Fast and Scalable Structured Multigrid
Main Conference
Yi Zong Tsinghua University, Xinliang Wang Huawei Technologies Co., Ltd, Haopeng Huang Tsinghua University, Chensong Zhang Academy of Mathematics and Systems Science, Xiaowen Xu Institute of Applied Physics and Computational Mathematics, Jian Sun CMA Earth System Modeling and Prediction Center, Bowen Yan Tsinghua University, Qin Wang Huawei Technologies Co., Ltd, Sicong Li Huawei Technologies Co., Ltd, Zhaohui Ding Huawei Technologies Co., Ltd, Wei Xue Tsinghua University
18:00
2h
Poster
POSTER - OCToPus: Semantic-aware Concurrency Control for Blockchain Transactions
Main Conference
dePaul Miller Lehigh University, Henry F. Korth Lehigh University, Roberto Palmieri Lehigh University