she/her/hers
Ph.D. Candidate
Department of Computer Science
City University of Hong Kong
Email: jiaminli.icy [at] gmail.com
CV (I usually keep my CV up-to-date.)
I am a fourth-year Ph.D. candidate at City University of Hong Kong, advised by Prof. Hong Xu, Henry and Prof. Cong Wang. Previously, I obtained my bachelor's degree in Computer Science from City University of Hong Kong in 2019. My research interests lie primarily in the area of distributed machine learning system (MLSys).
I am on the job market with expect graduation in Spring 2024, hopefully :)
[J2] Libin Liu, Hong Xu, Zhixiong Niu, Jingzong Li, Wei Zhang, Peng Wang, Jiamin Li, Jason Xue Chun, Cong Wang, "ScaleFlux: Efficient Stateful Scaling in NFV", IEEE Transactions on Parallel and Distributed Systems, 2022.
[J1] Libin Liu, Chengxi Gao, Peng Wang, Hongming Huang, Jiamin Li, Hong Xu, Wei Zhang, "Bottleneck-Aware Non-Clairvoyant Coflow Scheduling with Fai", IEEE Transactions on Cloud Computing, 2021.
[C3] Jiamin Li, Yimin Jiang, Yibo Zhu, Cong Wang, Hong Xu, "Accelerating Distributed MoE Training and Inference with Lina", USENIX Annual Technical Conference (ATC), 2023.
[C2] Jiamin Li, Hong Xu, Yibo Zhu, Zherui Liu, Chuanxiong Guo, Cong Wang, "Lyra: Elastic Scheduling for Deep Learning Clusters", ACM European Conference on Computer Systems (EuroSys), 2023.
[C1] Kaiwei Mo, Chen Chen, Jiamin Li, Hong Xu, Chun Jason Xue, "Two-Dimensional Learning Rate Decay: Towards Accurate Federated Learning with Non-IID Data", IEEE International Joint Conference on Neural Networks (IJCNN), 2021.
Visiting Student Reseacher @The University of Texas at Austin
Supervised by Prof. Aditya Akella.
WIP
Apr 2023 - Dec 2023, Austin, TX, United States
Research Intern @ByteDance
Supervised by Dr. Yibo Zhu.
We design and implement Lyra, an elastic GPU cluster scheduler for deep learning to address these problems. The key idea is to exploit cluster-level elasticity by loaning idle inferences servers for training, and job-level elasticity by scaling jobs to better utilize the dynamic resource pool.
May 2019 - May 2021, Beijing, Shenzhen
Part-time Research Assistant @CityU
Supervised by Prof. Henry Xu.
We aim to accelerate the PS distributed training system by reducing the transfer size of each communication operation, develop a control knob that schedules send and receive operations in PS distributed training system.
June 2018 - May 2019, HKSAR
Software Developer @Jardine Matheson
Design and develop web services for employee recruitment in Group Human Resources department.
May 2017 - May 2018, HKSAR
Collaborated with ByteDance.
Lina, a new system that accelerates all-to-all in distributed training of large MoE models. Our idea is to combine priority-based micro-op communication scheduling with pipeline-driven expert packing.
Abstract: Scaling model parameters usually improves model quality, but at the price of high computation overhead. Sparsely activated models, usually in the form of Mixture of Experts (MoE) architecture, have constant computation cost over their dense counterparts, thus providing opportunities to train and serve a large model at a reasonable cost... However, the distributed training of an MoE model is prone to low efficiency, mainly due to the interleaved all-to-all communication during model computation. This project makes three main contributions. First, we systematically analyze the all-to-all overhead in distributed training of MoE. Second, we propose a new communication scheduling scheme based on tensor partitioning that prioritizes the all-to-all operations over other communication, due to its blocking nature. Third, we introduce expert packing that reduces the all-to-all transfer size and incorporates optimizations to mitigate its overheads. Both techniques effectively tackle the all-to-all bottleneck, and we integrate them into a new system called Lina. Experiments on an A100 GPU testbed show that Lina improves the training step time of popular NLP models by up to 1.73x over the state-of-the-art.
Collaborated with Microsoft Research Asia
Work completed at ByteDance.
Lyra, an elastic GPU cluster scheduler for deep learning. The key idea is to exploit cluster-level elasticity by loaning idle inferences servers for training, and job-level elasticity by scaling jobs to better utilize the dynamic resource pool.
Abstract: Organizations build separate training and inference GPU clusters for deep learning, and use separate schedulers to manage them. This leads to problems for both: inference clusters have low GPU utilization when the traffic load is low; training jobs often experience long queuing due to lack of resources... We introduce Lyra, a new cluster scheduler to address these problems. Lyra introduces capacity loaning to loan idle inference GPU servers for training jobs. It further exploits elastic scaling that scales a training job’s GPU allocation to better utilize loaned resources. Capacity loaning and elastic scaling create new challenges to cluster management. When the loaned servers need to be returned, we need to minimize job preemptions; when more GPUs become available, we need to allocate them to elastic jobs and minimize the job completion time (JCT). Lyra addresses these combinatorial problems with principled heuristics. It introduces the notion of server preemption cost which it greedily reduces during server reclaiming. It further relies on the JCT reduction value defined for each additional worker for an elastic job to solve the scheduling problem as a multiple-choice knapsack problem. Prototype implementation on a 64-GPU testbed and large-scale simulation with 15-day traces of over 50,000 production jobs show that Lyra brings 1.53x and 1.50x reductions in average queuing time and JCT, and improves cluster usage by up to 26.9%.