Poster
in
Workshop: Workshop on Machine Learning and Compression
Training Block-wise Sparse Models Using Kronecker Product Decomposition
Ding Zhu · Zhiqun Zuo · Mahdi Khalili
Large-scale machine learning (ML) models are increasingly being used in critical domains like education, lending, recruitment, healthcare, criminal justice, etc. However, the training, deployment, and utilization of these models demand substantial computational resources. To decrease computation and memory costs, machine learning models with sparse weight matrices are widely used in the literature. Among sparse models, those with special sparse structures (e.g., models with block-wise sparse weight matrices) fit generally better with the hardware accelerators and can decrease the memory and computation costs during the inference. Unfortunately, while weight matrices with special sparsity patterns can make the models efficient during inference, there is no efficient method for training these models. In particular, existing training methods for block-wise sparse models start with full and dense models leading to an inefficient training process. In this work, we focus on training models with block-wise sparse matrices and propose an efficient training algorithm to decrease both computation and memory costs during the training. Our extensive empirical and theoretical analyses show that our proposed algorithms can decrease the computation and memory costs significantly without a performance drop compared to baselines.