Poster
in
Workshop: Workshop on Machine Learning and Compression
Layer-Importance guided Adaptive Quantization for Efficient Speech Emotion Recognition
Tushar Shinde · RITIKA JAIN · Avinash Kumar Sharma
Speech Emotion Recognition (SER) systems are crucial for enhancing human-machine interaction. Deep learning models have achieved significant success in SER without manually engineered features, but they require substantial computational resources, processing power, and hyper-parameter tuning, limiting their deployment on edge devices. To address these limitations, we propose an efficient and lightweight Multilayer Perceptron (MLP) classifier within a custom SER framework. Furthermore, we introduce a novel adaptive quantization scheme based on layer importance to reduce model size. This method balances model compression and performance by adaptively selecting bit-width precision for each layer based on its importance, ensuring the quantized model maintains accuracy within an acceptable threshold. Unlike previous mixed-precision methods, which are often complex and costly, our approach is both interpretable and efficient. Our model is evaluated on the benchmark SER datasets, focusing on features such as Mel-Frequency Cepstral Coefficient (MFCC), Chroma, and Mel-spectrogram. Our experiments show that our quantization scheme achieves performance comparable to state-of-the-art methods while significantly reducing model size, making it well-suited for lightweight devices.