Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Video-Language Models

Dual-Model Distillation for Efficient Action Classification with Hybrid Edge-Cloud Solution

Timothy Wei · Hsien Xin Peng · Elaine Xu · Bryan Zhao · Lei Ding · Diji Yang


Abstract:

As Artificial Intelligence models, such as Large Video-Language models (VLMs), grow in size, their deployment in real-world applications becomes increasingly challenging due to hardware limitations and computational costs. This is particularly evident in elderly care problems, where non-intrusive action prediction is critical but must operate within the constraints of edge devices. To address this, we design a hybrid edge-cloud solution that leverages the efficiency of smaller models for local processing while deferring to larger, more accurate cloud-based models when necessary. Specifically, we propose a novel unsupervised data generation method, Dual-Model Distillation (DMD), to train a lightweight switcher model that can predict when the edge model’s output is uncertain and selectively offload inference to the large model on the cloud. Experimental results on fall detection tasks show that our framework not only reduces computational overhead but also improves accuracy compared to using a large model alone. Our framework provides a scalable and adaptable solution for action classification in resource-constrained environments, with potential applications beyond healthcare. Noteworthy, while DMD-generated data is used for optimizing both performance and resource usage in our pipeline, we expect the concept of DMD to further support future research on knowledge alignment across multiple models.

Chat is not available.