"In this demo, we present a conditional early-exiting framework for efficient on-device video understanding. The proposed method is based on our recently published work, FrameExit [1], which automatically learns to process fewer frames for simpler videos and more frames for complex ones. Our model sequentially observes sampled frames from a video up to the current time step and uses a gating module to automatically determine the earliest exiting point in processing where an inference is sufficiently reliable. To enable the execution of the model on-device, we use state-of-the-art quantization techniques from the open-sourced AI Model Efficiency Toolkit and a novel compiler stack that supports models with dynamic inference graphs. Our model outperforms competing methods on the HVU benchmark and on average enables a 4X reduction in compute and latency at comparable accuracy.
[1] Ghodrati, Amir, Babak Ehteshami Bejnordi, and Amirhossein Habibian. ""Frameexit: Conditional early exiting for efficient video recognition."" Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021."