Poster
in
Workshop: Workshop on Machine Learning and Compression
SNeRV: Scalable Neural Representations for Video Coding
Yiying Wei · Hadi Amirpour · Christian Timmerer
Scalable or layered video coding encodes a video stream into multiple layers in such a way that it can be decoded at different levels of quality or resolution, depending on the capabilities of the device or the available network bandwidth. Traditional approaches are built as an extension of existing video codec standards, but lack industry deployments. In this paper, we propose a Scalable Neural Representation (SNeRV) for video coding that encodes multi-resolution/-quality videos into a single neural network comprising multiple layers. The base layer (BL) of the neural network encodes the lowest resolution/quality of the video stream. Enhancement layers (ELs) encode additional information that, using the BL as a starting point, can be used to reconstruct a higher-resolution/-quality video during the decoding process. This multi-layered structure allows the scalable bitstream to be truncated to adapt to the client's bandwidth conditions or computational decoding requirements. Unlike conventional video codecs constrained by complex and highly designed modules, SNeRV represents a video as a neural network and employs any model weight compression method for video compression. Experimental results demonstrate that SNeRV outperforms H.264/AVC's Scalable Video Coding (SVC) extension and achieves comparable decoding speed at high resolutions.