Skip to yearly menu bar Skip to main content


Poster
in
Affinity Event: LatinX in AI

Generative Interpolation of Sign Language Poses using RVQ-VAE

Fidel Omar Tito Cruz · Gissella Bejarano Nicho


Abstract:

In Sign Language Production (SLP) tasks, a common approach is to have individual sign language words and then concatenate their motion representation to form complete sentences. However, this process poses challenges due to missing frames in the middle, which lead to abrupt transitions and reduced smoothness, making the resulting sequences difficult to interpret. To address this issue, this paper presents a Residual Vector Quantized Variational Autoencoder (RVQVAE) model for interpolating 2D keypoint motion in videos. Our experiments simulate individual sign transitions by randomly hiding groups of frames within a sequence of video keypoints. The proposed model is evaluated by comparing its performance to a baseline method on frames hidden. Improvements in matrix distance errors and dynamic time-warping metrics demonstrate that the RVQVAE model produces promising results for generating intermediate frames. These findings highlight the potential for developing applications that enhance sign language production to benefit the deaf community.

Live content is unavailable. Log in and register to view live content