Poster
in
Workshop: Gaze Meets ML
Interaction-aware Dynamic 3D Gaze Estimation in Videos
Chenyi Kuang · Jeffrey O Kephart · Qiang Ji
Human gaze in in-the-wild and outdoor human activities is a continuous and dynamic process that is driven by the anatomical eyemovements such as fixations, saccades and smooth pursuit. However, learning gaze dynamics in videos remains as a challenging task as annotating human gaze in videos is labor-expensive. In this paper, we propose a novel method for dynamic 3D gaze estimation in videos by utilizing the human interaction labels. Our model contains a temporal gaze estimator which is built upon Autoregressive Transformer structures. Besides, our model learns the spatial relationship of gaze among multiple subjects, by constructing a Human Interaction Graph from predicted gaze and update the gaze feature with a structure-aware Transformer. Our model predict future gaze conditioned on historical gaze and the gaze interactions in an autoregressive manner. We propose a multi-state training algorithm to alternatively update the Interaction module and dynamic gaze estimation module, when training on a mixture of labeled and unlabeled sequences. We show significant improvements in both within-domain gaze estimation accuracy and cross-domain generalization on the state-of-the-art physically unconstrainedin-the-wild Gaze360 gaze estimation benchmark.