Poster
in
Workshop: NeurIPS 2023 Workshop on Tackling Climate Change with Machine Learning: Blending New and Existing Knowledge Systems
Resource Efficient and Generalizable Representation Learning of High-Dimensional Weather and Climate Data
Juan Nathaniel · Marcus Freitag · Patrick Curran · Isabel Ruddick · Johannes Schmude
Abstract:
We study self-supervised representation learning on high-dimensional data under resource constraints. Our work is motivated by applications of vision transformers to weather and climate data. Such data frequently comes in the form of tensors that are both higher dimensional and of larger size than the RGB imagery one encounters in many computer vision experiments. This raises scaling issues and brings up the need to leverage available compute resources efficiently. Motivated by results on masked autoencoders, we show that it is possible to use sampling of subtensors as the sole augmentation strategy for contrastive learning with a sampling ratio of $\sim$1\%. This is to be compared to typical masking ratios of $75\%$ or $90\%$ for image and video data respectively. In an ablation study, we explore extreme sampling ratios and find comparable skill for ratios as low as $\sim$0.0625\%. Pursuing efficiencies, we are finally investigating whether it is possible to generate robust embeddings on dimension values which were not present at training time. We answer this question to the positive by using learnable position encoders which have continuous dependence on dimension values.
Chat is not available.