Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Foundation Models for Science: Progress, Opportunities, and Challenges

Leveraging foundation models for data-limited ecological applications

Kyle Doherty · Max Gurinas · Erik Samsoe · Charles Casper · Beau Larkin · Philip Ramsey · Brandon Trabucco · Ruslan Salakhutdinov

Keywords: [ conservation ] [ ecology ] [ weed management ] [ few-shot ]


Abstract:

Human-driven change in natural ecosystems is a global challenge relevant to human and natural communities alike. Yet, ecological data (species presence/absence or abundance), the bedrock of global change impact monitoring, are difficult to gather due to challenging field conditions, few subject experts (e.g., botanists), and brief monitoring windows. Therefore, the default condition of ecological data analysis is one of data limitation. As the generalization abilities of large foundation models grow, we might leverage these models to derive ecological insights from few data. And because ecological data are inherently more rare, they also offer the machine learning community an opportunity to better study the out-of-distribution performance of foundation models in few-shot contexts. To illustrate these principles, we gathered a field-validated dataset of presence and absence of leafy spurge (\textit{Euphorbia esula}), a weed that invades natural areas and displaces native species in North America. We then surveyed these areas with a consumer-grade drone and extracted images from ground truth locations. We fine-tuned a convolutional neural network and a state-of-the-art vision transformer on these data, then contrasted few-shot performance with that of off-the-shelf GPT-4 checkpoints. While we achieved state-of-the-art classification performance on the full dataset with the fined-tuned DINOv2 vision transformer (0.85 test accuracy), GPT-4o nearly matched prior SOTA performance (0.75 test accuracy) when shown only 8 examples per class. Furthermore, we observed a 10\% test accuracy improvement between best GPT-4-turbo and GPT-4o results, illustrating the rapid advances in recent months. Our findings demonstrate the mutual benefit of pairing ecological data with the study of the generalization abilities of foundation models. We release the Leafy Spurge Dataset for further few-shot experiments and evaluation. We will release code and data (Creative Commons Attribution 4.0 International license) for our experiments upon conclusion of double-blind peer review.

Chat is not available.