Invited Talk
in
Workshop: Workshop on Advancing Neural Network Training (WANT): Computational Efficiency, Scalability, and Resource Optimization
A Data-Centric View on Workflows that Couple HPC with Large-Scale Models
Ana Gainaru
Abstract: In recent years, scientific computing workloads at HPC facilities have been undergoing a significant shift. While traditionally dominated by numerical simulations, these facilities are increasingly handling AI/ML applications for training and inference, processing and producing ever-increasing amounts of scientific data. Despite the focus on optimizing the execution of new AI/HPC workflows, little attention has been paid to the I/O runtime challenges they present. This talk aims to address that gap by analyzing these emerging trends from an I/O perspective. We will explore the performane of the multilayer high-performance I/O systems under the strain of these new workflows that combine traditional HPC techniques with AI interacting in new challenging ways.
Speaker's Bio: Ana Gainaru is a computer scientist in the CSM division at Oak Ridge National Laboratory, working on data management and performance optimization for large scale scientific workflows with a focus on codes coupling traditional HPC with AI. She received her PhD from the University of Illinois at Urbana-Champaign working on fault tolerance and scheduling for large-scale systems. In her current position she is working with application developers in fusion, neutron scattering and materials sciences to deploy digital twins and large models and improve their performance at scale.