As Machine Learning systems are increasingly becoming part of user-facing applications, their reliability and robustness are key to building and maintaining trust with users. While advances in learning are continuously improving model performance in expectation and in isolation, there is an emergent need for identifying, understanding, and mitigating cases where models may fail in unexpected ways and therefore break human trust or dependencies with other larger software ecosystems. Current development infrastructures and methodologies often designed with traditional software in mind, still provide very little support to enable practitioners debug and troubleshoot systems over time.
This talk will share some of the latest progress we have made on building interactive tools for engineering teams that enable ML developers to analyze and understand errors of learning models and systems prior to deployment and updates. The talk will cover case studies that show how these tools can be used to efficiently identify errors, compare different model versions, and train models which remain backward compatible with their previous versions. In this context, we will also share some ongoing work on integrating interpretability offerings with error analysis for guiding debugging experiences.