Poster
in
Workshop: Medical Imaging meets NeurIPS
Metrics Reloaded
Annika Reinke · Lena Maier-Hein · Patrick Scholz · Minu D. Tizabi · Evangelia Christodoulou · Ben Glocker · Fabian Isensee · Jens Kleesiek · Michal Kozubek · Mauricio Reyes · Michael A. Riegler · Manuel Wiesenfarth · Michael Baumgartner · Matthias Eisenmann · Doreen Heckmann-Nötzel · A. Kavur · Tim Rädsch · Laura Acion · Michela Antonelli · Tal Arbel · Spyridon Bakas · Pete Bankhead · Arriel Benis · Florian Buettner · M. Jorge Cardoso · Veronika Cheplygina · Beth Cimini · Gary Collins · Keyvan Farahani · Luciana Ferrer · Adrian Galdran · Bram van Ginneken · Robert Haase · Daniel Hashimoto · Michael Hoffman · Merel Huisman · Pierre Jannin · Charles Kahn · Dagmar Kainmueller · Alexandros Karargyris · Bernhard Kainz · Alan Karthikesalingam · Hannes Kenngott · Florian Kofler · Annette Kopp-Schneider · Anna Kreshuk · Tahsin Kurc · Bennett Landman · Geert Litjens · Amin Madani · Klaus H. Maier-Hein · Anne Martel · Peter Mattson · Erik Meijering · Bjoern Menze · David Moher · Karel G.M. Moons · Henning Mueller · Brennan Nichyporuk · Felix Nickel · Jens Petersen · Nasir Rajpoot · Nicola Rieke · Julio Saez-Rodriguez · Clarisa Sanchez · Shravya Shetty · Maarten van Smeden · Carole Sudre · Ronald Summers · Abdel Aziz Taha · Sotirios Tsaftaris · Ben Ben Van Calster · Gaël Varoquaux · Paul Jäger
Flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. A large international expert consortium now created Metrics Reloaded, a comprehensive framework guiding researchers towards problem-aware metric selection. The framework is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects relevant for metric selection, from the domain interest to properties of the target structure(s), data set and algorithm output. It supports image-level classification, object detection, semantic and instance segmentation tasks. Users are guided through the process of selecting and applying appropriate validation metrics while being made aware of pitfalls. To improve the user experience, we implemented the framework in an online tool, which also provides a common point of access to explore metric weaknesses and strengths. An instantiation of the framework for various biomedical image analysis use cases demonstrates its broad applicability across domains.