Metrics reloaded: recommendations for image analysis validation.

dc.contributor.author

Maier-Hein, Lena

dc.contributor.author

Reinke, Annika

dc.contributor.author

Godau, Patrick

dc.contributor.author

Tizabi, Minu D

dc.contributor.author

Buettner, Florian

dc.contributor.author

Christodoulou, Evangelia

dc.contributor.author

Glocker, Ben

dc.contributor.author

Isensee, Fabian

dc.contributor.author

Kleesiek, Jens

dc.contributor.author

Kozubek, Michal

dc.contributor.author

Reyes, Mauricio

dc.contributor.author

Riegler, Michael A

dc.contributor.author

Wiesenfarth, Manuel

dc.contributor.author

Kavur, A Emre

dc.contributor.author

Sudre, Carole H

dc.contributor.author

Baumgartner, Michael

dc.contributor.author

Eisenmann, Matthias

dc.contributor.author

Heckmann-Nötzel, Doreen

dc.contributor.author

Rädsch, Tim

dc.contributor.author

Acion, Laura

dc.contributor.author

Antonelli, Michela

dc.contributor.author

Arbel, Tal

dc.contributor.author

Bakas, Spyridon

dc.contributor.author

Benis, Arriel

dc.contributor.author

Blaschko, Matthew B

dc.contributor.author

Cardoso, M Jorge

dc.contributor.author

Cheplygina, Veronika

dc.contributor.author

Cimini, Beth A

dc.contributor.author

Collins, Gary S

dc.contributor.author

Farahani, Keyvan

dc.contributor.author

Ferrer, Luciana

dc.contributor.author

Galdran, Adrian

dc.contributor.author

van Ginneken, Bram

dc.contributor.author

Haase, Robert

dc.contributor.author

Hashimoto, Daniel A

dc.contributor.author

Hoffman, Michael M

dc.contributor.author

Huisman, Merel

dc.contributor.author

Jannin, Pierre

dc.contributor.author

Kahn, Charles E

dc.contributor.author

Kainmueller, Dagmar

dc.contributor.author

Kainz, Bernhard

dc.contributor.author

Karargyris, Alexandros

dc.contributor.author

Karthikesalingam, Alan

dc.contributor.author

Kofler, Florian

dc.contributor.author

Kopp-Schneider, Annette

dc.contributor.author

Kreshuk, Anna

dc.contributor.author

Kurc, Tahsin

dc.contributor.author

Landman, Bennett A

dc.contributor.author

Litjens, Geert

dc.contributor.author

Madani, Amin

dc.contributor.author

Maier-Hein, Klaus

dc.contributor.author

Martel, Anne L

dc.contributor.author

Mattson, Peter

dc.contributor.author

Meijering, Erik

dc.contributor.author

Menze, Bjoern

dc.contributor.author

Moons, Karel GM

dc.contributor.author

Müller, Henning

dc.contributor.author

Nichyporuk, Brennan

dc.contributor.author

Nickel, Felix

dc.contributor.author

Petersen, Jens

dc.contributor.author

Rajpoot, Nasir

dc.contributor.author

Rieke, Nicola

dc.contributor.author

Saez-Rodriguez, Julio

dc.contributor.author

Sánchez, Clara I

dc.contributor.author

Shetty, Shravya

dc.contributor.author

van Smeden, Maarten

dc.contributor.author

Summers, Ronald M

dc.contributor.author

Taha, Abdel A

dc.contributor.author

Tiulpin, Aleksei

dc.contributor.author

Tsaftaris, Sotirios A

dc.contributor.author

Van Calster, Ben

dc.contributor.author

Varoquaux, Gaël

dc.contributor.author

Jäger, Paul F

dc.date.accessioned

2025-08-09T02:31:57Z

dc.date.available

2025-08-09T02:31:57Z

dc.date.issued

2024-02

dc.description.abstract

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.

dc.identifier

10.1038/s41592-023-02151-z

dc.identifier.issn

1548-7091

dc.identifier.issn

1548-7105

dc.identifier.uri

https://hdl.handle.net/10161/33085

dc.language

eng

dc.publisher

Springer Science and Business Media LLC

dc.relation.ispartof

Nature methods

dc.relation.isversionof

10.1038/s41592-023-02151-z

dc.rights.uri

https://creativecommons.org/licenses/by-nc/4.0

dc.subject

Algorithms

dc.subject

Semantics

dc.subject

Image Processing, Computer-Assisted

dc.subject

Machine Learning

dc.title

Metrics reloaded: recommendations for image analysis validation.

dc.type

Journal article

duke.contributor.orcid

Benis, Arriel|0000-0002-9125-8300

pubs.begin-page

195

pubs.end-page

212

pubs.issue

2

pubs.organisational-group

Duke

pubs.organisational-group

Pratt School of Engineering

pubs.organisational-group

Biomedical Engineering

pubs.publication-status

Published

pubs.volume

21

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Metrics reloaded recommendations for image analysis validation.pdf
Size:
35.18 MB
Format:
Adobe Portable Document Format