Unachievable Region in Precision-Recall Space and Its Effect on Empirical Evaluation.
Repository Usage Stats
Precision-recall (PR) curves and the areas under them are widely used to summarize machine learning results, especially for data sets exhibiting class skew. They are often used analogously to ROC curves and the area under ROC curves. It is known that PR curves vary as class skew changes. What was not recognized before this paper is that there is a region of PR space that is completely unachievable, and the size of this region depends only on the skew. This paper precisely characterizes the size of that region and discusses its implications for empirical evaluation methodology in machine learning.
More InfoShow full item record
Instructor in the Department of Biostatistics & Bioinformatics
David Page works on algorithms for data mining and machine learning, and their applications to biomedical data, especially de-identified electronic health records and high-throughput genetic and other molecular data. Of particular interest are machine learning methods for complex multi-relational data (such as electronic health records or molecules as shown) and irregular temporal data, and methods that find causal relationships or produce human-interpretable output (such as the rules for molecu