State of the psychometric methods: patient-reported outcome measure development and refinement using item response theory.

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats


Citation Stats

Attention Stats


BACKGROUND:This paper is part of a series comparing different psychometric approaches to evaluate patient-reported outcome (PRO) measures using the same items and dataset. We provide an overview and example application to demonstrate 1) using item response theory (IRT) to identify poor and well performing items; 2) testing if items perform differently based on demographic characteristics (differential item functioning, DIF); and 3) balancing IRT and content validity considerations to select items for short forms. METHODS:Model fit, local dependence, and DIF were examined for 51 items initially considered for the Patient-Reported Outcomes Measurement Information System® (PROMIS®) Depression item bank. Samejima's graded response model was used to examine how well each item measured severity levels of depression and how well it distinguished between individuals with high and low levels of depression. Two short forms were constructed based on psychometric properties and consensus discussions with instrument developers, including psychometricians and content experts. Calibrations presented here are for didactic purposes and are not intended to replace official PROMIS parameters or to be used for research. RESULTS:Of the 51 depression items, 14 exhibited local dependence, 3 exhibited DIF for gender, and 9 exhibited misfit, and these items were removed from consideration for short forms. Short form 1 prioritized content, and thus items were chosen to meet DSM-V criteria rather than being discarded for lower discrimination parameters. Short form 2 prioritized well performing items, and thus fewer DSM-V criteria were satisfied. Short forms 1-2 performed similarly for model fit statistics, but short form 2 provided greater item precision. CONCLUSIONS:IRT is a family of flexible models providing item- and scale-level information, making it a powerful tool for scale construction and refinement. Strengths of IRT models include placing respondents and items on the same metric, testing DIF across demographic or clinical subgroups, and facilitating creation of targeted short forms. Limitations include large sample sizes to obtain stable item parameters, and necessary familiarity with measurement methods to interpret results. Combining psychometric data with stakeholder input (including people with lived experiences of the health condition and clinicians) is highly recommended for scale development and evaluation.





Published Version (Please cite this version)


Publication Info

Stover, Angela M, Lori D McLeod, Michelle M Langer, Wen-Hung Chen and Bryce B Reeve (2019). State of the psychometric methods: patient-reported outcome measure development and refinement using item response theory. Journal of patient-reported outcomes, 3(1). p. 50. 10.1186/s41687-019-0130-5 Retrieved from

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.



Bryce B. Reeve

Professor in Population Health Sciences

Dr. Bryce Reeve is a Professor of Population Health Sciences and Professor of Pediatrics at Duke University School of Medicine.  He also serves as Director of the Center for Health Measurement since 2017.  Trained in psychometric methods, Dr. Reeve’s work focuses on assessing the impact of disease and treatments on the lives of patients and their caregivers.  This includes the development of clinical outcome assessments using both qualitative and quantitative methods, and the integration of patient-centered data in research and healthcare delivery settings to inform decision-making.  From 2000 to 2010, Dr. Reeve served as Program Director for the U.S. National Cancer Institute and oversaw a portfolio of health-related quality of life research in cancer patients. From 2010 to 2017, he served as Professor of Health Policy and Management at the University of North Carolina.  From 2011-2013, Dr. Reeve served as President of the International Society for Quality of Life Research (ISOQOL).  In 2015, he received the John Ware and Alvin Tarlov Career Achievement Prize in Patient-Reported Outcomes Measures.  In 2017, 2018, 2019 and 2021, he was ranked in the top 1% most-cited in his respective field over the past 11-year period.

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.