Variable and threshold selection to control predictive accuracy in logistic regression

Kuk, AYC; Li, J; John Rush, A

Variable and threshold selection to control predictive accuracy in logistic regression

View / Download1.2 MB

Date

2014-01-01

Authors

Kuk, AYC

Li, J

John Rush, A

Repository Usage Stats

15
views

87
downloads

Citation Stats

Abstract

Summary: Using data collected from the 'Sequenced treatment alternatives to relieve depression' study, we use logistic regression to predict whether a patient will respond to treatment on the basis of early symptom change and patient characteristics. Model selection criteria such as the Akaike information criterion AIC and mean-squared-error of prediction MSEP may not be appropriate if the aim is to predict with a high degree of certainty who will respond or not respond to treatment. Towards this aim, we generalize the definition of the positive and negative predictive value curves to the case of multiple predictors. We point out that it is the ordering rather than the precise values of the response probabilities which is important, and we arrive at a unified approach to model selection via two-sample rank tests. To avoid overfitting, we define a cross-validated version of the positive and negative predictive value curves and compare these curves after smoothing for various models. When applied to the study data, we obtain a ranking of models that differs from those based on AIC and MSEP, as well as a tree-based method and regularized logistic regression using a lasso penalty. Our selected model performs consistently well for both 4-week-ahead and 7-week-ahead predictions. © 2014 Royal Statistical Society.

Type

Journal article

Subjects

Antidepression trials, Cross-validation, Indeterminate class, Logistic regression, Model selection, Negative predictive value, Positive predictive value

Permalink

https://hdl.handle.net/10161/24819

Published Version (Please cite this version)

10.1111/rssc.12058

Publication Info

Kuk, AYC, J Li and A John Rush (2014). Variable and threshold selection to control predictive accuracy in logistic regression. Journal of the Royal Statistical Society. Series C: Applied Statistics, 63(4). pp. 657–672. 10.1111/rssc.12058 Retrieved from https://hdl.handle.net/10161/24819.

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.

Collections

Scholarly Articles

Full item page

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.

Variable and threshold selection to control predictive accuracy in logistic regression

Date

Authors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Citation Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Published Version (Please cite this version)

Publication Info

Collections