Variable and threshold selection to control predictive accuracy in logistic regression
dc.contributor.author | Kuk, AYC | |
dc.contributor.author | Li, J | |
dc.contributor.author | John Rush, A | |
dc.date.accessioned | 2022-04-14T00:10:29Z | |
dc.date.available | 2022-04-14T00:10:29Z | |
dc.date.issued | 2014-01-01 | |
dc.date.updated | 2022-04-14T00:10:29Z | |
dc.description.abstract | Summary: Using data collected from the 'Sequenced treatment alternatives to relieve depression' study, we use logistic regression to predict whether a patient will respond to treatment on the basis of early symptom change and patient characteristics. Model selection criteria such as the Akaike information criterion AIC and mean-squared-error of prediction MSEP may not be appropriate if the aim is to predict with a high degree of certainty who will respond or not respond to treatment. Towards this aim, we generalize the definition of the positive and negative predictive value curves to the case of multiple predictors. We point out that it is the ordering rather than the precise values of the response probabilities which is important, and we arrive at a unified approach to model selection via two-sample rank tests. To avoid overfitting, we define a cross-validated version of the positive and negative predictive value curves and compare these curves after smoothing for various models. When applied to the study data, we obtain a ranking of models that differs from those based on AIC and MSEP, as well as a tree-based method and regularized logistic regression using a lasso penalty. Our selected model performs consistently well for both 4-week-ahead and 7-week-ahead predictions. © 2014 Royal Statistical Society. | |
dc.identifier.issn | 0035-9254 | |
dc.identifier.issn | 1467-9876 | |
dc.identifier.uri | ||
dc.language | en | |
dc.publisher | Wiley | |
dc.relation.ispartof | Journal of the Royal Statistical Society. Series C: Applied Statistics | |
dc.relation.isversionof | 10.1111/rssc.12058 | |
dc.subject | Antidepression trials | |
dc.subject | Cross-validation | |
dc.subject | Indeterminate class | |
dc.subject | Logistic regression | |
dc.subject | Model selection | |
dc.subject | Negative predictive value | |
dc.subject | Positive predictive value | |
dc.title | Variable and threshold selection to control predictive accuracy in logistic regression | |
dc.type | Journal article | |
pubs.begin-page | 657 | |
pubs.end-page | 672 | |
pubs.issue | 4 | |
pubs.organisational-group | Duke | |
pubs.organisational-group | School of Medicine | |
pubs.publication-status | Published | |
pubs.volume | 63 |
Files
Original bundle
- Name:
- Kuk_Rush-2014_Variable and threshold selection to control predictive accuracy in logistic regression.pdf
- Size:
- 1.2 MB
- Format:
- Adobe Portable Document Format