Generalized and Scalable Optimal Sparse Decision Trees

Zhong, Chudi

Generalized and Scalable Optimal Sparse Decision Trees

No Thumbnail Available

View / Download1.49 MB

Date

2020

Authors

Zhong, Chudi

Advisors

Rudin, Cynthia

Repository Usage Stats

425
views

603
downloads

Abstract

Decision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that have allowed practical algorithms to find \textit{optimal} decision trees. These new techniques have the potential to trigger a paradigm shift where it is possible to construct sparse decision trees to efficiently optimize a variety of objective functions without relying on greedy splitting and pruning heuristics that often lead to suboptimal solutions. The contribution in this work is to provide a general framework for decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables. We present techniques that produce optimal decision trees over a variety of objectives including F-score, AUC, and partial area under the ROC convex hull. We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables and speeds up decision tree construction by several orders of magnitude relative to the state-of-the art.

Type

Master's thesis

Department

Statistical Science

Subjects

Statistics, Computer science, Decision trees, Interpretable Model, Operations research, Optimization

Permalink

https://hdl.handle.net/10161/20774

Citation

Zhong, Chudi (2020). Generalized and Scalable Optimal Sparse Decision Trees. Master's thesis, Duke University. Retrieved from https://hdl.handle.net/10161/20774.

Collections

Masters Theses

Full item page

Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.

Generalized and Scalable Optimal Sparse Decision Trees

Date

Authors

Advisors

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

Abstract

Type

Department

Description

Provenance

Subjects

Citation

Permalink

Citation

Collections