dc.description.abstract |
<p>Decision tree optimization is notoriously difficult from a computational perspective
but essential for the field of interpretable machine learning. Despite efforts over
the past 40 years, only recently have optimization breakthroughs been made that have
allowed practical algorithms to find \textit{optimal} decision trees. These new techniques
have the potential to trigger a paradigm shift where it is possible to construct sparse
decision trees to efficiently optimize a variety of objective functions without relying
on greedy splitting and pruning heuristics that often lead to suboptimal solutions.
The contribution in this work is to provide a general framework for decision tree
optimization that addresses the two significant open problems in the area: treatment
of imbalanced data and fully optimizing over continuous variables. We present techniques
that produce optimal decision trees over a variety of objectives including F-score,
AUC, and partial area under the ROC convex hull. We also introduce a scalable algorithm
that produces provably optimal results in the presence of continuous variables and
speeds up decision tree construction by several orders of magnitude relative to the
state-of-the art.</p>
|
|