Browsing by Subject "Decision trees"
Results Per Page
Sort Options
Item Open Access Generalized and Scalable Optimal Sparse Decision Trees(2020) Zhong, ChudiDecision tree optimization is notoriously difficult from a computational perspective but essential for the field of interpretable machine learning. Despite efforts over the past 40 years, only recently have optimization breakthroughs been made that have allowed practical algorithms to find \textit{optimal} decision trees. These new techniques have the potential to trigger a paradigm shift where it is possible to construct sparse decision trees to efficiently optimize a variety of objective functions without relying on greedy splitting and pruning heuristics that often lead to suboptimal solutions. The contribution in this work is to provide a general framework for decision tree optimization that addresses the two significant open problems in the area: treatment of imbalanced data and fully optimizing over continuous variables. We present techniques that produce optimal decision trees over a variety of objectives including F-score, AUC, and partial area under the ROC convex hull. We also introduce a scalable algorithm that produces provably optimal results in the presence of continuous variables and speeds up decision tree construction by several orders of magnitude relative to the state-of-the art.
Item Open Access Human-in-the-loop Machine Learning System via Model Interpretability(2023) Chen, ZhiThe interpretability of a machine learning system is crucial in situations where it involves human-model interaction or affects the well-being of society. By making the decision process understandable to humans, interpretability makes it easier to troubleshoot, acquire knowledge from, and interact with machine learning models. However, designing an interpretable machine learning system that maximizes the human-in-the-loop experience can be challenging. My thesis aims to address the major challenges in interpretable machine learning and lay the foundations for a more interactive machine learning system.
In this thesis, I first tackle the challenge of building machine learning models with interpretability constraints, particularly in applications with unstructured data such as computer vision and materials science. I propose interpretable models that effectively capture the underlying patterns in the data and allow users to understand the model's decision-making process. Furthermore, this thesis studies the exploration and approximation of the set of all near-optimal models for interpretable model classes, enabling users to visualize, select, and modify multiple well-performing models. Lastly, I demonstrate how interpretable models can provide insights into the data, detecting common dataset flaws such as poorly imputed missing values, confoundings, and biases.
Item Open Access Optimal Sparse Decision Trees(2019) Hu, XiyangDecision tree algorithms have been among the most popular algorithms for interpretable (transparent) machine learning since the early 1980's. The problem that has plagued decision tree algorithms since their inception is their lack of optimality, or lack of guarantees of closeness to optimality: decision tree algorithms are often greedy or myopic, and sometimes produce unquestionably suboptimal models. Hardness of decision tree optimization is both a theoretical and practical obstacle, and even careful mathematical programming approaches have not been able to solve these problems efficiently. This work introduces the first practical algorithm for optimal decision trees for binary variables. The algorithm is a co-design of analytical bounds that reduce the search space and modern systems techniques, including data structures and a custom bit-vector library. We highlight possible steps to improving the scalability and speed of future generations of this algorithm based on insights from our theory and experiments.