FLAME: A Fast Large-scale Almost Matching Exactly Approach to Causal Inference
Abstract
A classical problem in causal inference is that of matching, where treatment units need to be matched to control units. Some of the main challenges in developing matching methods arise from the tension among (i) inclusion of as many covariates as possible in defining the matched groups, (ii) having matched groups with enough treated and control units for a valid estimate of Average Treatment Effect (ATE) in each group, and (iii) computing the matched pairs efficiently for large datasets. In this paper we propose a fast and novel method for approximate and exact matching in causal analysis called FLAME (Fast Large-scale Almost Matching Exactly). We define an optimization objective for match quality, which gives preferences to matching on covariates that can be useful for predicting the outcome while encouraging as many matches as possible. FLAME aims to optimize our match quality measure, leveraging techniques that are natural for query processing in the area of database management. We provide two implementations of FLAME using SQL queries and bit-vector techniques.
Type
Department
Description
Provenance
Citation
Permalink
Collections
Scholars@Duke
Sudeepa Roy
I joined the Department of Computer Science at Duke University in Fall 2015.
Before joining Duke, I was a postdoctoral research associate in the Department of Computer Science and Engineering,University of Washington where I worked with Prof. Dan Suciu and the database group.
I graduated from the University of Pennsylvania with a Ph.D. in Computer and Information Science where I was advised by Prof. Susan Davidson and Prof. Sanjeev Khanna. During my Ph.D., I did two internships at IBM Research, Almaden,and received a Google PhD fellowship in Structured Data in 2011.
I obtained my master's and bachelor's degrees in Computer Science from Indian Institute of Technology, Kanpur and Jadavpur University respectively.Research Interests I am broadly interested in data and information management with a focus on foundational aspects of big data analysis. My research objective is to help users with heterogenous backgrounds and interests leverage the maximum benefit from the available data. While my ongoing work on explanations in databases directly aims to assist users get deep insights into data by providing rich explanations to their questions, my work in the areas of data and workow provenance, probabilistic databases, and crowd-sourcing probes into compelling, fundamental questions that need to be answered to enable end-to-end processing and analysis of unstructured, noisy, and unreliable data in today's world while preserving its entire context.
Cynthia D. Rudin
Cynthia Rudin is a professor of computer science, electrical and computer engineering, statistical science, and biostatistics & bioinformatics at Duke University, and directs the Interpretable Machine Learning Lab. Previously, Prof. Rudin held positions at MIT, Columbia, and NYU. She holds an undergraduate degree from the University at Buffalo, and a PhD from Princeton University. She is the recipient of the 2022 Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity from the Association for the Advancement of Artificial Intelligence (AAAI). This award, similar only to world-renowned recognitions, such as the Nobel Prize and the Turing Award, carries a monetary reward at the million-dollar level. She is also a three-time winner of the INFORMS Innovative Applications in Analytics Award, was named as one of the "Top 40 Under 40" by Poets and Quants in 2015, and was named by Businessinsider.com as one of the 12 most impressive professors at MIT in 2015. She is a fellow of the American Statistical Association and a fellow of the Institute of Mathematical Statistics.
She is past chair of both the INFORMS Data Mining Section and the Statistical Learning and Data Science Section of the American Statistical Association. She has also served on committees for DARPA, the National Institute of Justice, AAAI, and ACM SIGKDD. She has served on three committees for the National Academies of Sciences, Engineering and Medicine, including the Committee on Applied and Theoretical Statistics, the Committee on Law and Justice, and the Committee on Analytic Research Foundations for the Next-Generation Electric Grid. She has given keynote/invited talks at several conferences including KDD (twice), AISTATS, CODE, Machine Learning in Healthcare (MLHC), Fairness, Accountability and Transparency in Machine Learning (FAT-ML), ECML-PKDD, and the Nobel Conference. Her work has been featured in news outlets including the NY Times, Washington Post, Wall Street Journal, the Boston Globe, Businessweek, and NPR.
Alexander Volfovsky
I am interested in theory and methodology for network analysis, causal inference and statistical/computational tradeoffs and in applications in the social sciences. Modern data streams frequently do not follow the traditional paradigms of n independent observations on p quantities of interest. They can include complex dependencies among the observations (e.g. interference in the study of causal effects) or among the quantities of interest (e.g. probabilities of edge formation in a network). My research is concerned with developing theory and methodological tools for approaching such modern data structures by better understanding these underlying dependence structures. My work concentrates on better understanding Kronecker covariance structures as they are related to network analysis and high dimensional unbalanced factorial designs. I work on theory and methodology for high dimensional data as it relates to network analysis, causal inference and computational and statistical tradeoffs. My primary applied interest is in the health and social sciences with past and ongoing collaborations studying friendship formation in high schools, employment outcomes for college graduates and job mobility as a function of an underlying social network.
Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.