Browsing by Author "Li, Fan"
Results Per Page
Sort Options
Item Open Access A Dynamic Directional Model for Effective Brain Connectivity using Electrocorticographic (ECoG) Time Series.(J Am Stat Assoc, 2015-03-01) Zhang, Tingting; Wu, Jingwei; Li, Fan; Caffo, Brian; Boatman-Reich, DanaWe introduce a dynamic directional model (DDM) for studying brain effective connectivity based on intracranial electrocorticographic (ECoG) time series. The DDM consists of two parts: a set of differential equations describing neuronal activity of brain components (state equations), and observation equations linking the underlying neuronal states to observed data. When applied to functional MRI or EEG data, DDMs usually have complex formulations and thus can accommodate only a few regions, due to limitations in spatial resolution and/or temporal resolution of these imaging modalities. In contrast, we formulate our model in the context of ECoG data. The combined high temporal and spatial resolution of ECoG data result in a much simpler DDM, allowing investigation of complex connections between many regions. To identify functionally segregated sub-networks, a form of biologically economical brain networks, we propose the Potts model for the DDM parameters. The neuronal states of brain components are represented by cubic spline bases and the parameters are estimated by minimizing a log-likelihood criterion that combines the state and observation equations. The Potts model is converted to the Potts penalty in the penalized regression approach to achieve sparsity in parameter estimation, for which a fast iterative algorithm is developed. The methods are applied to an auditory ECoG dataset.Item Open Access Accommodating the ecological fallacy in disease mapping in the absence of individual exposures.(Stat Med, 2017-09-19) Wang, Feifei; Wang, Jian; Gelfand, Alan; Li, FanIn health exposure modeling, in particular, disease mapping, the ecological fallacy arises because the relationship between aggregated disease incidence on areal units and average exposure on those units differs from the relationship between the event of individual incidence and the associated individual exposure. This article presents a novel modeling approach to address the ecological fallacy in the least informative data setting. We assume the known population at risk with an observed incidence for a collection of areal units and, separately, environmental exposure recorded during the period of incidence at a collection of monitoring stations. We do not assume any partial individual level information or random allocation of individuals to observed exposures. We specify a conceptual incidence surface over the study region as a function of an exposure surface resulting in a stochastic integral of the block average disease incidence. The true block level incidence is an unavailable Monte Carlo integration for this stochastic integral. We propose an alternative manageable Monte Carlo integration for the integral. Modeling in this setting is immediately hierarchical, and we fit our model within a Bayesian framework. To alleviate the resulting computational burden, we offer 2 strategies for efficient model fitting: one is through modularization, the other is through sparse or dimension-reduced Gaussian processes. We illustrate the performance of our model with simulations based on a heat-related mortality dataset in Ohio and then analyze associated real data.Item Open Access Advancements in Probabilistic Machine Learning and Causal Inference for Personalized Medicine(2019) Lorenzi, Elizabeth CatherineIn this dissertation, we present four novel contributions to the field of statistics with the shared goal of personalizing medicine to individual patients. These methods are developed to directly address problems in health care through two subfields of statistics: probabilistic machine learning and causal inference. These projects include improving predictions of adverse events after surgeries, or learning the effectiveness of treatments for specific subgroups and for individuals. We begin the dissertation in Chapter 1 with a discussion of personalized medicine, the use of electronic health record (EHR) data, and a brief discussion on learning heterogeneous treatment effects. In chapter 2, we present a novel algorithm, Predictive Hierarchical Clustering (PHC), for agglomerative hierarchical clustering of current procedural terminology (CPT) codes. Our predictive hierarchical clustering aims to cluster subgroups, not individual observations, found within our data, such that the clusters discovered result in optimal performance of a classification model, specifically for predicting surgical complications. In chapter 3, we develop a hierarchical infinite latent factor model (HIFM) to appropriately account for the covariance structure across subpopulations in data. We propose a novel Hierarchical Dirichlet Process shrinkage prior on the loadings matrix that flexibly captures the underlying structure of our data across subpopulations while sharing information to improve inference and prediction. We apply this work to the problem of predicting surgical complications using electronic health record data for geriatric patients at Duke University Health System (DUHS). The last chapters of the dissertation address personalized medicine from a causal perspective, where the goal is to understand how interventions affect individuals not full populations. In chapter 4, we address heterogeneous treatment effects across subgroups, where guidance for observational comparisons within subgroups is lacking as is a connection to classic design principles for propensity score (PS) analyses. We address these shortcomings by proposing a novel propensity score method for subgroup analysis (SGA) that seeks to balance existing strategies in an automatic and efficient way. With the use of overlap weights, we prove that an over-specified propensity model including interactions between subgroups and all covariates results in exact covariate balance within subgroups. This is paired with variable selection approaches to adjust for a possibly overspecified propensity score model. Finally, chapter 5 discusses our final contribution, a longitudinal matching algorithm aiming to predict individual treatment effects of a medication change for diabetes patients. This project aims to develop a novel and generalizable causal inference framework for learning heterogeneous treatment effects from Electronic Health Records (EHR) data. The key methodological innovation is to cast the sparse and irregularly-spaced EHR time series into functional data analysis in the design stage to adjust for confounding that changes over time. We conclude the dissertation and discuss future work in Section 6, outlining many directions for continued research on these topics.
Item Open Access Association of body mass index with mortality and functional outcome after acute ischemic stroke.(Scientific reports, 2017-05-31) Sun, Weiping; Huang, Yining; Xian, Ying; Zhu, Sainan; Jia, Zhirong; Liu, Ran; Li, Fan; Wei, Jade W; Wang, Ji-Guang; Liu, Ming; Anderson, Craig SThe relation between obesity and stroke outcome has been disputed. This study was aimed to determine the association of body mass index (BMI) with mortality and functional outcome in patients with acute ischemic stroke. Data were from a national, multi-centre, prospective, hospital-based register: the ChinaQUEST (Quality Evaluation of Stroke Care and Treatment) study. Of 4782 acute ischemic stroke patients, 282 were underweight (BMI < 18.5 kg/m2), 2306 were normal-weight (BMI 18.5 to < 24 kg/m2), 1677 were overweight (BMI 24 to <28 kg/m2) and 517 were obese (BMI ≥ 28 kg/m2). The risks of death at 12 months and death or high dependency at 3 and 12 months in overweight (HR: 0.97, 95% CI: 0.78-1.20; OR: 0.93, 95% CI: 0.80-1.09; OR: 0.95, 95% CI: 0.81-1.12) and obese patients (HR: 1.07, 95% CI: 0.78-1.48; OR: 0.96, 95% CI: 0.75-1.22; OR: 1.06, 95% CI: 0.83-1.35) did not differ from normal-weight patients significantly after adjusting for baseline characteristics. Underweight patients had significantly increased risks of these three outcomes. In ischemic stroke patients, being overweight or obese was not associated with decreased mortality or better functional recovery but being underweight predicted unfavourable outcomes.Item Open Access Bayesian Analysis of Latent Threshold Dynamic Models(2012) Nakajima, JochiTime series modeling faces increasingly high-dimensional problems in many scientific areas. Lack of relevant, data-based constraints typically leads to increased uncer-tainty in estimation and degradation of predictive performance. This dissertation addresses these general questions with a new and broadly applicable idea based on latent threshold models. The latent threshold approach is a model-based framework for inducing data-driven shrinkage of elements of parameter processes, collapsing them fully to zero when redundant or irrelevant while allowing for time-varying non-zero values when supported by the data. This dynamic sparsity modeling technique is implemented in broad classes of multivariate time series models with application tovarious time series data. The analyses demonstrate the utility of the latent threshold idea in reducing estimation uncertainty and improving predictions as well as model interpretation. Chapter 1 overviews the idea of the latent threshold approach and outlines the dissertation. Chapter 2 introduces the new approach to dynamic sparsity using latent threshold modeling and also discusses Bayesian analysis and computation for model fitting. Chapter 3 describes latent threshold multivariate models for a wide range of applications in the real data analysis that follows. Chapter 4 provides US and Japanese macroeconomic data analysis using latent threshold VAR models. Chapter 5 analyzes time series of foreign currency exchange rates (FX) using latent thresh-old dynamic factor models. Chapter 6 provides a study of electroencephalographic (EEG) time series using latent threshold factor process models. Chapter 7 develops a new framework of dynamic network modeling for multivariate time series using the latent threshold approach. Finally, Chapter 8 concludes the dissertation with open questions and future works.Item Open Access Bayesian Estimation and Sensitivity Analysis for Causal Inference(2019) Zaidi, Abbas MThis disseration aims to explore Bayesian methods for causal inference. In chapter 1, we present an overview of fundamental ideas from causal inference along with an outline of the methodological developments that we hope to tackle. In chapter 2, we develop a Gaussian-process mixture model for heterogeneous treatment effect estimation that leverages the use of transformed outcomes. The approach we will present attempts to improve point estimation and uncertainty quantification relative to past work that has used transformed variable related methods as well as traditional outcome modeling. Earlier work on modeling treatment effect heterogeneity using transformed outcomes has relied on tree based methods such as single regression trees and random forests. Under the umbrella of non-parametric models, outcome modeling has been performed using Bayesian additive regression trees and various flavors of weighted single trees. These approaches work well when large samples are available, but suffer in smaller samples where results are more sensitive to model misspecification -- our method attempts to garner improvements in inference quality via a correctly specified model rooted in Bayesian non-parametrics. Furthermore, while we begin with a model that assumes that the treatment assignment mechanism is known, an extension where it is learnt from the data is presented for applications to observational studies. Our approach is applied to simulated and real data to demonstrate our theorized improvements in inference with respect to two causal estimands: the conditional average treatment effect and the average treatment effect. By leveraging our correctly specified model, we are able to more accurately estimate the treatment effects while reducing their variance. In chapter 3, we parametrically and hierarchically estimate the average causal effects of different lengths of stay in the Udayan Ghar Program under the assumption that selection into different lengths is based on a set of observed covariates. This program was piloted in New Delhi, India as a means of providing a residential surrogate to vulnerable and at risk children with the hope of improving their psychological development. We find that the estimated effects on the psychological ideas of self concept and ego resilience (measured by the standardized Piers-Harris score) increase with the length of the time spent in the program. We are also able to conclude that there are measurable differences that exist between male and female children that spend time in the program. In chapter 4, we supplement the estimation of hierarchical dose-response function estimation by introducing a novel sensitivity-analysis and summarization strategy for assessing the robustness of our results to violations of the assumption of unconfoundedness. Finally, in chapter 5, we summarize what this dissertation has achieved, and briefly outline important areas where our work warrants further development.
Item Open Access Bayesian Mixture Modeling Approaches for Intermediate Variables and Causal Inference(2010) Schwartz, Scott LeeThis thesis examines causal inference related topics involving intermediate variables, and uses Bayesian methodologies to advance analysis capabilities in these areas. First, joint modeling of outcome variables with intermediate variables is considered in the context of birthweight and censored gestational age analyses. The proposed methodology provides improved inference capabilities for birthweight and gestational age, avoids post-treatment selection bias problems associated with conditional on gestational age analyses, and appropriately assesses the uncertainty associated with censored gestational age. Second, principal stratification methodology for settings where causal inference analysis requires appropriate adjustment of intermediate variables is extended to observational settings with binary treatments and binary intermediate variables. This is done by uncovering the structural pathways of unmeasured confounding affecting principal stratification analysis and directly incorporating them into a model based sensitivity analysis methodology. Demonstration focuses on a study of the efficacy of influenza vaccination in elderly populations. Third, flexibility, interpretability, and capability of principal stratification analyses for continuous intermediate variables are improved by replacing the current fully parametric methodologies with semiparametric Bayesian alternatives. This presentation is one of the first uses of nonparametric techniques in causal inference analysis,
and opens a connection between these two fields. Demonstration focuses on two studies, one involving a cholesterol reduction drug, and one examine the effect of physical activity on cardiovascular disease as it relates to body mass index.
Item Open Access Essays on Propensity Score Methods for Causal Inference in Observational Studies(2018) Nguyen, Nghi Le PhuongIn this dissertation, I present three essays from three different research projects and they involve different usages of propensity scores in drawing causal inferences in observational studies.
Chapter 1 talks about the general idea of causal inference as well as the concept of randomized experiments and observational studies. It introduces the three different projects and their contributions to the literature.
Chapter 2 gives a critical review and an extensive discussion of several commonly-used propensity score methods when the data have a multilevel structure, including matching, weighting, stratification, and methods that combine these with regression. The usage of these methods is illustrated using a data set about endoscopic vein-graft harvesting in coronary artery bypass graft (CABG) surgeries. We discuss important aspects of the implementation of these methods such as model specification and standard error calculations. Based on the comparison, we provide general guidelines for using propensity score methods with multilevel data in practice. We also provide the relevant code in the form of an \textsf{R} package, available on GitHub.
In observational studies, subjects are no longer assigned to treatment at random as in randomized experiments, and thus the association between the treatment and outcome can be due to some unmeasured variable that affects both the treatment and the outcome. Chapter 3 focuses on conducting sensitivity analysis to assess the robustness of the estimated quantity when the unconfoundedness assumption is violated. Two approaches to sensitivity analysis are presented, both are extensions from previous works to accommodate for a count outcome. One method is based on the subclassification estimator and it relies on maximum likelihood estimation. The second method is more flexible on the estimation method and is based on simulations. We illustrate both methods using a data set from a traffic safety research study about the safety effectiveness (measured in crash counts reduction) of the combined application of center line rumble strips and shoulder rumble strips on two-lane rural roads in Pennsylvania.
Chapter 4 proposes a method for estimating heterogeneous causal effects in observational studies by augmenting additive-interactive Gaussian process regression using the propensity scores, yielding a flexible yet robust way to predict the potential outcome surface from which the conditional treatment effects can be calculated. We show that our method works well even in presence of strong confounding and illustrate this by comparing with commonly-used methods in different settings using simulated data.
Finally, chapter 5 concludes this dissertation and discusses possible future works for each of the projects.
Item Open Access Hölder Bounds for Sensitivity Analysis in Causal Reasoning.(CoRR, 2021) Assaad, Serge; Zeng, Shuxi; Pfister, Henry; Li, Fan; Carin, LawrenceWe examine interval estimation of the effect of a treatment T on an outcome Y given the existence of an unobserved confounder U. Using H\"older's inequality, we derive a set of bounds on the confounding bias |E[Y|T=t]-E[Y|do(T=t)]| based on the degree of unmeasured confounding (i.e., the strength of the connection U->T, and the strength of U->Y). These bounds are tight either when U is independent of T or when U is independent of Y given T (when there is no unobserved confounding). We focus on a special case of this bound depending on the total variation distance between the distributions p(U) and p(U|T=t), as well as the maximum (over all possible values of U) deviation of the conditional expected outcome E[Y|U=u,T=t] from the average expected outcome E[Y|T=t]. We discuss possible calibration strategies for this bound to get interval estimates for treatment effects, and experimentally validate the bound using synthetic and semi-synthetic datasets.Item Open Access Modeling and Methodological Advances in Causal Inference(2021) Zeng, ShuxiThis thesis presents several novel modeling or methodological advancements to causal inference. First, we investigate the use of propensity score weighting in the randomized trials for covariate adjustment. We introduce the class of balancing weights and study its theoretical property. We demonstrate that it is asymptotically equivalent to the analysis of covariance (ANCOVA) and derive the closed-form variance estimator. We further recommend the overlap weighting estimator based on its semiparametric efficiency and good finite-sample performance. Next, we focus on comparative effectiveness studies with survival outcomes. As opposed to the approach coupling with a Cox proportional hazards model, we follow an ``once for all'' approach and construct pseudo-observations of the censored outcomes. We study the theoretical property of propensity score weighting estimator based on pseudo-observations and provide closed-form variance estimators. The third contribution lies in the domain of causal mediation analysis, which studies how much of the treatment effect is mediated or explained through a given intermediate variable. The existing approaches are not directly applicable to scenario where both the mediator and outcome are measured on the sparse and irregular time grids. We propose a causal mediation framework by treating the sparse and irregular data as realizations of smooth processes and provide the assumptions for nonparametric identifications. We also provide a functional principal component analysis (FPCA) approach for estimation and carries out inference with a Bayesian paradigm. Furthermore, we study how to achieve double robustness with machine learning approaches. We develop a new algorithm that learns the double-robust representations in observational studies. The proposed method can learn the low-dimensional representations as well as the balancing weights simultaneously. Lastly, we study how to build a robust prediction model by exploiting the causal relationships. From a causal perspective, we argue robust models should capture the stable causal relationships as opposed to the spurious correlations. We propose a causal transfer random forest method learning the stable causal relationships efficiently from a large scale of observational data and a small amount of randomized data. We provide theoretical justifications and validate the algorithm empirically with synthetic experiments and real world prediction tasks.
In summary, this thesis makes contributions to the following three major areas in causal inference: (i) propensity score weighting methods for randomized experiments and observational studies, which consists of (a) randomized controlled trial (Chapter 2}) (b) survival outcome (Chapter 3); (ii) causal mediation analysis with sparse and irregular longitudinal data (Chapter 4); (iii) machine learning methods for causal inference, which consists of (a) double robustness (Chapter 5), (b) causal transfer random forest (Chapter 6).
Item Open Access Multiple Imputation on Missing Values in Time Series Data(2015) Oh, SohaeFinancial stock market data, for various reasons, frequently contain missing values. One reason for this is that, because the markets close for holidays, daily stock prices are not always observed. This creates gaps in information, making it difficult to predict the following day’s stock prices. In this situation, information during the holiday can be “borrowed” from other countries’ stock market, since global stock prices tend to show similar movements and are in fact highly correlated. The main goal of this study is to combine stock index data from various markets around the world and develop an algorithm to impute the missing values in individual stock index using “information-sharing” between different time series. To develop imputation algorithm that accommodate time series-specific features, we take multiple imputation approach using dynamic linear model for time-series and panel data. This algorithm assumes ignorable missing data mechanism, as which missingness due to holiday. The posterior distributions of parameters, including missing values, is simulated using Monte Carlo Markov Chain (MCMC) methods and estimates from sets of draws are then combined using Rubin’s combination rule, rendering final inference of the data set. Specifically, we use the Gibbs sampler and Forward Filtering and Backward Sampling (FFBS) to simulate joint posterior distribution and posterior predictive distribution of latent variables and other parameters. A simulation study is conducted to check the validity and the performance of the algorithm using two error-based measurements: Root Mean Square Error (RMSE), and Normalized Root Mean Square Error (NRMSE). We compared the overall trend of imputed time series with complete data set, and inspected the in-sample predictability of the algorithm using Last Value Carried Forward (LVCF) method as a bench mark. The algorithm is applied to real stock price index data from US, Japan, Hong Kong, UK and Germany. From both of the simulation and the application, we concluded that the imputation algorithm performs well enough to achieve our original goal, predicting the stock price for the opening price after a holiday, outperforming the benchmark method. We believe this multiple imputation algorithm can be used in many applications that deal with time series with missing values such as financial and economic data and biomedical data.
Item Open Access Predicting the Risk of Huntington's Disease with Multiple Longitudinal Biomarkers.(Journal of Huntington's disease, 2019-06-22) Li, Fan; Li, Kan; Li, Cai; Luo, Sheng; PREDICT-HD and ENROLL-HD Investigators of the Huntington Study GroupBACKGROUND:Huntington's disease (HD) has gradually become a public health threat, and there is a growing interest in developing prognostic models to predict the time for HD diagnosis. OBJECTIVE:This study aims to develop a novel prognostic model that leverages multiple longitudinal biomarkers to inform the risk of HD. METHODS:The multivariate functional principal component analysis was used to summarize the essential information from multiple longitudinal markers and to obtain a set of prognostic scores. The prognostic scores were used as predictors in a Cox model to predict the right-censored time to diagnosis. We used cross-validation to determine the best model in PREDICT-HD (n = 1,039) and ENROLL-HD (n = 1,776); external validation was carried out in ENROLL-HD. RESULTS:We considered six commonly measured longitudinal biomarkers in PREDICT-HD and ENROLL-HD (Total Motor Score, Symbol Digit Modalities Test, Stroop Word Test, Stroop Color Test, Stroop Interference Test, and Total Functional Capacity). The prognostic model utilizing these longitudinal biomarkers significantly improved the predictive performance over the model with baseline biomarker information. A new prognostic index was computed using the proposed model, and can be dynamically updated over time as new biomarker measurements become available. CONCLUSION:Longitudinal measurements of commonly measured clinical biomarkers substantially improve the risk prediction of Huntington's disease diagnosis. Calculation of the prognostic index informs the patient's risk category and facilitates patient selection in future clinical trials.Item Open Access Propensity Score Methods For Causal Subgroup Analysis(2022) Yang, SiyunSubgroup analyses are frequently conducted in comparative effectiveness research and randomized clinical trials to assess evidence of heterogeneous treatment effect across patient subpopulations. Though widely used in medical research, causal inference methods for conducting statistical analyses on a range of pre-specified subpopulations remain underdeveloped, particularly in observational studies. This dissertation develops and extends propensity score methods for causal subgroup analysis.
In Chapter 2, we develop a suite of analytical methods and visualization tools for causal subgroup analysis. First, we introduce the estimand of subgroup weighted average treatment effect and provide the corresponding propensity score weighting estimator. We show that balancing covariates within a subgroup bounds the bias of the estimator of subgroup causal effects. Second, we propose to use the overlap weighting method to achieve exact balance within subgroups. We further propose a method that combines overlap weighting and LASSO, to balance the bias-variance tradeoff in subgroup analysis. Finally, we design a new diagnostic plot---the Connect-S plot---for visualizing the subgroup covariate balance. Extensive simulation studies are presented to compare the proposed method with several existing methods. We apply the proposed methods to the observational COMPARE-UF study to evaluate the causal effects of Myomectomy versus Hysterectomy on the relief of symptoms and quality of life (a continuous outcome) in a number of pre-specified subgroups of patients with uterine fibroids.
In Chapter 3, we investigate the propensity score weighting method for causal subgroup analysis with time-to-event outcomes. We introduce two causal estimands, the subgroup marginal hazard ratio and subgroup restricted average causal effect, and provide corresponding propensity score weighting estimators. We analytically established that the bias of subgroup restricted average causal effect is determined by subgroup covariate balance. Using extensive simulations, we compare the performance of various combination of propensity score models (logistic regression, random forests, LASSO, and generalized boosted models) and weighting schemes (inverse probability weighting, and overlap weighting) for estimating the survival causal estimands. We find that the logistic model with subgroup-covariate interactions selected by LASSO consistently outperforms other propensity score models. Also, overlap weighting generally outperforms inverse probability weighting in terms of balance, bias and variance, and the advantage is particularly pronounced in small subgroups and/or in the presence of poor overlap. Again, we apply the methods to the COMPARE-UF study with a time-to-event outcome, the time to disease recurrence after receiving a procedure.
In Chapter 4, we extend propensity score weighting methodology for covariate adjustment to improve the precision and power of subgroup analyses in RCTs. We fit a logistic regression propensity model with pre-specified covariate-subgroup interactions. We show that by construction, overlap weighting exactly balances the covariates with interaction terms in each subgroup. Extensive simulations are performed to compare the operating characteristics of unadjusted estimator, different propensity score weighting estimators and the analysis of covariance estimator. We apply these methods to the HF-ACTION trial to evaluate the effect of exercise training on 6-minute walk test in several pre-specified subgroups.
Item Open Access Spatial Bayesian Variable Selection with Application to Functional Magnetic Resonance Imaging (fMRI)(2011) Yang, YingFunctional magnetic resonance imaging (fMRI) is a major neuroimaging methodology and have greatly facilitate basic cognitive neuroscience research. However, there are multiple statistical challenges in the analysis of fMRI data, including, dimension reduction, multiple testing and inter-dependence of the MRI responses. In this thesis, a spatial Bayesian variable selection (BVS) model is proposed for the analysis of multi-subject fMRI data. The BVS framework simultaneously account for uncertainty in model specific parameters as well as the model selection process, solving the multiple testing problem. A spatial prior incorporate the spatial relationship of the MRI response, accounting for their inter-dependence. Compared to the non-spatial BVS model, the spatial BVS model enhances the sensitivity and accuracy of identifying activated voxels.
Item Open Access Topics and Applications of Weighting Methods in Case-Control and Observational Studies(2019) Li, FanWeighting methods have been widely used in statistics and related applications. For example, the inverse probability weighting is a standard approach to correct for survey non-response. The case-control design, frequently seen in epidemiologic or genetic studies, can be regarded as a special type of survey design; analogous inverse probability weighting approaches have been explored when the interest is the association between exposures and the disease (primary analysis) as well as when the interest is the association among exposures (secondary analysis). Meanwhile, in observational comparative effectiveness research, inverse probability weighting has been suggested as a valid approach to correct for confounding bias. This dissertation develops and extends weighting methods for case-control and observational studies.
The first part of this dissertation extends the inverse probability weighting approach for secondary analysis of case-control data. We revisit an inverse probability weighting estimator to offer new insights and extensions. Specifically, we construct its more general form by generalized least squares (GLS). Such a construction allows us to connect the GLS estimator with the generalized method of moments and motivates a new specification test designed to assess the adequacy of the inverse probability weights. The specification test statistic measures the weighted discrepancy between the case and control subsample estimators, and asymptotically follows a Chi-squared distribution under correct model specification. We illustrate the GLS estimator and specification test using a case-control sample of peripheral arterial disease, and use simulations to shed light on the operating characteristics of the specification test. The second part develops a robust difference-in-differences (DID) estimator for estimating causal effect with observational before-after data. Within the DID framework, two common estimation strategies are outcome regression and propensity score weighting. Motivated by a real application in traffic safety research, we propose a new double-robust DID estimator that hybridizes outcome regression and propensity score weighting. We show that the proposed estimator possesses the desirable large-sample robustness property, namely the consistency only requires either one of the outcome model or the propensity score model to be correctly specified. We illustrate the new estimator to study the causal effect of rumble strips in reducing vehicle crashes, and conduct a simulation study to examine its finite-sample performance. The third part discusses a unified framework, the balancing weights, for estimating causal effects in observational studies with multiple treatments. These weights incorporate the generalized propensity scores to balance the weighted covariate distribution of each treatment group, all weighted toward a common pre-specified target population. Within this framework, we further develop the generalized overlap weights, constructed as the product of the inverse probability weights and the harmonic mean of the generalized propensity scores. The generalized overlap weights corresponds to the target population with the most overlap in covariates between treatments, similar to the population in equipoise in clinical trials. We show that the generalized overlap weights minimize the total asymptotic variance of the nonparametric estimators for the pairwise contrasts within the class of balancing weights. We apply the new weighting method to study the racial disparities in medical expenditure and further examine its operating characteristics by simulations.