Browsing by Author "Page, David"
Results Per Page
Sort Options
Item Open Access An Efficient Pseudo-likelihood Method for Sparse Binary Pairwise Markov Network EstimationGeng, Sinong; Kuang, Zhaobin; Page, DavidThe pseudo-likelihood method is one of the most popular algorithms for learning sparse binary pairwise Markov networks. In this paper, we formulate the $L_1$ regularized pseudo-likelihood problem as a sparse multiple logistic regression problem. In this way, many insights and optimization procedures for sparse logistic regression can be applied to the learning of discrete Markov networks. Specifically, we use the coordinate descent algorithm for generalized linear models with convex penalties, combined with strong screening rules, to solve the pseudo-likelihood problem with $L_1$ regularization. Therefore a substantial speedup without losing any accuracy can be achieved. Furthermore, this method is more stable than the node-wise logistic regression approach on unbalanced high-dimensional data when penalized by small regularization parameters. Thorough numerical experiments on simulated data and real world data demonstrate the advantages of the proposed method.Item Embargo Exposomic modeling approaches for social and environmental determinants of health(2023) McCormack, KaraStudies of human health have recently expanded to focus on the exposome paradigm, encompassing allexposures humans encounter from conception onward. The central theme of this work is to develop and test novel statistical methodologies that can address the challenges of the complex relationships between environmental exposures, socioeconomic distress, and health outcomes. However, source, measurement, and volume intricacies inherent to these data have constrained progression of statistical methods for key research questions.
In this work, we explore three approaches to characterizing community health and its potential impact on several types of disease outcomes. In the first approach, we implement a latent class model to socioeconomic and comorbidities data and explore these classifications as fixed effects in an ecological spatial model of COVID-19 cases and deaths in NYC during two time periods of the pandemic. In the second, we use a non-parametric Bayesian approach to form socio-economic and pollution cluster profiles across US counties. We then use these profiles to inform a Bayesian spatial model on breast cancer mortality for data from 2014. In the final approach, we utilize a latent network model traditionally used in psychometrics research to explore structural racism. Using information from five domains (employment, education, housing, health, and criminal justice), we identify new variable complexes to illustrate the complex the manifestations of structural racism at the census tract level in Pennsylvania.
Item Open Access Machine Learning to Predict Developmental Neurotoxicity with High-throughput Data from 2D Bio-engineered TissuesKuusisto, Finn; Costa, Vitor Santos; Hou, Zhonggang; Thomson, James; Page, David; Stewart, RonThere is a growing need for fast and accurate methods for testing developmental neurotoxicity across several chemical exposure sources. Current approaches, such as in vivo animal studies, and assays of animal and human primary cell cultures, suffer from challenges related to time, cost, and applicability to human physiology. We previously demonstrated success employing machine learning to predict developmental neurotoxicity using gene expression data collected from human 3D tissue models exposed to various compounds. The 3D model is biologically similar to developing neural structures, but its complexity necessitates extensive expertise and effort to employ. By instead focusing solely on constructing an assay of developmental neurotoxicity, we propose that a simpler 2D tissue model may prove sufficient. We thus compare the accuracy of predictive models trained on data from a 2D tissue model with those trained on data from a 3D tissue model, and find the 2D model to be substantially more accurate. Furthermore, we find the 2D model to be more robust under stringent gene set selection, whereas the 3D model suffers substantial accuracy degradation. While both approaches have advantages and disadvantages, we propose that our described 2D approach could be a valuable tool for decision makers when prioritizing neurotoxicity screening.Item Open Access Privacy-Preserving Collaborative Prediction using Random ForestsGiacomelli, Irene; Jha, Somesh; Kleiman, Ross; Page, David; Yoon, KyonghwanWe study the problem of privacy-preserving machine learning (PPML) for ensemble methods, focusing our effort on random forests. In collaborative analysis, PPML attempts to solve the conflict between the need for data sharing and privacy. This is especially important in privacy sensitive applications such as learning predictive models for clinical decision support from EHR data from different clinics, where each clinic has a responsibility for its patients' privacy. We propose a new approach for ensemble methods: each entity learns a model, from its own data, and then when a client asks the prediction for a new private instance, the answers from all the locally trained models are used to compute the prediction in such a way that no extra information is revealed. We implement this approach for random forests and we demonstrate its high efficiency and potential accuracy benefit via experiments on real-world datasets, including actual EHR data.