Privacy-Preserving Collaborative Prediction using Random Forests
Repository Usage Stats
We study the problem of privacy-preserving machine learning (PPML) for ensemble methods, focusing our effort on random forests. In collaborative analysis, PPML attempts to solve the conflict between the need for data sharing and privacy. This is especially important in privacy sensitive applications such as learning predictive models for clinical decision support from EHR data from different clinics, where each clinic has a responsibility for its patients' privacy. We propose a new approach for ensemble methods: each entity learns a model, from its own data, and then when a client asks the prediction for a new private instance, the answers from all the locally trained models are used to compute the prediction in such a way that no extra information is revealed. We implement this approach for random forests and we demonstrate its high efficiency and potential accuracy benefit via experiments on real-world datasets, including actual EHR data.
More InfoShow full item record
Instructor in the Department of Biostatistics & Bioinformatics
David Page works on algorithms for data mining and machine learning, and their applications to biomedical data, especially de-identified electronic health records and high-throughput genetic and other molecular data. Of particular interest are machine learning methods for complex multi-relational data (such as electronic health records or molecules as shown) and irregular temporal data, and methods that find causal relationships or produce human-interpretable output (such as the rules for molecu