Browsing by Author "Carroll, Robert J"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
Item Open Access An Atlas of Genetic Variation Linking Pathogen-Induced Cellular Traits to Human Disease.(Cell host & microbe, 2018-08) Wang, Liuyang; Pittman, Kelly J; Barker, Jeffrey R; Salinas, Raul E; Stanaway, Ian B; Williams, Graham D; Carroll, Robert J; Balmat, Tom; Ingham, Andy; Gopalakrishnan, Anusha M; Gibbs, Kyle D; Antonia, Alejandro L; eMERGE Network; Heitman, Joseph; Lee, Soo Chan; Jarvik, Gail P; Denny, Joshua C; Horner, Stacy M; DeLong, Mark R; Valdivia, Raphael H; Crosslin, David R; Ko, Dennis CPathogens have been a strong driving force for natural selection. Therefore, understanding how human genetic differences impact infection-related cellular traits can mechanistically link genetic variation to disease susceptibility. Here we report the Hi-HOST Phenome Project (H2P2): a catalog of cellular genome-wide association studies (GWAS) comprising 79 infection-related phenotypes in response to 8 pathogens in 528 lymphoblastoid cell lines. Seventeen loci surpass genome-wide significance for infection-associated phenotypes ranging from pathogen replication to cytokine production. We combined H2P2 with clinical association data from patients to identify a SNP near CXCL10 as a risk factor for inflammatory bowel disease. A SNP in the transcriptional repressor ZBTB20 demonstrated pleiotropy, likely through suppression of multiple target genes, and was associated with viral hepatitis. These data are available on a web portal to facilitate interpreting human genome variation through the lens of cell biology and should serve as a rich resource for the research community.Item Open Access Applying active learning to high-throughput phenotyping algorithms for electronic health records data.(Journal of the American Medical Informatics Association : JAMIA, 2013-12) Chen, Yukun; Carroll, Robert J; Hinz, Eugenia R McPeek; Shah, Anushi; Eyler, Anne E; Denny, Joshua C; Xu, HuaObjectives
Generalizable, high-throughput phenotyping methods based on supervised machine learning (ML) algorithms could significantly accelerate the use of electronic health records data for clinical and translational research. However, they often require large numbers of annotated samples, which are costly and time-consuming to review. We investigated the use of active learning (AL) in ML-based phenotyping algorithms.Methods
We integrated an uncertainty sampling AL approach with support vector machines-based phenotyping algorithms and evaluated its performance using three annotated disease cohorts including rheumatoid arthritis (RA), colorectal cancer (CRC), and venous thromboembolism (VTE). We investigated performance using two types of feature sets: unrefined features, which contained at least all clinical concepts extracted from notes and billing codes; and a smaller set of refined features selected by domain experts. The performance of the AL was compared with a passive learning (PL) approach based on random sampling.Results
Our evaluation showed that AL outperformed PL on three phenotyping tasks. When unrefined features were used in the RA and CRC tasks, AL reduced the number of annotated samples required to achieve an area under the curve (AUC) score of 0.95 by 68% and 23%, respectively. AL also achieved a reduction of 68% for VTE with an optimal AUC of 0.70 using refined features. As expected, refined features improved the performance of phenotyping classifiers and required fewer annotated samples.Conclusions
This study demonstrated that AL can be useful in ML-based phenotyping methods. Moreover, AL and feature engineering based on domain knowledge could be combined to develop efficient and generalizable phenotyping methods.