Browsing by Subject "Support Vector Machine"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Open Access A new open-access platform for measuring and sharing mTBI data.(Scientific reports, 2021-04) Domel, August G; Raymond, Samuel J; Giordano, Chiara; Liu, Yuzhe; Yousefsani, Seyed Abdolmajid; Fanton, Michael; Cecchi, Nicholas J; Vovk, Olga; Pirozzi, Ileana; Kight, Ali; Avery, Brett; Boumis, Athanasia; Fetters, Tyler; Jandu, Simran; Mehring, William M; Monga, Sam; Mouchawar, Nicole; Rangel, India; Rice, Eli; Roy, Pritha; Sami, Sohrab; Singh, Heer; Wu, Lyndia; Kuo, Calvin; Zeineh, Michael; Grant, Gerald; Camarillo, David BDespite numerous research efforts, the precise mechanisms of concussion have yet to be fully uncovered. Clinical studies on high-risk populations, such as contact sports athletes, have become more common and give insight on the link between impact severity and brain injury risk through the use of wearable sensors and neurological testing. However, as the number of institutions operating these studies grows, there is a growing need for a platform to share these data to facilitate our understanding of concussion mechanisms and aid in the development of suitable diagnostic tools. To that end, this paper puts forth two contributions: (1) a centralized, open-access platform for storing and sharing head impact data, in collaboration with the Federal Interagency Traumatic Brain Injury Research informatics system (FITBIR), and (2) a deep learning impact detection algorithm (MiGNet) to differentiate between true head impacts and false positives for the previously biomechanically validated instrumented mouthguard sensor (MiG2.0), all of which easily interfaces with FITBIR. We report 96% accuracy using MiGNet, based on a neural network model, improving on previous work based on Support Vector Machines achieving 91% accuracy, on an out of sample dataset of high school and collegiate football head impacts. The integrated MiG2.0 and FITBIR system serve as a collaborative research tool to be disseminated across multiple institutions towards creating a standardized dataset for furthering the knowledge of concussion biomechanics.Item Open Access A unifying framework for interpreting and predicting mutualistic systems.(Nature communications, 2019-01) Wu, Feilun; Lopatkin, Allison J; Needs, Daniel A; Lee, Charlotte T; Mukherjee, Sayan; You, LingchongCoarse-grained rules are widely used in chemistry, physics and engineering. In biology, however, such rules are less common and under-appreciated. This gap can be attributed to the difficulty in establishing general rules to encompass the immense diversity and complexity of biological systems. Furthermore, even when a rule is established, it is often challenging to map it to mechanistic details and to quantify these details. Here we report a framework that addresses these challenges for mutualistic systems. We first deduce a general rule that predicts the various outcomes of mutualistic systems, including coexistence and productivity. We further develop a standardized machine-learning-based calibration procedure to use the rule without the need to fully elucidate or characterize their mechanistic underpinnings. Our approach consistently provides explanatory and predictive power with various simulated and experimental mutualistic systems. Our strategy can pave the way for establishing and implementing other simple rules for biological systems.Item Open Access Applying active learning to high-throughput phenotyping algorithms for electronic health records data.(Journal of the American Medical Informatics Association : JAMIA, 2013-12) Chen, Yukun; Carroll, Robert J; Hinz, Eugenia R McPeek; Shah, Anushi; Eyler, Anne E; Denny, Joshua C; Xu, HuaObjectives
Generalizable, high-throughput phenotyping methods based on supervised machine learning (ML) algorithms could significantly accelerate the use of electronic health records data for clinical and translational research. However, they often require large numbers of annotated samples, which are costly and time-consuming to review. We investigated the use of active learning (AL) in ML-based phenotyping algorithms.Methods
We integrated an uncertainty sampling AL approach with support vector machines-based phenotyping algorithms and evaluated its performance using three annotated disease cohorts including rheumatoid arthritis (RA), colorectal cancer (CRC), and venous thromboembolism (VTE). We investigated performance using two types of feature sets: unrefined features, which contained at least all clinical concepts extracted from notes and billing codes; and a smaller set of refined features selected by domain experts. The performance of the AL was compared with a passive learning (PL) approach based on random sampling.Results
Our evaluation showed that AL outperformed PL on three phenotyping tasks. When unrefined features were used in the RA and CRC tasks, AL reduced the number of annotated samples required to achieve an area under the curve (AUC) score of 0.95 by 68% and 23%, respectively. AL also achieved a reduction of 68% for VTE with an optimal AUC of 0.70 using refined features. As expected, refined features improved the performance of phenotyping classifiers and required fewer annotated samples.Conclusions
This study demonstrated that AL can be useful in ML-based phenotyping methods. Moreover, AL and feature engineering based on domain knowledge could be combined to develop efficient and generalizable phenotyping methods.Item Open Access Stability selection for regression-based models of transcription factor-DNA binding specificity.(Bioinformatics, 2013-07-01) Mordelet, Fantine; Horton, John; Hartemink, Alexander J; Engelhardt, Barbara E; Gordân, RalucaMOTIVATION: The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. RESULTS: We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF-DNA binding specificity. AVAILABILITY: Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.