Browsing by Author "Lucas, Joseph E"
Results Per Page
Sort Options
Item Open Access A flexible statistical model for alignment of label-free proteomics data--incorporating ion mobility and product ion information.(BMC Bioinformatics, 2013-12-16) Benjamin, Ashlee M; Thompson, J Will; Soderblom, Erik J; Geromanos, Scott J; Henao, Ricardo; Kraus, Virginia B; Moseley, M Arthur; Lucas, Joseph EBACKGROUND: The goal of many proteomics experiments is to determine the abundance of proteins in biological samples, and the variation thereof in various physiological conditions. High-throughput quantitative proteomics, specifically label-free LC-MS/MS, allows rapid measurement of thousands of proteins, enabling large-scale studies of various biological systems. Prior to analyzing these information-rich datasets, raw data must undergo several computational processing steps. We present a method to address one of the essential steps in proteomics data processing--the matching of peptide measurements across samples. RESULTS: We describe a novel method for label-free proteomics data alignment with the ability to incorporate previously unused aspects of the data, particularly ion mobility drift times and product ion information. We compare the results of our alignment method to PEPPeR and OpenMS, and compare alignment accuracy achieved by different versions of our method utilizing various data characteristics. Our method results in increased match recall rates and similar or improved mismatch rates compared to PEPPeR and OpenMS feature-based alignment. We also show that the inclusion of drift time and product ion information results in higher recall rates and more confident matches, without increases in error rates. CONCLUSIONS: Based on the results presented here, we argue that the incorporation of ion mobility drift time and product ion information are worthy pursuits. Alignment methods should be flexible enough to utilize all available data, particularly with recent advancements in experimental separation methods.Item Open Access A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2.(PLoS One, 2013) Woods, Christopher W; McClain, Micah T; Chen, Minhua; Zaas, Aimee K; Nicholson, Bradly P; Varkey, Jay; Veldman, Timothy; Kingsmore, Stephen F; Kingsmore, Stephen F; Huang, Yongsheng; Lambkin-Williams, Robert; Gilbert, Anthony G; Hero, Alfred O; Ramsburg, Elizabeth; Glickman, Seth; Lucas, Joseph E; Carin, Lawrence; Ginsburg, Geoffrey SThere is great potential for host-based gene expression analysis to impact the early diagnosis of infectious diseases. In particular, the influenza pandemic of 2009 highlighted the challenges and limitations of traditional pathogen-based testing for suspected upper respiratory viral infection. We inoculated human volunteers with either influenza A (A/Brisbane/59/2007 (H1N1) or A/Wisconsin/67/2005 (H3N2)), and assayed the peripheral blood transcriptome every 8 hours for 7 days. Of 41 inoculated volunteers, 18 (44%) developed symptomatic infection. Using unbiased sparse latent factor regression analysis, we generated a gene signature (or factor) for symptomatic influenza capable of detecting 94% of infected cases. This gene signature is detectable as early as 29 hours post-exposure and achieves maximal accuracy on average 43 hours (p = 0.003, H1N1) and 38 hours (p-value = 0.005, H3N2) before peak clinical symptoms. In order to test the relevance of these findings in naturally acquired disease, a composite influenza A signature built from these challenge studies was applied to Emergency Department patients where it discriminates between swine-origin influenza A/H1N1 (2009) infected and non-infected individuals with 92% accuracy. The host genomic response to Influenza infection is robust and may provide the means for detection before typical clinical symptoms are apparent.Item Open Access A network of substrates of the E3 ubiquitin ligases MDM2 and HUWE1 control apoptosis independently of p53.(Sci Signal, 2013-05-07) Kurokawa, Manabu; Kim, Jiyeon; Geradts, Joseph; Matsuura, Kenkyo; Liu, Liu; Ran, Xu; Xia, Wenle; Ribar, Thomas J; Henao, Ricardo; Dewhirst, Mark W; Kim, Wun-Jae; Lucas, Joseph E; Wang, Shaomeng; Spector, Neil L; Kornbluth, SallyIn the intrinsic pathway of apoptosis, cell-damaging signals promote the release of cytochrome c from mitochondria, triggering activation of the Apaf-1 and caspase-9 apoptosome. The ubiquitin E3 ligase MDM2 decreases the stability of the proapoptotic factor p53. We show that it also coordinated apoptotic events in a p53-independent manner by ubiquitylating the apoptosome activator CAS and the ubiquitin E3 ligase HUWE1. HUWE1 ubiquitylates the antiapoptotic factor Mcl-1, and we found that HUWE1 also ubiquitylated PP5 (protein phosphatase 5), which indirectly inhibited apoptosome activation. Breast cancers that are positive for the tyrosine receptor kinase HER2 (human epidermal growth factor receptor 2) tend to be highly aggressive. In HER2-positive breast cancer cells treated with the HER2 tyrosine kinase inhibitor lapatinib, MDM2 was degraded and HUWE1 was stabilized. In contrast, in breast cancer cells that acquired resistance to lapatinib, the abundance of MDM2 was not decreased and HUWE1 was degraded, which inhibited apoptosis, regardless of p53 status. MDM2 inhibition overcame lapatinib resistance in cells with either wild-type or mutant p53 and in xenograft models. These findings demonstrate broader, p53-independent roles for MDM2 and HUWE1 in apoptosis and specifically suggest the potential for therapy directed against MDM2 to overcome lapatinib resistance.Item Open Access Bayesian Gaussian Copula Factor Models for Mixed Data.(J Am Stat Assoc, 2013-06-01) Murray, Jared S; Dunson, David B; Carin, Lawrence; Lucas, Joseph EGaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.Item Open Access Cannabinoid exposure and altered DNA methylation in rat and human sperm.(Epigenetics, 2018-01) Murphy, Susan K; Itchon-Ramos, Nilda; Visco, Zachary; Huang, Zhiqing; Grenier, Carole; Schrott, Rose; Acharya, Kelly; Boudreau, Marie-Helene; Price, Thomas M; Raburn, Douglas J; Corcoran, David L; Lucas, Joseph E; Mitchell, John T; McClernon, F Joseph; Cauley, Marty; Hall, Brandon J; Levin, Edward D; Kollins, Scott HLittle is known about the reproductive effects of paternal cannabis exposure. We evaluated associations between cannabis or tetrahydrocannabinol (THC) exposure and altered DNA methylation in sperm from humans and rats, respectively. DNA methylation, measured by reduced representation bisulfite sequencing, differed in the sperm of human users from non-users by at least 10% at 3,979 CpG sites. Pathway analyses indicated Hippo Signaling and Pathways in Cancer as enriched with altered genes (Bonferroni p < 0.02). These same two pathways were also enriched with genes having altered methylation in sperm from THC-exposed versus vehicle-exposed rats (p < 0.01). Data validity is supported by significant correlations between THC exposure levels in humans and methylation for 177 genes, and substantial overlap in THC target genes in rat sperm (this study) and genes previously reported as having altered methylation in the brain of rat offspring born to parents both exposed to THC during adolescence. In humans, cannabis use was also associated with significantly lower sperm concentration. Findings point to possible pre-conception paternal reproductive risks associated with cannabis use.Item Open Access Computational Processing of Omics Data: Implications for Analysis(2013) Benjamin, Ashlee MarieIn this work, I present four studies across the range of 'omics data types - a Genome- Wide Association Study for gene-by-sex interaction of obesity traits, computational models for transcription start site classification, an assessment of reference-based mapping methods for RNA-Seq data from non-model organisms, and a statistical model for open-platform proteomics data alignment.
Obesity is an increasingly prevalent and severe health concern with a substantial heritable component, and marked sex differences. We sought to determine if the effect of genetic variants also differed by sex by performing a genome-wide association study modeling the effect of genotype-by-sex interaction on obesity phenotypes. Genotype data from individuals in the Framingham Heart Study Offspring cohort were analyzed across five exams. Although no variants showed genome-wide significant gene-by-sex interaction in any individual exam, four polymorphisms displayed a consistent BMI association (P-values .00186 to .00010) across all five exams. These variants were clustered downstream of LYPLAL1, which encodes a lipase/esterase expressed in adipose tissue, a locus previously identified as having sex-specific effects on central obesity. Primary effects in males were in the opposite direction as females and were replicated in Framingham Generation 3. Our data support a sex-influenced association between genetic variation at the LYPLAL1 locus and obesity-related traits.
The application of deep sequencing to map 5' capped transcripts has confirmed the existence of at least two distinct promoter classes in metazoans: focused promot- ers with transcription start sites (TSSs) that occur in a narrowly defined genomic span and dispersed promoters with TSSs that are spread over a larger window. Pre- vious studies have explored the presence of genomic features, such as CpG islands and sequence motifs, in these promoter classes, and our collaborators recently inves- tigated the relationship with chromatin features. It was found that promoter classes are significantly differentiated by nucleosome organization and chromatin structure. Here, we present computational models supporting the stronger contribution of chro- matin features to the definition of dispersed promoters compared to focused start sites. Specifically, dispersed promoters display enrichment for well-positioned nucleosomes downstream of the TSS and a more clearly defined nucleosome free region upstream, while focused promoters have a less organized nucleosome structure, yet higher presence of RNA polymerase II. These differences extend to histone vari- ants (H2A.Z) and marks (H3K4 methylation), as well as insulator binding (such as CTCF), independent of the expression levels of affected genes.
The application of next-generation sequencing technology to gene expression quantification analysis, namely, RNA-Sequencing, has transformed the way in which gene expression studies are conducted and analyzed. These advances are of partic- ular interest to researchers studying non-model organisms, as the need for knowl- edge of sequence information is overcome. De novo assembly methods have gained widespread acceptance in the RNA-Seq community for non-model organisms with no true reference genome or transcriptome. While such methods have tremendous utility, computational complexity is still a significant challenge for organisms with large and complex genomes. Here we present a comparison of four reference-based mapping methods for non-human primate data. We explore mapping efficacy, correlation between computed expression values, and utility for differential expression analyses. We show that reference-based mapping methods indeed have utility in RNA-Seq analysis of mammalian data with no true reference, and that the details of mapping methods should be carefully considered when doing so. We find that shorter seed sequences, allowance of mismatches, and allowance of gapped alignments, in addition to splice junction gaps result in more sensitive alignments of non-human primate RNA-Seq data.
Open-platform proteomics experiments seek to quantify and identify the proteins present in biological samples. Much like differential gene expression analyses, it is often of interest to determine how protein abundance differs in various physiological conditions. Label free LC-MS/MS enables the rapid measurement of thousands of proteins, providing a wealth of peptide intensity information for differential analysis. However, the processing of raw proteomics data poses significant challenges that must be overcome prior to analysis. We specifically address the matching of peptide measurements across samples - an essential pre-processing step in every proteomics experiment. Presented here is a novel method for open-platform proteomics data alignment with the ability to incorporate previously unused aspects of the data, particularly ion mobility drift times and product ion data. Our results suggest that the inclusion of additional data results in higher numbers of more confident matches, without increasing the number of mismatches. We also show that the incorporation of product ion data can improve results dramatically. Based on these results, we argue that the incorporation of ion mobility drift times and product ion information are worthy pursuits. In addition, alignment methods should be flexible enough to utilize all available data, particularly with recent advancements in experimental separation methods. The addition of drift times and/or high energy to alignment methods and accurate mass and time (AMT) tag databases can greatly improve experimenters ability to identify measured peptides, reducing analysis costs and potentially the need to run additional experiments.
Item Open Access Effect of genetic testing for risk of type 2 diabetes mellitus on health behaviors and outcomes: study rationale, development and design.(BMC Health Serv Res, 2012-01-18) Cho, Alex H; Killeya-Jones, Ley A; O'Daniel, Julianne M; Kawamoto, Kensaku; Gallagher, Patrick; Haga, Susanne; Lucas, Joseph E; Trujillo, Gloria M; Joy, Scott V; Ginsburg, Geoffrey SBACKGROUND: Type 2 diabetes is a prevalent chronic condition globally that results in extensive morbidity, decreased quality of life, and increased health services utilization. Lifestyle changes can prevent the development of diabetes, but require patient engagement. Genetic risk testing might represent a new tool to increase patients' motivation for lifestyle changes. Here we describe the rationale, development, and design of a randomized controlled trial (RCT) assessing the clinical and personal utility of incorporating type 2 diabetes genetic risk testing into comprehensive diabetes risk assessments performed in a primary care setting. METHODS/DESIGN: Patients are recruited in the laboratory waiting areas of two primary care clinics and enrolled into one of three study arms. Those interested in genetic risk testing are randomized to receive either a standard risk assessment (SRA) for type 2 diabetes incorporating conventional risk factors plus upfront disclosure of the results of genetic risk testing ("SRA+G" arm), or the SRA alone ("SRA" arm). Participants not interested in genetic risk testing will not receive the test, but will receive SRA (forming a third, "no-test" arm). Risk counseling is provided by clinic staff (not study staff external to the clinic). Fasting plasma glucose, insulin levels, body mass index (BMI), and waist circumference are measured at baseline and 12 months, as are patients' self-reported behavioral and emotional responses to diabetes risk information. Primary outcomes are changes in insulin resistance and BMI after 12 months; secondary outcomes include changes in diet patterns, physical activity, waist circumference, and perceived risk of developing diabetes. DISCUSSION: The utility, feasibility, and efficacy of providing patients with genetic risk information for common chronic diseases in primary care remain unknown. The study described here will help to establish whether providing type 2 diabetes genetic risk information in a primary care setting can help improve patients' clinical outcomes, risk perceptions, and/or their engagement in healthy behavior change. In addition, study design features such as the use of existing clinic personnel for risk counseling could inform the future development and implementation of care models for the use of individual genetic risk information in primary care. TRIAL REGISTRATION: ClinicalTrials.gov: NCT00849563.Item Open Access Factor Models to Describe Linear and Non-linear Structure in High Dimensional Gene Expression Data(2011) Mayrink, Vinicius DinizAn important problem in the analysis of gene expression data is the identification of groups of features that are coherently expressed. For example, one often wishes to know whether a group of genes, clustered because of correlation in one data set, is still highly co-expressed in another data set. For some microarray platforms there are many, relatively short, probes for each gene of interest. In this case, it is possible that a given probe is not measuring its targeted transcript, but rather a different gene with a similar region (called cross-hybridization). Similarly, the incorrect mapping of short nucleotide sequences to a target gene is a common issue related to the young technology producing RNA-Seq data. The expression pattern across samples is a valuable source of information, which can be used to address distinct problems through the application of factor models. Our first study is focused on the identification of the presence/absence status of a gene in a sample. We compare our factor model to state-of-the-art detection methods; the results suggest superior performance of the factor analysis for detecting transcripts. In the second study, we apply factor models to investigate gene modules (groups of coherently expressed genes). Variation in the number of copies of regions of the genome is a well known and important feature of most cancers. Copy number alteration is detected for a group of genes in breast cancer; our goal is to examine this abnormality in the same chromosomal region for other types of tumors (Ovarian, Lung and Brain). In the third application, the expression pattern related to RNA-Seq count data is evaluated through a factor model based on the Poisson distribution. Here, the presence/absence of coherent patterns is closely associated with the number of incorrect read mappings. The final study of this dissertation is dedicated to the analysis of multi-factor models with linear and non-linear structure of interactions between latent factors. The interaction terms can have important implications in the model; they represent relationships between genes which cannot be captured in an ordinary analysis.
Item Open Access Gene Expression Profiles Link Respiratory Viral Infection, Platelet Response to Aspirin, and Acute Myocardial Infarction.(PLoS One, 2015) Rose, Jason J; Voora, Deepak; Cyr, Derek D; Lucas, Joseph E; Zaas, Aimee K; Woods, Christopher W; Newby, L Kristin; Kraus, William E; Ginsburg, Geoffrey SBACKGROUND: Influenza infection is associated with myocardial infarction (MI), suggesting that respiratory viral infection may induce biologic pathways that contribute to MI. We tested the hypotheses that 1) a validated blood gene expression signature of respiratory viral infection (viral GES) was associated with MI and 2) respiratory viral exposure changes levels of a validated platelet gene expression signature (platelet GES) of platelet function in response to aspirin that is associated with MI. METHODS: A previously defined viral GES was projected into blood RNA data from 594 patients undergoing elective cardiac catheterization and used to classify patients as having evidence of viral infection or not and tested for association with acute MI using logistic regression. A previously defined platelet GES was projected into blood RNA data from 81 healthy subjects before and after exposure to four respiratory viruses: Respiratory Syncytial Virus (RSV) (n=20), Human Rhinovirus (HRV) (n=20), Influenza A virus subtype H1N1 (H1N1) (n=24), Influenza A Virus subtype H3N2 (H3N2) (n=17). We tested for the change in platelet GES with viral exposure using linear mixed-effects regression and by symptom status. RESULTS: In the catheterization cohort, 32 patients had evidence of viral infection based upon the viral GES, of which 25% (8/32) had MI versus 12.2% (69/567) among those without evidence of viral infection (OR 2.3; CI [1.03-5.5], p=0.04). In the infection cohorts, only H1N1 exposure increased platelet GES over time (time course p-value = 1e-04). CONCLUSIONS: A viral GES of non-specific, respiratory viral infection was associated with acute MI; 18% of the top 49 genes in the viral GES are involved with hemostasis and/or platelet aggregation. Separately, H1N1 exposure, but not exposure to other respiratory viruses, increased a platelet GES previously shown to be associated with MI. Together, these results highlight specific genes and pathways that link viral infection, platelet activation, and MI especially in the case of H1N1 influenza infection.Item Open Access Latent factor analysis to discover pathway-associated putative segmental aneuploidies in human cancers.(PLoS Comput Biol, 2010-09-02) Lucas, Joseph E; Kung, Hsiu-Ni; Chi, Jen-Tsan ATumor microenvironmental stresses, such as hypoxia and lactic acidosis, play important roles in tumor progression. Although gene signatures reflecting the influence of these stresses are powerful approaches to link expression with phenotypes, they do not fully reflect the complexity of human cancers. Here, we describe the use of latent factor models to further dissect the stress gene signatures in a breast cancer expression dataset. The genes in these latent factors are coordinately expressed in tumors and depict distinct, interacting components of the biological processes. The genes in several latent factors are highly enriched in chromosomal locations. When these factors are analyzed in independent datasets with gene expression and array CGH data, the expression values of these factors are highly correlated with copy number alterations (CNAs) of the corresponding BAC clones in both the cell lines and tumors. Therefore, variation in the expression of these pathway-associated factors is at least partially caused by variation in gene dosage and CNAs among breast cancers. We have also found the expression of two latent factors without any chromosomal enrichment is highly associated with 12q CNA, likely an instance of "trans"-variations in which CNA leads to the variations in gene expression outside of the CNA region. In addition, we have found that factor 26 (1q CNA) is negatively correlated with HIF-1alpha protein and hypoxia pathways in breast tumors and cell lines. This agrees with, and for the first time links, known good prognosis associated with both a low hypoxia signature and the presence of CNA in this region. Taken together, these results suggest the possibility that tumor segmental aneuploidy makes significant contributions to variation in the lactic acidosis/hypoxia gene signatures in human cancers and demonstrate that latent factor analysis is a powerful means to uncover such a linkage.Item Open Access Nasopharyngeal Protein Biomarkers of Acute Respiratory Virus Infection.(EBioMedicine, 2017-03) Burke, Thomas W; Henao, Ricardo; Soderblom, Erik; Tsalik, Ephraim L; Thompson, J Will; McClain, Micah T; Nichols, Marshall; Nicholson, Bradly P; Veldman, Timothy; Lucas, Joseph E; Moseley, M Arthur; Turner, Ronald B; Lambkin-Williams, Robert; Hero, Alfred O; Woods, Christopher W; Ginsburg, Geoffrey SInfection of respiratory mucosa with viral pathogens triggers complex immunologic events in the affected host. We sought to characterize this response through proteomic analysis of nasopharyngeal lavage in human subjects experimentally challenged with influenza A/H3N2 or human rhinovirus, and to develop targeted assays measuring peptides involved in this host response allowing classification of acute respiratory virus infection. Unbiased proteomic discovery analysis identified 3285 peptides corresponding to 438 unique proteins, and revealed that infection with H3N2 induces significant alterations in protein expression. These include proteins involved in acute inflammatory response, innate immune response, and the complement cascade. These data provide insights into the nature of the biological response to viral infection of the upper respiratory tract, and the proteins that are dysregulated by viral infection form the basis of signature that accurately classifies the infected state. Verification of this signature using targeted mass spectrometry in independent cohorts of subjects challenged with influenza or rhinovirus demonstrates that it performs with high accuracy (0.8623 AUROC, 75% TPR, 97.46% TNR). With further development as a clinical diagnostic, this signature may have utility in rapid screening for emerging infections, avoidance of inappropriate antibacterial therapy, and more rapid implementation of appropriate therapeutic and public health strategies.