Browsing by Subject "Genomics"
- Results Per Page
- Sort Options
Item Open Access A Cloud-Based Infrastructure for Cancer Genomics(2020) Panea, Razvan IoanThe advent of new genomic approaches, particularly next generation sequencing (NGS) has resulted in explosive growth of biological data. As the size of biological data keeps growing at exponential rates, new methods for data management and data processing are becoming essential in bioinformatics and computational biology. Indeed, data analysis has now become the central challenge in genomics.
NGS has provided rich tools for defining genomic alterations that cause cancer. The processing time and computing requirements have now become a serious bottleneck to the characterization and analysis of these genomic alterations. Moreover, as the adoption of NGS continues to increase, the computing power required often exceeds what any single institution can provide, leading to major restraints in the type and number of analyses that can be performed.
Cloud computing represents a potential solution to this problem. On a cloud platform, computing resources can be available on-demand, thus allowing users to implement scalable and highly parallel methods. However, few centralized frameworks exist to allow the average researcher the ability to apply bioinformatics workflows using cloud resources. Moreover, bioinformatics approaches are associated with multiple processing challenges, such as the variability in the methods or data used and the reproducibility requirements of the research analysis.
Here, we present CloudConductor, a software system that is specifically designed to harness the power of cloud computing to perform complex analysis pipelines on large biological datasets. CloudConductor was designed with five central features in mind: scalability, modularity, parallelism, reproducibility and platform agnosticism.
We demonstrate the processing power afforded by CloudConductor on a real-world genomics problem. Using CloudConductor, we processed and analyzed 101 whole genome tumor-normal paired samples from Burkitt lymphoma subtypes to identify novel genomic alterations. We identified a total of 72 driver genes associated with the disease. Somatic events were identified in both coding and non-coding regions of nearly all driver genes, notably in genes IGLL5, BACH2, SIN3A, and DNMT1. We have developed the analysis framework by implementing a graphical user interface, a back-end database system, a data loader and a workflow management system.
In this thesis, we develop the concepts and describe an implementation of automated cloud-based infrastructure to analyze genomics data, creating a fast and efficient analysis resource for genomics researchers.
Item Open Access A Genocentric Approach to Discovery of Mendelian Disorders.(American journal of human genetics, 2019-11) Hansen, Adam W; Murugan, Mullai; Li, He; Khayat, Michael M; Wang, Liwen; Rosenfeld, Jill; Andrews, B Kim; Jhangiani, Shalini N; Coban Akdemir, Zeynep H; Sedlazeck, Fritz J; Ashley-Koch, Allison E; Liu, Pengfei; Muzny, Donna M; Task Force for Neonatal Genomics; Davis, Erica E; Katsanis, Nicholas; Sabo, Aniko; Posey, Jennifer E; Yang, Yaping; Wangler, Michael F; Eng, Christine M; Sutton, V Reid; Lupski, James R; Boerwinkle, Eric; Gibbs, Richard AThe advent of inexpensive, clinical exome sequencing (ES) has led to the accumulation of genetic data from thousands of samples from individuals affected with a wide range of diseases, but for whom the underlying genetic and molecular etiology of their clinical phenotype remains unknown. In many cases, detailed phenotypes are unavailable or poorly recorded and there is little family history to guide study. To accelerate discovery, we integrated ES data from 18,696 individuals referred for suspected Mendelian disease, together with relatives, in an Apache Hadoop data lake (Hadoop Architecture Lake of Exomes [HARLEE]) and implemented a genocentric analysis that rapidly identified 154 genes harboring variants suspected to cause Mendelian disorders. The approach did not rely on case-specific phenotypic classifications but was driven by optimization of gene- and variant-level filter parameters utilizing historical Mendelian disease-gene association discovery data. Variants in 19 of the 154 candidate genes were subsequently reported as causative of a Mendelian trait and additional data support the association of all other candidate genes with disease endpoints.Item Open Access A high-resolution map of human evolutionary constraint using 29 mammals.(Nature, 2011-10-12) Lindblad-Toh, Kerstin; Garber, Manuel; Zuk, Or; Lin, Michael F; Parker, Brian J; Washietl, Stefan; Kheradpour, Pouya; Ernst, Jason; Jordan, Gregory; Mauceli, Evan; Ward, Lucas D; Lowe, Craig B; Holloway, Alisha K; Clamp, Michele; Gnerre, Sante; Alföldi, Jessica; Beal, Kathryn; Chang, Jean; Clawson, Hiram; Cuff, James; Di Palma, Federica; Fitzgerald, Stephen; Flicek, Paul; Guttman, Mitchell; Hubisz, Melissa J; Jaffe, David B; Jungreis, Irwin; Kent, W James; Kostka, Dennis; Lara, Marcia; Martins, Andre L; Massingham, Tim; Moltke, Ida; Raney, Brian J; Rasmussen, Matthew D; Robinson, Jim; Stark, Alexander; Vilella, Albert J; Wen, Jiayu; Xie, Xiaohui; Zody, Michael C; Broad Institute Sequencing Platform and Whole Genome Assembly Team; Baldwin, Jen; Bloom, Toby; Chin, Chee Whye; Heiman, Dave; Nicol, Robert; Nusbaum, Chad; Young, Sarah; Wilkinson, Jane; Worley, Kim C; Kovar, Christie L; Muzny, Donna M; Gibbs, Richard A; Baylor College of Medicine Human Genome Sequencing Center Sequencing Team; Cree, Andrew; Dihn, Huyen H; Fowler, Gerald; Jhangiani, Shalili; Joshi, Vandita; Lee, Sandra; Lewis, Lora R; Nazareth, Lynne V; Okwuonu, Geoffrey; Santibanez, Jireh; Warren, Wesley C; Mardis, Elaine R; Weinstock, George M; Wilson, Richard K; Genome Institute at Washington University; Delehaunty, Kim; Dooling, David; Fronik, Catrina; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Minx, Patrick; Sodergren, Erica; Birney, Ewan; Margulies, Elliott H; Herrero, Javier; Green, Eric D; Haussler, David; Siepel, Adam; Goldman, Nick; Pollard, Katherine S; Pedersen, Jakob S; Lander, Eric S; Kellis, ManolisThe comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease.Item Open Access A Hyb-Seq phylogeny of Boechera and related genera using a combination of Angiosperms353 and Brassicaceae-specific bait sets.(American journal of botany, 2023-10) Hay, Nikolai M; Windham, Michael D; Mandáková, Terezie; Lysak, Martin A; Hendriks, Kasper P; Mummenhoff, Klaus; Lens, Frederic; Pryer, Kathleen M; Bailey, C DonovanPremise
Although Boechera (Boechereae, Brassicaceae) has become a plant model system for both ecological genomics and evolutionary biology, all previous phylogenetic studies have had limited success in resolving species relationships within the genus. The recent effective application of sequence data from target enrichment approaches to resolve the evolutionary relationships of several other challenging plant groups prompted us to investigate their usefulness in Boechera and Boechereae.Methods
To resolve the phylogeny of Boechera and closely related genera, we utilized the Hybpiper pipeline to analyze two combined bait sets: Angiosperms353, with broad applicability across flowering plants; and a Brassicaceae-specific bait set designed for use in the mustard family. Relationships for 101 samples representing 81 currently recognized species were inferred from a total of 1114 low-copy nuclear genes using both supermatrix and species coalescence methods.Results
Our analyses resulted in a well-resolved and highly supported phylogeny of the tribe Boechereae. Boechereae is divided into two major clades, one comprising all western North American species of Boechera, the other encompassing the eight other genera of the tribe. Our understanding of relationships within Boechera is enhanced by the recognition of three core clades that are further subdivided into robust regional species complexes.Conclusions
This study presents the first broadly sampled, well-resolved phylogeny for most known sexual diploid Boechera. This effort provides the foundation for a new phylogenetically informed taxonomy of Boechera that is crucial for its continued use as a model system.Item Open Access A Roadmap for the Human Oral and Craniofacial Cell Atlas.(Journal of dental research, 2022-10) Caetano, AJ; Human Cell Atlas Oral and Craniofacial Bionetwork; Sequeira, I; Byrd, KMOral and craniofacial tissues are uniquely adapted for continuous and intricate functioning, including breathing, feeding, and communication. To achieve these vital processes, this complex is supported by incredible tissue diversity, variously composed of epithelia, vessels, cartilage, bone, teeth, ligaments, and muscles, as well as mesenchymal, adipose, and peripheral nervous tissue. Recent single cell and spatial multiomics assays-specifically, genomics, epigenomics, transcriptomics, proteomics, and metabolomics-have annotated known and new cell types and cell states in human tissues and animal models, but these concepts remain limitedly explored in the human postnatal oral and craniofacial complex. Here, we highlight the collaborative and coordinated efforts of the newly established Oral and Craniofacial Bionetwork as part of the Human Cell Atlas, which aims to leverage single cell and spatial multiomics approaches to first understand the cellular and molecular makeup of human oral and craniofacial tissues in health and to then address common and rare diseases. These powerful assays have already revealed the cell types that support oral tissues, and they will unravel cell types and molecular networks utilized across development, maintenance, and aging as well as those affected in diseases of the craniofacial complex. This level of integration and cell annotation with partner laboratories across the globe will be critical for understanding how multiple variables, such as age, sex, race, and ancestry, influence these oral and craniofacial niches. Here, we 1) highlight these recent collaborative efforts to employ new single cell and spatial approaches to resolve our collective biology at a higher resolution in health and disease, 2) discuss the vision behind the Oral and Craniofacial Bionetwork, 3) outline the stakeholders who contribute to and will benefit from this network, and 4) outline directions for creating the first Human Oral and Craniofacial Cell Atlas.Item Open Access Adaptive sequence divergence forged new neurodevelopmental enhancers in humans.(Cell, 2022-11) Mangan, Riley J; Alsina, Fernando C; Mosti, Federica; Sotelo-Fonseca, Jesús Emiliano; Snellings, Daniel A; Au, Eric H; Carvalho, Juliana; Sathyan, Laya; Johnson, Graham D; Reddy, Timothy E; Silver, Debra L; Lowe, Craig BSearches for the genetic underpinnings of uniquely human traits have focused on human-specific divergence in conserved genomic regions, which reflects adaptive modifications of existing functional elements. However, the study of conserved regions excludes functional elements that descended from previously neutral regions. Here, we demonstrate that the fastest-evolved regions of the human genome, which we term "human ancestor quickly evolved regions" (HAQERs), rapidly diverged in an episodic burst of directional positive selection prior to the human-Neanderthal split, before transitioning to constraint within hominins. HAQERs are enriched for bivalent chromatin states, particularly in gastrointestinal and neurodevelopmental tissues, and genetic variants linked to neurodevelopmental disease. We developed a multiplex, single-cell in vivo enhancer assay to discover that rapid sequence divergence in HAQERs generated hominin-unique enhancers in the developing cerebral cortex. We propose that a lack of pleiotropic constraints and elevated mutation rates poised HAQERs for rapid adaptation and subsequent susceptibility to disease.Item Open Access An evolutionary genomics approach towards understanding Plasmodium vivax in central Africa(2022) Gartner, ValerieIncreased attention has recently been placed on understanding the natural variation of the malaria parasite Plasmodium vivax across the globe, as in 2020 alone, P. vivax caused an estimated 4.5 million malaria cases and lead to over 600,000 deaths around the world. P. vivax infections in central Africa have been of particular interest, as humans in Sub-Saharan Africa frequently possess a P. vivax resistance allele known as the Duffy-negative phenotype that is believed to prevent infection in these individuals. However, new reports of asymptomatic and symptomatic infections in Duffy-negative individuals in Africa raise the possibility that P. vivax is evolving to evade host resistance.Whole genome sequencing has become more common as a means of understanding the population diversity of P. vivax. However, there is still a scarcity of information about P. vivax in central Africa. In this dissertation, I analyze whole genome sequencing data from a new P. vivax sample collected from the Democratic Republic of the Congo in central Africa. By studying P. vivax from central Africa, we can begin to understand the evolutionary history of the pathogen in this part of the world as it relates to the global context of this pathogen. I also investigate the relationship of P. vivax in the DRC with a potential animal reservoir of a closely related species, P. vivax-like, in non-human primates in this region. Due to the scarcity of P. vivax samples in central Africa, I also investigated methods with which to best make use of whole genome sequencing data, particularly in generating phylogenetic trees. While many studies of P. vivax genetic diversity employ whole genome variation data in order to study evolutionary relationships of P. vivax populations, in this dissertation I make use of the P. vivax apicoplast, a non-photosynthetic plastid organelle genome. The apicoplast genome is five times longer than the mitochondrial genome and does not undergo recombination, making it a valuable locus for studying P. vivax evolutionary history using phylogenetic trees.
Item Open Access An Exploration into Fern Genome Space.(Genome Biol Evol, 2015-08-26) Wolf, PG; Sessa, EB; Marchant, DB; Li, F; Rothfels, CJ; Sigel, EM; Gitzendanner, MA; Visger, CJ; Banks, JA; Soltis, DEFerns are one of the few remaining major clades of land plants for which a complete genome sequence is lacking. Knowledge of genome space in ferns will enable broad-scale comparative analyses of land plant genes and genomes, provide insights into genome evolution across green plants, and shed light on genetic and genomic features that characterize ferns, such as their high chromosome numbers and large genome sizes. As part of an initial exploration into fern genome space, we used a whole genome shotgun sequencing approach to obtain low-density coverage (∼0.4X to 2X) for six fern species from the Polypodiales (Ceratopteris, Pteridium, Polypodium, Cystopteris), Cyatheales (Plagiogyria), and Gleicheniales (Dipteris). We explore these data to characterize the proportion of the nuclear genome represented by repetitive sequences (including DNA transposons, retrotransposons, ribosomal DNA, and simple repeats) and protein-coding genes, and to extract chloroplast and mitochondrial genome sequences. Such initial sweeps of fern genomes can provide information useful for selecting a promising candidate fern species for whole genome sequencing. We also describe variation of genomic traits across our sample and highlight some differences and similarities in repeat structure between ferns and seed plants.Item Open Access Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs.(Nature, 2002-12-05) Okazaki, Y; Furuno, M; Kasukawa, T; Adachi, J; Bono, H; Kondo, S; Nikaido, I; Osato, N; Osato, N; Saito, R; Suzuki, H; Yamanaka, I; Kiyosawa, H; Yagi, K; Tomaru, Y; Hasegawa, Y; Nogami, A; Schönbach, C; Gojobori, T; Baldarelli, R; Hill, DP; Bult, C; Hume, DA; Hume, DA; Quackenbush, J; Schriml, LM; Kanapin, A; Matsuda, H; Batalov, S; Beisel, KW; Blake, JA; Bradt, D; Brusic, V; Chothia, C; Corbani, LE; Cousins, S; Dalla, E; Dragani, TA; Fletcher, CF; Forrest, A; Frazer, KS; Gaasterland, T; Gariboldi, M; Gissi, C; Godzik, A; Gough, J; Grimmond, S; Gustincich, S; Hirokawa, N; Jackson, IJ; Jarvis, ED; Kanai, A; Kawaji, H; Kawasawa, Y; Kedzierski, RM; King, BL; Konagaya, A; Kurochkin, IV; Lee, Y; Lenhard, B; Lyons, PA; Maglott, DR; Maltais, L; Marchionni, L; McKenzie, L; Miki, H; Nagashima, T; Numata, K; Okido, T; Pavan, WJ; Pertea, G; Pesole, G; Petrovsky, N; Pillai, R; Pontius, JU; Qi, D; Ramachandran, S; Ravasi, T; Reed, JC; Reed, DJ; Reid, J; Ring, BZ; Ringwald, M; Sandelin, A; Schneider, C; Semple, CAM; Setou, M; Shimada, K; Sultana, R; Takenaka, Y; Taylor, MS; Teasdale, RD; Tomita, M; Verardo, R; Wagner, L; Wahlestedt, C; Wang, Y; Watanabe, Y; Wells, C; Wilming, LG; Wynshaw-Boris, A; Yanagisawa, M; Yang, I; Yang, L; Yuan, Z; Zavolan, M; Zhu, Y; Zimmer, A; Carninci, P; Hayatsu, N; Hirozane-Kishikawa, T; Konno, H; Nakamura, M; Sakazume, N; Sato, K; Shiraki, T; Waki, K; Kawai, J; Aizawa, K; Arakawa, T; Fukuda, S; Hara, A; Hashizume, W; Imotani, K; Ishii, Y; Itoh, M; Kagawa, I; Miyazaki, A; Sakai, K; Sasaki, D; Shibata, K; Shinagawa, A; Yasunishi, A; Yoshino, M; Waterston, R; Lander, ES; Rogers, J; Birney, E; Hayashizaki, Y; FANTOM Consortium; RIKEN Genome Exploration Research Group Phase I & II TeamOnly a small proportion of the mouse genome is transcribed into mature messenger RNA transcripts. There is an international collaborative effort to identify all full-length mRNA transcripts from the mouse, and to ensure that each is represented in a physical collection of clones. Here we report the manual annotation of 60,770 full-length mouse complementary DNA sequences. These are clustered into 33,409 'transcriptional units', contributing 90.1% of a newly established mouse transcriptome database. Of these transcriptional units, 4,258 are new protein-coding and 11,665 are new non-coding messages, indicating that non-coding RNA is a major component of the transcriptome. 41% of all transcriptional units showed evidence of alternative splicing. In protein-coding transcripts, 79% of splice variations altered the protein product. Whole-transcriptome analyses resulted in the identification of 2,431 sense-antisense pairs. The present work, completely supported by physical clones, provides the most comprehensive survey of a mammalian transcriptome so far, and is a valuable resource for functional genomics.Item Open Access Assessing the utility of thermodynamic features for microRNA target prediction under relaxed seed and no conservation requirements.(PLoS One, 2011) Lekprasert, Parawee; Mayhew, Michael; Ohler, UweBACKGROUND: Many computational microRNA target prediction tools are focused on several key features, including complementarity to 5'seed of miRNAs and evolutionary conservation. While these features allow for successful target identification, not all miRNA target sites are conserved and adhere to canonical seed complementarity. Several studies have propagated the use of energy features of mRNA:miRNA duplexes as an alternative feature. However, different independent evaluations reported conflicting results on the reliability of energy-based predictions. Here, we reassess the usefulness of energy features for mammalian target prediction, aiming to relax or eliminate the need for perfect seed matches and conservation requirement. METHODOLOGY/PRINCIPAL FINDINGS: We detect significant differences of energy features at experimentally supported human miRNA target sites and at genome-wide sites of AGO protein interaction. This trend is confirmed on datasets that assay the effect of miRNAs on mRNA and protein expression changes, and a simple linear regression model leads to significant correlation of predicted versus observed expression change. Compared to 6-mer seed matches as baseline, application of our energy-based model leads to ∼3-5-fold enrichment on highly down-regulated targets, and allows for prediction of strictly imperfect targets with enrichment above baseline. CONCLUSIONS/SIGNIFICANCE: In conclusion, our results indicate significant promise for energy-based miRNA target prediction that includes a broader range of targets without having to use conservation or impose stringent seed match rules.Item Open Access Avian genomes. A flock of genomes. Introduction.(Science, 2014-12-12) Zhang, Guojie; Jarvis, Erich D; Gilbert, M Thomas PItem Open Access Avianbase: a community resource for bird genomics.(Genome Biol, 2015-01-29) Eöry, Lél; Gilbert, M Thomas P; Li, Cai; Li, Bo; Archibald, Alan; Aken, Bronwen L; Zhang, Guojie; Jarvis, Erich; Flicek, Paul; Burt, David WGiving access to sequence and annotation data for genome assemblies is important because, while facilitating research, it places both assembly and annotation quality under scrutiny, resulting in improvements to both. Therefore we announce Avianbase, a resource for bird genomics, which provides access to data released by the Avian Phylogenomics Consortium.Item Open Access Behavior genetics and postgenomics.(Behav Brain Sci, 2012-10) Charney, EvanThe science of genetics is undergoing a paradigm shift. Recent discoveries, including the activity of retrotransposons, the extent of copy number variations, somatic and chromosomal mosaicism, and the nature of the epigenome as a regulator of DNA expressivity, are challenging a series of dogmas concerning the nature of the genome and the relationship between genotype and phenotype. According to three widely held dogmas, DNA is the unchanging template of heredity, is identical in all the cells and tissues of the body, and is the sole agent of inheritance. Rather than being an unchanging template, DNA appears subject to a good deal of environmentally induced change. Instead of identical DNA in all the cells of the body, somatic mosaicism appears to be the normal human condition. And DNA can no longer be considered the sole agent of inheritance. We now know that the epigenome, which regulates gene expressivity, can be inherited via the germline. These developments are particularly significant for behavior genetics for at least three reasons: First, epigenetic regulation, DNA variability, and somatic mosaicism appear to be particularly prevalent in the human brain and probably are involved in much of human behavior; second, they have important implications for the validity of heritability and gene association studies, the methodologies that largely define the discipline of behavior genetics; and third, they appear to play a critical role in development during the perinatal period and, in particular, in enabling phenotypic plasticity in offspring. I examine one of the central claims to emerge from the use of heritability studies in the behavioral sciences, the principle of minimal shared maternal effects, in light of the growing awareness that the maternal perinatal environment is a critical venue for the exercise of adaptive phenotypic plasticity. This consideration has important implications for both developmental and evolutionary biology.Item Open Access Between two fern genomes.(Gigascience, 2014) Sessa, Emily B; Banks, Jo Ann; Barker, Michael S; Der, Joshua P; Duffy, Aaron M; Graham, Sean W; Hasebe, Mitsuyasu; Langdale, Jane; Li, Fay-Wei; Marchant, D Blaine; Pryer, Kathleen M; Rothfels, Carl J; Roux, Stanley J; Salmi, Mari L; Sigel, Erin M; Soltis, Douglas E; Soltis, Pamela S; Stevenson, Dennis W; Wolf, Paul GFerns are the only major lineage of vascular plants not represented by a sequenced nuclear genome. This lack of genome sequence information significantly impedes our ability to understand and reconstruct genome evolution not only in ferns, but across all land plants. Azolla and Ceratopteris are ideal and complementary candidates to be the first ferns to have their nuclear genomes sequenced. They differ dramatically in genome size, life history, and habit, and thus represent the immense diversity of extant ferns. Together, this pair of genomes will facilitate myriad large-scale comparative analyses across ferns and all land plants. Here we review the unique biological characteristics of ferns and describe a number of outstanding questions in plant biology that will benefit from the addition of ferns to the set of taxa with sequenced nuclear genomes. We explain why the fern clade is pivotal for understanding genome evolution across land plants, and we provide a rationale for how knowledge of fern genomes will enable progress in research beyond the ferns themselves.Item Open Access Beyond Blood and Belonging: Alternarratives for a Global Citizenry(2011) Bardill, Jessica DawnIn my dissertation, I interrogate the ways blood influences identity construction and how it shifts into a paradigmatic story, known as a blood narrative, that further determines belonging. In five chapters, I argue that the use of a blood narrative undermines sovereignty as well as the creative evolution of nations. I move from an examination of a blood narrative throughout American literature (chapter 1), through a study of legislation and science (chapters 2 and 3). In these latter two chapters, I turn to the Cherokee Nation's expulsion of Freedmen and the Eastern Band of Cherokee Indians' new membership requirement of DNA testing, which demonstrate influences of a blood narrative upon policy and legislation, and how biotechnology maintains this narrative through DNA and genomics. Finally, I explore novels from Gerald Vizenor (White Earth Anishinaabe) and Thomas King (Cherokee) that offer alternatives to a blood narrative (chapters 4 and 5). I use the term alternarrative here instead of counternarrative to focus on original alternatives, particularly from the alter position of the Native, not on reactionary or countering stories. The alternatives to this blood narrative emerge in both the modern and traditional stories of Native American peoples, providing recourse to understanding identity in ways other than blood. This new sense of belonging is especially important in a world where so many identities are determined by national boundaries, and limited by blood. These alternative narratives provide a new way of moving forward by embracing a survivance for the future, not just reacting to the past.
Item Open Access Chiropteran types I and II interferon genes inferred from genome sequencing traces by a statistical gene-family assembler.(BMC Genomics, 2010-07-21) Kepler, Thomas B; Sample, Christopher; Hudak, Kathryn; Roach, Jeffrey; Haines, Albert; Walsh, Allyson; Ramsburg, Elizabeth ABACKGROUND: The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. RESULTS: We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. CONCLUSION: The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.Item Open Access Comparative genomics reveals insights into avian genome evolution and adaptation.(Science, 2014-12-12) Zhang, Guojie; Li, Cai; Li, Qiye; Li, Bo; Larkin, Denis M; Lee, Chul; Storz, Jay F; Antunes, Agostinho; Greenwold, Matthew J; Meredith, Robert W; Ödeen, Anders; Cui, Jie; Zhou, Qi; Xu, Luohao; Pan, Hailin; Wang, Zongji; Jin, Lijun; Zhang, Pei; Hu, Haofu; Yang, Wei; Hu, Jiang; Xiao, Jin; Yang, Zhikai; Liu, Yang; Xie, Qiaolin; Yu, Hao; Lian, Jinmin; Wen, Ping; Zhang, Fang; Li, Hui; Zeng, Yongli; Xiong, Zijun; Liu, Shiping; Zhou, Long; Huang, Zhiyong; An, Na; Wang, Jie; Zheng, Qiumei; Xiong, Yingqi; Wang, Guangbiao; Wang, Bo; Wang, Jingjing; Fan, Yu; da Fonseca, Rute R; Alfaro-Núñez, Alonzo; Schubert, Mikkel; Orlando, Ludovic; Mourier, Tobias; Howard, Jason T; Ganapathy, Ganeshkumar; Pfenning, Andreas; Whitney, Osceola; Rivas, Miriam V; Hara, Erina; Smith, Julia; Farré, Marta; Narayan, Jitendra; Slavov, Gancho; Romanov, Michael N; Borges, Rui; Borges, Rui; Machado, João Paulo; Khan, Imran; Springer, Mark S; Gatesy, John; Hoffmann, Federico G; Opazo, Juan C; Håstad, Olle; Sawyer, Roger H; Kim, Heebal; Kim, Kyu-Won; Kim, Hyeon Jeong; Cho, Seoae; Li, Ning; Huang, Yinhua; Bruford, Michael W; Zhan, Xiangjiang; Dixon, Andrew; Bertelsen, Mads F; Derryberry, Elizabeth; Warren, Wesley; Wilson, Richard K; Li, Shengbin; Ray, David A; Green, Richard E; O'Brien, Stephen J; Griffin, Darren; Johnson, Warren E; Haussler, David; Ryder, Oliver A; Willerslev, Eske; Graves, Gary R; Alström, Per; Fjeldså, Jon; Mindell, David P; Edwards, Scott V; Braun, Edward L; Rahbek, Carsten; Burt, David W; Houde, Peter; Zhang, Yong; Yang, Huanming; Wang, Jian; Avian Genome Consortium; Jarvis, Erich D; Gilbert, M Thomas P; Wang, JunBirds are the most species-rich class of tetrapod vertebrates and have wide relevance across many research fields. We explored bird macroevolution using full genomes from 48 avian species representing all major extant clades. The avian genome is principally characterized by its constrained size, which predominantly arose because of lineage-specific erosion of repetitive elements, large segmental deletions, and gene loss. Avian genomes furthermore show a remarkably high degree of evolutionary stasis at the levels of nucleotide sequence, gene synteny, and chromosomal structure. Despite this pattern of conservation, we detected many non-neutral evolutionary changes in protein-coding genes and noncoding regions. These analyses reveal that pan-avian genomic diversity covaries with adaptations to different lifestyles and convergent evolution of traits.Item Open Access Comparative genomics reveals molecular features unique to the songbird lineage.(BMC Genomics, 2014-12-13) Wirthlin, Morgan; Lovell, Peter V; Jarvis, Erich D; Mello, Claudio VBACKGROUND: Songbirds (oscine Passeriformes) are among the most diverse and successful vertebrate groups, comprising almost half of all known bird species. Identifying the genomic innovations that might be associated with this success, as well as with characteristic songbird traits such as vocal learning and the brain circuits that underlie this behavior, has proven difficult, in part due to the small number of avian genomes available until recently. Here we performed a comparative analysis of 48 avian genomes to identify genomic features that are unique to songbirds, as well as an initial assessment of function by investigating their tissue distribution and predicted protein domain structure. RESULTS: Using BLAT alignments and gene synteny analysis, we curated a large set of Ensembl gene models that were annotated as novel or duplicated in the most commonly studied songbird, the Zebra finch (Taeniopygia guttata), and then extended this analysis to 47 additional avian and 4 non-avian genomes. We identified 10 novel genes uniquely present in songbird genomes. A refined map of chromosomal synteny disruptions in the Zebra finch genome revealed that the majority of these novel genes localized to regions of genomic instability associated with apparent chromosomal breakpoints. Analyses of in situ hybridization and RNA-seq data revealed that a subset of songbird-unique genes is expressed in the brain and/or other tissues, and that 2 of these (YTHDC2L1 and TMRA) are highly differentially expressed in vocal learning-associated nuclei relative to the rest of the brain. CONCLUSIONS: Our study reveals novel genes unique to songbirds, including some that may subserve their unique vocal control system, substantially improves the quality of Zebra finch genome annotations, and contributes to a better understanding of how genomic features may have evolved in conjunction with the emergence of the songbird lineage.Item Open Access Concordance Between Genomic Alterations Detected by Tumor and Germline Sequencing: Results from a Tertiary Care Academic Center Molecular Tumor Board.(The oncologist, 2023-01) Green, Michelle F; Watson, Catherine H; Tait, Sarah; He, Jie; Pavlick, Dean C; Frampton, Garrett; Riedel, Jinny; Plichta, Jennifer K; Armstrong, Andrew J; Previs, Rebecca A; Kauff, Noah; Strickler, John H; Datto, Michael B; Berchuck, Andrew; Menendez, Carolyn SObjective
The majority of tumor sequencing currently performed on cancer patients does not include a matched normal control, and in cases where germline testing is performed, it is usually run independently of tumor testing. The rates of concordance between variants identified via germline and tumor testing in this context are poorly understood. We compared tumor and germline sequencing results in patients with breast, ovarian, pancreatic, and prostate cancer who were found to harbor alterations in genes associated with homologous recombination deficiency (HRD) and increased hereditary cancer risk. We then evaluated the potential for a computational somatic-germline-zygosity (SGZ) modeling algorithm to predict germline status based on tumor-only comprehensive genomic profiling (CGP) results.Methods
A retrospective chart review was performed using an academic cancer center's databases of somatic and germline sequencing tests, and concordance between tumor and germline results was assessed. SGZ modeling from tumor-only CGP was compared to germline results to assess this method's accuracy in determining germline mutation status.Results
A total of 115 patients with 146 total alterations were identified. Concordance rates between somatic and germline alterations ranged from 0% to 85.7% depending on the gene and variant classification. After correcting for differences in variant classification and filtering practices, SGZ modeling was found to have 97.2% sensitivity and 90.3% specificity for the prediction of somatic versus germline origin.Conclusions
Mutations in HRD genes identified by tumor-only sequencing are frequently germline. Providers should be aware that technical differences related to assay design, variant filtering, and variant classification can contribute to discordance between tumor-only and germline sequencing test results. In addition, SGZ modeling had high predictive power to distinguish between mutations of somatic and germline origin without the need for a matched normal control, and could potentially be considered to inform clinical decision-making.Item Open Access Connecting Populations Across Ocean Basins: Genomics of Short-finned Pilot Whales (Globicephala macrorhynchus) in the Western North Atlantic(2022-04-18) Hanson, SophieShort-finned pilot whales (Globicephala macrorhynchus) are widely-distributed throughout the Atlantic Ocean. These whales are capable of traveling large distances, yet their regional movement patterns and population structure are poorly defined, making stock identification and species management challenging. To understand the population structure of these whales, I analyzed genetic relatedness across 56 distinct individuals in three geographic locations: the Caribbean nation of St. Vincent & the Grenadines (n = 17), Florida, USA (n = 7), and North Carolina, USA (n = 36). I generated genetic sequences from tissue samples using double digest restriction site associated DNA sequencing (ddRAD-Seq). I then derived 3,227 single nucleotide polymorphisms (SNPs) from the Freebayes bioinformatics pipeline. To infer population structure, I used a Bayesian clustering analysis implemented in STRUCTURE software. The results indicate that individuals from all of the three sampling locations are genetically similar. This supports the hypothesis that there is substantial gene flow between the eastern Caribbean and southeast United States. It is likely that the Gulf Stream and extensive continental shelf facilitate long-ranging individual or group movement, and thus connectivity. Interestingly, results also indicate a second, genetically-distinct population comprised of three individuals (two from St. Vincent and one from North Carolina). While more sampling is needed to confirm this second population, it is possible that there is a larger oceanic stock in the western North Atlantic. Together, these findings can be used to better inform the management of short-finned pilot whales, which is imperative considering rising anthropogenic pressures, mass-strandings, and the species’ cultural importance in artisanal whaling.