Integrative Computational Genomics Defines the Molecular Origins and Outcomes of Lymphoma

Thumbnail Image




Moffitt, Andrea Barrett


Dave, Sandeep S

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Lymphomas are a heterogeneous group of hematological malignancies composed of diseases with diverse molecular origins and clinical outcomes. Derived from immune cells of lymphoid origin, lymphoma can arise from lymphoid cells present anywhere in the body, from the spleen and lymph nodes to peripheral sites like the liver and intestines. Current strategies for lymphoma diagnosis involve primarily histopathological examinations of the tumor biopsy, including cytogenetics and immunophenotyping. As more data becomes available, diagnoses may increasingly depend on genomic features that define each disease. Classification of lymphoid neoplasms is generally based on the cell of origin, or the lineage of the normal cell that the cancer is thought to arise from. Lymphomas can be classified into dozens of distinct diagnostic entities, though any two patients with the same diagnosis may have very different outcomes and molecular underpinnings, so we need to understand both the commonalities of patients with the same disease and the unique features that may require personalized treatment strategies. Patient prognosis in lymphoma depends greatly on the type of lymphoma, ranging from nearly curable diseases with over 90% five-year survival rates, to most patients dying in the first year in the worse entities. Greater clarity is needed in the role of the underlying genomics that contribute to these variable treatment responses and clinical outcomes.

Next-generation sequencing approaches allow us to delve into the molecular underpinnings of lymphomas, in order to gain insight about the origin and evolution of these diseases. High-throughput sequencing protocols allow us to examine the whole genome, exome, epigenome, or transcriptome of cancer cells in tens to hundreds of patients for each disease. As cost of sequencing is reduced, and the ability to generate more data increases, we face increasing computational challenges to both process and interpret the wealth of data available in cancer genomics. Developing efficient and effective bioinformatics tools is necessary to transform billions of sequencing reads into actionable hypotheses on the role of certain genes or biological pathways in a specific cancer type or patient.

In this dissertation, I present several strategies and applications of integrative computational genomics in lymphoma, with contributions throughout the research process, from development of initial assays and quality control strategies for the sequencing data, to joint analysis of clinical and genomic data, and finally through follow-up experimental models for lymphoma.

First, I focus on two rare T cell lymphomas, hepatosplenic T cell lymphoma (HSTL) and enteropathy associated T cell lymphoma (EATL), which are both diseases with very poor clinical outcomes and a previous dearth of knowledge on the genetic basis of the diseases. We define the somatic mutation landscape of HSTL, through application of exome sequencing and find SETD2 to be the most highly mutated gene. We further utilize the exome sequencing data to investigate copy number alterations and show a significant survival difference between cases with and without certain arm-level copy number alterations. Knockdown of SETD2 in an HSTL cell line, followed by RNA sequencing, demonstrates the role of SETD2 loss in proliferation and cell cycle changes, linking the SETD2 mutations to a potential oncogenic mechanism. Furthermore, we investigate the potentially targetable mutations in the JAK-STAT pathway and demonstrate oncogenic downstream molecular phenotypes and potential druggability of these mutations. In the enteropathy associated T cell lymphoma study, we apply exome and RNA sequencing to a large EATL cohort. Our findings show a significant role for loss of function mutations in chromatin modifiers and JAK-STAT signaling genes. EATL can be separated into two subtypes, Type I and Type II, which we show to have convergent genomic features, in the face of divergent gene expression. RNA sequencing data defines a distinct separation between the two subtypes. Delving further into the role of SETD2 in these T cell lymphomas, we generate a mouse model with a conditional knockout of SETD2 in T cells and demonstrate a role for SETD2 in altering the lineage development of T cells.

To understand more about why certain genetic abnormalities are recurrent in some disease entities and not others, we turn to the cell of origin for clues. We pair two different lymphomas, Burkitt lymphoma and mantle cell lymphoma, with their associated cells of origin, germinal center B cells and naive B cells. These closely related cell types have much in common as B cells, but from studies of their transcriptomes, we know that there are many molecular differences that distinguish the two. In this work, after looking more closely at mantle cell lymphoma genomics, we look at the underlying chromatin markers that define the epigenomes of these B cells. We test the association between chromatin markers and mutation rates of genes between these two cell types and lymphomas, and find that genes with more open chromatin may have a higher mutation rate, when comparing closely related cells and lymphomas. Finally, I present my work on developing an RNA sequencing based strategy for defining the complete transcriptome of diffuse large B cell lymphoma (DLBCL). Gene expression profiling with microarray has shown the existence of two subtypes in DLBCL, activated B cell like (ABC) and germinal center B cell like (GCB). However, the role for non-coding RNAs, alternative splicing, and mutations, in these two subtypes and the larger group is previously not well understood. We develop a strand-specific RNA sequencing strategy that will allow the investigation of the total RNA transcriptome in DLBCL, including microRNAs, lncRNAs, and other important non-coding RNAs. Furthermore, we show that RNA sequencing can be used to distinguish the two subtypes, including through RNA sequencing based mutation calls, as well as through differentially expressed lncRNAs that we define for the first time in DLBCL.

Broadly, this dissertation contributes novel findings in the field of lymphoma genomics, as well as presenting a framework for computational integrative genomics that can guide future studies. The heterogeneity of lymphoma across cases requires us to dive deep into individual diseases, even rare ones, as well as appreciate the similarities and differences across lymphomas. To improve diagnoses, prognoses, and treatment options, we need to understand the molecular origins of lymphoma. Using a range of molecular and computational approaches, we can move closer to true personalized medicine at the genomic level.





Moffitt, Andrea Barrett (2016). Integrative Computational Genomics Defines the Molecular Origins and Outcomes of Lymphoma. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.