Skip to main content
Duke University Libraries
DukeSpace Scholarship by Duke Authors
  • Login
  • Ask
  • Menu
  • Login
  • Ask a Librarian
  • Search & Find
  • Using the Library
  • Research Support
  • Course Support
  • Libraries
  • About
View Item 
  •   DukeSpace
  • Theses and Dissertations
  • Duke Dissertations
  • View Item
  •   DukeSpace
  • Theses and Dissertations
  • Duke Dissertations
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A Cloud-Based Infrastructure for Cancer Genomics

Thumbnail
View / Download
3.2 Mb
Date
2020
Author
Panea, Razvan Ioan
Advisor
Dave, Sandeep S
Repository Usage Stats
833
views
1,038
downloads
Abstract

The advent of new genomic approaches, particularly next generation sequencing (NGS) has resulted in explosive growth of biological data. As the size of biological data keeps growing at exponential rates, new methods for data management and data processing are becoming essential in bioinformatics and computational biology. Indeed, data analysis has now become the central challenge in genomics.

NGS has provided rich tools for defining genomic alterations that cause cancer. The processing time and computing requirements have now become a serious bottleneck to the characterization and analysis of these genomic alterations. Moreover, as the adoption of NGS continues to increase, the computing power required often exceeds what any single institution can provide, leading to major restraints in the type and number of analyses that can be performed.

Cloud computing represents a potential solution to this problem. On a cloud platform, computing resources can be available on-demand, thus allowing users to implement scalable and highly parallel methods. However, few centralized frameworks exist to allow the average researcher the ability to apply bioinformatics workflows using cloud resources. Moreover, bioinformatics approaches are associated with multiple processing challenges, such as the variability in the methods or data used and the reproducibility requirements of the research analysis.

Here, we present CloudConductor, a software system that is specifically designed to harness the power of cloud computing to perform complex analysis pipelines on large biological datasets. CloudConductor was designed with five central features in mind: scalability, modularity, parallelism, reproducibility and platform agnosticism.

We demonstrate the processing power afforded by CloudConductor on a real-world genomics problem. Using CloudConductor, we processed and analyzed 101 whole genome tumor-normal paired samples from Burkitt lymphoma subtypes to identify novel genomic alterations. We identified a total of 72 driver genes associated with the disease. Somatic events were identified in both coding and non-coding regions of nearly all driver genes, notably in genes IGLL5, BACH2, SIN3A, and DNMT1. We have developed the analysis framework by implementing a graphical user interface, a back-end database system, a data loader and a workflow management system.

In this thesis, we develop the concepts and describe an implementation of automated cloud-based infrastructure to analyze genomics data, creating a fast and efficient analysis resource for genomics researchers.

Description
Dissertation
Type
Dissertation
Department
Computational Biology and Bioinformatics
Subject
Bioinformatics
Automation
Bioinformatics
Cancer
Cloud Computing
Computing Infrastructure
Genomics
Permalink
https://hdl.handle.net/10161/21006
Citation
Panea, Razvan Ioan (2020). A Cloud-Based Infrastructure for Cancer Genomics. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/21006.
Collections
  • Duke Dissertations
More Info
Show full item record
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.

Rights for Collection: Duke Dissertations


Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info

Make Your Work Available Here

How to Deposit

Browse

All of DukeSpaceCommunities & CollectionsAuthorsTitlesTypesBy Issue DateDepartmentsAffiliations of Duke Author(s)SubjectsBy Submit DateThis CollectionAuthorsTitlesTypesBy Issue DateDepartmentsAffiliations of Duke Author(s)SubjectsBy Submit Date

My Account

LoginRegister

Statistics

View Usage Statistics
Duke University Libraries

Contact Us

411 Chapel Drive
Durham, NC 27708
(919) 660-5870
Perkins Library Service Desk

Digital Repositories at Duke

  • Report a problem with the repositories
  • About digital repositories at Duke
  • Accessibility Policy
  • Deaccession and DMCA Takedown Policy

TwitterFacebookYouTubeFlickrInstagramBlogs

Sign Up for Our Newsletter
  • Re-use & Attribution / Privacy
  • Harmful Language Statement
  • Support the Libraries
Duke University