Transcription Factor-Centric Approaches to Identify Regulatory Driver Mutations in Cancer

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Most previous efforts to identify cancer driver mutations have focused on protein-coding genes. In recent years, the decreasing costs of DNA sequencing have enabled whole-genome sequencing (WGS) studies of thousands of tumor samples, making it possible to systematically survey non-coding regions for potential driver events. From these studies, millions of somatic mutations in cancer have been identified, the majority of which are non-coding. However, driver identification remains a far greater challenge in non-coding regions than in coding genes, primarily due to the incomplete annotation of the non-coding genome and the unknown functional impact of non-coding mutations.

In this work, we present new approaches to identify putative regulatory driver mutations in cancer, based on new methodology for predicting the quantitative effects of single nucleotide variants on transcription factor (TF) binding. Unlike most of the previous work on driver identification, our method does not require the driver mutations to be highly recurrent; instead, we assess the mutations’ significance by testing if they cause larger TF binding changes than expected in the case of completely random mutations. Since gene regulation relies on the cooperation of multiple regulatory elements, we have devised a way to combine the effects of all regulatory mutations of a gene in order to identify genes whose regulation is likely to be significantly perturbed by the mutations observed in their regulatory elements, through changes in TF binding.

We have applied our TF-centric approaches to analyze single nucleotide variants identified in a liver cancer data set from the International Cancer Genome Consortium (ICGC), and identified potentially dysregulated genes whose regulatory mutations could trigger significant TF binding changes. Notably, the genes identified by us are different from the ones prioritized by recurrence-based approaches. However, most of the potentially dysregulated genes we have identified have large changes in gene expression and/or are cancer prognostic genes. Our results suggest that regulatory mutations should be investigated further, not just by their recurrence, but also by their functional effects such as TF binding changes, to uncover dysregulated genes that may drive tumorigenesis.





Zhao, Jingkang (2020). Transcription Factor-Centric Approaches to Identify Regulatory Driver Mutations in Cancer. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.