Studies into Location-specific cis-Regulatory Motifs
Gene expression and regulation are major determinants of phenotypic traits displayed across species. Although the DNA sequence elements that control gene expression play a crucial role in determining species morphology, predicting cis-regulatory elements through sequence analysis alone remains a difficult task. A few regulatory elements, such as the TATA-box and Initiator sequence, have been known to exhibit overrepresentation at specific locations within the proximal promoter. However, the extent to which this occurs among cis-regulatory elements is not well understood. Here, we take a genome-wide approach towards detecting such functional sequence elements, using location-specific overrepresentation as a criterion for regulatory function. We provide evidence that a surprisingly large number of regulatory elements exhibit locational overrepresentation with respect to the transcription start site. We then utilize this characteristic to predict novel cis-regulatory elements overrepresented at particular locations within the proximal promoter.
Transcriptional regulation is most often controlled not by single protein factors acting in isolation, but instead multiple transcription factors acting together within multi-protein complexes. As protein-protein interactions are largely determined through protein structure, we would expect to see patterns of spatial preference between motif-pairs binding interacting factors. However, in the absence of methods to predict such spatial preferences between motifs, comprehensive assessments of such inter-relationships have not been previously conducted. As our model provides a general tool for detecting positional specificities of a motif relative to a given reference point, we expanded our model to measure distance preferences between pairs of motifs on a genome-wide scale. We show that there often exist patterns of spatial dependencies between pairs of sequence elements that bind interacting protein factors. We find that regulatory motifs binding interacting proteins often have multiple inter-motif distances at which they preferentially occur, and we show that the intervals between preferred distances are highly consistent across motif-pairs. This distance preference `phasing' was empirically found to occur at consistent intervals around ~8-10 bp, corresponding to approximately the number of nucleotides within a single turn of the DNA double-helix. This finding suggests a tendency for protein factor-pairs to interact in a specific orientation with respect to the turn of the DNA molecule, and offers a convenient method by which to determine motif-pairs binding interacting transcription factors de novo.
While little is known about the mechanisms by which individual cis-regulatory elements ultimately control gene expression, even less is known about how such elements evolve over time. A single transcription factor can potentially target hundreds of genes across the genome, and thus modifications in the binding affinities of such proteins must induce conversions at a multitude of functional sites in order to preserve the set of target genes that the trans-factor regulates. It is therefore commonly assumed that such changes occur rarely and at a slow rate over the course of evolution. Despite this widespread assumption, we find that a surprisingly large number of cis-regulatory elements have been subject to significant changes in consensus sequence in a lineage-specific manner. Here, we demonstrate that the genomic landscape is highly adaptable, rapidly adjusting to global changes in preferred regulatory consensus sequences. Focusing upon regulatory elements exhibiting location-specific overrepresentation, we find that a substantial fraction of regulatory elements have been subject to evolutionary modifications, even between closely related eutherians. These findings have broad implications regarding evolving phenotypes observed across species.
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 United States License.
Rights for Collection: Duke Dissertations