Intergenic and genic sequence lengths have opposite relationships with respect to gene expression.

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats


Citation Stats


Eukaryotic genomes are mostly composed of noncoding DNA whose role is still poorly understood. Studies in several organisms have shown correlations between the length of the intergenic and genic sequences of a gene and the expression of its corresponding mRNA transcript. Some studies have found a positive relationship between intergenic sequence length and expression diversity between tissues, and concluded that genes under greater regulatory control require more regulatory information in their intergenic sequences. Other reports found a negative relationship between expression level and gene length and the interpretation was that there is selection pressure for highly expressed genes to remain small. However, a correlation between gene sequence length and expression diversity, opposite to that observed for intergenic sequences, has also been reported, and to date there is no testable explanation for this observation. To shed light on these varied and sometimes conflicting results, we performed a thorough study of the relationships between sequence length and gene expression using cell-type (tissue) specific microarray data in Arabidopsis thaliana. We measured median gene expression across tissues (expression level), expression variability between tissues (expression pattern uniformity), and expression variability between replicates (expression noise). We found that intergenic (upstream and downstream) and genic (coding and noncoding) sequences have generally opposite relationships with respect to expression, whether it is tissue variability, median, or expression noise. To explain these results we propose a model, in which the lengths of the intergenic and genic sequences have opposite effects on the ability of the transcribed region of the gene to be epigenetically regulated for differential expression. These findings could shed light on the role and influence of noncoding sequences on gene expression.





Published Version (Please cite this version)


Publication Info

Colinas, Juliette, Scott C Schmidler, Gil Bohrer, Borislav Iordanov and Philip N Benfey (2008). Intergenic and genic sequence lengths have opposite relationships with respect to gene expression. PLoS One, 3(11). p. e3670. 10.1371/journal.pone.0003670 Retrieved from

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.



Scott C. Schmidler

Associate Professor of Statistical Science

Research Interests:

  • Monte Carlo methods; high-dimensional sampling algorithms; Mixing times of Markov chains; MCMC; Sequential Monte Carlo; probabilistic graphical models; Bayesian computation; Computational complexity of statistical inference.

  • Computational biology; Protein structure and folding; computational immunology; computational biophysics; statistical physics; computational statistical mechanics; molecular evolution.

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.