Stability selection for regression-based models of transcription factor-DNA binding specificity.

dc.contributor.author

Mordelet, Fantine

dc.contributor.author

Horton, John

dc.contributor.author

Hartemink, Alexander J

dc.contributor.author

Engelhardt, Barbara E

dc.contributor.author

Gordân, Raluca

dc.coverage.spatial

England

dc.date.accessioned

2017-08-01T20:48:48Z

dc.date.available

2017-08-01T20:48:48Z

dc.date.issued

2013-07-01

dc.description.abstract

MOTIVATION: The DNA binding specificity of a transcription factor (TF) is typically represented using a position weight matrix model, which implicitly assumes that individual bases in a TF binding site contribute independently to the binding affinity, an assumption that does not always hold. For this reason, more complex models of binding specificity have been developed. However, these models have their own caveats: they typically have a large number of parameters, which makes them hard to learn and interpret. RESULTS: We propose novel regression-based models of TF-DNA binding specificity, trained using high resolution in vitro data from custom protein-binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of putative DNA binding sites for the TFs of interest (yeast TFs Cbf1 and Tye7, and human TFs c-Myc, Max and Mad2) in their native genomic context. These high-throughput quantitative data are well suited for training complex models that take into account not only independent contributions from individual bases, but also contributions from di- and trinucleotides at various positions within or near the binding sites. To ensure that our models remain interpretable, we use feature selection to identify a small number of sequence features that accurately predict TF-DNA binding specificity. To further illustrate the accuracy of our regression models, we show that even in the case of paralogous TF with highly similar position weight matrices, our new models can distinguish the specificities of individual factors. Thus, our work represents an important step toward better sequence-based models of individual TF-DNA binding specificity. AVAILABILITY: Our code is available at http://genome.duke.edu/labs/gordan/ISMB2013. The PBM data used in this article are available in the Gene Expression Omnibus under accession number GSE47026.

dc.identifier

https://www.ncbi.nlm.nih.gov/pubmed/23812975

dc.identifier

btt221

dc.identifier.eissn

1367-4811

dc.identifier.uri

https://hdl.handle.net/10161/15155

dc.language

eng

dc.publisher

Oxford University Press (OUP)

dc.relation.ispartof

Bioinformatics

dc.relation.isversionof

10.1093/bioinformatics/btt221

dc.subject

Algorithms

dc.subject

Binding Sites

dc.subject

DNA

dc.subject

Genome

dc.subject

Humans

dc.subject

Linear Models

dc.subject

Protein Array Analysis

dc.subject

Protein Binding

dc.subject

Saccharomyces cerevisiae Proteins

dc.subject

Support Vector Machine

dc.subject

Transcription Factors

dc.title

Stability selection for regression-based models of transcription factor-DNA binding specificity.

dc.type

Journal article

duke.contributor.orcid

Hartemink, Alexander J|0000-0002-1292-2606

pubs.author-url

https://www.ncbi.nlm.nih.gov/pubmed/23812975

pubs.begin-page

i117

pubs.end-page

i125

pubs.issue

13

pubs.organisational-group

Basic Science Departments

pubs.organisational-group

Biostatistics & Bioinformatics

pubs.organisational-group

Computer Science

pubs.organisational-group

Duke

pubs.organisational-group

Faculty

pubs.organisational-group

Molecular Genetics and Microbiology

pubs.organisational-group

School of Medicine

pubs.organisational-group

Trinity College of Arts & Sciences

pubs.publication-status

Published

pubs.volume

29

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Stability selection for regression-based models of transcription factor-DNA binding specificity.pdf
Size:
1.01 MB
Format:
Adobe Portable Document Format
Description:
Published version