Identification and utilization of arbitrary correlations in models of recombination signal sequences.

dc.contributor.author

Cowell, Lindsay G

dc.contributor.author

Davila, Marco

dc.contributor.author

Kepler, Thomas B

dc.contributor.author

Kelsoe, Garnett

dc.coverage.spatial

England

dc.date.accessioned

2016-01-08T18:01:48Z

dc.date.issued

2002

dc.description.abstract

BACKGROUND: A significant challenge in bioinformatics is to develop methods for detecting and modeling patterns in variable DNA sequence sites, such as protein-binding sites in regulatory DNA. Current approaches sometimes perform poorly when positions in the site do not independently affect protein binding. We developed a statistical technique for modeling the correlation structure in variable DNA sequence sites. The method places no restrictions on the number of correlated positions or on their spatial relationship within the site. No prior empirical evidence for the correlation structure is necessary. RESULTS: We applied our method to the recombination signal sequences (RSS) that direct assembly of B-cell and T-cell antigen-receptor genes via V(D)J recombination. The technique is based on model selection by cross-validation and produces models that allow computation of an information score for any signal-length sequence. We also modeled RSS using order zero and order one Markov chains. The scores from all models are highly correlated with measured recombination efficiencies, but the models arising from our technique are better than the Markov models at discriminating RSS from non-RSS. CONCLUSIONS: Our model-development procedure produces models that estimate well the recombinogenic potential of RSS and are better at RSS recognition than the order zero and order one Markov models. Our models are, therefore, valuable for studying the regulation of both physiologic and aberrant V(D)J recombination. The approach could be equally powerful for the study of promoter and enhancer elements, splice sites, and other DNA regulatory sites that are highly variable at the level of individual nucleotide positions.

dc.identifier

https://www.ncbi.nlm.nih.gov/pubmed/12537561

dc.identifier.eissn

1474-760X

dc.identifier.uri

https://hdl.handle.net/10161/11484

dc.language

eng

dc.publisher

Springer Science and Business Media LLC

dc.relation.ispartof

Genome Biol

dc.subject

Animals

dc.subject

B-Lymphocytes

dc.subject

Chromosome Mapping

dc.subject

Chromosomes

dc.subject

Computational Biology

dc.subject

Conserved Sequence

dc.subject

DNA, Intergenic

dc.subject

Gene Rearrangement, B-Lymphocyte

dc.subject

Gene Rearrangement, T-Lymphocyte

dc.subject

Genetic Variation

dc.subject

Immunoglobulin Joining Region

dc.subject

Immunoglobulin Variable Region

dc.subject

Markov Chains

dc.subject

Mice

dc.subject

Models, Genetic

dc.subject

Nucleic Acid Conformation

dc.subject

Recombination, Genetic

dc.subject

Regulatory Sequences, Nucleic Acid

dc.subject

T-Lymphocytes

dc.title

Identification and utilization of arbitrary correlations in models of recombination signal sequences.

dc.type

Journal article

duke.contributor.orcid

Kelsoe, Garnett|0000-0002-8770-040X

pubs.author-url

https://www.ncbi.nlm.nih.gov/pubmed/12537561

pubs.begin-page

RESEARCH0072

pubs.issue

12

pubs.organisational-group

Basic Science Departments

pubs.organisational-group

Biostatistics & Bioinformatics

pubs.organisational-group

Duke

pubs.organisational-group

Duke Cancer Institute

pubs.organisational-group

Duke Human Vaccine Institute

pubs.organisational-group

Immunology

pubs.organisational-group

Institutes and Centers

pubs.organisational-group

School of Medicine

pubs.publication-status

Published

pubs.volume

3

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Identification and utilization of arbitrary correlations in models of recombination signal sequences.pdf
Size:
1.75 MB
Format:
Adobe Portable Document Format