Identification and utilization of arbitrary correlations in models of recombination signal sequences.
Abstract
BACKGROUND: A significant challenge in bioinformatics is to develop methods for detecting
and modeling patterns in variable DNA sequence sites, such as protein-binding sites
in regulatory DNA. Current approaches sometimes perform poorly when positions in the
site do not independently affect protein binding. We developed a statistical technique
for modeling the correlation structure in variable DNA sequence sites. The method
places no restrictions on the number of correlated positions or on their spatial relationship
within the site. No prior empirical evidence for the correlation structure is necessary.
RESULTS: We applied our method to the recombination signal sequences (RSS) that direct
assembly of B-cell and T-cell antigen-receptor genes via V(D)J recombination. The
technique is based on model selection by cross-validation and produces models that
allow computation of an information score for any signal-length sequence. We also
modeled RSS using order zero and order one Markov chains. The scores from all models
are highly correlated with measured recombination efficiencies, but the models arising
from our technique are better than the Markov models at discriminating RSS from non-RSS.
CONCLUSIONS: Our model-development procedure produces models that estimate well the
recombinogenic potential of RSS and are better at RSS recognition than the order zero
and order one Markov models. Our models are, therefore, valuable for studying the
regulation of both physiologic and aberrant V(D)J recombination. The approach could
be equally powerful for the study of promoter and enhancer elements, splice sites,
and other DNA regulatory sites that are highly variable at the level of individual
nucleotide positions.
Type
Journal articleSubject
AnimalsB-Lymphocytes
Chromosome Mapping
Chromosomes
Computational Biology
Conserved Sequence
DNA, Intergenic
Gene Rearrangement, B-Lymphocyte
Gene Rearrangement, T-Lymphocyte
Genetic Variation
Immunoglobulin Joining Region
Immunoglobulin Variable Region
Markov Chains
Mice
Models, Genetic
Nucleic Acid Conformation
Recombination, Genetic
Regulatory Sequences, Nucleic Acid
T-Lymphocytes
Permalink
https://hdl.handle.net/10161/11484Collections
More Info
Show full item recordScholars@Duke
Lindsay Grey Cowell
Adjunct Assistant Professor in the Department of Biostatistics and Bioinformatics
Somatic Diversification of Lymphocyte Antigen Receptor Genes * V(D)J Recombination
* Somatic Hypermutation Biomedical Ontology * Ontological Representation of Cells
of Hematopoietic Lineage Biomedical Text Mining Logic-based Reasoning
Garnett H. Kelsoe
James B. Duke Distinguished Professor of Immunology
1. Lymphocyte development and antigen-driven diversification of immunoglobulin and
T cell antigen receptor genes. 2. The germinal center reaction and mechanisms for
clonal selection and self - tolerance. The origins of autoimmunity. 3. Interaction
of innate- and adaptive immunity and the role of inflammation in lymphoid organogenesis.
4. The role of secondary V(D)J gene rearrangment in lymphocyte development and malignancies.
5. Mathematical modeling of immune responses,
Thomas B. Kepler
Adjunct Professor in the Department of Immunology
Computational and Systems Immunology, Theoretical and Evolutionary Medicine
Alphabetical list of authors with Scholars@Duke profiles.

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info