||<p>As of late 2009, the Protein Data Bank (PDB) has grown to contain over 70,000 models.
This recent increase in the amount of structural data allows for more extensive explication
of the governing principles of macromolecular folding and association to complement
traditional studies focused on a single molecule or complex. PDB-wide characterization
of structural features yields insights that are useful in prediction and validation
of the 3D structure of macromolecules and their complexes. Here, these insights lead
to a deeper understanding of protein--protein interfaces, full-atom critical assessment
of increasingly more accurate structure predictions, a better defined library of RNA
backbone conformers for validation and building 3D models, and knowledge-based automatic
correction of errors in protein sidechain rotamers. </p><p>My study of protein--protein
interfaces identifies amino acid pairing preferences in a set of 146 transient interfaces.
Using a geometric interface surface definition devoid of arbitrary cutoffs common
to previous studies of interface composition, I calculate inter- and intrachain amino
acid pairing preferences. As expected, salt-bridges and hydrophobic patches are prevalent,
but likelihood correction of observed pairing frequencies reveals some surprising
pairing preferences, such as Cys-His interchain pairs and Met-Met intrachain pairs.
To complement my statistical observations, I introduce a 2D visualization of the 3D
interface surface that can display a variety of interface characteristics, including
residue type, atomic distance and backbone/sidechain composition. </p><p>My study
of protein interiors finds that 3D structure prediction from sequence (as part of
the CASP experiment) is very close to full-atom accuracy. Validation of structure
prediction should therefore consider all atom positions instead of the traditional
Calpha-only evaluation. I introduce six new full-model quality criteria to assess
the accuracy of CASP predictions, which demonstrate that groups who use structural
knowledge culled from the PDB to inform their prediction protocols produce the most
accurate results. </p><p>My study of RNA backbone introduces a set of rotamer-like
"suite" conformers. Initially hand-identified by the Richardson laboratory, these
7D conformers represent backbone segments that are found to be genuine and favorable.
X-ray crystallographers can use backbone conformers for model building in often poor
backbone density and in validation after refinement. Increasing amounts of high quality
RNA data allow for improved conformer identification, but also complicate hand-curation.
I demonstrate that affinity propagation successfully differentiates between two related
but distinct suite conformers, and is a useful tool for automated conformer clustering.
</p><p>My study of protein sidechain rotamers in X-ray structures identifies a class
of systematic errors that results in sidechains misfit by approximately 180 degrees.
I introduce Autofix, a method for automated detection and correction of such errors.
Autofix corrects over 40% of errors for Leu, Thr, and Val residues, and a significant
number of Arg residues. On average, Autofix made four corrections per PDB file in
945 X-ray structures. Autofix will be implemented into MolProbity and PHENIX for
easy integration into X-ray crystallography workflows.</p>