The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.

dc.contributor.author

Williams, Christopher J

dc.contributor.author

Richardson, David C

dc.contributor.author

Richardson, Jane S

dc.date.accessioned

2023-12-01T18:52:06Z

dc.date.available

2023-12-01T18:52:06Z

dc.date.issued

2022-01

dc.date.updated

2023-12-01T18:52:03Z

dc.description.abstract

We have curated a high-quality, "best-parts" reference dataset of about 3 million protein residues in about 15,000 PDB-format coordinate files, each containing only residues with good electron density support for a physically acceptable model conformation. The resulting prefiltered data typically contain the entire core of each chain, in quite long continuous fragments. Each reference file is a single protein chain, and the total set of files were selected for low redundancy, high resolution, good MolProbity score, and other chain-level criteria. Then each residue was critically tested for adequate local map quality to firmly support its conformation, which must also be free of serious clashes or covalent-geometry outliers. The resulting Top2018 prefiltered datasets have been released on the Zenodo online web service and are freely available for all uses under a Creative Commons license. Currently, one dataset is residue filtered on main chain plus Cβ atoms, and a second dataset is full-residue filtered; each is available at four different sequence-identity levels. Here, we illustrate both statistics and examples that show the beneficial consequences of residue-level filtering. That process is necessary because even the best of structures contain a few highly disordered local regions with poor density and low-confidence conformations that should not be included in reference data. Therefore, the open distribution of these very large, prefiltered reference datasets constitutes a notable advance for structural bioinformatics and the fields that depend upon it.

dc.identifier.issn

0961-8368

dc.identifier.issn

1469-896X

dc.identifier.uri

https://hdl.handle.net/10161/29455

dc.language

eng

dc.publisher

Wiley

dc.relation.ispartof

Protein science : a publication of the Protein Society

dc.relation.isversionof

10.1002/pro.4239

dc.subject

Proteins

dc.subject

Crystallography, X-Ray

dc.subject

Computational Biology

dc.subject

Protein Conformation

dc.subject

Algorithms

dc.subject

Models, Molecular

dc.subject

Software

dc.subject

Databases, Protein

dc.title

The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.

dc.type

Journal article

duke.contributor.orcid

Richardson, Jane S|0000-0002-3311-2944

pubs.begin-page

290

pubs.end-page

300

pubs.issue

1

pubs.organisational-group

Duke

pubs.organisational-group

School of Medicine

pubs.organisational-group

Basic Science Departments

pubs.organisational-group

Biochemistry

pubs.publication-status

Published

pubs.volume

31

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.pdf
Size:
3.81 MB
Format:
Adobe Portable Document Format