ALERT: This system is being upgraded on Tuesday December 12. It will not be available
for use for several hours that day while the upgrade is in progress. Deposits to DukeSpace
will be disabled on Monday December 11, so no new items are to be added to the repository
while the upgrade is in progress. Everything should be back to normal by the end of
day, December 12.
Annotation of phenotypes using ontologies: a gold standard for the training and evaluation of natural language processing systems.
Abstract
Natural language descriptions of organismal phenotypes, a principal object of study
in biology, are abundant in the biological literature. Expressing these phenotypes
as logical statements using ontologies would enable large-scale analysis on phenotypic
information from diverse systems. However, considerable human effort is required to
make these phenotype descriptions amenable to machine reasoning. Natural language
processing tools have been developed to facilitate this task, and the training and
evaluation of these tools depend on the availability of high quality, manually annotated
gold standard data sets. We describe the development of an expert-curated gold standard
data set of annotated phenotypes for evolutionary biology. The gold standard was developed
for the curation of complex comparative phenotypes for the Phenoscape project. It
was created by consensus among three curators and consists of entity-quality expressions
of varying complexity. We use the gold standard to evaluate annotations created by
human curators and those generated by the Semantic CharaParser tool. Using four annotation
accuracy metrics that can account for any level of relationship between terms from
two phenotype annotations, we found that machine-human consistency, or similarity,
was significantly lower than inter-curator (human-human) consistency. Surprisingly,
allowing curatorsaccess to external information did not significantly increase the
similarity of their annotations to the gold standard or have a significant effect
on inter-curator consistency. We found that the similarity of machine annotations
to the gold standard increased after new relevant ontology terms had been added. Evaluation
by the original authors of the character descriptions indicated that the gold standard
annotations came closer to representing their intended meaning than did either the
curator or machine annotations. These findings point toward ways to better design
software to augment human curators and the use of the gold standard corpus will allow
training and assessment of new tools to improve phenotype annotation accuracy at scale.
Type
Journal articlePermalink
https://hdl.handle.net/10161/26579Published Version (Please cite this version)
10.1093/database/bay110Publication Info
Dahdul, Wasila; Manda, Prashanti; Cui, Hong; Balhoff, James P; Dececchi, T Alexander;
Ibrahim, Nizar; ... Mabee, Paula M (2018). Annotation of phenotypes using ontologies: a gold standard for the training and evaluation
of natural language processing systems. Database : the journal of biological databases and curation, 2018. 10.1093/database/bay110. Retrieved from https://hdl.handle.net/10161/26579.This is constructed from limited available data and may be imprecise. To cite this
article, please review & use the official citation provided by the journal.
Collections
More Info
Show full item recordScholars@Duke
Hilmar Lapp
Dir, IT

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info