Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies.

dc.contributor.author

Dececchi, T Alexander

dc.contributor.author

Balhoff, James P

dc.contributor.author

Lapp, Hilmar

dc.contributor.author

Mabee, Paula M

dc.date.accessioned

2023-02-07T20:29:59Z

dc.date.available

2023-02-07T20:29:59Z

dc.date.issued

2015-11

dc.date.updated

2023-02-07T20:29:56Z

dc.description.abstract

The reality of larger and larger molecular databases and the need to integrate data scalably have presented a major challenge for the use of phenotypic data. Morphology is currently primarily described in discrete publications, entrenched in noncomputer readable text, and requires enormous investments of time and resources to integrate across large numbers of taxa and studies. Here we present a new methodology, using ontology-based reasoning systems working with the Phenoscape Knowledgebase (KB; kb.phenoscape.org), to automatically integrate large amounts of evolutionary character state descriptions into a synthetic character matrix of neomorphic (presence/absence) data. Using the KB, which includes more than 55 studies of sarcopterygian taxa, we generated a synthetic supermatrix of 639 variable characters scored for 1051 taxa, resulting in over 145,000 populated cells. Of these characters, over 76% were made variable through the addition of inferred presence/absence states derived by machine reasoning over the formal semantics of the source ontologies. Inferred data reduced the missing data in the variable character-subset from 98.5% to 78.2%. Machine reasoning also enables the isolation of conflicts in the data, that is, cells where both presence and absence are indicated; reports regarding conflicting data provenance can be generated automatically. Further, reasoning enables quantification and new visualizations of the data, here for example, allowing identification of character space that has been undersampled across the fin-to-limb transition. The approach and methods demonstrated here to compute synthetic presence/absence supermatrices are applicable to any taxonomic and phenotypic slice across the tree of life, providing the data are semantically annotated. Because such data can also be linked to model organism genetics through computational scoring of phenotypic similarity, they open a rich set of future research questions into phenotype-to-genome relationships.

dc.identifier

syv031

dc.identifier.issn

1063-5157

dc.identifier.issn

1076-836X

dc.identifier.uri

https://hdl.handle.net/10161/26576

dc.language

eng

dc.publisher

Oxford University Press (OUP)

dc.relation.ispartof

Systematic biology

dc.relation.isversionof

10.1093/sysbio/syv031

dc.subject

Animals

dc.subject

Data Interpretation, Statistical

dc.subject

Computational Biology

dc.subject

Phenotype

dc.subject

Classification

dc.subject

Amphibians

dc.subject

Biological Evolution

dc.subject

Biological Ontologies

dc.title

Toward Synthesizing Our Knowledge of Morphology: Using Ontologies and Machine Reasoning to Extract Presence/Absence Evolutionary Phenotypes across Studies.

dc.type

Journal article

duke.contributor.orcid

Lapp, Hilmar|0000-0001-9107-0714

pubs.begin-page

936

pubs.end-page

952

pubs.issue

6

pubs.organisational-group

Duke

pubs.organisational-group

Staff

pubs.publication-status

Published

pubs.volume

64

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Toward Synthesizing Our Knowledge of Morphology Using Ontologies and Machine Reasoning to Extract PresenceAbsence Evolutiona.pdf
Size:
1.87 MB
Format:
Adobe Portable Document Format
Description:
Published version