BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules.

dc.contributor.author

Lin, Tzyy-Shyang

dc.contributor.author

Coley, Connor W

dc.contributor.author

Mochigase, Hidenobu

dc.contributor.author

Beech, Haley K

dc.contributor.author

Wang, Wencong

dc.contributor.author

Wang, Zi

dc.contributor.author

Woods, Eliot

dc.contributor.author

Craig, Stephen L

dc.contributor.author

Johnson, Jeremiah A

dc.contributor.author

Kalow, Julia A

dc.contributor.author

Jensen, Klavs F

dc.contributor.author

Olsen, Bradley D

dc.date.accessioned

2019-11-01T18:37:14Z

dc.date.available

2019-11-01T18:37:14Z

dc.date.issued

2019-09-12

dc.date.updated

2019-11-01T18:37:12Z

dc.description.abstract

Having a compact yet robust structurally based identifier or representation system is a key enabling factor for efficient sharing and dissemination of research results within the chemistry community, and such systems lay down the essential foundations for future informatics and data-driven research. While substantial advances have been made for small molecules, the polymer community has struggled in coming up with an efficient representation system. This is because, unlike other disciplines in chemistry, the basic premise that each distinct chemical species corresponds to a well-defined chemical structure does not hold for polymers. Polymers are intrinsically stochastic molecules that are often ensembles with a distribution of chemical structures. This difficulty limits the applicability of all deterministic representations developed for small molecules. In this work, a new representation system that is capable of handling the stochastic nature of polymers is proposed. The new system is based on the popular "simplified molecular-input line-entry system" (SMILES), and it aims to provide representations that can be used as indexing identifiers for entries in polymer databases. As a pilot test, the entries of the standard data set of the glass transition temperature of linear polymers (Bicerano, 2002) were converted into the new BigSMILES language. Furthermore, it is hoped that the proposed system will provide a more effective language for communication within the polymer community and increase cohesion between the researchers within the community.

dc.identifier.issn

2374-7943

dc.identifier.issn

2374-7951

dc.identifier.uri

https://hdl.handle.net/10161/19465

dc.language

eng

dc.publisher

American Chemical Society (ACS)

dc.relation.ispartof

ACS central science

dc.relation.isversionof

10.1021/acscentsci.9b00476

dc.subject

Science & Technology

dc.subject

Physical Sciences

dc.subject

Chemistry, Multidisciplinary

dc.subject

Chemistry

dc.subject

LANGUAGE

dc.subject

MACHINE

dc.subject

SMILES

dc.subject

REPRESENTATION

dc.subject

PREDICTION

dc.title

BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules.

dc.type

Journal article

duke.contributor.orcid

Craig, Stephen L|0000-0002-8810-0369

pubs.begin-page

1523

pubs.end-page

1531

pubs.issue

9

pubs.organisational-group

Trinity College of Arts & Sciences

pubs.organisational-group

Duke

pubs.organisational-group

Chemistry

pubs.publication-status

Published

pubs.volume

5

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules.pdf
Size:
1.76 MB
Format:
Adobe Portable Document Format
Description:
Published version