BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules.
Abstract
Having a compact yet robust structurally based identifier or representation system
is a key enabling factor for efficient sharing and dissemination of research results
within the chemistry community, and such systems lay down the essential foundations
for future informatics and data-driven research. While substantial advances have been
made for small molecules, the polymer community has struggled in coming up with an
efficient representation system. This is because, unlike other disciplines in chemistry,
the basic premise that each distinct chemical species corresponds to a well-defined
chemical structure does not hold for polymers. Polymers are intrinsically stochastic
molecules that are often ensembles with a distribution of chemical structures. This
difficulty limits the applicability of all deterministic representations developed
for small molecules. In this work, a new representation system that is capable of
handling the stochastic nature of polymers is proposed. The new system is based on
the popular "simplified molecular-input line-entry system" (SMILES), and it aims to
provide representations that can be used as indexing identifiers for entries in polymer
databases. As a pilot test, the entries of the standard data set of the glass transition
temperature of linear polymers (Bicerano, 2002) were converted into the new BigSMILES
language. Furthermore, it is hoped that the proposed system will provide a more effective
language for communication within the polymer community and increase cohesion between
the researchers within the community.
Type
Journal articleSubject
Science & TechnologyPhysical Sciences
Chemistry, Multidisciplinary
Chemistry
LANGUAGE
MACHINE
SMILES
REPRESENTATION
PREDICTION
Permalink
https://hdl.handle.net/10161/19465Published Version (Please cite this version)
10.1021/acscentsci.9b00476Publication Info
Lin, Tzyy-Shyang; Coley, Connor W; Mochigase, Hidenobu; Beech, Haley K; Wang, Wencong;
Wang, Zi; ... Olsen, Bradley D (2019). BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules. ACS central science, 5(9). pp. 1523-1531. 10.1021/acscentsci.9b00476. Retrieved from https://hdl.handle.net/10161/19465.This is constructed from limited available data and may be imprecise. To cite this
article, please review & use the official citation provided by the journal.
Collections
More Info
Show full item recordScholars@Duke
Stephen L Craig
William T. Miller Distinguished Professor of Chemistry
Research interests in Prof. Craig's group bridge physical organic and materials chemistry.
Many of these interests are guided by the vision that important challenges in materials
science might be better tackled not from the traditional perspective of an engineer,
but rather from the molecular perspective of an organic chemist. Current interests
include the design and synthesis of self-healing polymers and the use of modern mechanochemistry
in new stress-responsive polymers, catalysis, and the st

Articles written by Duke faculty are made available through the campus open access policy. For more information see: Duke Open Access Policy
Rights for Collection: Scholarly Articles
Works are deposited here by their authors, and represent their research and opinions, not that of Duke University. Some materials and descriptions may include offensive content. More info