BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules.


Having a compact yet robust structurally based identifier or representation system is a key enabling factor for efficient sharing and dissemination of research results within the chemistry community, and such systems lay down the essential foundations for future informatics and data-driven research. While substantial advances have been made for small molecules, the polymer community has struggled in coming up with an efficient representation system. This is because, unlike other disciplines in chemistry, the basic premise that each distinct chemical species corresponds to a well-defined chemical structure does not hold for polymers. Polymers are intrinsically stochastic molecules that are often ensembles with a distribution of chemical structures. This difficulty limits the applicability of all deterministic representations developed for small molecules. In this work, a new representation system that is capable of handling the stochastic nature of polymers is proposed. The new system is based on the popular "simplified molecular-input line-entry system" (SMILES), and it aims to provide representations that can be used as indexing identifiers for entries in polymer databases. As a pilot test, the entries of the standard data set of the glass transition temperature of linear polymers (Bicerano, 2002) were converted into the new BigSMILES language. Furthermore, it is hoped that the proposed system will provide a more effective language for communication within the polymer community and increase cohesion between the researchers within the community.





Published Version (Please cite this version)


Publication Info

Lin, Tzyy-Shyang, Connor W Coley, Hidenobu Mochigase, Haley K Beech, Wencong Wang, Zi Wang, Eliot Woods, Stephen L Craig, et al. (2019). BigSMILES: A Structurally-Based Line Notation for Describing Macromolecules. ACS central science, 5(9). pp. 1523–1531. 10.1021/acscentsci.9b00476 Retrieved from

This is constructed from limited available data and may be imprecise. To cite this article, please review & use the official citation provided by the journal.



Stephen L Craig

William T. Miller Distinguished Professor of Chemistry

Research interests in Prof. Craig's group bridge physical organic and materials chemistry. Many of these interests are guided by the vision that important challenges in materials science might be better tackled not from the traditional perspective of an engineer, but rather from the molecular perspective of an organic chemist. Current interests include the design and synthesis of self-healing polymers and the use of modern mechanochemistry in new stress-responsive polymers, catalysis, and the study of transition states and reactive intermediates. These areas require an interdisciplinary and nontraditional mix of synthetic organic and polymer chemistry, single-molecule spectroscopy, supramolecular chemistry, and materials characterization. Research interests are complemented by numerous teaching and outreach activities, including: (1) hosting intensive undergraduate and high school research experiences for a diverse group of both Duke and non-Duke students; (2) exploiting effective, scalable, and low-cost mechanisms for content dissemination; (3) team-based and active learning content in the undergraduate and graduate classroom.

Unless otherwise indicated, scholarly articles published by Duke faculty members are made available here with a CC-BY-NC (Creative Commons Attribution Non-Commercial) license, as enabled by the Duke Open Access Policy. If you wish to use the materials in ways not already permitted under CC-BY-NC, please consult the copyright owner. Other materials are made available here through the author’s grant of a non-exclusive license to make their work openly accessible.