Browsing by Subject "Materials informatics"
- Results Per Page
- Sort Options
Item Open Access Data curation of a findable, accessible, interoperable, reusable polymer nanocomposites data resource - MaterialsMine(2022) Hu, BingyinA polymer nanocomposite (PNC) is a composite material consisting of a polymer matrix and stiff fillers with at least one dimension smaller than 100 nm. With the addition of a small amount of filler to the polymer matrix, PNC demonstrates large reinforcement of mechanical, viscoelastic, dielectric, thermal, optical, and other physiochemical properties as compared to pure polymer or pure fillers acting alone. PNCs have thus attracted significant amounts of research interest over recent years. To accelerate materials design, we need findable, accessible, interoperable, and reusable (FAIR) data resources to provide sufficient data for data-driven approaches to replace the traditional trial-and-error style of exploration in a lab. With the goal to build a FAIR data resource for the PNC community, we built NanoMine in 2016, which later evolves into MaterialsMine with the extension of MetaMine in the metamaterial domain. To be FAIR, we need a clear and extensible data representation to enable the interoperable knowledge exchange. We thus designed the NanoMine XML schema. With the data framework and data representation in place, we still need tools and a user-friendly interface for data curation. This dissertation describes in detail the tools and data interfaces we developed to ensure a smooth data curation pathway for NanoMine/MaterialsMine. To reduce and prevent curation errors and thus improve data quality, we need data validation mechanisms. To address the need, we discuss the validation mechanisms embedded both during and after the curation. On many occasions, even without human-caused curation errors, the data resource cannot perform to its full capacity due to data inconsistencies. For example, the inconsistency of polymer indexing caused by the lack of uniformity in expression of polymer names and the inconsistent use of mass fraction and volume fraction in specifying the composite composition. To address the need for data standardization, tools developed to bypass manual curation, the mass fraction – volume fraction conversion agent, and ChemProps, a RESTful API-enabled multi-algorithm-based polymer/filler name mapping methodology, are discussed in detail in this dissertation. To create truly powerful and transformative materials design paradigms and towards a sustainable future for MaterialsMine, we need to harness the power of AI to efficiently extract a significant set of data from the published, archival literature. Natural Language Processing (NLP) offers an opportunity to make this data accessible and readily reusable by humans and machines. The first step is to generate a sample list where curators can easily find the number of samples, their compositions, and properties reported in the paper. The task is handled in a pretraining-finetuning fashion. Downstream tasks include Named Entity Recognition (NER) to detect sample code, sample composition, property, and group reference to samples in the captions, and Relation Extraction (RE) which predicts the relations between pairs of detected named entities. In this dissertation, a detailed discussion of how the two corpora for pretraining and finetuning are constructed is provided. A T5-base model pretrained on the caption-mention corpus and finetuned for the NER and RE tasks is proposed. We evaluated it along with an array of BERT-based models and seq2seq models for potential applications in semi-automated curation pipeline for MaterialsMine.
Item Open Access Machine Learning, Phase Stability, and Disorder with the Automatic Flow Framework for Materials Discovery(2018) Oses, CoreyTraditional materials discovery approaches - relying primarily on laborious experiments - have controlled the pace of technology. Instead, computational approaches offer an accelerated path: high-throughput exploration and characterization of virtual structures. These ventures, performed by automated ab-initio frameworks, have rapidly expanded the volume of programmatically-accessible data, cultivating opportunities for data-driven approaches. Herein, a collection of robust characterization methods are presented, implemented within the Automatic Flow Framework for Materials Discovery (AFLOW), that leverages materials data for the prediction of phase diagrams and properties of disordered materials. These methods directly address the issue of materials synthesizability, bridging the gap between simulation and experiment. Powering these predictions is the AFLOW.org repository for inorganic crystals, the largest and most comprehensive database of its kind, containing more than 2 million compounds with about 100 different properties computed for each. As calculated with standardized parameter sets, the wealth of data also presents a favorable learning environment. Machine learning algorithms are employed for property prediction, descriptor development, design rule discovery, and the identification of candidate functional materials. When combined with physical models and intelligently formulated descriptors, the data becomes a powerful tool, facilitating the discovery of new materials for applications ranging from high-temperature superconductors to thermoelectrics. These methods have been validated by the synthesis of two new permanent magnets introduced herein - the first discovered by computational approaches.