Molecular Dynamics and Machine Learning for Small Molecules and Proteins

Limited Access
This item is unavailable until:
2025-01-27

Date

2022

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

110
views
0
downloads

Abstract

Molecular dynamics (MD) simulation is an extremely powerful, highly effective, and widely used approach to understand the nature of chemical processes in atomic details for small molecules, biomolecules, and materials. The accuracy of MD simulation results is highly dependent on force fields. Quantum mechanical (QM) calculation has excellent accuracy, but the computational cost is not affordable for long MD simulations. Therefore, traditional molecular mechanical (MM) force fields, which divide energy into classical bond, angle, dihedral, electrostatic and van der Waals (vdW) terms, or hybrid QM/MM methods, which consider tradeoff between QM accuracy and MM efficiency, are generally utilized in MD simulations. Machine learning (ML) provides capability for generating an accurate potential at QM level without increasing much computational effort, and ML-based potentials had rapid development and widespread applications during the past decade. In this dissertation, we apply MD and ML techniques to develop new methods for simulating on small molecules and proteins. First, we train ML models to increase the accuracy of QM/MM from the semiempirical to the ab initio level. Active learning is performed to efficiently update ML models on the fly with gradient boosting technique, and new data from MD simulations are sampled according to the boundary of reference energies, distance-based clustering, and density-based clustering. Solvation free energies of small molecules obtained from QM/MM ML models show good agreement with experiment. Next, force fields based on neural network (NN) are constructed for QM/MM vdW interaction, which is normally described with Lennard-Jones (LJ) potential. We develop a new QM/MM NN architecture, dubbed QM-NN/MM-NN, and new input features based on center of mass for NN, which better describes non-bonded interactions than other descriptors. NN force fields greatly outperform LJ potentials and show good transferability to different small molecules. In addition, general and transferable NN force fields based on CHARMM force fields, named CHARMM-NN, are constructed for proteins, according to residue-based systematic molecular fragmentation method. NN is based on atom types and new input features that are similar to MM inputs are proposed, which enhances the compatibility of CHARMM-NN with MM MD. The validations on geometric data, relative potential energies and reorganization energies demonstrate that the potential energy minima of CHARMM-NN are very similar to QM, but the simulations of peptides and proteins indicate that the solvent effects and non-bonded interactions should be modeled in future development of NN force fields. Finally, we develop a piecewise approach to run all-atom steered MD (SMD) simulations within small water box, avoiding the huge amounts of computational resources required to run all-atom SMD simulations using a large water box. The robustness of this approach is validated with a small protein NI3C. Compared to coarse-grained SMD, the all-atom SMD simulations on luciferase reveal more atomic resolution details on force-extension plots and the key secondary structures related to mechanical stability in unfolding pathway.

Department

Description

Provenance

Citation

Citation

Zhang, Pan (2022). Molecular Dynamics and Machine Learning for Small Molecules and Proteins. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/26876.

Collections


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.