Deep Learning Algorithms for Automating and Accelerating the Cryo-EM Data Processing Pipeline
Abstract
Cryo-electron microscopy (cryo-EM) has solidified its position in the structural biology field as an invaluable method for achieving near-atomic resolution of macro-molecular structures in their native conditions. However, the inherently fragile nature of biological samples imposes stringent limitations on the electron doses that can be used during imaging, resulting in data characterized by notably low signal-to-noise ratios (SNR). To obtain a three-dimensional (3D) representation of these biological entities, substantial volumes of data need to be acquired and averaged in 3D to remove noise and improve resolution. The cryo-EM structure determination workflow involves many intricate steps, starting with sample preparation and vitrification, progressing to sample screening and data collection. During data analysis, macromolecular structures-of-interest need to be accurately identified and localized before they can be used for 3D reconstruction. A key challenge in this process is the extensive manual intervention and time required to analyze the large volumes of data that are necessary to achieve high-resolution. In this thesis, we propose strategies that harness the capabilities of deep learning to accelerate and reduce manual intervention during the data acquisition and image processing pipelines, with the goal of automating and streamlining the determination of protein structures of biomedical relevance.
To improve the efficiency of data collection, we introduce cryo-ZSSR, a deep-internal learning-based method that enables the determination of 3D structures at resolutions surpassing the limits imposed by the imaging system. By combining low magnification imaging with in-silico image super-resolution (SR), cryo-ZSSR accelerates cryo-EM data collection by allowing to include more particles in each exposure without sacrificing resolution. To mitigate the need for manual intervention and further streamline sample screening and data collection, we develop the Smartscope framework which leverages deep learning-based navigation techniques to enable specimen screening in a fully automated manner, significantly increasing efficiency and reducing operational costs. For data processing downstream, we introduce deep-learning based detection algorithms to streamline and automate particle identification both in 2D - single particle analysis (SPA), and 3D - cryo-electron tomography (CET). Our approach enables precise detection of proteins-of-interest with minimal human intervention while reducing detection time from days to minutes, allowing the analysis of larger datasets than previously possible.
Collectively, we show these methods substantially boost the efficiency of cryo-EM data acquisition and help streamline the SPA and CET image analysis pipelines, paving the way for the development of high-throughput strategies for high-resolution structure determination of biomolecules. We conclude this thesis by discussing the potential benefits and shortcomings of using deep learning-based algorithms in cryo-EM image analysis tasks.
Type
Department
Description
Provenance
Citation
Permalink
Citation
Huang, Qinwen (2023). Deep Learning Algorithms for Automating and Accelerating the Cryo-EM Data Processing Pipeline. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/30351.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.