Deep Learning Algorithms for Automating and Accelerating the Cryo-EM Data Processing Pipeline

dc.contributor.advisor

Bartesaghi, Alberto

dc.contributor.author

Huang, Qinwen

dc.date.accessioned

2024-03-07T18:39:48Z

dc.date.issued

2023

dc.department

Computer Science

dc.description.abstract

Cryo-electron microscopy (cryo-EM) has solidified its position in the structural biology field as an invaluable method for achieving near-atomic resolution of macro-molecular structures in their native conditions. However, the inherently fragile nature of biological samples imposes stringent limitations on the electron doses that can be used during imaging, resulting in data characterized by notably low signal-to-noise ratios (SNR). To obtain a three-dimensional (3D) representation of these biological entities, substantial volumes of data need to be acquired and averaged in 3D to remove noise and improve resolution. The cryo-EM structure determination workflow involves many intricate steps, starting with sample preparation and vitrification, progressing to sample screening and data collection. During data analysis, macromolecular structures-of-interest need to be accurately identified and localized before they can be used for 3D reconstruction. A key challenge in this process is the extensive manual intervention and time required to analyze the large volumes of data that are necessary to achieve high-resolution. In this thesis, we propose strategies that harness the capabilities of deep learning to accelerate and reduce manual intervention during the data acquisition and image processing pipelines, with the goal of automating and streamlining the determination of protein structures of biomedical relevance.

To improve the efficiency of data collection, we introduce cryo-ZSSR, a deep-internal learning-based method that enables the determination of 3D structures at resolutions surpassing the limits imposed by the imaging system. By combining low magnification imaging with in-silico image super-resolution (SR), cryo-ZSSR accelerates cryo-EM data collection by allowing to include more particles in each exposure without sacrificing resolution. To mitigate the need for manual intervention and further streamline sample screening and data collection, we develop the Smartscope framework which leverages deep learning-based navigation techniques to enable specimen screening in a fully automated manner, significantly increasing efficiency and reducing operational costs. For data processing downstream, we introduce deep-learning based detection algorithms to streamline and automate particle identification both in 2D - single particle analysis (SPA), and 3D - cryo-electron tomography (CET). Our approach enables precise detection of proteins-of-interest with minimal human intervention while reducing detection time from days to minutes, allowing the analysis of larger datasets than previously possible.

Collectively, we show these methods substantially boost the efficiency of cryo-EM data acquisition and help streamline the SPA and CET image analysis pipelines, paving the way for the development of high-throughput strategies for high-resolution structure determination of biomolecules. We conclude this thesis by discussing the potential benefits and shortcomings of using deep learning-based algorithms in cryo-EM image analysis tasks.

dc.identifier.uri

https://hdl.handle.net/10161/30351

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Computer science

dc.subject

Biochemistry

dc.subject

Algorithms

dc.subject

Cryo electron tomography

dc.subject

Cryo-EM

dc.subject

Deep learning

dc.subject

Single particle analysis

dc.title

Deep Learning Algorithms for Automating and Accelerating the Cryo-EM Data Processing Pipeline

dc.type

Dissertation

duke.embargo.months

11

duke.embargo.release

2025-02-07T18:39:48Z

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Huang_duke_0066D_17729.pdf
Size:
165.03 MB
Format:
Adobe Portable Document Format

Collections