Develop Novel Statistical and Computational Methods for Omics Data Analysis

dc.contributor.advisor

Xie, Jichun

dc.contributor.author

Gao, Qi

dc.date.accessioned

2024-03-07T18:39:00Z

dc.date.issued

2023

dc.department

Biostatistics and Bioinformatics Doctor of Philosophy

dc.description.abstract

Recent advances in sequencing technologies have enabled the measurement of gene expression and other omics profiles at multi-cell, single-cell or subcellular resolution. However, these advances also posed challenges for data analysis, such as identifying differentially expressed feature gene sets with high accuracy and benchmarking computational methods for various analysis topics on data with complex heterogeneity. In my dissertation, we have focused on developing novel statistical and computational methods to address these challenges.

In project 1, we developed SifiNet, a versatile pipeline to identify cell-subpopulation specific feature genes, annotate cell subpopulations, and reveal their relationships. The major advantage of SifiNet is that it bypasses cell clustering and thus avoids possible bias introduced by inaccurate clustering; thus, SifiNet achieves significantly higher accuracy in feature gene identification and cell annotation than tranditional two-step methods relying on clustering. SifiNet can analyze both single cell RNA sequencing (scRNA-seq) and single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) data, providing insight into multiomic cellular profile.

In project 2, we developed GeneScape, a novel scRNA-seq data simulator that can simulate complex cellular heterogeneity. Existing scRNA-seq data simulators are limited in their abilities to simulate data with complex or subtle cellular heterogeneities, especially for those cells exhibit both cell type and cell state differences (such as differences in cell cycles, senescence levels, and DNA-damage levels). GeneScape can successfully simulate gene expressions for cells with complex heterogeneity structures.

In project 3, we developed GeneScape-S (GeneScape-Spatial), a simulator for spatially resolved transcriptomics (SRT) data. Existing SRT-specific simulators cannot fulfill customized needs such as simulating multi-layer data, mimicking local tissue heterogeneity, and accommodating mixing cell-type structures in low-resolution spots. To fill these gaps, we propose GeneScape-S, which preserves the expression and spatial patterns of real SRT data, and offers specially designed functions tailored to fulfill customized needs. GeneScape-S also incorporates the features in GeneScape to simulate complex heterogeneities.

dc.identifier.uri

https://hdl.handle.net/10161/30257

dc.rights.uri

https://creativecommons.org/licenses/by-nc-nd/4.0/

dc.subject

Biostatistics

dc.subject

Bioinformatics

dc.title

Develop Novel Statistical and Computational Methods for Omics Data Analysis

dc.type

Dissertation

duke.embargo.months

23

duke.embargo.release

2026-02-07T18:39:00Z

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Gao_duke_0066D_17602.pdf
Size:
94.33 MB
Format:
Adobe Portable Document Format

Collections