EVALUATING AND INTERPRETING MACHINE LEARNING OUTPUTS IN GENOMICS DATA

dc.contributor.advisor

Xie, Jichun

dc.contributor.author

Fang, Jiyuan

dc.date.accessioned

2022-09-21T13:54:59Z

dc.date.available

2022-09-22T08:17:16Z

dc.date.issued

2022

dc.department

Biostatistics and Bioinformatics Doctor of Philosophy

dc.description.abstract

In my dissertation, we have developed statistical and computational tools to evaluate and interpret machine learning outputs in genomics data. The first two projects focus on single-cell RNA-sequencing (scRNA-seq) data. In project 1, we evaluated the fitting of widely-used distribution families on scRNA-seq UMI counts and concluded that UMI counts of polyclonal cells following gene-specific cell-type-specific NB distributions without zero- inflation. Based on this modeling, we proposed the working dispersion score (WDS) to select genes that differentially express across cell types. In project 2, we developed a new internal (unsupervised) index, Clustering Deviation Index (CDI), to evaluate cell label sets obtained from clustering algorithms. We conducted in silico and experimental scRNA-seq studies to show that CDI can select the optimal clustering label set. We also benchmarked CDI by comparing it with other internal indices in terms of the agreement with external indices using high-quality benchmark label sets. In addition, we demonstrated that CDI is more computationally efficient than other internal indices, especially for million-scale datasets. In project 3, we proposed a model-agnostic hypothesis testing framework to interpret feature interactions underneath complex machine learning models. The simulation study results demonstrated large power while controlling the type I error rate.

dc.identifier.uri

https://hdl.handle.net/10161/25813

dc.subject

Biostatistics

dc.title

EVALUATING AND INTERPRETING MACHINE LEARNING OUTPUTS IN GENOMICS DATA

dc.type

Dissertation

duke.embargo.months

-0.06575342465753424

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Fang_duke_0066D_16942.pdf
Size:
9.1 MB
Format:
Adobe Portable Document Format

Collections