Understanding Dimension Reduction for Data Visualization
Date
2024
Authors
Advisors
Journal Title
Journal ISSN
Volume Title
Repository Usage Stats
views
downloads
Abstract
Dimension Reduction (DR) algorithms have emerged as critical tools that allow scientists to gain insight into high-dimensional data. DR algorithms map high-dimensional data to a low dimensional embedding, enabling data visualization. A high-quality visualization can help the user to gain insights about cluster structure and distributional characteristics of the data. On the other hand, a low-quality DR visualization can create the appearance of structure in the data that does not actually exist. Without an understanding of the algorithms’ loss functions and what aspects of them have an impact on the embedding, it is difficult to substantially improve upon them. In addition, given the importance of gaining insights from DR, DR methods should be evaluated carefully before trusting their results.
My research presents frameworks to (1) obtain insights of how dimension reduction tools work, including understanding how the choices of loss functions and what graph components to include affect the final embedding of dimension reduction algorithms (Chapter 2); (2) systematically evaluate popular DR methods, including t-SNE, art-SNE, UMAP, PaCMAP, TriMap and ForceAtlas2, which can help us to choose DR tools that align with the scientific goals of the user (Chapter 3); and (3) three variants of PaCMAP that focus on addressing different aspects of dimension reduction, including BridgeMAP, LocalMAP and ParamRepulsor (Chapter 4, 5 and 6).
Type
Department
Description
Provenance
Subjects
Citation
Permalink
Citation
Wang, Yingfan (2024). Understanding Dimension Reduction for Data Visualization. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/32584.
Collections
Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.