Joint Data Modeling Using Variational Autoencoders

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Nervous systems are macroscopic nonequilibrium physical systems that produce intricate behaviors that remain difficult to analyze, classify, and understand. In my thesis, I develop and analyze a statistical technique based on machine learning thatattempts to improve upon previous efforts to analyze behavioral data by being multimodal, which means combining information from different kinds of observables to provide a better insight than what any one observable can provide. Many modern experiments have simultaneously recorded data from multiple sources (e.g., audio, video, neural data). It is of great interest to learn the relationship between different data sources. Multimodal datasets present a challenge for latent variable models as they must learn to capture not only the variance present within each data type, but also the relationships among types. Typically, this is done by training a collection of unimodal experts, the outputs of which are aggregated in a shared latent space. Here, building on recent developments in identifiable variational autoencoders (VAEs), I propose a new joint analysis method, the product of identifiable sufficient experts (POISE-VAE), which posits a latent representation unique to each modality, with latent spaces interacting via an undirected graphical model. This model guarantees identifiability of the latent spaces without the need for additional covariates, and given a simple yet flexible class of approximate posteriors, can be trained by maximizing an evidence lower bound approximated by Gibbs sampling. I show comparable performance to existing methods on a variety of toy and benchmark datasets in generating realistic samples, with applications to the simultaneous modeling of brain calcium imaging data and behavior. Then, I use the VAE framework to investigate the vocalization of hearing and deaf mice during courtship. It is of great interest to figure out if auditory feedback affects the vocalization produced by hearing and deaf mice. I use the low dimensional representation of data learnt by VAE to compare vocalizations produced in the two cases. My statistical analysis based on maximum mean discrepancy(MMD) yields no statistical difference in vocalization produced by the two groups. I conclude with a discussion on possible extensions of the model.






Kumar, Achint (2022). Joint Data Modeling Using Variational Autoencoders. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.