Browsing by Author "Carin, Lawrence"
- Results Per Page
- Sort Options
Item Open Access A Deep-Learning Algorithm for Thyroid Malignancy Prediction From Whole Slide Cytopathology ImagesDov, David; Kovalsky, Shahar Z; Assaad, Serge; Cohen, Jonathan; Range, Danielle Elliott; Pendse, Avani A; Henao, Ricardo; Carin, LawrenceWe consider thyroid-malignancy prediction from ultra-high-resolution whole-slide cytopathology images. We propose a deep-learning-based algorithm that is inspired by the way a cytopathologist diagnoses the slides. The algorithm identifies diagnostically relevant image regions and assigns them local malignancy scores, that in turn are incorporated into a global malignancy prediction. We discuss the relation of our deep-learning-based approach to multiple-instance learning (MIL) and describe how it deviates from classical MIL methods by the use of a supervised procedure to extract relevant regions from the whole-slide. The analysis of our algorithm further reveals a close relation to hypothesis testing, which, along with unique characteristics of thyroid cytopathology, allows us to devise an improved training strategy. We further propose an ordinal regression framework for the simultaneous prediction of thyroid malignancy and an ordered diagnostic score acting as a regularizer, which further improves the predictions of the network. Experimental results demonstrate that the proposed algorithm outperforms several competing methods, achieving performance comparable to human experts.Item Open Access A host transcriptional signature for presymptomatic detection of infection in humans exposed to influenza H1N1 or H3N2.(PLoS One, 2013) Woods, Christopher W; McClain, Micah T; Chen, Minhua; Zaas, Aimee K; Nicholson, Bradly P; Varkey, Jay; Veldman, Timothy; Kingsmore, Stephen F; Kingsmore, Stephen F; Huang, Yongsheng; Lambkin-Williams, Robert; Gilbert, Anthony G; Hero, Alfred O; Ramsburg, Elizabeth; Glickman, Seth; Lucas, Joseph E; Carin, Lawrence; Ginsburg, Geoffrey SThere is great potential for host-based gene expression analysis to impact the early diagnosis of infectious diseases. In particular, the influenza pandemic of 2009 highlighted the challenges and limitations of traditional pathogen-based testing for suspected upper respiratory viral infection. We inoculated human volunteers with either influenza A (A/Brisbane/59/2007 (H1N1) or A/Wisconsin/67/2005 (H3N2)), and assayed the peripheral blood transcriptome every 8 hours for 7 days. Of 41 inoculated volunteers, 18 (44%) developed symptomatic infection. Using unbiased sparse latent factor regression analysis, we generated a gene signature (or factor) for symptomatic influenza capable of detecting 94% of infected cases. This gene signature is detectable as early as 29 hours post-exposure and achieves maximal accuracy on average 43 hours (p = 0.003, H1N1) and 38 hours (p-value = 0.005, H3N2) before peak clinical symptoms. In order to test the relevance of these findings in naturally acquired disease, a composite influenza A signature built from these challenge studies was applied to Emergency Department patients where it discriminates between swine-origin influenza A/H1N1 (2009) infected and non-infected individuals with 92% accuracy. The host genomic response to Influenza infection is robust and may provide the means for detection before typical clinical symptoms are apparent.Item Open Access A modular simulation system for the bidomain equationsPormann, JCardiac arrhythmias and fibrillation are potentially life threatening diseases that can result from the improper conduction of electrical impulses in the heart. Experimental study of such cardiac abnormalities are dangerous at best, often requiring the subject to be placed in fibrillation for some time before attempting a large ``rescue'' shock. Thus, most all studies are done in animals and not humans. Furthermore, there is some indication that heart size may have considerable implications for fibrillation and other conduction abnormalities. Thus animal models for defibrillation studies must be chosen with great care. As an alternative, researchers are now using computer simulation to study the factors that generate and sustain arrhythmias, hoping to obtain at least preliminary data to guide fewer, more targeted experimental studies. Computer simulations of the Bidomain Equations have become very complex as they have been applied to many problems in cardiac electrophysiology. More complex membrane dynamics, irregular grids, and 3-D data sets are all being investigated. Software engineering principles will need to be applied to manage this continuing growth in complexity. We propose a modular framework for development of a Simulation System whereby a researcher may mix and match program elements to generate a simulator tailored to their particular problem. The modular approach will simplify the generation and maintenance of the different program elements and it will enable the end-researcher to determine the proper mix of complexity versus speed for their particular problem of interest. The contrary approach, one monolithic program which can run all simulations of all complexities, is simply unrealistic. It would impose too great a burden on maintenance and upgradability, and it would be difficult to provide good performance for a wide range of applications. The modular approach also allows for the incremental inclusion of various complexities in the bidomain model. From a simple 2-D homogeneous, isotropic regular grid, monodomain simulation, we can progress, step by step, to a bidomain simulation with a fully implicit time-integration scheme on irregular, 3-D grids with arbitrary anisotropy and inhomogeneity, with a non-trivial membrane model. Simulations with such a wealth of complexity have not been performed to date. As microprocessors have become cheaper and more powerful, parallel computing has become more widespread. Machines with hundreds of high-performance CPUs connected by fast networks are commonplace and are now capable of surpassing traditional vector-based supercomputers in terms of overall performance. The Simulation System presented here incorporates data-parallelism to allow large scale Bidomain problems to be run on these newest parallel supercomputers. The large amount of distributed memory in such machines can be harnessed to allow extremely large scale simulations to be run. The large number of CPUs provide a tremendous amount of computational power which can be used to run such simulations more quickly. Finally, the results presented here show that a modular Simulation System is feasible for a wide range of pplications, and that it can obtain very good performance over this range of applications. The parallel speed-up seen was very good, regularly achieving a factor of 13 speed-up on 16 processors. The results presented here also show that we can simulate bidomain problems using an implicit time-integrator with an irregular, anisotropic and inhomogeneous, grid and a non-trivial membrane model. We are able to run such simulations on parallel computers, thereby harnessing a tremendous amount of memory and computational resources. Such simulations have not been run to date.Item Open Access An active learning approach for rapid characterization of endothelial cells in human tumors.(PLoS One, 2014) Padmanabhan, Raghav K; Somasundar, Vinay H; Griffith, Sandra D; Zhu, Jianliang; Samoyedny, Drew; Tan, Kay See; Hu, Jiahao; Liao, Xuejun; Carin, Lawrence; Yoon, Sam S; Flaherty, Keith T; Dipaola, Robert S; Heitjan, Daniel F; Lal, Priti; Feldman, Michael D; Roysam, Badrinath; Lee, William MFCurrently, no available pathological or molecular measures of tumor angiogenesis predict response to antiangiogenic therapies used in clinical practice. Recognizing that tumor endothelial cells (EC) and EC activation and survival signaling are the direct targets of these therapies, we sought to develop an automated platform for quantifying activity of critical signaling pathways and other biological events in EC of patient tumors by histopathology. Computer image analysis of EC in highly heterogeneous human tumors by a statistical classifier trained using examples selected by human experts performed poorly due to subjectivity and selection bias. We hypothesized that the analysis can be optimized by a more active process to aid experts in identifying informative training examples. To test this hypothesis, we incorporated a novel active learning (AL) algorithm into FARSIGHT image analysis software that aids the expert by seeking out informative examples for the operator to label. The resulting FARSIGHT-AL system identified EC with specificity and sensitivity consistently greater than 0.9 and outperformed traditional supervised classification algorithms. The system modeled individual operator preferences and generated reproducible results. Using the results of EC classification, we also quantified proliferation (Ki67) and activity in important signal transduction pathways (MAP kinase, STAT3) in immunostained human clear cell renal cell carcinoma and other tumors. FARSIGHT-AL enables characterization of EC in conventionally preserved human tumors in a more automated process suitable for testing and validating in clinical trials. The results of our study support a unique opportunity for quantifying angiogenesis in a manner that can now be tested for its ability to identify novel predictive and response biomarkers.Item Open Access An integrated transcriptome and expressed variant analysis of sepsis survival and death.(Genome Med, 2014) Tsalik, Ephraim L; Langley, Raymond J; Dinwiddie, Darrell L; Miller, Neil A; Yoo, Byunggil; van Velkinburgh, Jennifer C; Smith, Laurie D; Thiffault, Isabella; Jaehne, Anja K; Valente, Ashlee M; Henao, Ricardo; Yuan, Xin; Glickman, Seth W; Rice, Brandon J; McClain, Micah T; Carin, Lawrence; Corey, G Ralph; Ginsburg, Geoffrey S; Cairns, Charles B; Otero, Ronny M; Fowler, Vance G; Rivers, Emanuel P; Woods, Christopher W; Kingsmore, Stephen FBACKGROUND: Sepsis, a leading cause of morbidity and mortality, is not a homogeneous disease but rather a syndrome encompassing many heterogeneous pathophysiologies. Patient factors including genetics predispose to poor outcomes, though current clinical characterizations fail to identify those at greatest risk of progression and mortality. METHODS: The Community Acquired Pneumonia and Sepsis Outcome Diagnostic study enrolled 1,152 subjects with suspected sepsis. We sequenced peripheral blood RNA of 129 representative subjects with systemic inflammatory response syndrome (SIRS) or sepsis (SIRS due to infection), including 78 sepsis survivors and 28 sepsis non-survivors who had previously undergone plasma proteomic and metabolomic profiling. Gene expression differences were identified between sepsis survivors, sepsis non-survivors, and SIRS followed by gene enrichment pathway analysis. Expressed sequence variants were identified followed by testing for association with sepsis outcomes. RESULTS: The expression of 338 genes differed between subjects with SIRS and those with sepsis, primarily reflecting immune activation in sepsis. Expression of 1,238 genes differed with sepsis outcome: non-survivors had lower expression of many immune function-related genes. Functional genetic variants associated with sepsis mortality were sought based on a common disease-rare variant hypothesis. VPS9D1, whose expression was increased in sepsis survivors, had a higher burden of missense variants in sepsis survivors. The presence of variants was associated with altered expression of 3,799 genes, primarily reflecting Golgi and endosome biology. CONCLUSIONS: The activation of immune response-related genes seen in sepsis survivors was muted in sepsis non-survivors. The association of sepsis survival with a robust immune response and the presence of missense variants in VPS9D1 warrants replication and further functional studies. TRIAL REGISTRATION: ClinicalTrials.gov NCT00258869. Registered on 23 November 2005.Item Open Access Application of Stochastic Processes in Nonparametric Bayes(2014) Wang, YingjianThis thesis presents theoretical studies of some stochastic processes and their appli- cations in the Bayesian nonparametric methods. The stochastic processes discussed in the thesis are mainly the ones with independent increments - the Levy processes. We develop new representations for the Levy measures of two representative exam- ples of the Levy processes, the beta and gamma processes. These representations are manifested in terms of an infinite sum of well-behaved (proper) beta and gamma dis- tributions, with the truncation and posterior analyses provided. The decompositions provide new insights into the beta and gamma processes (and their generalizations), and we demonstrate how the proposed representation unifies some properties of the two, as these are of increasing importance in machine learning.
Next a new Levy process is proposed for an uncountable collection of covariate- dependent feature-learning measures; the process is called the kernel beta process. Available covariates are handled efficiently via the kernel construction, with covari- ates assumed observed with each data sample ("customer"), and latent covariates learned for each feature ("dish"). The dependencies among the data are represented with the covariate-parameterized kernel function. The beta process is recovered as a limiting case of the kernel beta process. An efficient Gibbs sampler is developed for computations, and state-of-the-art results are presented for image processing and music analysis tasks.
Last is a non-Levy process example of the multiplicative gamma process applied in the low-rank representation of tensors. The multiplicative gamma process is applied along the super-diagonal of tensors in the rank decomposition, with its shrinkage property nonparametrically learns the rank from the multiway data. This model is constructed as conjugate for the continuous multiway data case. For the non- conjugate binary multiway data, the Polya-Gamma auxiliary variable is sampled to elicit closed-form Gibbs sampling updates. This rank decomposition of tensors driven by the multiplicative gamma process yields state-of-art performance on various synthetic and benchmark real-world datasets, with desirable model scalability.
Item Open Access Applications of Deep Representation Learning to Natural Language Processing and Satellite Imagery(2020) Wang, GuoyinDeep representation learning has shown its effectiveness in many tasks such as text classification and image processing. Many researches have been done to directly improve the representation quality. However, how to improve the representation quality by cooperating ancillary data source or by interacting with other representations is still not fully explored. Also, using representation learning to help other tasks is worth further exploration.
In this work, we explore these directions by solving various problems in natural language processing and image processing. In the natural language processing part, we first discuss how to introduce alternative representations to improve the original representation quality and hence boost the model performance. We then discuss a text representation matching algorithm. By introducing such matching algorithm, we can better align different text representations in text generation models and hence improve the generation qualities.
For the image processing part, we consider a real-world air condition prediction problem: ground-level $PM_{2.5}$ estimation. To solve this problem, we introduce a joint model to improve image representation learning by incorporating image encoder with ancillary data source and random forest model. We the further extend this model with ranking information for semi-supervised learning setup. The semi-supervised model can then utilize low-cost sensors for $PM_{2.5}$ estimation.
Finally, we introduce a recurrent kernel machine concept to explain the representation interaction mechanism within time-dependent neural network models and hence unified a variety of algorithms into a generalized framework.
Item Open Access Bayesian and Information-Theoretic Learning of High Dimensional Data(2012) Chen, MinhuaThe concept of sparseness is harnessed to learn a low dimensional representation of high dimensional data. This sparseness assumption is exploited in multiple ways. In the Bayesian Elastic Net, a small number of correlated features are identified for the response variable. In the sparse Factor Analysis for biomarker trajectories, the high dimensional gene expression data is reduced to a small number of latent factors, each with a prototypical dynamic trajectory. In the Bayesian Graphical LASSO, the inverse covariance matrix of the data distribution is assumed to be sparse, inducing a sparsely connected Gaussian graph. In the nonparametric Mixture of Factor Analyzers, the covariance matrices in the Gaussian Mixture Model are forced to be low-rank, which is closely related to the concept of block sparsity.
Finally in the information-theoretic projection design, a linear projection matrix is explicitly sought for information-preserving dimensionality reduction. All the methods mentioned above prove to be effective in learning both simulated and real high dimensional datasets.
Item Open Access Bayesian Gaussian Copula Factor Models for Mixed Data.(J Am Stat Assoc, 2013-06-01) Murray, Jared S; Dunson, David B; Carin, Lawrence; Lucas, Joseph EGaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa.Item Restricted Bayesian inference of the number of factors in gene-expression analysis: application to human virus challenge studies(BMC BIOINFORMATICS, 2010-11-09) Chen, Bo; Chen, Minhua; Paisley, John; Zaas, Aimee; Woods, Christopher; Ginsburg, Geoffrey S; Hero, Alfred; Lucas, Joseph; Dunson, David; Carin, LawrenceItem Open Access Bayesian Learning with Dependency Structures via Latent Factors, Mixtures, and Copulas(2016) Han, ShaoboBayesian methods offer a flexible and convenient probabilistic learning framework to extract interpretable knowledge from complex and structured data. Such methods can characterize dependencies among multiple levels of hidden variables and share statistical strength across heterogeneous sources. In the first part of this dissertation, we develop two dependent variational inference methods for full posterior approximation in non-conjugate Bayesian models through hierarchical mixture- and copula-based variational proposals, respectively. The proposed methods move beyond the widely used factorized approximation to the posterior and provide generic applicability to a broad class of probabilistic models with minimal model-specific derivations. In the second part of this dissertation, we design probabilistic graphical models to accommodate multimodal data, describe dynamical behaviors and account for task heterogeneity. In particular, the sparse latent factor model is able to reveal common low-dimensional structures from high-dimensional data. We demonstrate the effectiveness of the proposed statistical learning methods on both synthetic and real-world data.
Item Open Access Bayesian Nonparametric Modeling of Latent Structures(2014) Xing, ZhengmingUnprecedented amount of data has been collected in diverse fields such as social network, infectious disease and political science in this information explosive era. The high dimensional, complex and heterogeneous data imposes tremendous challenges on traditional statistical models. Bayesian nonparametric methods address these challenges by providing models that can fit the data with growing complexity. In this thesis, we design novel Bayesian nonparametric models on dataset from three different fields, hyperspectral images analysis, infectious disease and voting behaviors.
First, we consider analysis of noisy and incomplete hyperspectral imagery, with the objective of removing the noise and inferring the missing data. The noise statistics may be wavelength-dependent, and the fraction of data missing (at random) may be substantial, including potentially entire bands, offering the potential to significantly reduce the quantity of data that need be measured. We achieve this objective by employing Bayesian dictionary learning model, considering two distinct means of imposing sparse dictionary usage and drawing the dictionary elements from a Gaussian process prior, imposing structure on the wavelength dependence of the dictionary elements.
Second, a Bayesian statistical model is developed for analysis of the time-evolving properties of infectious disease, with a particular focus on viruses. The model employs a latent semi-Markovian state process, and the state-transition statistics are driven by three terms: ($i$) a general time-evolving trend of the overall population, ($ii$) a semi-periodic term that accounts for effects caused by the days of the week, and ($iii$) a regression term that relates the probability of infection to covariates (here, specifically, to the Google Flu Trends data).
Third, extensive information on 3 million randomly sampled United States citizens is used to construct a statistical model of constituent preferences for each U.S. congressional district. This model is linked to the legislative voting record of the legislator from each district, yielding an integrated model for constituency data, legislative roll-call votes, and the text of the legislation. The model is used to examine the extent to which legislators' voting records are aligned with constituent preferences, and the implications of that alignment (or lack thereof) on subsequent election outcomes. The analysis is based on a Bayesian nonparametric formalism, with fast inference via a stochastic variational Bayesian analysis.
Item Open Access Coded aperture compressive temporal imaging.(Opt Express, 2013-05-06) Llull, Patrick; Liao, Xuejun; Yuan, Xin; Yang, Jianbo; Kittle, David; Carin, Lawrence; Sapiro, Guillermo; Brady, David JWe use mechanical translation of a coded aperture for code division multiple access compression of video. We discuss the compressed video's temporal resolution and present experimental results for reconstructions of > 10 frames of temporal data per coded snapshot.Item Open Access Convolutional neural network to identify symptomatic Alzheimer's disease using multimodal retinal imaging.(The British journal of ophthalmology, 2020-11-26) Wisely, C Ellis; Wang, Dong; Henao, Ricardo; Grewal, Dilraj S; Thompson, Atalie C; Robbins, Cason B; Yoon, Stephen P; Soundararajan, Srinath; Polascik, Bryce W; Burke, James R; Liu, Andy; Carin, Lawrence; Fekrat, SharonBACKGROUND/AIMS:To develop a convolutional neural network (CNN) to detect symptomatic Alzheimer's disease (AD) using a combination of multimodal retinal images and patient data. METHODS:Colour maps of ganglion cell-inner plexiform layer (GC-IPL) thickness, superficial capillary plexus (SCP) optical coherence tomography angiography (OCTA) images, and ultra-widefield (UWF) colour and fundus autofluorescence (FAF) scanning laser ophthalmoscopy images were captured in individuals with AD or healthy cognition. A CNN to predict AD diagnosis was developed using multimodal retinal images, OCT and OCTA quantitative data, and patient data. RESULTS:284 eyes of 159 subjects (222 eyes from 123 cognitively healthy subjects and 62 eyes from 36 subjects with AD) were used to develop the model. Area under the receiving operating characteristic curve (AUC) values for predicted probability of AD for the independent test set varied by input used: UWF colour AUC 0.450 (95% CI 0.282, 0.592), OCTA SCP 0.582 (95% CI 0.440, 0.724), UWF FAF 0.618 (95% CI 0.462, 0.773), GC-IPL maps 0.809 (95% CI 0.700, 0.919). A model incorporating all images, quantitative data and patient data (AUC 0.836 (CI 0.729, 0.943)) performed similarly to models only incorporating all images (AUC 0.829 (95% CI 0.719, 0.939)). GC-IPL maps, quantitative data and patient data AUC 0.841 (95% CI 0.739, 0.943). CONCLUSION:Our CNN used multimodal retinal images to successfully predict diagnosis of symptomatic AD in an independent test set. GC-IPL maps were the most useful single inputs for prediction. Models including only images performed similarly to models also including quantitative data and patient data.Item Open Access Deep Automatic Threat Recognition: Considerations for Airport X-Ray Baggage Screening(2020) Liang, Kevin JDeep learning has made significant progress in recent years, contributing to major advancements in many fields. One such field is automatic threat recognition, where methods based on neural networks have surpassed more traditional machine learning methods. In particular, we evaluate the performance of convolutional object detection models within the context of X-ray baggage screening at airport checkpoints. To do so, we collected a large dataset of scans containing threats from a diverse set of classes, and then trained and compared a number of models. Many currently deployed X-ray scanners contain multiple X-ray emitter-detector pairs arranged to give multiple views of the scanned object, and we find that combining predictions from these improves overall performance. We select the best-performing models fitting our design criteria and integrate them into the X-ray scanning machines, resulting in functional prototypes capable of simulating live screening deployment.
We also explore a number of subfields of deep learning with potential to improve these deep automatic threat recognition algorithms. For example, as data collection efforts are scaled up and the number threat categories are expanded, the likelihood of missing annotations will also increase, especially if this new data is collected from real airport traffic. Such a setting is actually common in object detection datasets, and we show that a positive-unlabeled learning assumption better fits the characteristics of the data. Additionally, real-world data distributions tend to drift over time or evolve cyclically with the seasons. Baggage scan images also tend to be sensitive, meaning storing data may represent a security or privacy risk. As a result, a continual learning setting may be more appropriate for these kinds of data, which we examine in the context of generative adversarial networks. Finally, the sensitivity of security applications makes understanding models especially important. We thus spend some time examining how certain popular neural networks emerge from assumptions made starting from kernel methods. Through these works, we find that deep learning methods show considerable promise to improve existing automatic threat recognition systems.
Item Open Access Deep Generative Models for Image Representation Learning(2018) Pu, YunchenRecently there has been increasing interest in developing generative models of data, offering the promise of learning based on the often vast quantity of unlabeled data. With such learning, one typically seeks to build rich, hierarchical probabilistic models that are able to
fit to the distribution of complex real data, and are also capable of realistic data synthesis. In this dissertation, novel models and learning algorithms are proposed for deep generative models.
This disseration consists of three main parts.
The first part developed a deep generative model joint analysis of images and associated labels or captions. The model is efficiently learned using variational autoencoder. A multilayered (deep) convolutional dictionary representation is employed as a decoder of the
latent image features. Stochastic unpooling is employed to link consecutive layers in the image model, yielding top-down image generation. A deep Convolutional Neural Network (CNN) is used as an image encoder; the CNN is used to approximate a distribution for the latent DGDN features/code. The latent code is also linked to generative models for labels (Bayesian support vector machine) or captions (recurrent neural network). When predicting a label/caption for a new image at test, averaging is performed across the distribution of latent codes; this is computationally efficient as a consequence of the learned CNN-based encoder. Since the framework is capable of modeling the image in the presence/absence of associated labels/captions, a new semi-supervised setting is manifested for CNN learning with images; the framework even allows unsupervised CNN learning, based on images alone. Excellent results are obtained on several benchmark datasets, including ImageNet, demonstrating that the proposed model achieves results that are highly competitive with similarly sized convolutional neural networks.
The second part developed a new method for learning variational autoencoders (VAEs), based on Stein variational gradient descent. A key advantage of this approach is that one need not make parametric assumptions about the form of the encoder distribution. Performance is further enhanced by integrating the proposed encoder with importance sampling. Excellent performance is demonstrated across multiple unsupervised and semi-supervised problems, including semi-supervised analysis of the ImageNet data, demonstrating the scalability of the model to large datasets.
The third part developed a new form of variational autoencoder, in which the joint distribution of data and codes is considered in two (symmetric) forms: (i) from observed data fed through the encoder to yield codes, and (ii) from latent codes drawn from a simple
prior and propagated through the decoder to manifest data. Lower bounds are learned for marginal log-likelihood fits observed data and latent codes. When learning with the variational bound, one seeks to minimize the symmetric Kullback-Leibler divergence of
joint density functions from (i) and (ii), while simultaneously seeking to maximize the two marginal log-likelihoods. To facilitate learning, a new form of adversarial training is developed. An extensive set of experiments is performed, in which we demonstrate state-of-the-art data reconstruction and generation on several image benchmark datasets.
Item Open Access Deep Generative Models for Vision and Language Intelligence(2018) Gan, ZheDeep generative models have achieved tremendous success in recent years, with applications in various tasks involving vision and language intelligence. In this dissertation, I will mainly discuss the contributions that I have made in this field during my Ph.D. study. Specifically, the dissertation is divided into two parts.
In the first part, I will mainly focus on one specific kind of deep directed generative model, called Sigmoid Belief Network (SBN). First, I will present a fully Bayesian algorithm for efficient learning and inference of SBN. Second, since the original SBN can be only used for binary image modeling, I will also discuss the generalization of it to model spare count-valued data for topic modeling, and sequential data for motion capture synthesis, music generation and dynamic topic modeling.
In the second part, I will mainly focus on visual captioning (i.e., image-to-text generation), and conditional image synthesis. Specifically, I will first present Semantic Compositional Network for visual captioning, and emphasize interpretability and controllability revealed in the learning algorithm, via a mixture-of-experts design, and the usage of detected semantic concepts. I will then present Triangle Generative Adversarial Network, which is a general framework that can be used for joint distribution matching and learning the bidirectional mappings between two different domains. We consider the joint modeling of image-label, image-image and image-attribute pairs, with applications in semi-supervised image classification, image-to-image translation and attribute-based image editing.
Item Open Access Deep Generative Models for Vision, Languages and Graphs(2019) Wang, WenlinDeep generative models have achieved remarkable success in modeling various types of data, ranging from vision, languages and graphs etc. They offer flexible and complementary representations for both labeled and unlabeled data. Moreover, they are naturally capable of generating realistic data. In this thesis, novel variations of generative models have been proposed for various learning tasks, which can be categorized into three parts.
In the first part, generative models are designed to learn generalized representation for images under Zero-Shot Learning (ZSL) setting. An attribute conditioned variational autoencoder is introduced, representing each class as a latent-space distribution and enabling learning highly discriminative and robust feature representations. It endows the generative model discriminative power by choosing one class that maximize the variational lower bound. I further show that the model can be naturally generalized to transductive and few-shot setting.
In the second part, generative models are proposed for controllable language generation. Specifically, two types of topic enrolled language generation models have been proposed. The first introduces a topic compositional neural language model for controllable and interpretable language generation via a mixture-of-expert model design. While the second solve the problem via a VAE framework with a topic-conditioned GMM model design. Both of the two models have boosted the performance of existing language generation systems with controllable properties.
In the third part, generative models are introduced for the broaden graph data. First, a variational homophilic embedding (VHE) model is proposed. It is a fully generative model that learns network embeddings by modeling the textual semantic information with a variational autoencoder, while accounting for the graph structure information through a homophilic prior design. Secondly, for the heterogeneous multi-task learning, a novel graph-driven generative model is developed to unifies them into the same framework. It combines graph convolutional network (GCN) with multiple VAEs, thus embedding the nodes of graph in a uniform manner while specializing their organization and usage to different tasks.
Item Open Access Deep Latent-Variable Models for Natural Language Understanding and Generation(2020) Shen, DinghanDeep latent-variable models have been widely adopted to model various types of data, due to its ability to: 1) infer rich high-level information from the input data (especially in a low-resource setting); 2) result in a generative network that can synthesize samples unseen during training. In this dissertation, I will present the contributions I have made to leverage the general framework of latent-variable model to various natural language processing problems, which is especially challenging given the discrete nature of text sequences. Specifically, the dissertation is divided into two parts.
In the first part, I will present two of my recent explorations on leveraging deep latent-variable models for natural language understanding. The goal here is to learn meaningful text representations that can be helpful for tasks such as sentence classification, natural language inference, question answering, etc. Firstly, I will propose a variational autoencoder based on textual data to digest unlabeled information. To alleviate the observed posterior collapse issue, a specially-designed deconvolutional decoder is employed as the generative network. The resulting sentence embeddings greatly boost the downstream tasks performances. Then I will present a model to learn compressed/binary sentence embeddings, which is storage-efficient and applicable to on-device applications.
As to the second part, I will introduce a multi-level Variational Autoencoder (VAE) to model long-form text sequences (with as many as 60 words). A multi-level generative network is leveraged to capture the word-level, sentence-level coherence, respectively. Moreover, with a hierarchical design of the latent space, long-form and coherent texts can be more reliably produced (relative to baseline text VAE models). Semantically-rich latent representations are also obtained in such an unsupervised manner. Human evaluation further demonstrates the superiority of the proposed method.
Item Open Access Dependent Hierarchical Bayesian Models for Joint Analysis of Social Networks and Associated Text(2012) Wang, Eric XunThis thesis presents spatially and temporally dependent hierarchical Bayesian models for the analysis of social networks and associated textual data. Social network analysis has received significant recent attention and has been applied to fields as varied as analysis of Supreme Court votes, Congressional roll call data, and inferring links between authors of scientific papers. In many traditional social network analysis models, temporal and spatial dependencies are not considered due to computational difficulties, even though significant such dependencies often play a significant role in the underlying generative process of the observed social network data.
Thus motivated, this thesis presents four new models that consider spatial and/or temporal dependencies and (when available) the associated text. The first is a time-dependent (dynamic) relational topic model that models nodes by their relevant documents and uses probit regression construction to map topic overlap between nodes to a link. The second is a factor model with dynamic random effects that is used to analyze the voting patterns of the United States Supreme Court. hTe last two models present the primary contribution of this thesis two spatially and temporally dependent models that jointly analyze legislative roll call data and the their associated legislative text and introduce a new paradigm for social network factor analysis: being able to predict new columns (or rows) of matrices from the text. The first uses a nonparametric joint clustering approach to link the factor and topic models while the second uses a text regression construction. Finally, two other models on analysis of and tracking in video are also presented and discussed.
- «
- 1 (current)
- 2
- 3
- »