Show simple item record

Scalable Bayesian Matrix and Tensor Factorization for Discrete Data

dc.contributor.advisor Carin, Lawrence
dc.contributor.author Hu, Changwei
dc.date.accessioned 2017-05-16T17:27:26Z
dc.date.available 2017-05-16T17:27:26Z
dc.date.issued 2017
dc.identifier.uri https://hdl.handle.net/10161/14383
dc.description.abstract <p>Matrix and tensor factorization methods decompose the observed matrix and tensor data into a set of factor matrices. They provide a useful way to extract latent factors or features from complex data, and also to predict missing data. Matrix and tensor factorization has drawn significant attentions in a wide variety of applications, such as topic modeling, recommender system, and learning from social network and knowledge base. However, developing factorization methods for massive and sparse observations remains a challenge, especially when the data are binary or count-valued (which is true of most real-world data). In this thesis, we present a set of scalable Bayesian factorization models for low rank approximation of massive matrix or tensors with binary and count-valued observations. The proposed models enjoy the following properties: (1) The inference complexity scales linearly in the number of non-zeros in the data; (2) The side-information along a certain dimension, such as pairwise relationships (e.g., an adjacency network) between entities, can be easily leveraged to handle issues such as data sparsity, and the cold-start problem; (3) The proposed models have full local conjugacy, leading to simple, closed-form batch inference as well as online inference; (4) In contrast to many existing matrix and tensor factorization methods, in which factor matrices are usually assumed to be real-valued, we assume non-negativity on factor matrices. The non-negative factor matrices in our model provide easy interpretability; (5) For tensor factorization, the number of "topics", or in other words, the rank of tensor, can be inferred from the data. In this thesis, we evaluate the proposed models on a variety of real-world data sets, from diverse domains, such as analyzing scholarly text data, political science data, large-scale market transaction data and knowledge-graphs, etc.</p>
dc.subject Statistics
dc.subject Computer science
dc.subject Artificial intelligence
dc.title Scalable Bayesian Matrix and Tensor Factorization for Discrete Data
dc.type Dissertation
dc.department Electrical and Computer Engineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record