Scalable Bayesian Matrix and Tensor Factorization for Discrete Data

Thumbnail Image



Journal Title

Journal ISSN

Volume Title

Repository Usage Stats



Matrix and tensor factorization methods decompose the observed matrix and tensor data into a set of factor matrices. They provide a useful way to extract latent factors or features from complex data, and also to predict missing data. Matrix and tensor factorization has drawn significant attentions in a wide variety of applications, such as topic modeling, recommender system, and learning from social network and knowledge base. However, developing factorization methods for massive and sparse observations remains a challenge, especially when the data are binary or count-valued (which is true of most real-world data). In this thesis, we present a set of scalable Bayesian factorization models for low rank approximation of massive matrix or tensors with binary and count-valued observations. The proposed models enjoy the following properties: (1) The inference complexity scales linearly in the number of non-zeros in the data; (2) The side-information along a certain dimension, such as pairwise relationships (e.g., an adjacency network) between entities, can be easily leveraged to handle issues such as data sparsity, and the cold-start problem; (3) The proposed models have full local conjugacy, leading to simple, closed-form batch inference as well as online inference; (4) In contrast to many existing matrix and tensor factorization methods, in which factor matrices are usually assumed to be real-valued, we assume non-negativity on factor matrices. The non-negative factor matrices in our model provide easy interpretability; (5) For tensor factorization, the number of "topics", or in other words, the rank of tensor, can be inferred from the data. In this thesis, we evaluate the proposed models on a variety of real-world data sets, from diverse domains, such as analyzing scholarly text data, political science data, large-scale market transaction data and knowledge-graphs, etc.





Hu, Changwei (2017). Scalable Bayesian Matrix and Tensor Factorization for Discrete Data. Dissertation, Duke University. Retrieved from


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.