dc.description.abstract |
<p>Matrix and tensor factorization methods decompose the observed matrix and tensor
data into a set of factor matrices. They provide a useful way to extract latent factors
or features from complex data, and also to predict missing data. Matrix and tensor
factorization has drawn significant attentions in a wide variety of applications,
such as topic modeling, recommender system, and learning from social network and knowledge
base. However, developing factorization methods for massive and sparse observations
remains a challenge, especially when the data are binary or count-valued (which is
true of most real-world data). In this thesis, we present a set of scalable Bayesian
factorization models for low rank approximation of massive matrix or tensors with
binary and count-valued observations. The proposed models enjoy the following properties:
(1) The inference complexity scales linearly in the number of non-zeros in the data;
(2) The side-information along a certain dimension, such as pairwise relationships
(e.g., an adjacency network) between entities, can be easily leveraged to handle issues
such as data sparsity, and the cold-start problem; (3) The proposed models have full
local conjugacy, leading to simple, closed-form batch inference as well as online
inference; (4) In contrast to many existing matrix and tensor factorization methods,
in which factor matrices are usually assumed to be real-valued, we assume non-negativity
on factor matrices. The non-negative factor matrices in our model provide easy interpretability;
(5) For tensor factorization, the number of "topics", or in other words, the rank
of tensor, can be inferred from the data. In this thesis, we evaluate the proposed
models on a variety of real-world data sets, from diverse domains, such as analyzing
scholarly text data, political science data, large-scale market transaction data and
knowledge-graphs, etc.</p>
|
|