Probabilistic Methods for Distributed Learning

dc.contributor.advisor

Carin, Lawrence

dc.contributor.author

Zhang, XianXing

dc.date.accessioned

2014-05-14T19:18:31Z

dc.date.available

2014-05-14T19:18:31Z

dc.date.issued

2014

dc.department

Electrical and Computer Engineering

dc.description.abstract

Access to data at massive scale has proliferated recently. A significant machine learning challenge concerns development of methods that efficiently model and learn from data at this scale, while retaining analysis flexibility and sophistication.

Many statistical learning problems are formulated in terms of regularized empirical risk minimization [15]. To scale this method to big data that are becoming commonplace in various applications, it is desirable to efficiently extend empirical risk minimization to a large-scale setting. When the size of the data is too large to be stored on a single machine, or at least too large to keep in a single localized memory, one popular solution is to store and process the data in a distributed manner. Consequently, the focus of this dissertation is to study distributed learning algorithms [3] for empirical risk minimization problems.

Toward this end we propose a series of probabilistic methods for divide-and-conquer distributed learning, with these methods accounting for an increasing set of challenges. The basic Maximum Entropy Mixture (MEM) method is first proposed, to model uncertainty caused by randomly partitioning the data across computing nodes. We then develop a hierarchical extension to MEM, termed hMEM, facilitating sharing of statistical strength among data blocks. Finally, to addresses small sample bias, we impose the constraint that the mean of inferred parameters is the same across all data blocks, yielding a hierarchical MEM with expectation constraint (termed hecMEM). Computations are performed with a generalized Expectation-Maximization algorithm. The hecMEM method achieves state-of-the-art results for distributed matrix completion and logistic regression at massive scale, with comparisons made to MEM, hMEM and several alternative approaches.

dc.identifier.uri

https://hdl.handle.net/10161/8728

dc.subject

Electrical engineering

dc.title

Probabilistic Methods for Distributed Learning

dc.type

Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhang_duke_0066D_12366.pdf
Size:
626.61 KB
Format:
Adobe Portable Document Format

Collections