Probabilistic Methods for Distributed Learning

Zhang, XianXing

Probabilistic Methods for Distributed Learning

dc.contributor.advisor	Carin, Lawrence
dc.contributor.author	Zhang, XianXing
dc.date.accessioned	2014-05-14T19:18:31Z
dc.date.available	2014-05-14T19:18:31Z
dc.date.issued	2014
dc.department	Electrical and Computer Engineering
dc.description.abstract	Access to data at massive scale has proliferated recently. A significant machine learning challenge concerns development of methods that efficiently model and learn from data at this scale, while retaining analysis flexibility and sophistication. Many statistical learning problems are formulated in terms of regularized empirical risk minimization [15]. To scale this method to big data that are becoming commonplace in various applications, it is desirable to efficiently extend empirical risk minimization to a large-scale setting. When the size of the data is too large to be stored on a single machine, or at least too large to keep in a single localized memory, one popular solution is to store and process the data in a distributed manner. Consequently, the focus of this dissertation is to study distributed learning algorithms [3] for empirical risk minimization problems. Toward this end we propose a series of probabilistic methods for divide-and-conquer distributed learning, with these methods accounting for an increasing set of challenges. The basic Maximum Entropy Mixture (MEM) method is first proposed, to model uncertainty caused by randomly partitioning the data across computing nodes. We then develop a hierarchical extension to MEM, termed hMEM, facilitating sharing of statistical strength among data blocks. Finally, to addresses small sample bias, we impose the constraint that the mean of inferred parameters is the same across all data blocks, yielding a hierarchical MEM with expectation constraint (termed hecMEM). Computations are performed with a generalized Expectation-Maximization algorithm. The hecMEM method achieves state-of-the-art results for distributed matrix completion and logistic regression at massive scale, with comparisons made to MEM, hMEM and several alternative approaches.
dc.identifier.uri	https://hdl.handle.net/10161/8728
dc.subject	Electrical engineering
dc.title	Probabilistic Methods for Distributed Learning
dc.type	Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zhang_duke_0066D_12366.pdf
Size:: 626.61 KB
Format:: Adobe Portable Document Format

Download

Collections

Dissertations