Probabilistic Methods for Distributed Learning

Loading...
Thumbnail Image

Date

2014

Advisors

Carin, Lawrence

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

384
views
784
downloads

Abstract

Access to data at massive scale has proliferated recently. A significant machine learning challenge concerns development of methods that efficiently model and learn from data at this scale, while retaining analysis flexibility and sophistication.

Many statistical learning problems are formulated in terms of regularized empirical risk minimization [15]. To scale this method to big data that are becoming commonplace in various applications, it is desirable to efficiently extend empirical risk minimization to a large-scale setting. When the size of the data is too large to be stored on a single machine, or at least too large to keep in a single localized memory, one popular solution is to store and process the data in a distributed manner. Consequently, the focus of this dissertation is to study distributed learning algorithms [3] for empirical risk minimization problems.

Toward this end we propose a series of probabilistic methods for divide-and-conquer distributed learning, with these methods accounting for an increasing set of challenges. The basic Maximum Entropy Mixture (MEM) method is first proposed, to model uncertainty caused by randomly partitioning the data across computing nodes. We then develop a hierarchical extension to MEM, termed hMEM, facilitating sharing of statistical strength among data blocks. Finally, to addresses small sample bias, we impose the constraint that the mean of inferred parameters is the same across all data blocks, yielding a hierarchical MEM with expectation constraint (termed hecMEM). Computations are performed with a generalized Expectation-Maximization algorithm. The hecMEM method achieves state-of-the-art results for distributed matrix completion and logistic regression at massive scale, with comparisons made to MEM, hMEM and several alternative approaches.

Description

Provenance

Citation

Citation

Zhang, XianXing (2014). Probabilistic Methods for Distributed Learning. Dissertation, Duke University. Retrieved from https://hdl.handle.net/10161/8728.

Collections


Dukes student scholarship is made available to the public using a Creative Commons Attribution / Non-commercial / No derivative (CC-BY-NC-ND) license.