Efficient and Scalable Deep Learning

Wen, Wei

Efficient and Scalable Deep Learning

dc.contributor.advisor	Li, Hai
dc.contributor.advisor	Chen, Yiran
dc.contributor.author	Wen, Wei
dc.date.accessioned	2020-02-10T17:28:11Z
dc.date.available	2020-02-10T17:28:11Z
dc.date.issued	2019
dc.department	Electrical and Computer Engineering
dc.description.abstract	Deep Neural Networks (DNNs) can achieve accuracy superior to traditional machine learning models, because of their large learning capacity and the availability of large amounts of labeled data. In general, larger DNNs can obtain higher accuracy. However, there are two obstacles which hinder us building larger DNNs: (1) inference of large DNNs is slow which limits their deployment to small devices; (2) training large DNNs is also slow which slows down research exploration. To remove those obstacles, this dissertation focuses on acceleration of DNN inference and training. To accelerate DNN inference, original DNNs are compressed while keeping original accuracy. More specific, Structurally Sparse Deep Neural Networks (SSDNNs) are proposed to remove neural components. In Convolutional Neural Networks (CNNs), neurons, filters, channels and layers can be removed; in Recurrent Neural Networks (RNNs), hidden sizes can be reduced. The study shows that SSDNNs can achieve higher speedup than sparse DNNs which have non-structured sparsity. Besides SSDNNs, a Force Regularization is proposed to enforce DNNs to lower-rank space, such that DNNs can be decomposed to lower-rank architectures with fewer ranks than traditional methods. The dissertation also demonstrates that SSDNNs and Force Regularization are orthogonal and can be combined for higher speedup. To accelerate DNN training, distributed deep learning is required. However, two problems hinder us using more compute nodes for higher training speed: Communication Bottleneck and Generalization Gap. Communication Bottleneck is that communication time will increase and dominate when the distributed systems scale to many compute nodes. To reduce gradient communication in Stochastic Gradient Descent (SGD), SGD with low-precision gradients (TernGrad) is proposed. Moreover, in distributed deep learning, a large batch size is required to exploit system computing power; unfortunately, accuracy will decrease when the batch size is very large, which is referred to as the Generalization Gap. One hypothesis to explain Generalization Gap is that large-batch SGD sticks at sharp minima. The dissertation proposes a stochastic smoothing (SmoothOut) to escape sharp minima. The dissertation will show that TernGrad overcomes Communication Bottleneck and SmoothOut helps to close the Generalization Gap.
dc.identifier.uri	https://hdl.handle.net/10161/20143
dc.subject	Artificial intelligence
dc.subject	Computer science
dc.subject	Computer engineering
dc.subject	Deep neural networks
dc.subject	Distributed Training
dc.subject	Model Compression
dc.subject	Quantization
dc.subject	Sharp Minima
dc.subject	Sparsity
dc.title	Efficient and Scalable Deep Learning
dc.type	Dissertation

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Wen_duke_0066D_15445.pdf
Size:: 4.44 MB
Format:: Adobe Portable Document Format

Download

Collections

Dissertations