dc.description.abstract |
<p>Subspaces and manifolds are two powerful models for high dimensional signals. Subspaces
model linear correlation and are a good fit to signals generated by physical systems,
such as frontal images of human faces and multiple sources impinging at an antenna
array. Manifolds model sources that are not linearly correlated, but where signals
are determined by a small number of parameters. Examples are images of human faces
under different poses or expressions, and handwritten digits with varying styles.
However, there will always be some degree of model mismatch between the subspace or
manifold model and the true statistics of the source. This dissertation exploits subspace
and manifold models as prior information in various signal processing and machine
learning tasks.</p><p>A near-low-rank Gaussian mixture model measures proximity to
a union of linear or affine subspaces. This simple model can effectively capture the
signal distribution when each class is near a subspace. This dissertation studies
how the pairwise geometry between these subspaces affects classification performance.
When model mismatch is vanishingly small, the probability of misclassification is
determined by the product of the sines of the principal angles between subspaces.
When the model mismatch is more significant, the probability of misclassification
is determined by the sum of the squares of the sines of the principal angles. Reliability
of classification is derived in terms of the distribution of signal energy across
principal vectors. Larger principal angles lead to smaller classification error, motivating
a linear transform that optimizes principal angles. This linear transformation, termed
TRAIT, also preserves some specific features in each class, being complementary to
a recently developed Low Rank Transform (LRT). Moreover, when the model mismatch is
more significant, TRAIT shows superior performance compared to LRT.</p><p>The manifold
model enforces a constraint on the freedom of data variation. Learning features that
are robust to data variation is very important, especially when the size of the training
set is small. A learning machine with large numbers of parameters, e.g., deep neural
network, can well describe a very complicated data distribution. However, it is also
more likely to be sensitive to small perturbations of the data, and to suffer from
suffer from degraded performance when generalizing to unseen (test) data.</p><p>From
the perspective of complexity of function classes, such a learning machine has a huge
capacity (complexity), which tends to overfit. The manifold model provides us with
a way of regularizing the learning machine, so as to reduce the generalization error,
therefore mitigate overfiting. Two different overfiting-preventing approaches are
proposed, one from the perspective of data variation, the other from capacity/complexity
control. In the first approach, the learning machine is encouraged to make decisions
that vary smoothly for data points in local neighborhoods on the manifold. In the
second approach, a graph adjacency matrix is derived for the manifold, and the learned
features are encouraged to be aligned with the principal components of this adjacency
matrix. Experimental results on benchmark datasets are demonstrated, showing an obvious
advantage of the proposed approaches when the training set is small.</p><p>Stochastic
optimization makes it possible to track a slowly varying subspace underlying streaming
data. By approximating local neighborhoods using affine subspaces, a slowly varying
manifold can be efficiently tracked as well, even with corrupted and noisy data. The
more the local neighborhoods, the better the approximation, but the higher the computational
complexity. A multiscale approximation scheme is proposed, where the local approximating
subspaces are organized in a tree structure. Splitting and merging of the tree nodes
then allows efficient control of the number of neighbourhoods. Deviation (of each
datum) from the learned model is estimated, yielding a series of statistics for anomaly
detection. This framework extends the classical {\em changepoint detection} technique,
which only works for one dimensional signals. Simulations and experiments highlight
the robustness and efficacy of the proposed approach in detecting an abrupt change
in an otherwise slowly varying low-dimensional manifold.</p>
|
|