Browsing by Author "Maggioni, Mauro"
- Results Per Page
- Sort Options
Item Open Access A maximum entropy-based approach for the description of the conformational ensemble of calmodulin from paramagnetic NMR(2016-05-04) Thelot, FrancoisCharacterizing protein dynamics is an essential step towards a better understanding of protein function. Experimentally, we can access information about protein dynamics from paramagnetic NMR data such as pseudocontact shifts, which integrate ensemble-averaged information about the motion of proteins. In this report, we recognize that the relative position of the two domains of calmodulin can be represented as the evolution of one of the domains in the space of Euclidean motions. From this perspective, we suggest a maximum entropy-based approach for finding a probability distribution on SE(3) satisfying experimental NMR measurements. While sampling of SE(3) is performed with the ensemble generator EOM, the proposed framework can be extended to uniform sampling of the space of Euclidean motions. At the end of this study, we find that the most represented protein conformations for calmodulin corresponds to conformations in which both protein domains are in close contact, despite being largely different from each other. Such a representation agrees with the random coil linker model, and sharply differs with the extended crystal structure of calmodulin.Item Open Access Atlas Simulation: A Numerical Scheme for Approximating Multiscale Diffusions Embedded in High Dimensions(2014) Crosskey, Miles MartinWhen simulating multiscale stochastic differential equations (SDEs) in high-dimensions, separation of timescales and high-dimensionality can make simulations expensive. The computational cost is dictated by microscale properties and interactions of many variables, while interesting behavior often occurs on the macroscale with few important degrees of freedom. For many problems bridging the gap between the microscale and macroscale by direct simulation is computationally infeasible, and one would like to learn a fast macroscale simulator. In this paper we present an unsupervised learning algorithm that uses short parallelizable microscale simulations to learn provably accurate macroscale SDE models. The learning algorithm takes as input: the microscale simulator, a local distance function, and a homogenization scale. The learned macroscale model can then be used for fast computation and storage of long simulations. I will discuss various examples, both low- and high-dimensional, as well as results about the accuracy of the fast simulators we construct, and its dependency on the number of short paths requested from the microscale simulator.
Item Open Access Dimensionality Reduction and Learning on Networks(2011) Balachandran, PrakashMachine learning is a powerful branch of mathematics and statistics that allows the automation of tasks that would otherwise require humans a long time to perform. Two particular fields of machine learning that have been developing in the last two decades are dimensionality reduction and semi-supervised learning.
Dimensionality reduction is a powerful tool in the analysis of high dimensional data by reducing the number of variables under consideration while approximately preserving some quantity of interest (usually pairwise distances). Methods such as Principal Component Analysis (PCA) or Isometric Feature Mapping (ISOMAP) do this do this by embedding the data, equipped with a nonnegative, symmetric, similarity kernel or adjacency matrix into Euclidean space and finding a linear subspace or low dimensional submanifold which best fits the data, respectively.
When the data takes the form of network data, how to perform such dimensionality reduction intrinsically, without resorting to an embedding, that can be extended to the case of nonnegative, non-symmetric adjacency matrices remains an important open problem. In the first part of my dissertation, using current techniques in local spectral clustering to partition the network using a Markov process induced by the adjacency matrix, we deliver an intrinsic dimensionality reduction of the network in terms of a non-Markov process on a reduced state space that approximately preserves transitions of the original Markov process between clusters. By iterating the process, one obtains a family of non-Markov processes on successively finer state spaces representing the original network ands its diffusion at different scales, which can be used to approximate the law of the original process at a particular time scale. We give applications of this theory to a variety of synthetic data sets and evaluate its performance accordingly.
Next, consider the case of detecting astronomical phenomenon solely in terms of the light intensities observed. There already exists a large database of prior recorded phenomena that has been categorized by humans as a function of the observed light intensity. Given these so-called class labels then, how can we automate the procedure of extending these class labels to the massive amount of data that is currently being observed? This is the problem of concern in semi-supervised learning.
In the second part of my thesis, we consider data sets in which relations between data points are more complex than simply pairwise. Examples include gene networks where the the data points are random variables, and similarities of a subset are measured by non-independence of the corresponding random variables. Such data sets can be illustrated as a hypergraph, and the natural question for diagnosis becomes: how does one perform transductive inference (a particular form of semi-supervised learning)? Using the simple case of pairwise and threewise similarities, we construct a reversible random walk on undirected edges induced by threewise relations (faces). By pulling the random walk back to a random walk on the vertex set and mixing it with the random walk induced by pairwise similarities, we perform diffusive transductive inference. We present applications and results of this technique, any analyze its performance on a variety of data sets.
Item Open Access Estimating the Intrinsic Dimension of High-Dimensional Data Sets: A Multiscale, Geometric Approach(2011) Little, Anna VictoriaThis work deals with the problem of estimating the intrinsic dimension of noisy, high-dimensional point clouds. A general class of sets which are locally well-approximated by k dimensional planes but which are embedded in a D>>k dimensional Euclidean space are considered. Assuming one has samples from such a set, possibly corrupted by high-dimensional noise, if the data is linear the dimension can be recovered using PCA. However, when the data is non-linear, PCA fails, overestimating the intrinsic dimension. A multiscale version of PCA is thus introduced which is robust to small sample size, noise, and non-linearities in the data.
Item Open Access Non-parametric approximate linear programming for MDPs(2012) Pazis, JasonOne of the most difficult tasks in value function based methods for learning in Markov Decision Processes is finding an approximation architecture that is expressive enough to capture the important structure in the value function, while at the same time not overfitting the training samples. This thesis presents a novel Non-Parametric approach to Approximate Linear Programming (NP-ALP), which requires nothing more than a smoothness assumption on the value function. NP-ALP can make use of real-world, noisy sampled transitions rather than requiring samples from the full Bellman equation, while providing the first known max-norm, finite sample performance guarantees for ALP under mild assumptions. Additionally NP-ALP is amenable to problems with large (multidimensional) or even infinite (continuous) action spaces, and does not require a model to select actions using the resulting approximate solution.