Browsing by Subject "Clustering"
Results Per Page
Sort Options
Item Open Access Bidding For Parking: The Impact of University Affiliation on Predicting Bid Values in Dutch Auctions of On-Campus Parking Permits(2016-06-14) Kelly, GrantParking is often underpriced and expanding its capacity is expensive; universities need a better way of reducing congestion outside of building costly parking garages. Demand based pricing mechanisms, such as auctions, offer a possible solution to the problem by promising to reduce parking at peak times. However, faculty, students, and staff at universities have systematically different parking needs, leading to different parking valuations. In this study, I determine the impact university affiliation has on predicting bid values cast in three Dutch Auctions of on-campus parking permits sold at Chapman University in Fall 2010. Using clustering techniques crosschecked with university demographic information to detect affiliation groups, I ran a log-linear regression, finding that university affiliation had a larger effect on bid amount than on lot location and fraction of auction duration. Generally, faculty were predicted to have higher bids whereas students were predicted to have lower bids.Item Open Access Clustering Multiple Related Datasets with a Hierarchical Dirichlet Process(2011) de Oliveira Sales, Ana PaulaI consider the problem of clustering multiple related groups of data. My approach entails mixture models in the context of hierarchical Dirichlet processes, focusing on their ability to perform inference on the unknown number of components in the mixture, as well as to facilitate the sharing of information and borrowing of strength across the various data groups. Here, I build upon the hierarchical Dirichlet process model proposed by Muller et al. (2004), revising some relevant aspects of the model, as well as improving the MCMC sampler's convergence by combining local Gibbs sampler moves with global Metropolis-Hastings split-merge moves. I demonstrate the strengths of my model by employing it to cluster both synthetic and real datasets.
Item Open Access Computational Methods for Investigating Dendritic Cell Biology(2011) de Oliveira Sales, Ana PaulaThe immune system is constantly faced with the daunting task of protecting the host from a large number of ever-evolving pathogens. In vertebrates, the immune response results from the interplay of two cellular systems: the innate immunity and the adaptive immunity. In the past decades, dendritic cells have emerged as major players in the modulation of the immune response, being one of the primary links between these two branches of the immune system.
Dendritic cells are pathogen-sensing cells that alert the rest of the immune system of the presence of infection. The signals sent by dendritic cells result in the recruitment of the appropriate cell types and molecules required for effectively clearing the infection. A question of utmost importance in our understanding of the immune response and our ability to manipulate it in the development of vaccines and therapies is: "How do dendritic cells translate the various cues they perceive from the environment into different signals that specifically activate the appropriate parts of the immune system that result in an immune response streamlined to clear the given pathogen?"
Here we have developed computational and statistical methods aimed to address specific aspects of this question. In particular, understanding how dendritic cells ultimately modulate the immune response requires an understanding of the subtleties of their maturation process in response to different environmental signals. Hence, the first part of this dissertation focuses on elucidating the changes in the transcriptional
program of dendritic cells in response to the detection of two common pathogen- associated molecules, LPS and CpG. We have developed a method based on Langevin and Dirichlet processes to model and cluster gene expression temporal data, and have used it to identify, on a large scale, genes that present unique and common transcriptional behaviors in response to these two stimuli. Additionally, we have also investigated a different, but related, aspect of dendritic cell modulation of the adaptive immune response. In the second part of this dissertation, we present a method to predict peptides that will bind to MHC molecules, a requirement for the activation of pathogen-specific T cells. Together, these studies contribute to the elucidation of important aspects of dendritic cell biology.
Item Open Access Dose-Guided Automatic IMRT Planning: A Feasibility Study(2014) Sheng, YangPurpose: To develop and evaluate an automatic IMRT planning technique for prostate cancer utilizing prior expert plan's dose distribution as guidance.
Methods and Materials: In this study, the anatomical information of prostate cancer cases was parameterized and quantified into two measures: the percent distance-to-prostate (PDP) and the concaveness angle. Based on these two quantities, a plan atlas composed of 5 expert prostate IMRT plans was built out of a 70-case pool at our institution using k-medoids clustering analysis.
Extra 20 cases were used as query cases to evaluate the dose-guided automatic planning (DAP) scheme. Each query case was matched to an atlas case based on PTV-OAR anatomical features followed by deformable registration to enhance fine local matching. Using the deformation field, the expert dose in the matched atlas case was warped onto the query case, creating the goal dose conformal to the query case's anatomy. Dose volume histograms (DVHs) objectives were sampled from the goal dose to guide automatic IMRT treatment planning. Dosimetric comparison between DAP plans and clinical plans were performed.
Results: Generating goal dose is highly efficient by using MIMTM workflows. The deformable registration provides high-quality goal dose tailored to query case's anatomy in terms of the dose falloff at the PTV-OAR boundary and the overall conformity. Automatic planning in EclipseTM takes ~2.5 min (~70 iterations) without human intervention. Compared to clinical plans, DAP plans improved the conformity index from 0.85±0.04 to 0.88±0.02 (p=0.0045), the bladder-gEUD from 40.7±3.2 Gy to 40.0±3.1 Gy (p=0.0003), and rectum-gEUD from 40.4±2.0 Gy to 39.9±2.1 Gy (p=0.0167). Other dosimetric parameter is similar (p>0.05): homogeneity indices are 7.4±0.9% and 7.1±1.5%, for DAP plans and clinical plans, respectively.
Conclusions: Dose-guided automatic treatment planning is feasible and efficient. Atlas-based patient-specific dose objectives can effectively guide the optimizer to achieve similar or better plan quality compared to clinical plans.
Item Open Access Equity Clusters Through the Lens of Realized SemicorrelationsBollerslev, Tim; Patton, Andrew J; Zhang, HaozheItem Open Access Exploration and Application of Dimensionality Reduction and Clustering Techniques to Diabetes Patient Health Records(2017-05-24) Gopinath, SidharthThis research examines various data dimensionality reduction techniques and clustering methods. The goal was to apply these ideas to a test dataset and a healthcare dataset to see how they practically work and what conclusions we could draw from their application. Specifically, we hoped to identify similar clusters of diabetes patients and develop hypotheses of risk for adverse events for further research into sub-populations of diabetes patients. Upon further research and application, it became apparent that the data dimensionality reduction and clustering methods are sensitive to the parameter settings and must be fine-tuned carefully to be successful. Additionally, we saw several statistically significant differences in outcomes for the clusters identified with these data. We focused on coronary artery disease and kidney disease. Focusing on these clusters, we found a high proportion of patients taking medications for heart or kidney conditions Based on these findings, we were able to decide on future paths building upon this research that could lead to more actionable conclusions.Item Open Access Heterogeneity in Mortgage Refinancing(2022-06-22) Wu, JuliaMany households who would benefit from and are eligible to refinance their mortgages fail to do so. A recent literature has demonstrated a significant degree of heterogeneity in the propensity to refinance across various dimensions, yet much heterogeneity is left unexplained. In this paper, I use a clustering regression to characterize heterogeneity in mortgage refinancing by estimating the distribution of propensities to refinance. A key novelty to my approach is that I do so without relying on borrower characteristics, allowing me to recover the full degree of heterogeneity, rather than simply the extent to which the propensity to refinance varies with a given observable. I then explore the role of both observed and unobserved heterogeneity in group placement by regressing group estimates on a set of demographic characteristics. As a complement to my analysis, I provide evidence from a novel dataset of detailed information on borrower perspectives on mortgage refinancing to paint a more nuanced picture of how household characteristics and behavioral mechanisms play into the decision to refinance. I find a significant degree of heterogeneity in both the average and marginal propensity to refinance across households. While observables such as education, race and income do significantly correlate with group heterogeneity, it is clear that much heterogeneity may still be attributed to the presence of unobservable characteristics.Item Open Access Malaria Risk Factors in the Peruvian Amazon: A Multilevel Analysis(2012) Lana, Justin ThomasA multilevel analysis of malaria risk factors was conducted using data gathered from community-wide surveillance along the Iquitos-Mazan Road and Napo River in Loreto, Peru. In total, 1650 individuals nested within 338 households nested within 18 communities were included in the study. Personal travel (Odds Ratios [OR] 2.48; 95% Confidence Interval [CI] = 1.46, 4.21) and other house member's malaria statuses (OR = 2.54; 95% CI = 1.49, 4.32) were all associated with increased odds in having a malaria episode. Having a large household (>5 individuals) (OR = 0.33; 95% CI = 0.12, 0.93), presence of a community health post / secondary school (OR =0.26; 95% CI = 0.08, 0.80) and church (OR = 0.33; 95% CI = 0.30, 0.78) were associated with lower odds of having a malaria episode. Malaria clustering was evident as 54% of the malaria burden occurred in only 6% of the households surveyed.
Item Open Access Peer Effects & Differential Attrition: Evidence from Tennessee’s Project STAR(2022-04-08) Satish, SanjayThis paper explores the effects of attrition on student development in early education. It aims to provide evidence that student departure in elementary schools has educational impacts on the students they leave behind. Utilizing data from Tennessee’s Project STAR experiment, this paper aims to expand upon the literature of peer effects, as well as attrition, in public elementary schools. It departs from previous papers by utilizing survival analysis to determine which characteristics of students prolonged participation in the experiment. Clustering analysis is subsequently employed to group departed students to better understand the various channels of attrition present in STAR. It finds that students who left Project STAR were more likely to be of lower income and lower ability than their peers. This paper then uses these findings to estimate the peer effects of attrition on students who remained in the experiment and undertakes a discussion of potential sources of bias in this estimation and their effects on the explanatory power of peer effects estimates.Item Open Access Risk Price Variation: The Missing Half of Empirical Asset Pricing(Economic Research Initiatives at Duke (ERID) Working Paper, 2019-05-24) Patton, AJ; Weller, BMItem Open Access Separating Features from Noise with Persistence and Statistics(2010) Wang, BeiIn this thesis, we explore techniques in statistics and persistent homology, which detect features among data sets such as graphs, triangulations and point cloud. We accompany our theorems with algorithms and experiments, to demonstrate their effectiveness in practice.
We start with the derivation of graph scan statistics, a measure useful to assess the statistical significance of a subgraph in terms of edge density. We cluster graphs into densely-connected subgraphs based on this measure. We give algorithms for finding such clusterings and experiment on real-world data.
We next study statistics on persistence, for piecewise-linear functions defined on the triangulations of topological spaces. We derive persistence pairing probabilities among vertices in the triangulation. We also provide upper bounds for total persistence in expectation.
We continue by examining the elevation function defined on the triangulation of a surface. Its local maxima obtained by persistence pairing are useful in describing features of the triangulations of protein surfaces. We describe an algorithm to compute these local maxima, with a run-time ten-thousand times faster in practice than previous method. We connect such improvement with the total Gaussian curvature of the surfaces.
Finally, we study a stratification learning problem: given a point cloud sampled from a stratified space, which points belong to the same strata, at a given scale level? We assess the local structure of a point in relation to its neighbors using kernel and cokernel persistent homology. We prove the effectiveness of such assessment through several inference theorems, under the assumption of dense sample. The topological inference theorem relates the sample density with the homological feature size. The probabilistic inference theorem provides sample estimates to assess the local structure with confidence. We describe an algorithm that computes the kernel and cokernel persistence diagrams and prove its correctness. We further experiment on simple synthetic data.
Item Open Access Topic Modeling for Inferring Brain States from Electroencephalography (EEG) Signals(2018) Prabhudesai, KedarInferring brain states from EEG signals allows for the management of sleep disorders and brain diseases by providing an insight into the electrophysiological state of the brain. We explore the use of topic modeling – which are popular text processing algorithms – to infer brain states from EEG signals. Latent Dirichlet allocation (LDA) is our preferred topic model because of its mixture-of-mixtures nature and its ability to be trained in an unsupervised manner. First, we present an architecture of a deep convolutional auto-encoder neural network to automatically learn feature representations from EEG signals. The network uses a combination of convolutional and max-pooling layers to achieve reduction in the dimensionality of raw data, and can be trained in an unsupervised manner. We demonstrate an improvement in clustering EEG signals into sleep stages with the LDA topic model using features derived from the auto-encoder, compared to standard manually extracted EEG features. Next, we address the issue of modeling continuous domain data using topic modeling. In the LDA topic model, topics are modeled as discrete distributions over a finite vocabulary of words. Modeling data spanning a continuous domain with the LDA requires discrete approximations of the continuous data, which can lead to loss of information and may not represent the true structure of the underlying data. We present the GMM-LDA topic model, where topics are represented using Gaussian mixture models (GMMs), which are multi-modal distributions spanning a continuous domain. We present results demonstrating superior clustering performance in clustering EEG data into sleep stages using the GMM-LDA topic model compared to the standard LDA and other clustering algorithms. Finally, we explore a set of features that can be potentially used with topic modeling to infer brain states corresponding to brain injury in mice. Spectral, entropy and moment related features are extracted from EEG signals recorded from mice with artificially induced brain injury. We present an analysis on the relative importance of these features using bagged decision trees, and demonstrate that a combination of these features can potentially be used to track the progression of brain injury and also to predict recovery from brain injury in mice.