# Browsing by Subject "density estimation"

###### Results Per Page

###### Sort Options

Item Open Access A Data-Retaining Framework for Tail Estimation(2020) Cunningham, ErikaModeling of extreme data often involves thresholding, or retaining only the most extreme observations, in order that the tail may "speak" and not be overwhelmed by the bulk of the data. We describe a transformation-based framework that allows univariate density estimation to smoothly transition from a flexible, semi-parametric estimation of the bulk into a parametric estimation of the tail without thresholding. In the limit, this framework has desirable theoretical tail-matching properties to the selected parametric distribution. We develop three Bayesian models under the framework: one using a logistic Gaussian process (LGP) approach; one using a Dirichlet process mixture model (DPMM); and one using a predictive recursion approximation of the DPMM. Models produce estimates and intervals for density, distribution, and quantile functions across the full data range and for the tail index (inverse-power-decay parameter), under an assumption of heavy tails. For each approach, we carry out a simulation study to explore the model's practical usage in non-asymptotic settings, comparing its performance to methods that involve thresholding.

Among the three models proposed, the LGP has lowest bias through the bulk and highest quantile interval coverage generally. Compared to thresholding methods, its tail predictions have lower root mean squared error (RMSE) in all scenarios but the most complicated, e.g. a sharp bulk-to-tail transition. The LGP's consistent underestimation of the tail index does not hinder tail estimation in pre-extrapolation to moderate-extrapolation regions but does affect extreme extrapolations.

An interplay between the parametric transform and the natural sparsity of the DPMM sometimes causes the DPMM to favor estimation of the bulk over estimation of the tail. This can be overcome by increasing prior precision on less sparse (flatter) base-measure density shapes. A finite mixture model (FMM), substituted for the DPMM in simulation, proves effective at reducing tail RMSE over thresholding methods in some, but not all, scenarios and quantile levels.

The predictive recursion marginal posterior (PRMP) model is fast and does the best job among proposed models of estimating the tail-index parameter. This allows it to reduce RMSE in extrapolation over thresholding methods in most scenarios considered. However, bias from the predictive recursion contaminates the tail, casting doubt on the PRMP's predictions in tail regions where data should still inform estimation. We recommend the PRMP model as a quick tool for visualizing the marginal posterior over transformation parameters, which can aid in diagnosing multimodality and informing the precision needed to overcome sparsity in the mixture model approach.

In summary, there is not enough information in the likelihood alone to prevent the bulk from overwhelming the tail. However, a model that harnesses the likelihood with a carefully specified prior can allow both the bulk and tail to speak without an explicit separation of the two. Moreover, retaining all of the data under this framework reduces quantile variability, improving prediction in the tails compared to methods that threshold.

Item Open Access Evaluation of Quantitative Potential of Breast Tomosynthesis Using a Voxelized Anthropomorphic Breast Phantom(2010) Mehtaji, Deep SunilPurpose: To assess the quantitative potential of breast tomosynthesis by estimating the percent density of voxelized anthropomorphic breast phantoms.

Methods and Materials:A Siemens breast tomosynthesis system was modeled using Monte Carlo methods and a voxelized anthropomorphic breast phantom. The images generated by the simulation were reconstructed using Siemens filtered back-projection software. The non-uniform background due to scatter, heel effect, and limited angular sampling was estimated by simulating and subtracting images of a uniform 100% fatty breast phantom. To estimate the density of each slice, the total number of fatty and glandular voxels was calculated both before and after applying a thresholding algorithm to classify voxels as fat vs. glandular. Finally, the estimated density of the reconstructed slice was compared to the known percent density of the corresponding slice from the voxelized phantom. This percent density estimation comparison was done for a 35%- and a 60%-dense 5cm breast phantom.

Results: Without thresholding, overall density estimation errors for the central eleven slices were 4.97% and 2.55% for the 35% and 60% dense phantoms, respectively. After thresholding to classify voxels as fat vs. glandular, errors for central eleven were 7.99% and 6.26%, respectively. Voxel to voxel matching of the phantom vs. reconstructed slice demonstrated 75.69% and 75.25% respectively of voxels were correctly classified.

Conclusion: The errors in slice density estimation were <8% for both the phantoms thus implying that quantification of breast density using tomosynthesis is possible. However, limitations of the acquisition and reconstruction process continue to pose challenges in density estimation leading to potential voxel to voxel errors that warrant further investigation.

Item Open Access Topics in Bayesian Computer Model Emulation and Calibration, with Applications to High-Energy Particle Collisions(2019) Coleman, Jacob RyanProblems involving computer model emulation arise when scientists simulate expensive experiments with computationally expensive computer models. To more quickly probe the experimental design space, statisticians build emulators that act as fast surrogates to the computationally expensive computer models. The emulators are typically Gaussian processes, in order to induce spatial correlation in the input space. Often the main scientific interest lies in inference on one or more input parameters of the computer model which do not vary in nature. Inference on these input parameters is referred to as ``calibration,'' and these inputs are referred to as ``calibration parameters.'' We first detail our emulation and calibration model for an application in high-energy particle physics; this model brings together some existing ideas in the literature on handling multivariate output, and lays out a foundation for the remainder of the thesis.

In the next two chapters, we introduce novel ideas in the field of computer model emulation and calibration. The first addresses the problem of model comparison in this context, and how to simultaneously compare competing computer models while performing calibration. Using a mixture model to facilitate the comparison, we demonstrate that by conditioning on the mixture parameter we can recover the calibration parameter posterior from an independent calibration model. This mixture is then extended in the case of correlated data, a crucial innovation for this comparison framework to be useful in the particle collision setting. Lastly, we explore two possible non-exchangeable mixture models, where model preference changes over the input space.

The second novel idea addresses density estimation when only coarse bin counts are available. We develop an estimation method which avoids costly numerical integration and maintains plausible correlation for nearby bins. Additionally, we extend the method to density regression so that full a full density can be predicted from an input parameter, having only been trained on coarse histograms. This enables inference on the input parameter, and we develop an importance sampling method that compares favorably to the foundational calibration method detailed earlier.