Browsing by Subject "statistical modeling"
Results Per Page
Sort Options
Item Open Access An Analysis of NBA Spatio-Temporal Data(2017) Robertson, MeganThis project examines the utility of spatio-temporal tracking data from professional basketball games by fitting models predicting whether a player will make a shot. The first part of the project involved the exploration of the data, evaluated its issues, and generated features to use as co-variates in the models. The second part fit various classification models and evaluated their predictive performance. The paper concludes with a discussion of methods to improve the models and future work.
Item Open Access An Ensemble Approach to Knowledge-Based Intensity-Modulated Radiation Therapy Planning.(Frontiers in oncology, 2018-01) Zhang, Jiahan; Wu, Q Jackie; Xie, Tianyi; Sheng, Yang; Yin, Fang-Fang; Ge, YaorongKnowledge-based planning (KBP) utilizes experienced planners' knowledge embedded in prior plans to estimate optimal achievable dose volume histogram (DVH) of new cases. In the regression-based KBP framework, previously planned patients' anatomical features and DVHs are extracted, and prior knowledge is summarized as the regression coefficients that transform features to organ-at-risk DVH predictions. In our study, we find that in different settings, different regression methods work better. To improve the robustness of KBP models, we propose an ensemble method that combines the strengths of various linear regression models, including stepwise, lasso, elastic net, and ridge regression. In the ensemble approach, we first obtain individual model prediction metadata using in-training-set leave-one-out cross validation. A constrained optimization is subsequently performed to decide individual model weights. The metadata is also used to filter out impactful training set outliers. We evaluate our method on a fresh set of retrospectively retrieved anonymized prostate intensity-modulated radiation therapy (IMRT) cases and head and neck IMRT cases. The proposed approach is more robust against small training set size, wrongly labeled cases, and dosimetric inferior plans, compared with other individual models. In summary, we believe the improved robustness makes the proposed method more suitable for clinical settings than individual models.Item Open Access Protein Crystallization: Soft Matter and Chemical Physics Perspectives(2014) Fusco, DianaX-ray and neutron crystallography are the predominant methods for obtaining atomic-scale information on bimolecular macromolecules. Despite the success of these techniques, generating well diffracting crystals critically limits going from protein to structure. In practice, the crystallization process proceeds through knowledge-informed empiricism. Better physico-chemical understanding remains elusive because of the large number of variables involved, hence little guidance is available to systematically identify solution conditions that promote crystallization.
The fields of structural biology and soft matter have independently sought out fundamental principles to rationalize protein crystallization. Yet the conceptual differences and limited overlap between the two disciplines may have prevented a comprehensive understanding of the phenomenon to emerge. Part of this dissertation focuses on computational studies of rubredoxin and human uniquitin that bridge the two fields.
Using atomistic simulations, the protein crystal contacts are characterized, and patchy particle models are accordingly parameterized. Comparing the phase diagrams of these schematic models with experimental results enables the critical review of the assumptions behind the two approaches, and reveals insights about protein-protein interactions that can be leveraged to crystallize proteins more generally. In addition, exploration of the model parameter space provides a rationale for several experimental observations, such as the success and occasional failure of George and Wilson's proposal for protein crystallization conditions and the competition between different crystal forms.
These simple physical models enlighten the connection between protein phase behavior and protein-protein interactions, which are, however, remarkably sensitive to the protein chemical environment. To help determine relationships between the physico-chemical protein properties and crystallization propensity, statistical models are trained on samples for 182 proteins supplied by the Northeast Structural Genomics consortium. Gaussian processes, which capture trends beyond the reach of linear statistical models, distinguish between two main physico-chemical mechanisms driving crystallization. One is characterized by low levels of side chain entropy and has been extensively reported in the literature. The other identifies specific electrostatic interactions not previously described in the crystallization context. Because evidence for two distinct mechanisms can be gleaned both from crystal contacts and from solution conditions leading to successful crystallization, the model offers future avenues for optimizing crystallization screens based on partial structural information. The availability of crystallization data coupled with structural outcomes analyzed through state-of-the-art statistical models may thus guide macromolecular crystallization toward a more rational basis.
To conclude, the behavior of water in protein crystals is specifically examined. Water is not only essential for the correct functioning and folding of proteins, but it is also a key player in protein crystal assembly. Although water occupies up to 80% of the volume fraction of a protein crystal, its structure has so far received little attention and it is often overly simplified in the structural refinement process. Merging information derived from molecular dynamics simulations and original structural information provides a way to better understand the behavior of water in crystals and to develop a method that enriches standard structural refinement.