Browsing by Author "Yang, Yun"
- Results Per Page
- Sort Options
Item Open Access A widespread length-dependent splicing dysregulation in cancer.(Science advances, 2022-08) Zhang, Sirui; Mao, Miaowei; Lv, Yuesheng; Yang, Yingqun; He, Weijing; Song, Yongmei; Wang, Yongbo; Yang, Yun; Al Abo, Muthana; Freedman, Jennifer A; Patierno, Steven R; Wang, Yang; Wang, ZefengDysregulation of alternative splicing is a key molecular hallmark of cancer. However, the common features and underlying mechanisms remain unclear. Here, we report an intriguing length-dependent splicing regulation in cancers. By systematically analyzing the transcriptome of thousands of cancer patients, we found that short exons are more likely to be mis-spliced and preferentially excluded in cancers. Compared to other exons, cancer-associated short exons (CASEs) are more conserved and likely to encode in-frame low-complexity peptides, with functional enrichment in GTPase regulators and cell adhesion. We developed a CASE-based panel as reliable cancer stratification markers and strong predictors for survival, which is clinically useful because the detection of short exon splicing is practical. Mechanistically, mis-splicing of CASEs is regulated by elevated transcription and alteration of certain RNA binding proteins in cancers. Our findings uncover a common feature of cancer-specific splicing dysregulation with important clinical implications in cancer diagnosis and therapies.Item Open Access Nonparametric Bayes for Big Data(2014) Yang, YunClassical asymptotic theory deals with models in which the sample size $n$ goes to infinity with the number of parameters $p$ being fixed. However, rapid advancement of technology has empowered today's scientists to collect a huge number of explanatory variables
to predict a response. Many modern applications in science and engineering belong to the ``big data" regime in which both $p$ and $n$ may be very large. A variety of genomic applications even have $p$ substantially greater than $n$. With the advent of MCMC, Bayesian approaches exploded in popularity. Bayesian inference often allows easier interpretability than frequentist inference. Therefore, it becomes important to understand and evaluate
Bayesian procedures for ``big data" from a frequentist perspective.
In this dissertation, we address a number of questions related to solving large-scale statistical problems via Bayesian nonparametric methods.
It is well-known that classical estimators can be inconsistent in the high-dimensional regime without any constraints on the model. Therefore, imposing additional low-dimensional structures on the high-dimensional ambient space becomes inevitable. In the first two chapters of the thesis, we study the prediction performance of high-dimensional nonparametric regression from a minimax point of view. We consider two different low-dimensional constraints: 1. the response depends only on a small subset of the covariates; 2. the covariates lie on a low dimensional manifold in the original high dimensional ambient space. We also provide Bayesian nonparametric methods based on Gaussian process priors that are shown to be adaptive to unknown smoothness or low-dimensional manifold structure by attaining minimax convergence rates up to log factors. In chapter 3, we consider high-dimensional classification problems where all data are of categorical nature. We build a parsimonious model based on Bayesian tensor factorization for classification while doing inferences on the important predictors.
It is generally believed that ensemble approaches, which combine multiple algorithms or models, can outperform any single algorithm at machine learning tasks, such as prediction. In chapter 5, we propose Bayesian convex and linear aggregation approaches motivated by regression applications. We show that the proposed approach is minimax optimal when the true data-generating model is a convex or linear combination of models in the list. Moreover, the method can adapt to sparsity structure in which certain models should receive zero weights, and the method is tuning parameter free unlike competitors. More generally, under an M-open view when the truth falls outside the space of all convex/linear combinations, our theory suggests that the posterior measure tends to concentrate on the best approximation of the truth at the minimax rate.
Chapter 6 is devoted to sequential Markov chain Monte Carlo algorithms for Bayesian on-line learning of big data. The last chapter attempts to justify the use of posterior distribution to conduct statistical inferences for semiparametric estimation problems (the semiparametric Bernstein von-Mises theorem) from a frequentist perspective.
Item Open Access Quantitative comparison of automatic and manual IMRT optimization for prostate cancer: the benefits of DVH prediction.(Journal of applied clinical medical physics, 2015-03-08) Yang, Yun; Li, Taoran; Yuan, Lunlin; Ge, Yaorong; Yin, Fang-Fang; Lee, W Robert; Wu, Q JackieA recent publication indicated that the patient anatomical feature (PAF) model was capable of predicting optimal objectives based on past experience. In this study, the benefits of IMRT optimization using PAF-predicted objectives as guidance for prostate were evaluated. Three different optimization methods were compared.1) Expert Plan: Ten prostate cases (16 plans) were planned by an expert planner using conventional trial-and-error approach started with institutional modified OAR and PTV constraints. Optimization was stopped at 150 iterations and that plan was saved as Expert Plan. 2) Clinical Plan: The planner would keep working on the Expert Plan till he was satisfied with the dosimetric quality and the final plan was referred to as Clinical Plan. 3) PAF Plan: A third sets of plans for the same ten patients were generated fully automatically using predicted DVHs as guidance. The optimization was based on PAF-based predicted objectives, and was continued to 150 iterations without human interaction. DMAX and D98% for PTV, DMAX for femoral heads, DMAX, D10cc, D25%/D17%, and D40% for bladder/rectum were compared. Clinical Plans are further optimized with more iterations and adjustments, but in general provided limited dosimetric benefits over Expert Plans. PTV D98% agreed within 2.31% among Expert, Clinical, and PAF plans. Between Clinical and PAF Plans, differences for DMAX of PTV, bladder, and rectum were within 2.65%, 2.46%, and 2.20%, respectively. Bladder D10cc was higher for PAF but < 1.54% in general. Bladder D25% and D40% were lower for PAF, by up to 7.71% and 6.81%, respectively. Rectum D10cc, D17%, and D40% were 2.11%, 2.72%, and 0.27% lower for PAF, respectively. DMAX for femoral heads were comparable (< 35 Gy on average). Compared to Clinical Plan (Primary + Boost), the average optimization time for PAF plan was reduced by 5.2 min on average, with a maximum reduction of 7.1min. Total numbers of MUs per plan for PAF Plans were lower than Clinical Plans, indicating better delivery efficiency. The PAF-guided planning process is capable of generating clinical-quality prostate IMRT plans with no human intervention. Compared to manual optimization, this automatic optimization increases planning and delivery efficiency, while maintainingplan quality.