Browsing by Author "Sun, Yiyang"
Results Per Page
Sort Options
Item Open Access Analysis of clinical predictors of kidney diseases in type 2 diabetes patients based on machine learning.(International urology and nephrology, 2022-09) Hui, Dongna; Sun, Yiyang; Xu, Shixin; Liu, Junjie; He, Ping; Deng, Yuhui; Huang, Huaxiong; Zhou, Xiaoshuang; Li, RongshanBackground
The heterogeneity of Type 2 Diabetes Mellitus (T2DM) complicated with renal diseases has not been fully understood in clinical practice. The purpose of the study was to propose potential predictive factors to identify diabetic kidney disease (DKD), nondiabetic kidney disease (NDKD), and DKD superimposed on NDKD (DKD + NDKD) in T2DM patients noninvasively and accurately.Methods
Two hundred forty-one eligible patients confirmed by renal biopsy were enrolled in this retrospective, analytical study. The features composed of clinical and biochemical data prior to renal biopsy were extracted from patients' electronic medical records. Machine learning algorithms were used to distinguish among different kidney diseases pairwise. Feature variables selected in the developed model were evaluated.Results
Logistic regression model achieved an accuracy of 0.8306 ± 0.0057 for DKD and NDKD classification. Hematocrit, diabetic retinopathy (DR), hematuria, platelet distribution width and history of hypertension were identified as important risk factors. Then SVM model allowed us to differentiate NDKD from DKD + NDKD with accuracy 0.8686 ± 0.052 where hematuria, diabetes duration, international normalized ratio (INR), D-Dimer, high-density lipoprotein cholesterol were the top risk factors. Finally, the logistic regression model indicated that DD-dimer, hematuria, INR, systolic pressure, DR were likely to be predictive factors to identify DKD with DKD + NDKD.Conclusion
Predictive factors were successfully identified among different renal diseases in type 2 diabetes patients via machine learning methods. More attention should be paid on the coagulation factors in the DKD + NDKD patients, which might indicate a hypercoagulable state and an increased risk of thrombosis.Item Open Access Multi-objective learning and explanation for stroke risk assessment in Shanxi province.(Scientific reports, 2022-12) Ma, Jing; Sun, Yiyang; Liu, Junjie; Huang, Huaxiong; Zhou, Xiaoshuang; Xu, ShixinStroke is the leading cause of death in China (Zhou et al. in The Lancet, 2019). A dataset from Shanxi Province is analyzed to predict the risk of patients at four states (low/medium/high/attack) and to estimate transition probabilities between various states via a SHAP DeepExplainer. To handle the issues related to an imbalanced sample set, the quadratic interactive deep model (QIDeep) was first proposed by flexible selection and appending of quadratic interactive features. The experimental results showed that the QIDeep model with 3 interactive features achieved the state-of-the-art accuracy 83.33%(95% CI (83.14%; 83.52%)). Blood pressure, physical inactivity, smoking, weight, and total cholesterol are the top five most important features. For the sake of high recall in the attack state, stroke occurrence prediction is considered an auxiliary objective in multi-objective learning. The prediction accuracy was improved, while the recall of the attack state was increased by 17.79% (to 82.06%) compared to QIDeep (from 71.49%) with the same features. The prediction model and analysis tool in this paper provided not only a prediction method but also an attribution explanation of the risk states and transition direction of each patient, a valuable tool for doctors to analyze and diagnose the disease.Item Embargo Sparse and Faithful Explanations Without Sparse Models(2024) Sun, YiyangEven if a model is not globally sparse, it is possible for decisions made by that model to be accurately and faithfully described by a small number of features. For example, an application for a large loan might be denied to someone because they have no credit history, which overwhelms any evidence of their creditworthiness. In this paper, we introduce the Sparse Explanation Value (SEV), a new way to measure sparsity in machine learning models. In the loan denial example above, the SEV is 1 because only one factor is needed to explain why the loan was denied. SEV is a measure of \textit{decision sparsity} rather than overall model sparsity, and we can show that many machine learning models -- even if they are not sparse -- actually have low decision sparsity as measured by SEV. SEV is defined using moves over a hypercube with a predefined population commons (reference), allowing SEV to be defined consistently across model classes, with movement restrictions that reflect real-world constraints. Moreover, by allowing flexibility in this reference, and by considering how distances along the hypercube translate into distances in feature space, we can derive sparse and meaningful explanations for different types of function classes and propose three possible approaches: cluster-based SEV, SEV with flexible references and tree-based SEV. Ultimately, we propose algorithms aimed at reducing SEV without compromising model accuracy, thereby offering sparse yet fully faithful explanations, even in the absence of globally sparse models.