Abstract:
As more diagnostic testing options become available to physicians, it becomes
more difficult to combine various types of medical information together in order to
optimize the overall diagnosis. To improve diagnostic performance, here we introduce an
approach to optimize a decision-fusion technique to combine heterogeneous information,
such as from different modalities, feature categories, or institutions. This dissertation
presents a computer aid known as optimized decision fusion, and explores both its
underlying theory and practical application.
The purpose of this work was 1) to present optimized decision fusion, a
classification algorithm designed for noisy, heterogeneous data sets with few samples,
and 2) to evaluate decision fusion’s classification ability on clinical, heterogeneous
breast cancer data sets. This study used the following three clinical data sets:
heterogeneous breast mass lesions, heterogeneous breast microcalcification lesions, and
breast blood serum protein levels. In addition to these clinical data sets, we also used
various simulated data sets.
We used two variants of our decision fusion algorithm: 1) DF-A, which
optimized the area (AUC) under the receiver operating characteristic (ROC) curve, an
DF-P, which optimized the high-sensitivity partial area (pAUC) under the curve. We
compared decision fusion’s classification performance to those of the following other
classifiers: linear discriminant analysis, an artificial neural network, classical regression
models (linear, logistic, and probit), Bayesian model averaging of these regression
models, least angle regression, and a support vector machine.
The simulation studies showed that decision fusion is able to maintain high
classification performance on data sets with many weak features and few samples,
although performance was lowered by feature correlations.
For the calcification data set, DF-A outperformed the other classifiers in terms of
AUC (p < 0.02) and achieved AUC = 0.85 ± 0.01. The DF-P surpassed the other
classifiers in terms of pAUC (p < 0.01) and reached pAUC = 0.38 ± 0.02. For the mass
data set, DF-A outperformed both the ANN and the LDA (p < 0.04) and achieved
AUC = 0.94 ± 0.01. Although for this data set there were no statistically significant
differences among the classifiers’ pAUC values (pAUC = 0.57 ± 0.07 to 0.67 ± 0.05, p >
0.10), the DF-P did significantly improve specificity versus the LDA at both 98% and
100% sensitivity (p < 0.04).
For the data set of blood serum proteins, there were no statistically significant
differences among the classifiers for distinguishing normal tissue from malignant lesions
(AUC = 0.79 to 0.84,
!
p > 0.12 0.12), but decision fusion was able to achieve significantly
higher specificity, 60%, at 90% sensitivity (
!
p < 0.02 0.02). For the task of distinguishing
benign from malignant lesions, all classifiers had very poor performance (AUC = 0.50 to
0.57), but decision fusion achieved the best performance at AUC = 0.64 (
!
p < 0.05 0.05). The
proteins were probably indicative of secondary effects, such as inflammatory response,
rather than specific for cancer.
In conclusion, decision fusion directly optimized clinically significant
performance measures such as AUC and pAUC, and sometimes outperformed other
machine-learning techniques when applied to three different breast cancer data sets. By
testing on a wide variety of simulated and clinical data sets, we show that decision
fusion is robust to noisy data and can handle heterogeneous data structures when given
relatively few observations.