Correlation-based linear discriminant classification for gene expression data.
Microarray gene expression technology provides a systematic approach to patient classification. However, microarray data pose a great computational challenge owing to their large dimensionality, small sample sizes, and potential correlations among genes. A recent study has shown that gene-gene correlations have a positive effect on the accuracy of classification models, in contrast to some previous results. In this study, a recently developed correlation-based classifier, the ensemble of random subspace (RS) Fisher linear discriminants (FLDs), was utilized. The impact of gene-gene correlations on the performance of this classifier and other classifiers was studied using simulated datasets and real datasets. A cross-validation framework was used to evaluate the performance of each classifier using the simulated datasets or real datasets, and misclassification rates (MRs) were computed. Using the simulated data, the average MRs of the correlation-based classifiers decreased as the correlations increased when there were more correlated genes. Using real data, the correlation-based classifiers outperformed the non-correlation-based classifiers, especially when the gene-gene correlations were high. The ensemble RS-FLD classifier is a potential state-of-the-art computational method. The correlation-based ensemble RS-FLD classifier was effective and benefited from gene-gene correlations, particularly when the correlations were high.