Fuzzy Discretization and Rough Set based Feature Selection for High-Dimensional Classification
Authors
Prema Ramasamy and Premalatha Kandhasamy
Abstract
1 Prema Ramasamy, Assistant Professor, New Horizon College of Engineering, Bangalore
E-mail:premabit@gmail.com
2 Professor, Department of Computer Science and Engineering, Bannari Amman Institute of Techlology,
Sathyamangalam.
(Received May 11 2018, accepted July \u00a016 \u00a02018)
Contemporary \u00a0biological \u00a0technologies \u00a0like \u00a0gene \u00a0expression \u00a0microarrays \u00a0produce \u00a0extremely \u00a0high-
dimensional datasets with limited samples. Analysis of gene expression data is essential in microarray gene expression
studies in order to retrieve the required information. Gene expression data generally contain a large number of genes
but a small number of samples. The complicated relations among the different genes make analysis more difficult,
and removing irrelevant genes improves the quality of results. In this regard, a new feature selection algorithm called
2-level MRMS is presented based on rough set theory. It selects a set of genes from microarray data by maximizing
the relevance and significance of the selected genes. The paper also presents a novel discretization method, Gaussian
Fuzzy Discretization based on fuzzy logic to discretize the continuous gene expression values. The performance of
the proposed algorithm, along with a comparison with other related feature selection methods, is studied using the
classification accuracy of k-Nearest Neighbor (kNN) and Support Vector Machine (SVM) on four microarray data
sets. \u00a0The \u00a0experimental \u00a0results \u00a0show \u00a0that \u00a0the \u00a0genes \u00a0selected \u00a0using \u00a02-level \u00a0MRMS \u00a0feature \u00a0selection \u00a0give \u00a0high
classification accuracy than other methods. \u00a0