TY - JOUR
T1 - Gene selection for microarray data classification based on Gray Wolf Optimizer enhanced with TRIZ-inspired operators
AU - Alomari, Osama Ahmad
AU - Makhadmeh, Sharif Naser
AU - Al-Betar, Mohammed Azmi
AU - Alyasseri, Zaid Abdi Alkareem
AU - Doush, Iyad Abu
AU - Abasi, Ammar Kamal
AU - Awadallah, Mohammed A.
AU - Zitar, Raed Abu
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/7/8
Y1 - 2021/7/8
N2 - DNA microarray technology is the fabrication of a single chip to contain a thousand genetic codes. Each microarray experiment can analyze many thousands of genes in parallel. The outcomes of the DNA microarray is a table/matrix, called gene expression data. Pattern recognition algorithms are widely applied to gene expression data to differentiate between health and cancerous patient samples. However, gene expression data is characterized as a high dimensional data that typically encompassed of redundant, noisy, and irrelevant genes. Datasets with such characteristics pose a challenge to machine learning algorithms. This is because they impede the training and testing process and entail high resource computations that deteriorate the classification performance. In order to avoid these pitfalls, gene selection is needed. This paper proposes a new hybrid filter-wrapper approach using robust Minimum Redundancy Maximum Relevancy (rMRMR) as a filter approach to choose the top-ranked genes. Modified Gray Wolf Optimizer (MGWO) is used as a wrapper approach to seek further small sets of genes. In MGWO, new optimization operators inspired by the TRIZ-inventive solution are coupled with the original GWO to increase the diversity of the population. To evaluate the performance of the proposed method, nine well-known microarray datasets are tested. The support vector machine (SVM) is employed for the classification task to estimate the goodness of the selected subset of genes. The effectiveness of TRIZ optimization operators in MGWO is evaluated by investigating the convergence behavior of GWO with and without TRIZ optimization operators. Moreover, the results of MGWO are compared with seven state-of-art gene selection methods using the same datasets based on classification accuracy and the number of selected genes. The results show that the proposed method achieves the best results in four out of nine datasets and it obtains remarkable results on the remaining datasets. The experimental results demonstrated the effectiveness of the proposed method in searching the gene search space and it was able to find the best gene combinations.
AB - DNA microarray technology is the fabrication of a single chip to contain a thousand genetic codes. Each microarray experiment can analyze many thousands of genes in parallel. The outcomes of the DNA microarray is a table/matrix, called gene expression data. Pattern recognition algorithms are widely applied to gene expression data to differentiate between health and cancerous patient samples. However, gene expression data is characterized as a high dimensional data that typically encompassed of redundant, noisy, and irrelevant genes. Datasets with such characteristics pose a challenge to machine learning algorithms. This is because they impede the training and testing process and entail high resource computations that deteriorate the classification performance. In order to avoid these pitfalls, gene selection is needed. This paper proposes a new hybrid filter-wrapper approach using robust Minimum Redundancy Maximum Relevancy (rMRMR) as a filter approach to choose the top-ranked genes. Modified Gray Wolf Optimizer (MGWO) is used as a wrapper approach to seek further small sets of genes. In MGWO, new optimization operators inspired by the TRIZ-inventive solution are coupled with the original GWO to increase the diversity of the population. To evaluate the performance of the proposed method, nine well-known microarray datasets are tested. The support vector machine (SVM) is employed for the classification task to estimate the goodness of the selected subset of genes. The effectiveness of TRIZ optimization operators in MGWO is evaluated by investigating the convergence behavior of GWO with and without TRIZ optimization operators. Moreover, the results of MGWO are compared with seven state-of-art gene selection methods using the same datasets based on classification accuracy and the number of selected genes. The results show that the proposed method achieves the best results in four out of nine datasets and it obtains remarkable results on the remaining datasets. The experimental results demonstrated the effectiveness of the proposed method in searching the gene search space and it was able to find the best gene combinations.
KW - Classification
KW - Gene selection
KW - Gray Wolf Optimizer
KW - Optimization
KW - SVM
KW - TRIZ
KW - rMRMR
UR - http://www.scopus.com/inward/record.url?scp=85105697444&partnerID=8YFLogxK
U2 - 10.1016/j.knosys.2021.107034
DO - 10.1016/j.knosys.2021.107034
M3 - Article
AN - SCOPUS:85105697444
SN - 0950-7051
VL - 223
JO - Knowledge-Based Systems
JF - Knowledge-Based Systems
M1 - 107034
ER -