TY - JOUR
T1 - New feature selection paradigm based on hyper-heuristic technique
AU - Ibrahim, Rehab Ali
AU - Abd Elaziz, Mohamed
AU - Ewees, Ahmed A.
AU - El-Abd, Mohammed
AU - Lu, Songfeng
N1 - Ibrahim, R. A., Abd Elaziz, M., Ewees, A. A., El-Abd, M., & Lu, S. (2021). New feature selection paradigm based on hyper-heuristic technique. Applied Mathematical Modelling, 98, 14-37. https://doi.org/https://doi.org/10.1016/j.apm.2021.04.018
PY - 2021/10
Y1 - 2021/10
N2 - Feature selection (FS) is a crucial step for effective data mining since it has largest effect on improving the performance of classifiers. This is achieved by removing the irrelevant features and using only the relevant features. Many metaheuristic approaches exist in the literature in attempt to address this problem. The performance of these approaches differ based on the settings of a number of factors including the use of chaotic maps, opposition-based learning (OBL) and the percentage of the population that OBL will be applied to, the metaheuristic (MH) algorithm adopted, the classifier utilized, and the threshold value used to convert real solutions to binary ones. However, it is not an easy task to identify the best settings for these different components in order to determine the relevant features for a specific dataset. Moreover, running extensive experiments to fine tune these settings for each and every dataset will consume considerable time. In order to mitigate this important issue, a hyper-heuristic based FS paradigm is proposed. In the proposed model, a two-stage approach is adopted to identify the best combination of these components. In the first stage, referred to as the training stage, the Differential Evolution (DE) algorithm is used as a controller for selecting the best combination of components to be used by the second stage. In the second stage, referred to as the testing stage, the received combination will be evaluated using a testing set. Empirical evaluation of the proposed framework is based on numerous experiments performed on the most popular 18 datasets from the UCI machine learning repository. Experimental results illustrates that the generated generic configuration provides a better performance than eight other metaheuristic algorithms over all performance measures when applied to the UCI dataset. Moreover, The overall paradigm ranks at number one when compared against state-of-the-art algorithms. Finally, the generic configuration provides a very competitive performance for high dimensional datasets.
AB - Feature selection (FS) is a crucial step for effective data mining since it has largest effect on improving the performance of classifiers. This is achieved by removing the irrelevant features and using only the relevant features. Many metaheuristic approaches exist in the literature in attempt to address this problem. The performance of these approaches differ based on the settings of a number of factors including the use of chaotic maps, opposition-based learning (OBL) and the percentage of the population that OBL will be applied to, the metaheuristic (MH) algorithm adopted, the classifier utilized, and the threshold value used to convert real solutions to binary ones. However, it is not an easy task to identify the best settings for these different components in order to determine the relevant features for a specific dataset. Moreover, running extensive experiments to fine tune these settings for each and every dataset will consume considerable time. In order to mitigate this important issue, a hyper-heuristic based FS paradigm is proposed. In the proposed model, a two-stage approach is adopted to identify the best combination of these components. In the first stage, referred to as the training stage, the Differential Evolution (DE) algorithm is used as a controller for selecting the best combination of components to be used by the second stage. In the second stage, referred to as the testing stage, the received combination will be evaluated using a testing set. Empirical evaluation of the proposed framework is based on numerous experiments performed on the most popular 18 datasets from the UCI machine learning repository. Experimental results illustrates that the generated generic configuration provides a better performance than eight other metaheuristic algorithms over all performance measures when applied to the UCI dataset. Moreover, The overall paradigm ranks at number one when compared against state-of-the-art algorithms. Finally, the generic configuration provides a very competitive performance for high dimensional datasets.
KW - Chaotic maps
KW - Differential evolution
KW - Feature selection
KW - Hyper-heuristic
KW - Meta-heuristic
KW - Opposition-based learning
UR - http://www.scopus.com/inward/record.url?scp=85106941149&partnerID=8YFLogxK
U2 - 10.1016/j.apm.2021.04.018
DO - 10.1016/j.apm.2021.04.018
M3 - Article
SN - 0307-904X
VL - 98
SP - 14
EP - 37
JO - Applied Mathematical Modelling
JF - Applied Mathematical Modelling
ER -