Experimental Study on 164 Algorithms Available in Software Tools for Solving Standard Non-Linear Regression Problems
MetadataShow full item record
AuthorGacto Colorado, María José; Soto Hidalgo, José Manuel; Alcalá Fernández, Jesús; Alcalá Fernández, Rafael
Data MiningSupervised learningRegression algorithmsExperimental study
M. J. Gacto, J. M. Soto-Hidalgo, J. Alcalá-Fdez and R. Alcalá, "Experimental Study on 164 Algorithms Available in Software Tools for Solving Standard Non-Linear Regression Problems," in IEEE Access, vol. 7, pp. 108916-108939, 2019. [doi: 10.1109/ACCESS.2019.2933261]
SponsorshipThis work was supported in part by the University of Córdoba under the project PPG2019-UCOSOCIAL-03, and in part by the Spanish Ministry of Science, Innovation and Universities under Grant TIN2015- 68454-R and Grant TIN2017-89517-P.
In the specialized literature, researchers can find a large number of proposals for solving regression problems that come from different research areas. However, researchers tend to use only proposals from the area in which they are experts. This paper analyses the performance of a large number of the available regression algorithms from some of the most known and widely used software tools in order to help non-expert users from other areas to properly solve their own regression problems and to help specialized researchers developing well-founded future proposals by properly comparing and identifying algorithms that will enable them to focus on significant further developments. To sum up, we have analyzed 164 algorithms that come from 14 main different families available in 6 software tools (Neural Networks, Support Vector Machines, Regression Trees, Rule-Based Methods, Stacking, Random Forests, Model trees, Generalized Linear Models, Nearest Neighbor methods, Partial Least Squares and Principal Component Regression, Multivariate Adaptive Regression Splines, Bagging, Boosting, and other methods) over 52 datasets. A new measure has also been proposed to show the goodness of each algorithm with respect to the others. Finally, a statistical analysis by non-parametric tests has been carried out over all the algorithms and on the best 30 algorithms, both with and without bagging. Results show that the algorithms from Random Forest, Model Tree and Support Vector Machine families get the best positions in the rankings obtained by the statistical tests when bagging is not considered. In addition, the use of bagging techniques significantly improves the performance of the algorithms without excessive increase in computational times.