New Spark solutions for distributed frequent itemset and association rule mining algorithms
Metadatos
Mostrar el registro completo del ítemEditorial
Springer
Materia
Big Data Data Mining Association Rule Frequent Itemset Distributed computing Spark
Fecha
2023-04-30Referencia bibliográfica
Fernandez-Basso, C. et al. New Spark solutions for distributed frequent itemset and association rule mining algorithms. Cluster Computing. [https://doi.org/10.1007/s10586-023-04014-w]
Patrocinador
Universidad de Granada/CBUA; Junta de Andalucia P18-RT-1765; Ministry of Science and Innovation, Spain (MICINN) Instituto de Salud Carlos III Spanish Government PID2021-123960OB-I00, TED2021-129402B-C21; ERDF A way of making Europe; European Union NextGenerationEU; Ministry of Universities through the EUResumen
The large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with
massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information
in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous
phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major
problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for
frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive
computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the
existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform
which has been demonstrated to outperform existing distributive algorithmic implementations.