New Spark solutions for distributed frequent itemset and association rule mining algorithms
Metadata
Show full item recordEditorial
Springer
Materia
Big Data Data Mining Association Rule Frequent Itemset Distributed computing Spark
Date
2023-04-30Referencia bibliográfica
Fernandez-Basso, C. et al. New Spark solutions for distributed frequent itemset and association rule mining algorithms. Cluster Computing. [https://doi.org/10.1007/s10586-023-04014-w]
Sponsorship
Universidad de Granada/CBUA; Junta de Andalucia P18-RT-1765; Ministry of Science and Innovation, Spain (MICINN) Instituto de Salud Carlos III Spanish Government PID2021-123960OB-I00, TED2021-129402B-C21; ERDF A way of making Europe; European Union NextGenerationEU; Ministry of Universities through the EUAbstract
The large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with
massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information
in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous
phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major
problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for
frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive
computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the
existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform
which has been demonstrated to outperform existing distributive algorithmic implementations.