New Spark solutions for distributed frequent itemset and association rule mining algorithms

Fernández Basso, Carlos Jesús; Ruiz Jiménez, María Dolores; Martín Bautista, María José

doi:10.1007/s10586-023-04014-w

s10586-023-04014-w.pdf (2.143Mb)

Identificadores

URI: https://hdl.handle.net/10481/82042

DOI: 10.1007/s10586-023-04014-w

Exportar

Editorial

Springer

Materia

Big Data

Data Mining

Association Rule

Frequent Itemset

Distributed computing

Spark

Fecha

2023-04-30

Referencia bibliográfica

Fernandez-Basso, C. et al. New Spark solutions for distributed frequent itemset and association rule mining algorithms. Cluster Computing. [https://doi.org/10.1007/s10586-023-04014-w]

Patrocinador

Universidad de Granada/CBUA; Junta de Andalucia P18-RT-1765; Ministry of Science and Innovation, Spain (MICINN) Instituto de Salud Carlos III Spanish Government PID2021-123960OB-I00, TED2021-129402B-C21; ERDF A way of making Europe; European Union NextGenerationEU; Ministry of Universities through the EU

Resumen

The large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform which has been demonstrated to outperform existing distributive algorithmic implementations.

Colecciones

DCCIA - Artículos

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional