New Spark solutions for distributed frequent itemset and association rule mining algorithms

Fernández Basso, Carlos Jesús; Ruiz Jiménez, María Dolores; Martín Bautista, María José

doi:10.1007/s10586-023-04014-w

dc.contributor.author	Fernández Basso, Carlos Jesús
dc.contributor.author	Ruiz Jiménez, María Dolores
dc.contributor.author	Martín Bautista, María José
dc.date.accessioned	2023-05-31T07:21:48Z
dc.date.available	2023-05-31T07:21:48Z
dc.date.issued	2023-04-30
dc.identifier.citation	Fernandez-Basso, C. et al. New Spark solutions for distributed frequent itemset and association rule mining algorithms. Cluster Computing. [https://doi.org/10.1007/s10586-023-04014-w]	es_ES
dc.identifier.uri	https://hdl.handle.net/10481/82042
dc.description	Funding for open access publishing: Universidad de Gran- ada/CBUA. The research reported in this paper was partially sup- ported by the BIGDATAMED project, which has received funding from the Andalusian Government (Junta de Andalucı ́a) under grant agreement No P18-RT-1765, by Grants PID2021-123960OB-I00 and Grant TED2021-129402B-C21 funded by Ministerio de Ciencia e Innovacio ́n and, by ERDF A way of making Europe and by the European Union NextGenerationEU. In addition, this work has been partially supported by the Ministry of Universities through the EU- funded Margarita Salas programme NextGenerationEU. Funding for open access charge: Universidad de Granada/CBUA	es_ES
dc.description.abstract	The large amount of data generated every day makes necessary the re-implementation of new methods capable of handle with massive data efficiently. This is the case of Association Rules, an unsupervised data mining tool capable of extracting information in the form of IF-THEN patterns. Although several methods have been proposed for the extraction of frequent itemsets (previous phase before mining association rules) in very large databases, the high computational cost and lack of memory remains a major problem to be solved when processing large data. Therefore, the aim of this paper is three fold: (1) to review existent algorithms for frequent itemset and association rule mining, (2)to develop new efficient frequent itemset Big Data algorithms using distributive computation, as well as a new association rule mining algorithm in Spark, and (3) to compare the proposed algorithms with the existent proposals varying the number of transactions and the number of items. To this purpose, we have used the Spark platform which has been demonstrated to outperform existing distributive algorithmic implementations.	es_ES
dc.description.sponsorship	Universidad de Granada/CBUA	es_ES
dc.description.sponsorship	Junta de Andalucia P18-RT-1765	es_ES
dc.description.sponsorship	Ministry of Science and Innovation, Spain (MICINN) Instituto de Salud Carlos III Spanish Government PID2021-123960OB-I00, TED2021-129402B-C21	es_ES
dc.description.sponsorship	ERDF A way of making Europe	es_ES
dc.description.sponsorship	European Union NextGenerationEU	es_ES
dc.description.sponsorship	Ministry of Universities through the EU	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Springer	es_ES
dc.rights	Atribución 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject	Big Data	es_ES
dc.subject	Data Mining	es_ES
dc.subject	Association Rule	es_ES
dc.subject	Frequent Itemset	es_ES
dc.subject	Distributed computing	es_ES
dc.subject	Spark	es_ES
dc.title	New Spark solutions for distributed frequent itemset and association rule mining algorithms	es_ES
dc.type	journal article	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	10.1007/s10586-023-04014-w
dc.type.hasVersion	VoR	es_ES

Ficheros en el ítem

Nombre:: s10586-023-04014-w.pdf
Tamaño:: 2.143Mb
Formato:: PDF

Este ítem aparece en la(s) siguiente(s) colección(ones)

DCCIA - Artículos

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 4.0 Internacional