Afficher la notice abrégée

dc.contributor.authorKharitonova, Ksenia
dc.contributor.authorCallejas Carrión, Zoraida 
dc.contributor.authorPérez Fernández, David
dc.contributor.authorGutiérrez Fandiño, Asier
dc.contributor.authorGriol Barres, David 
dc.date.accessioned2023-11-09T09:47:36Z
dc.date.available2023-11-09T09:47:36Z
dc.date.issued2023-09-14
dc.identifier.citationK. Kharitonova, Z. Callejas and D. Pérez-Fernández et al. / Data in Brief 50 (2023) 109565[https://doi.org/10.1016/j.dib.2023.109565]es_ES
dc.identifier.urihttps://hdl.handle.net/10481/85546
dc.description.abstractThe ChatSubs dataset [5] contains dialogue data in Spanish and three of Spain’s co-official languages (Catalan, Basque, and Galician). It has been obtained from OpenSubtitles, from which we have gathered the movie subtitles in our languages of interest and processed them to generate clearly segmented dialogues and their turns. The data processing code is pub- licly accessible. The result is 206.706 JSON files with more than 20 million dialogues and 96 million turns, which rep- resents one of the biggest dialogue corpus available, as other similar datasets in better resourced languages do not reach 500k dialogues or present less defined conversations. Thus, the ChatSubs dataset is an ideal resource for research teams that are interested in training dialogue models in Spanish, Catalan, Basque, and Galicianes_ES
dc.description.sponsorshipCONVERSA ( TED2021-132470B-I00 ) funded by MCIN/AEI/10.13039/50110 0 011033es_ES
dc.description.sponsorshipEuropean Union NextGenerationEU/PRTRes_ES
dc.language.isoenges_ES
dc.publisherElsevieres_ES
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectDialogue es_ES
dc.subjectConversation es_ES
dc.subjectChatbotses_ES
dc.subjectConversational AIes_ES
dc.subjectSpeeches_ES
dc.subjectNatural language processinges_ES
dc.titleChatSubs: A dataset of dialogues in Spanish, Catalan, Basque and Galician extracted from movie subtitles for developing advanced conversational modelses_ES
dc.typejournal articlees_ES
dc.rights.accessRightsopen accesses_ES
dc.identifier.doi10.1016/j.dib.2023.109565
dc.type.hasVersionVoRes_ES


Fichier(s) constituant ce document

[PDF]

Ce document figure dans la(les) collection(s) suivante(s)

Afficher la notice abrégée

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepté là où spécifié autrement, la license de ce document est décrite en tant que Attribution-NonCommercial-NoDerivatives 4.0 Internacional