A multi-level methodology for the automated translation of a coreference resolution dataset: an application to the Italian language

Minutolo, Aniello; Fujita, Hamido

doi:10.1007/s00521-022-07641-3

dc.contributor.author	Minutolo, Aniello
dc.contributor.author	Fujita, Hamido
dc.date.accessioned	2022-10-17T12:41:11Z
dc.date.available	2022-10-17T12:41:11Z
dc.date.issued	2022-09-19
dc.identifier.citation	Minutolo, A... [et al.]. A multi-level methodology for the automated translation of a coreference resolution dataset: an application to the Italian language. Neural Comput & Applic (2022). [https://doi.org/10.1007/s00521-022-07641-3]	es_ES
dc.identifier.uri	https://hdl.handle.net/10481/77364
dc.description.abstract	In the last decade, the demand for readily accessible corpora has touched all areas of natural language processing, including coreference resolution. However, it is one of the least considered sub-fields in recent developments. Moreover, almost all existing resources are only available for the English language. To overcome this lack, this work proposes a methodology to create a corpus for coreference resolution in Italian exploiting knowledge of annotated resources in other languages. Starting from OntonNotes, the methodology translates and refines English utterances to obtain utterances respecting Italian grammar, dealing with language-specific phenomena and preserving coreference and mentions. A quantitative and qualitative evaluation is performed to assess the well-formedness of generated utterances, considering readability, grammaticality, and acceptability indexes. The results have confirmed the effectiveness of the methodology in generating a good dataset for coreference resolution starting from an existing one. The goodness of the dataset is also assessed by training a coreference resolution model based on BERT language model, achieving the promising results. Even if the methodology has been tailored for English and Italian languages, it has a general basis easily extendable to other languages, adapting a small number of language-dependent rules to generalize most of the linguistic phenomena of the language under examination.	es_ES
dc.language.iso	eng	es_ES
dc.publisher	Springer	es_ES
dc.rights	Atribución 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.subject	Coreference resolution	es_ES
dc.subject	Corpus creation	es_ES
dc.subject	Automated translation	es_ES
dc.subject	Cross-language	es_ES
dc.subject	Natural language processing	es_ES
dc.subject	Linguistic phenomena	es_ES
dc.title	A multi-level methodology for the automated translation of a coreference resolution dataset: an application to the Italian language	es_ES
dc.type	journal article	es_ES
dc.rights.accessRights	open access	es_ES
dc.identifier.doi	10.1007/s00521-022-07641-3
dc.type.hasVersion	VoR	es_ES

Files in this item

Name:: s00521-022-07641-3.pdf
Size:: 2.465Mb
Format:: PDF

This item appears in the following Collection(s)

DCCIA - Artículos

Show simple item record

Except where otherwise noted, this item's license is described as Atribución 4.0 Internacional