A multi-level methodology for the automated translation of a coreference resolution dataset: an application to the Italian language
Metadatos
Mostrar el registro completo del ítemEditorial
Springer
Materia
Coreference resolution Corpus creation Automated translation Cross-language Natural language processing Linguistic phenomena
Fecha
2022-09-19Referencia bibliográfica
Minutolo, A... [et al.]. A multi-level methodology for the automated translation of a coreference resolution dataset: an application to the Italian language. Neural Comput & Applic (2022). [https://doi.org/10.1007/s00521-022-07641-3]
Resumen
In the last decade, the demand for readily accessible corpora has touched all areas of natural language processing, including
coreference resolution. However, it is one of the least considered sub-fields in recent developments. Moreover, almost all
existing resources are only available for the English language. To overcome this lack, this work proposes a methodology to
create a corpus for coreference resolution in Italian exploiting knowledge of annotated resources in other languages.
Starting from OntonNotes, the methodology translates and refines English utterances to obtain utterances respecting Italian
grammar, dealing with language-specific phenomena and preserving coreference and mentions. A quantitative and qualitative
evaluation is performed to assess the well-formedness of generated utterances, considering readability, grammaticality,
and acceptability indexes. The results have confirmed the effectiveness of the methodology in generating a good
dataset for coreference resolution starting from an existing one. The goodness of the dataset is also assessed by training a
coreference resolution model based on BERT language model, achieving the promising results. Even if the methodology
has been tailored for English and Italian languages, it has a general basis easily extendable to other languages, adapting a
small number of language-dependent rules to generalize most of the linguistic phenomena of the language under
examination.