<rdf:RDF xmlns:rdf="http://www.openarchives.org/OAI/2.0/rdf/" xmlns:ow="http://www.ontoweb.org/ontology/1#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ds="http://dspace.org/ds/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/rdf/ http://www.openarchives.org/OAI/2.0/rdf.xsd">
   <ow:Publication rdf:about="oai:digibug.ugr.es:10481/73197">
      <dc:title>Fine-Tuning BERT Models for Intent Recognition Using a Frequency Cut-Off Strategy for Domain-Specific Vocabulary Extension</dc:title>
      <dc:creator>Fernández Martínez, Fernando</dc:creator>
      <dc:creator>Griol Barres, David</dc:creator>
      <dc:creator>Callejas Carrión, Zoraida</dc:creator>
      <dc:subject>Topic classification</dc:subject>
      <dc:subject>Intent detection</dc:subject>
      <dc:subject>Conversational systems</dc:subject>
      <dc:subject>Recurrent networks</dc:subject>
      <dc:subject>Attentive RNN</dc:subject>
      <dc:subject>Attentive LSTM</dc:subject>
      <dc:subject>Transformer models</dc:subject>
      <dc:subject>Transfer learning</dc:subject>
      <dc:description>The work leading to these results was supported by the Spanish Ministry of Science and Innovation through the R&amp; D&amp;i projects GOMINOLA (PID2020-118112RB-C21 and PID2020118112RB-C22, funded by MCIN/AEI/10.13039/501100011033), CAVIAR (TEC2017-84593-C2-1-R, funded by MCIN/ AEI/10.13039/501100011033/FEDER "Una manera de hacer Europa"), and AMICPoC (PDC2021-120846-C42, funded by MCIN/AEI/10.13039/501100011033 and by "the European Union "NextGenerationEU/PRTR"). This research also received funding from the European Union's Horizon2020 research and innovation program under grant agreement No 823907 (http://menhirproject.eu, accessed on 2 February 2022). Furthermore, R.K.'s research was supported by the Spanish Ministry of Education (FPI grant PRE2018-083225).</dc:description>
      <dc:description>Intent recognition is a key component of any task-oriented conversational system. The&#xd;
intent recognizer can be used first to classify the user’s utterance into one of several predefined classes&#xd;
(intents) that help to understand the user’s current goal. Then, the most adequate response can be&#xd;
provided accordingly. Intent recognizers also often appear as a form of joint models for performing&#xd;
the natural language understanding and dialog management tasks together as a single process, thus&#xd;
simplifying the set of problems that a conversational system must solve. This happens to be especially&#xd;
true for frequently asked question (FAQ) conversational systems. In this work, we first present an&#xd;
exploratory analysis in which different deep learning (DL) models for intent detection and classification&#xd;
were evaluated. In particular, we experimentally compare and analyze conventional recurrent&#xd;
neural networks (RNN) and state-of-the-art transformer models. Our experiments confirmed that&#xd;
best performance is achieved by using transformers. Specifically, best performance was achieved by&#xd;
fine-tuning the so-called BETO model (a Spanish pretrained bidirectional encoder representations&#xd;
from transformers (BERT) model from the Universidad de Chile) in our intent detection task. Then, as&#xd;
the main contribution of the paper, we analyze the effect of inserting unseen domain words to extend&#xd;
the vocabulary of the model as part of the fine-tuning or domain-adaptation process. Particularly,&#xd;
a very simple word frequency cut-off strategy is experimentally shown to be a suitable method for&#xd;
driving the vocabulary learning decisions over unseen words. The results of our analysis show that&#xd;
the proposed method helps to effectively extend the original vocabulary of the pretrained models.&#xd;
We validated our approach with a selection of the corpus acquired with the Hispabot-Covid19 system&#xd;
obtaining satisfactory results.</dc:description>
      <dc:date>2022-03-08T07:22:40Z</dc:date>
      <dc:date>2022-03-08T07:22:40Z</dc:date>
      <dc:date>2022-02-03</dc:date>
      <dc:type>journal article</dc:type>
      <dc:identifier>Fernández-Martínez, F... [et al.]. Fine-Tuning BERT Models for Intent Recognition Using a Frequency Cut-Off Strategy for Domain-Specific Vocabulary Extension. Appl. Sci. 2022, 12, 1610. [https://doi.org/10.3390/app12031610]</dc:identifier>
      <dc:identifier>http://hdl.handle.net/10481/73197</dc:identifier>
      <dc:identifier>10.3390/app12031610</dc:identifier>
      <dc:language>eng</dc:language>
      <dc:relation>info:eu-repo/grantAgreement/EC/H2020/823907</dc:relation>
      <dc:rights>http://creativecommons.org/licenses/by/3.0/es/</dc:rights>
      <dc:rights>open access</dc:rights>
      <dc:rights>Atribución 3.0 España</dc:rights>
      <dc:publisher>MDPI</dc:publisher>
   </ow:Publication>
</rdf:RDF>