<rdf:RDF xmlns:rdf="http://www.openarchives.org/OAI/2.0/rdf/" xmlns:ow="http://www.ontoweb.org/ontology/1#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:ds="http://dspace.org/ds/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:doc="http://www.lyncode.com/xoai" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/rdf/ http://www.openarchives.org/OAI/2.0/rdf.xsd">
   <ow:Publication rdf:about="oai:digibug.ugr.es:10481/73537">
      <dc:title>A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset</dc:title>
      <dc:creator>Luna Jiménez, Cristina</dc:creator>
      <dc:creator>Griol Barres, David</dc:creator>
      <dc:creator>Callejas Carrión, Zoraida</dc:creator>
      <dc:subject>Audio–visual emotion recognition</dc:subject>
      <dc:subject>Human-computer interaction</dc:subject>
      <dc:subject>Computational paralinguistics</dc:subject>
      <dc:subject>xlsr-Wav2Vec2.0 transformer</dc:subject>
      <dc:subject>Transformer</dc:subject>
      <dc:subject>Transfer learning</dc:subject>
      <dc:subject>Action Units</dc:subject>
      <dc:subject>RAVDESS</dc:subject>
      <dc:subject>Speech emotion recognition</dc:subject>
      <dc:subject>Facial emotion recognition</dc:subject>
      <dc:description>The work leading to these results was supported by the Spanish Ministry of Science and Innovation through the projects GOMINOLA (PID2020-118112RB-C21 and PID2020-118112RB-C22, funded by MCIN/AEI/10.13039/501100011033), CAVIAR (TEC2017-84593-C2-1-R, funded by MCIN/AEI/10.13039/501100011033/FEDER "Una manera de hacer Europa"), and AMIC-PoC (PDC2021-120846-C42, funded by MCIN/AEI/10.13039/501100011033 and by "the European Union "NextGenerationEU/PRTR"). This research also received funding from the European Union's Horizon2020 research and innovation program under grant agreement No 823907 (http://menhir-project.eu, accessed on 17 November 2021). Furthermore, R.K.'s research was supported by the Spanish Ministry of Education (FPI grant PRE2018-083225).</dc:description>
      <dc:description>Emotion recognition is attracting the attention of the research community due to its multiple&#xd;
applications in different fields, such as medicine or autonomous driving. In this paper, we proposed&#xd;
an automatic emotion recognizer system that consisted of a speech emotion recognizer (SER) and a&#xd;
facial emotion recognizer (FER). For the SER, we evaluated a pre-trained xlsr-Wav2Vec2.0 transformer&#xd;
using two transfer-learning techniques: embedding extraction and fine-tuning. The best accuracy&#xd;
results were achieved when we fine-tuned the whole model by appending a multilayer perceptron&#xd;
on top of it, confirming that the training was more robust when it did not start from scratch and the&#xd;
previous knowledge of the network was similar to the task to adapt. Regarding the facial emotion&#xd;
recognizer, we extracted the Action Units of the videos and compared the performance between&#xd;
employing static models against sequential models. Results showed that sequential models beat&#xd;
static models by a narrow difference. Error analysis reported that the visual systems could improve&#xd;
with a detector of high-emotional load frames, which opened a new line of research to discover new&#xd;
ways to learn from videos. Finally, combining these two modalities with a late fusion strategy, we&#xd;
achieved 86.70% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying&#xd;
eight emotions. Results demonstrated that these modalities carried relevant information to detect&#xd;
users’ emotional state and their combination allowed to improve the final system performance.</dc:description>
      <dc:date>2022-03-18T08:38:33Z</dc:date>
      <dc:date>2022-03-18T08:38:33Z</dc:date>
      <dc:date>2021-12-30</dc:date>
      <dc:type>journal article</dc:type>
      <dc:identifier>Luna-Jiménez, C... [et al.]. A Proposal for Multimodal Emotion Recognition Using Aural Transformers and Action Units on RAVDESS Dataset. Appl. Sci. 2022, 12, 327. [https://doi.org/10.3390/app12010327]</dc:identifier>
      <dc:identifier>http://hdl.handle.net/10481/73537</dc:identifier>
      <dc:identifier>10.3390/app12010327</dc:identifier>
      <dc:language>eng</dc:language>
      <dc:relation>info:eu-repo/grantAgreement/EC/H2020/823907</dc:relation>
      <dc:rights>http://creativecommons.org/licenses/by/3.0/es/</dc:rights>
      <dc:rights>open access</dc:rights>
      <dc:rights>Atribución 3.0 España</dc:rights>
      <dc:publisher>MDPI</dc:publisher>
   </ow:Publication>
</rdf:RDF>