Mostrar el registro sencillo del ítem

dc.contributor.authorLuna Jiménez, Cristina
dc.contributor.authorGriol Barres, David 
dc.contributor.authorCallejas Carrión, Zoraida 
dc.contributor.authorKleinlein, Ricardo
dc.contributor.authorMontero, Juan M.
dc.contributor.authorFernández Martínez, Fernando
dc.date.accessioned2021-11-19T08:40:57Z
dc.date.available2021-11-19T08:40:57Z
dc.date.issued2021
dc.identifier.citationLuna-Jiménez, C.; Griol, D.; Callejas, Z.; Kleinlein, R.; Montero, J.M.; Fernández-Martínez, F. Multimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learning. Sensors 2021, 21, 7665. https://doi.org/10.3390/s21227665es_ES
dc.identifier.urihttp://hdl.handle.net/10481/71614
dc.description.abstractEmotion Recognition is attracting the attention of the research community due to the multiple areas where it can be applied, such as in healthcare or in road safety systems. In this paper, we propose a multimodal emotion recognition system that relies on speech and facial information. For the speech-based modality, we evaluated several transfer-learning techniques, more specifically, embedding extraction and Fine-Tuning. The best accuracy results were achieved when we fine-tuned the CNN-14 of the PANNs framework, confirming that the training was more robust when it did not start from scratch and the tasks were similar. Regarding the facial emotion recognizers, we propose a framework that consists of a pre-trained Spatial Transformer Network on saliency maps and facial images followed by a bi-LSTM with an attention mechanism. The error analysis reported that the frame-based systems could present some problems when they were used directly to solve a videobased task despite the domain adaptation, which opens a new line of research to discover new ways to correct this mismatch and take advantage of the embedded knowledge of these pre-trained models. Finally, from the combination of these two modalities with a late fusion strategy, we achieved 80.08% accuracy on the RAVDESS dataset on a subject-wise 5-CV evaluation, classifying eight emotions. The results revealed that these modalities carry relevant information to detect users’ emotional state and their combination enables improvement of system performance.es_ES
dc.language.isoenges_ES
dc.publisherMDPIes_ES
dc.rightsAtribución 3.0 España*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/es/*
dc.subjectAudio–visual emotion recognitiones_ES
dc.subjectHuman-computer-interactiones_ES
dc.subjectComputational paralinguisticses_ES
dc.subjectSpatial transformerses_ES
dc.subjectTransfer learninges_ES
dc.subjectSpeech emotion recognitiones_ES
dc.subjectFacial emotion recognitiones_ES
dc.titleMultimodal Emotion Recognition on RAVDESS Dataset Using Transfer Learninges_ES
dc.typejournal articlees_ES
dc.rights.accessRightsopen accesses_ES
dc.identifier.doi10.3390/s21227665


Ficheros en el ítem

[PDF]

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem

Atribución 3.0 España
Excepto si se señala otra cosa, la licencia del ítem se describe como Atribución 3.0 España