Please use this identifier to cite or link to this item: http://dx.doi.org/10.25673/122078
Title: A Hybrid Spiking-Attention Transformer Model for Robust and Efficient Speech Emotion Recognition on Multi-Dataset Benchmarks
Author(s): Abbas Ali, Samah
Granting Institution: Hochschule Anhalt
Issue Date: 2025-08
Extent: 1 Online-Ressource (7 Seiten)
Language: English
Abstract: This study introduces a novel and effective method for Speech Emotion Recognition (SER) that combines Spiking Neural Networks (SNNs), Temporal Attention, and Transformer encoders within a powerful hybrid model. SER is essential for improving human-computer interaction by enabling intelligent systems to effectively recognize emotions from speech. Unlike traditional methods that typically rely on shallow classifiers and manually engineered features, our deep learning-based approach takes full advantage of the energy efficiency of SNNs, the selective focus provided by temporal attention, and the long-range temporal modeling capabilities of Transformer architectures. We thoroughly evaluated the performance of this model on a comprehensive multi-dataset corpus, which included TESS, SAVEE, RAVDESS, and CREMA-D. The model achieved an impressive and consistent accuracy of 98% across all emotion classes. These strong results not only demonstrate the model’s superior effectiveness but also highlight its potential for use in real-time, resource-limited environments. Furthermore, this hybrid approach clearly surpasses existing state-of-the-art SER techniques and offers a reliable foundation for application in real-world affective computing scenarios.
URI: https://opendata.uni-halle.de//handle/1981185920/124026
Open Access: Open access publication
License: (CC BY-SA 4.0) Creative Commons Attribution ShareAlike 4.0(CC BY-SA 4.0) Creative Commons Attribution ShareAlike 4.0
Appears in Collections:International Conference on Applied Innovations in IT (ICAIIT)

Files in This Item:
File SizeFormat 
2-8-ICAIIT_2025_13(4).pdf1.03 MBAdobe PDFView/Open