DSTER: A Dual-Stream Transformer-based Emotion Recognition Model through Keystrokes Dynamics
Published in IEEE International Joint Conference on Biometrics, 2024
Abstract:
Emotion Recognition is a critical research area for enhancing human-computer interaction. Keystroke dynamics, a behavioral biometric capturing typing patterns, offers a non-intrusive, user-friendly method for recognizing emotions. We propose a Dual-Stream Transformer-based Emotion Recognition (DSTER) model, which leverages keystroke dynamics to determine emotional states. The DSTER model features a dual-stream architecture that separately extracts temporal-over-channel and channel-over-temporal information. Each stream employs multi-head self-attention mechanisms, Long-Short Term Memory (LSTM), and Convolutional Neural Network (CNN) layers, along with dense vector embeddings of keycode data, to improve the extraction of temporal and contextual information from typing sequences. To the best of our knowledge, the DSTER model is the first to integrate transformer architecture with keystroke dynamics for emotion recognition. Our experiments on a widely-used fixed-text dataset demonstrate that the DSTER model significantly outperforms the three most recent baseline models, achieving average F1 scores up to 0.989 and an average accuracy increase of up to 66.04%. Unlike the significant performance variations reported in baseline models, the DSTER model maintains consistent and robust performance across all five tested emotional states. Further analysis shows that the model performs better with longer window lengths and greater overlaps.
Keywords:
Emotion Recognition, Keystroke Dynamics, Transformer
