Research Article
Audio Deepfake Detection Using a Hybrid Model of Convolutional and Bidirectional Long Short-term Memory Networks
Samar Al-Halabi*
,
Adnan Kafri
Issue:
Volume 11, Issue 1, March 2026
Pages:
1-7
Received:
1 November 2025
Accepted:
13 November 2025
Published:
7 January 2026
DOI:
10.11648/j.aas.20261101.11
Downloads:
Views:
Abstract: With the rapid advancement of audio deepfake technologies, detecting such manipulations has become a critical cybersecurity challenge. This study proposes a novel hybrid model that combines Convolutional Neural Networks (CNNs) with Bidirectional Long Short-Term Memory (BiLSTM) networks to detect spoofed audio. The research is based on the Release-in-the-Wild dataset, which simulates real-world acoustic conditions, and employs a preprocessing pipeline involving the extraction of Mel-Frequency Cepstral Coefficients (MFCCs) enhanced with first- and second-order derivatives. The proposed model achieved an accuracy of 99% with an Equal Error Rate (EER) of 0.011, while maintaining remarkable lightness with only 473k trainable parameters. Beyond numerical performance, the model demonstrates strong robustness against acoustic variability, environmental noise, and speaker diversity, highlighting its potential for deployment in uncontrolled real-world scenarios. Its compact design ensures low computational demand, making it practical for integration into online verification systems, intelligent voice assistants, and security monitoring infrastructures. Comparative experiments further confirm that the hybrid CNN–BiLSTM architecture achieves a superior balance between accuracy, efficiency, and generalization compared to recent Transformer-based models. Overall, this work contributes an interpretable and resource-efficient framework for generalized audio deepfake detection. The findings underline that high detection accuracy and lightweight design are not mutually exclusive, and future research will focus on extending the approach to multimodal systems that jointly analyze both audio and visual cues for more reliable deepfake forensics.
Abstract: With the rapid advancement of audio deepfake technologies, detecting such manipulations has become a critical cybersecurity challenge. This study proposes a novel hybrid model that combines Convolutional Neural Networks (CNNs) with Bidirectional Long Short-Term Memory (BiLSTM) networks to detect spoofed audio. The research is based on the Release-i...
Show More