Significance of Teager Energy Operator for Speech Applications
Abstract
Speech is used in various applications apart from voice communications, such as pathology detection, severity-level classification of dysarthria, and replay spoof speech detection for voice biometric and voice assistants. The first part of this thesis work deals with the development of the countermeasure (CM) system for replay Spoof Speech Detection (SSD). Replay attack on voice biometric, refers to the fraudulent attempt made by an imposter to spoof another person�s identity by replaying the pre-recorded voice samples in front of an Automatic Speaker Veri- fication (ASV) system or Voice Assistants (VAs). Lastly, the dysarthria, which is neuromotor speech disorder is studied and analysed using various speech processing and deep learning approaches. Dysarthria, Parkinson�s disease, Cerebral Palsy, etc. are types of atypical speech, which impairs neuromotor functions of the human body. Among these, dysarthria is one of the most common atypical speech. To analyse the dysarthic condition of the patient depends on the severity level, which is generally provided by Speech Language Pathologist (SLPs). However, to make the assessment immune to human biases and errors, this thesis is oriented towards developing the severity level classification system using signal processing and deep learning approaches for dysarthric speech. This presents analysis of dysarthic vs. normal speech using the Teager Energy Operator (TEO) based Teager Energy Cepstral Coefficients (TECC), and Squared Energy Operator (SEO) based Squared Energy Cepstral Co-efficients (SECC) as the frontend features. These features provided as input for deep learning and pattern recognition model predicts the severitylevel class for dysarthria. Lastly, the generalization of the countermeasure system for the replay attacks on the ASV systems and VAs is analysed using the TEO based TECC feature set. The generalization of the CM system is presented through the cross database evaluation between the Voice Spoofing Detection Corpus (VSDC), ASVspoof 2017 version 2.0 and ASVspoof 2019 PA datasets. Further, the analysis of One point Replay (1PR) and Two Point Replay (2PR) are presented in this thesis.