Healthcare Applications and Security Concerns of Speech Processing Systems

Doctoral Dissertation


Conventionally, most research on speech processing focuses on automatic speech recognition (ASR), i.e., transcribing speech to text. However, natural speech does not only contain text content information, but also much other information such as emotion and even the speaker’s health status. That means we can extract more information from speech besides text content and use them for novel applications. Specifically, we can develop speech processing systems for healthcare applications such as building convenient and low-cost diagnose, screening, or monitoring solutions. In the first part of this thesis, I investigate how to build speech processing systems for healthcare applications. Specifically, I explore the use of speech systems for monitoring and early diagnosis the autism spectrum disorders, emotional and behavioral disorders, and major depressive disorder.

On the other hand, with the fast-growing number of users and usage scenarios, the security problem of speech processing systems (e.g., Amazon Alexa) becomes a new concern. Recent work has found speech processing systems are vulnerable to multiple types of attacks. However, it is still unclear how dangerous these attacks are in realistic settings. Therefore, in the second part of the thesis, I first systematically explore the vulnerabilities of speech processing systems. Then, I conduct a focused study on adversarial attacks to deep neural network-based models since deep neural networks are becoming the mainstream technique in a variety of speech applications such as speech recognition and speaker identification. Finally, I investigate the effective defense strategies protecting speech processing systems against malicious attacks in realistic settings.

Overall, this thesis aims to address two orthogonal problems about speech processing, and the goal of the research is to broaden the applications and proves the robustness of machine learning-based speech processing systems.


Attribute NameValues
Author Yuan Gong
Contributor Meng Jiang, Committee Member
Contributor Adam Czajka, Committee Member
Contributor Christian Poellabauer, Research Director
Contributor Taeho Jung, Committee Member
Degree Level Doctoral Dissertation
Degree Discipline Computer Science and Engineering
Degree Name Doctor of Philosophy
Banner Code

Defense Date
  • 2020-07-09

Submission Date 2020-07-16
  • machine learning

  • security

  • speech processing

  • healthcare

  • English

Record Visibility Public
Content License
Departments and Units
Catalog Record


Please Note: You may encounter a delay before a download begins. Large or infrequently accessed files can take several minutes to retrieve from our archival storage system.