Bang Tran, Youxiang Zhu, James W. Schwoebel, and Xiaohui Liang
Excessive sleepiness in critical tasks and jobs can lead to adverse outcomes, such as work accidents and car crashes. Detecting and monitoring sleepiness levels can prevent these adverse events from happening. In this paper, we propose an attention-based sleepiness detection method using HuBERT embeddings and eGeMAPS features of human speech. Specifically, we propose an attention-based convolutional neural network (CNN) model that achieves accurate 82.57% sleepiness detection using HuBERT embeddings plus age and gender as inputs. We also show that the embedded attention layers significantly improve the detection accuracy in different cases of inputs. We then explore the attention weights from the attention layers and observe that the long and semantically-different responses from “Picture description,” “Microphone test,” and “Free speech” tasks are more relevant to sleepiness detection when the model is trained with HuBERT only; the short and semantically-similar responses from “Sustained phonation” and “Diadochokinetic” tasks are more relevant when trained with HuBERT plus age and gender. The attention mechanism enables our model to take all responses as one input, simplifying the data pre-processing and identifying the relevant speech responses to sleepiness detection.
Accepted by IEEE International Conference on Communications (ICC) 2023.