More Data, Better Accuracy? An Empirical Study on Few-shot Speech-based Dementia Detection (SoCal 2022)

Youxiang Zhu and Xiaohui Liang

Speech-based dementia detection faces small data problems and weak data-label correlation problems. Existing works still used cross-validation or fixed training testing split as evaluation protocols on the small data, which may overestimate the performance. We propose a new evaluation protocol under the few-shot learning setting. Based on our evaluation results, we found that with more data in training, the dementia detection model does not necessarily produce higher accuracy, partly because of the weak data-label correlation. Moreover, we observed that pre-training may be more critical than having more downstream data for performance improvement in dementia detection.

Accepted by the Southern California Natural Language Processing Symposium (SoCal), 2022.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: