Youxiang Zhu and Xiaohui Liang
Speech-based dementia detection faces small data problems and weak data-label correlation problems. Existing works still used cross-validation or fixed training testing split as evaluation protocols on the small data, which may overestimate the performance. We propose a new evaluation protocol under the few-shot learning setting. Based on our evaluation results, we found that with more data in training, the dementia detection model does not necessarily produce higher accuracy, partly because of the weak data-label correlation. Moreover, we observed that pre-training may be more critical than having more downstream data for performance improvement in dementia detection.
Accepted by the Southern California Natural Language Processing Symposium (SoCal), 2022.