Abstract
Lower-and-middle income countries are faced with challenges arising from a
lack of data on cause of death (COD), which can limit decisions on population
health and disease management. A verbal autopsy(VA) can provide information
about a COD in areas without robust death registration systems. A VA consists
of structured data, combining numeric and binary features, and unstructured
data as part of an open-ended narrative text. This study assesses the
performance of various machine learning approaches when analyzing both the
structured and unstructured components of the VA report. The algorithms were
trained and tested via cross-validation in the three settings of binary
features, text features and a combination of binary and text features derived
from VA reports from rural South Africa. The results obtained indicate
narrative text features contain valuable information for determining COD and
that a combination of binary and text features improves the automated COD
classification task.
Keywords: Diabetes Mellitus, Verbal Autopsy, Cause of Death, Machine
Learning, Natural Language Processing