Poster
in
Affinity Workshop: Women in Machine Learning
Natural language processing for automated information extraction of cancer parameters from free-text pathology reports
Okechinyere Achilonu
A cancer pathology report is a valuable medical document that provides information for prognosis, personalised treatment plan and patient management. Developing countries still use the unstructured (free text) reporting format. However, this reporting format has been associated with several limitations arising from variations in the quality of reporting. Manual information extraction and report classification can be intrinsically complex and resource intensive, given a free-text format. Extracting information from these reports into a structured layout is also essential for research, auditing, and cancer incidence reporting. This study aimed to develop and evaluate strategies for extracting relevant information to classify cancer pathology reports and to develop a rule-based function to automatically extract cancer prognostic parameters from these reports and transform them into structured data to uncover the trend of the parameters over the years.We retrieved colorectal and prostate cancer diagnostic cases from the National Health Laboratory Services. TM and ML algorithms were used for data preprocessing, visualisations, feature selections, text classification and performance evaluation. Secondly, we developed a rule-based NLP algorithm that retrieved and extracted important prognostic parameters from the reports to explore their trends.Results showed inconsistencies and incompleteness in reporting each year and throughout the study period. The findings also indicate that the developed rule-based function achieved high accurate annotation for all the parameters extracted, with performance measures ranging from 83% -100%. The trend analysis result showed significant trends comparable to previous studies.In conclusion, we developed reproducible frameworks using NLP and ML algorithms that can form the basis for future studies in South Africa. Our study bridged the gap between data availability and actionable knowledge.