Skip to main content
SHARE
Publication

Semi-Supervised Information Extraction for Cancer Pathology Reports...

Publication Type
Conference Paper
Journal Name
IEEE-EMBS International Conference on Biomedical and Health Informatics 2019
Book Title
2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI)
Publication Date
Page Numbers
1 to 4
Issue
1
Conference Name
IEEE EMBS International Conference on Biomedical & Health Informatics (IEEE-EMBS BHI 2019)
Conference Location
Chicago, Illinois, United States of America
Conference Sponsor
IEEE
Conference Date
-

Pathology reports are a main source of data for cancer surveillance programs. Manual coding of pathology reports is labor-intensive but necessary for obtaining labeled data to train automated information extraction systems. In this study, we investigated semi-supervised deep learning, improving the performance of a multitask information extraction system for automated annotation of pathology reports. We used a set of over 374,000 pathology reports from the Louisiana Tumor Registry and a novel convolutional attention-based auto-encoder. We performed a set of experiments comparing supervised training augmented with unlabeled data at 1%, 5%, 10%, and 50% of the original data size. We also compared the impact of extending text processing to include unlabeled tokens. We find that semi-supervised training consistently improved individual performance with increased micro-averaged F-scores between 0.012 and 0.064 and increased macro-averaged F-scores of up to 0.158. This demonstrates that semantic information learned via unsupervised learning can be used to improve supervised clinical task performance.