Abstract
Effective public health surveillance is important for national secu- rity. With novel emerging infectious diseases being reported across different parts of the world, there is a need to build effective bio- surveillance systems that can track, monitor and report such events in a timely manner. Additionally, there is a need to identify sus- ceptible geographic regions/populations where these diseases may have a significant impact and design preemptive strategies to tackle them. With the digitization of health related information through electronic health records (EHR) and electronic healthcare claim re- imbursements (eHCR), there is a tremendous opportunity to ex- ploit these datasets for public health surveillance. In this paper, we present our analysis on the use of eHCR data for bio-surveillance by studying the 2009-2010 H1N1 pandemic flu season. We present a novel approach to extract spatial and temporal patterns of flu in- cidence across the United States (US) from eHCRs and find that a small, but distinct set of break-out patterns govern the flu and asthma incidence rates across the entire country. Further, we ob- serve a distinct temporal lag in the onset of flu when compared to asthma across geographic regions in the US. The patterns extracted from the data collectively indicate how these break-out patterns are coupled, even though the flu represents an infectious disease whereas asthma represents a typical chronic condition. Taken to- gether, our approach demonstrates how mining eHCRs can provide novel insights in tackling public health concerns.