Skip to main content
SHARE
Publication

Automatic Labeling for Entity Extraction in Cyber Security

by Robert A Bridges, Corinne L Jones, Michael Iannacone, Kelly M Huffer, John R Goodall
Publication Type
Conference Paper
Publication Date
Conference Name
2014 ASE International Conference on Cyber Security
Conference Location
Stanford, California, United States of America
Conference Date

Timely analysis of cyber-security information necessitates automated information extraction from unstructured text.
While state-of-the-art extraction methods produce extremely accurate results, they require ample training data, which is generally unavailable for specialized applications, such as detecting security related entities; moreover, manual annotation of corpora is very costly and often not a viable solution.
In response, we develop a very precise method to automatically label text from several data sources by leveraging related, domain-specific, structured data and provide public access to a corpus annotated with cyber-security entities.
Next, we implement a Maximum Entropy Model trained with the average perceptron on a portion of our corpus (~750,000 words) and achieve near perfect precision, recall, and accuracy, with training times under 17 seconds.