An effective tool for managing and sharing documents and data is needed to effectively support spent fuel activities.
Filter Projects
Area of Research
Date
The Oak Ridge National Laboratory's Computational Data Analytics Group's has worked over 12 years in creating text analytics systems to quickly discover meaningful information from raw data. These capabilities focus on six key areas, emphasizing high performance over very large sets of raw documents.
Collecting and Extracting: Collecting millions of documents from databases, Internet, Social Media, and hard drives; extracting text from hundreds of file formats; and translating this information into multiple languages.
Storing and Indexing: Storing and indexing millions of documents in search servers, distributed file systems (MapReduce), relational databases, and file systems.
Recommending: Filtering the full content of millions of documents to recommend the most valuable and relevant information based on a user’s own information, or user selections, or a user’s interactions with information.
Categorize: Grouping items based on the full content of documents using supervised and semi-supervised machine learning methods and targeted search lists.
Clustering: Creating a hierarchical group of documents based on similarity using unsupervised learning methods on the full content of each document.
Visualizing: Showing hierarchies, groups, and relationships among documents that helps the user quickly understand their value, and to see new connections.
This work has resulted in eight issued ( 7,072,883 7,315,858 7,693,903 7,805,446 7,937,389 8,473,314 8,825,710 9,256,649) and one pending patents , several commercial licenses (including Pro2Serve and TextOre), a spin off company (Global Security Information Analysts LLC (GSIA)), an R&D 100 Awards, and scores of peer reviewed research publications.
Collecting and Extracting: Collecting millions of documents from databases, Internet, Social Media, and hard drives; extracting text from hundreds of file formats; and translating this information into multiple languages.
Storing and Indexing: Storing and indexing millions of documents in search servers, distributed file systems (MapReduce), relational databases, and file systems.
Recommending: Filtering the full content of millions of documents to recommend the most valuable and relevant information based on a user’s own information, or user selections, or a user’s interactions with information.
Categorize: Grouping items based on the full content of documents using supervised and semi-supervised machine learning methods and targeted search lists.
Clustering: Creating a hierarchical group of documents based on similarity using unsupervised learning methods on the full content of each document.
Visualizing: Showing hierarchies, groups, and relationships among documents that helps the user quickly understand their value, and to see new connections.
This work has resulted in eight issued ( 7,072,883 7,315,858 7,693,903 7,805,446 7,937,389 8,473,314 8,825,710 9,256,649) and one pending patents , several commercial licenses (including Pro2Serve and TextOre), a spin off company (Global Security Information Analysts LLC (GSIA)), an R&D 100 Awards, and scores of peer reviewed research publications.
Big data demands the need for intelligent, recommender agents that can enhance a person’s situational or domain awareness of their environment. The ability to have a keen awareness and availability of relevant information provides a critical competitive edge. Unfortunately, there is simply too much data streaming too quickly for a person to manually process, analyze, and take action within a reasonable amount of time. In an attempt to alleviate this challenge, many people subscribe to relevant Internet information. There may be forms of subscriptions with the most common being Really Simple Syndication (RSS), blogs, even Facebook and Twitter. The concept is simple, when new information is posted to the site; a subscriber sees a list of this new information. The subscriber then has the option of following a link to read more. This approach is a very useful and successful model for monitoring this data, but it does have some significant drawbacks. In practice, the feeds of new information become quite lengthy, and contain more information than can be practically read. Furthermore, there can be a significant number of items that have little interest to the subscriber. Thus, the ability to find new and relevant information proves critical. We have developed a content-based recommender system that addresses both of these problems. The flexibility of input allows the system to be adaptable to industry and government use cases and data sets such as news feeds, resumes, proposal requests, etc.
Scientific facilities and organizations constantly report and evaluate performance and impact based on publication metrics. One of the major problems with this process is that many facilities do not have an effective method of finding, tracking, or managing their publications. COBRA is a publication discovery and management system designed to automate the discovery of publications while providing a management solution and adding efficiency to the manual processes that are still necessary. The COBRA publication discovery and management system is the culmination of several years worth of work. It builds upon previous methods of publication discovery and adds a complete management system and infrastructure.
While the term ‘innovation ecosystem’ is often utilized, the concept is rarely quantified. Oak Ridge National Lab conducted a ground-breaking application of natural language processing, link analysis and other computational techniques to transform text and numerical data into metrics on clean energy innovation activity and geography for the U.S. Department of Energy. The project demonstrates that a machine-assisted methodology gives the user a replicable method to rapidly identify, quantify and characterize clean energy innovation ecosystems.