Pattern Discovery Through Reversing Time Flow
One of the growing, critical challenges of big data is to produce actionable intelligence from massive (and still-increasing) datasets, in a decreasing amount of time. Current analysis tools enable analysts to examine (and confirm or reject) hypotheses they have formed. These confirmatory methods and tools are a necessary part of data analysis; however, they use a trial-and-error based approach that consumes large amounts of time.
Previous analysis methods were human-centric and, as such, allowed the extraordinary decision capabilities of the mind to be leveraged in analyzing the pertinent, hard-to-come-by intelligence data. With the massive collections of data that occur every day, indeed every hour, in a region of interest, the human mind can only leverage its power on an infinitesimal portion of the collected data. Moreover, it is now an extremely complex challenge to know which of the pertinent data are buried in the massive amounts of collected data.
A need exists for an automated, exploratory, data-centric mode of analysis capable of discovering patterns, creating metadata, or simply generating a more concentrated grouping of data to be added to the manual, confirmatory, human-centric mode to facilitate the vast majority of data collected.
This invention, developed by researchers at the Johns Hopkins Applied Physics Laboratory (JHU/APL), provides systems and methods in the form of a software tool for automatically mining massive intelligence databases to discover sequential patterns therein using a novel combination of forward and reverse temporal processing techniques as an enhancement to well-known pattern discovery algorithms.
Rule induction algorithms constitute a well-known class of pattern discovery algorithms that can be used to facilitate automated discovery of patterns or associations within structured datasets. The patterns comprise associations of database elements that repeat throughout an examined time-pace. One type of rule induction algorithm is known as Sequential Rule Induction (SRI) which discovers repetitive sequential patterns (RSPs). In general, SRI discovers RSPs by first amassing candidate patterns, and subsequently pruning/removing candidate patterns that do not pass one or more statistical thresholds set by the user. This invention enhances these well-known pattern discovery algorithms by incorporating forward and reverse temporal processing in a fully automated capability for efficiently discovering a subset of repetitive sequential patterns (RSPs) hidden in typically large datasets.
The software tool operates by utilizing sequential rule induction as an underlying algorithm together with reverse and forward temporal processing techniques to mine massive databases to discover sequential patterns being exhibited by the database elements. In a first processing segment, the software tool operates on this input data to output (discover) a candidate set of patterns. In a second processing segment, statistical thresholds are used to prune the candidate set of patterns, output from the first processing segment, to generate and output the final set of patterns. Using the final set of patterns, the final processing segment involves reaching into the original input dataset(s), and extracting the actual data elements that comprise each of the patterns in the final set. These extracted data elements are then post-processed and arranged in the appropriate order, that is, as described by the patterns. This output can be input to a visualization tool to display the pattern elements. Alternatively, visualization can also be a simple tabular listing of the patterns.
This invention provides capabilities for mining massive datasets in an automated fashion and in a timely manner to discover sequential patterns buried within the data. Experimental results have indicated typical processing times on commercial off-the-shelf platforms range from several minutes for large regular datasets, to one or two hours when handling large dense datasets.
We are seeking a commercial transition licensee, either a start-up or an existing company, focusing on non-US Government customers who can successfully commercialize this IP.
Patent Status: U.S. patent 7,945,572 issued.
Mr. E. Chalfin
Phone: (443) 778-7473