Frequent Pattern Mining Analytics

Lin, Xika

Etd

Frequent Pattern Mining Analytics

Public Deposited

This dissertation research focuses on frequent pattern mining on static and stream data. Frequent Patterns are the patterns that appear frequently in a data set. Mining of frequent patterns is critical for applications ranging from retail analysis and bioinformatics to financial analysis and web usage mining. Mined patterns are often indicators of important observations or discoveries hidden in the data. However, frequent pattern mining is known to be computationally intensive. Response times for mining such patterns tend to be unacceptably long. In addition, existing mining systems tend to be black boxes that provide no knowledge of precise parameter settings to match the analyst’s interest. The usability of mining systems is limited by the lack of support for the sense-making of the mined patterns. Furthermore, little progress has been made toward supporting mining and tracking frequent pattern evolution over streaming data. In this dissertation, we thus focus on addressing the following problems. First, we propose a novel parameter space model called PARAS that enables efficient rule mining by compactly maintaining the final rulesets. The PARAS model is based on the abstraction of stable regions that form the coarse granularity ruleset space. Based on new insights into the redundancy relationships among rules, PARAS establishes a surprisingly compact representation of complex redundancy relationships while enabling efficient redundancy resolution at query time. Second, the widespread restriction to only support positive rules can miss important insights and lead to misleading results. For this reason, the discovery of both negative and positive rules, which can both be extremely revealing, is important. Unfortunately, the generation of negative rules slows down the mining process even further. To tackle this shortcoming, we extend our PARAS technology to incorporate negative rule mining, called PARASc. PARASc enables efficient mining of complete rule types, i.e., both positive and negative rules. To further complicate matters, in the context of complete rules, redundant relationships exist across different rule types. We establish a theoretical foundation of rule redundancy relationships, which facilitates redundancy relationship modeling. This redundancy meta-knowledge enables effective and efficient redundancy resolution across the full spectrum of positive and negative rules at query time. Third, we design a visual frequent pattern exploration framework, called FIRE, that features innovative visual displays and interactions to enable interactive frequent pattern exploration. We propose two linked interactive displays, namely, the parameter space view and the rule space view that together support enhanced sense-making of rule relationships by users as confirmed by case studies. Lastly, we develop a frequent itemset-based community evolution mining system, called eCommunity, that supports efficient mining of community evolution patterns on streaming data. eCommunity is equipped with both computational community extraction and evolution tracking techniques, which allow it to not only efficiently discover communities of interest but also track the community evolution pattern efficiently. Our comprehensive experimental studies, using both synthetic as well as real data from different domains demonstrate the superior performance of our proposed strategies over alternate methods from the literature in both effectiveness and efficiency.

Creator