PARALLELIZATION OF TOP-K HIGH UTILITY ITEMSET MINING IN REAL-TIME DATA STREAMS USING GENETIC ALGORITHM
Abstract
High-Utility Itemset Mining (HUIM) in real-time data streams is a challenging problem due to the infinite, high-speed, and continuously evolving nature of data. Traditional approaches based on minimum utility thresholds (minUtil) often face difficulties in parameter selection, which can lead to the loss of important patterns and high computational costs. This paper addresses the problem by proposing a parallelized genetic algorithm for Top-K HUIM in data streams. The proposed method integrates a sliding window model with parallelized fitness evaluation, enabling real-time processing while maintaining high accuracy. Data are represented using bitmap and hash table structures to optimize memory usage and reduce redundant evaluations during the evolutionary process. Experimental studies on benchmark datasets, including Retail, Mushroom, Chess, and Accidents, demonstrate that the proposed approach significantly improves runtime, memory efficiency, and scalability compared to sequential GA and existing HUIM methods. These results highlight the potential of parallelized genetic algorithms as an effective solution for large-scale and real-time utility mining tasks.
