PARALLELIZATION OF TOP-K HIGH UTILITY ITEMSET MINING IN REAL-TIME DATA STREAMS USING GENETIC ALGORITHM

Authors

  • Đức Thành Phạm Khoa CNTT
  • Nguyện Lê Thị Minh
  • Duy Trần Anh
  • Thái Trần Minh

Abstract

High-Utility Itemset Mining (HUIM) in real-time data streams is a challenging problem due to the infinite, high-speed, and continuously evolving nature of data. Traditional approaches based on minimum utility thresholds (minUtil) often face difficulties in parameter selection, which can lead to the loss of important patterns and high computational costs. This paper addresses the problem by proposing a parallelized genetic algorithm for Top-K HUIM in data streams. The proposed method integrates a sliding window model with parallelized fitness evaluation, enabling real-time processing while maintaining high accuracy. Data are represented using bitmap and hash table structures to optimize memory usage and reduce redundant evaluations during the evolutionary process. Experimental studies on benchmark datasets, including Retail, Mushroom, Chess, and Accidents, demonstrate that the proposed approach significantly improves runtime, memory efficiency, and scalability compared to sequential GA and existing HUIM methods. These results highlight the potential of parallelized genetic algorithms as an effective solution for large-scale and real-time utility mining tasks.

Published

15-04-2026

How to Cite

Phạm, Đức T., Lê Thị Minh, N., Trần Anh, D., & Trần Minh, T. (2026). PARALLELIZATION OF TOP-K HIGH UTILITY ITEMSET MINING IN REAL-TIME DATA STREAMS USING GENETIC ALGORITHM. HUFLIT Journal of Science, 10(1), 35. Retrieved from https://hjs.huflit.edu.vn/index.php/hjs/article/view/331

Issue

Section

Science and Technology

Categories

Most read articles by the same author(s)