Khai thác tập mục hữu ích cao từ các luồng dữ liệu dựa trên di truyền

Authors

  • Nguyện Lê Trường DH Ngoại ngữ - tin học Tp.Hcm
  • Thành Phạm Đức
  • Duy Trần Anh

Keywords:

High Utility Itemset Mining; Data Stream; Hash Table; Genetic Algorithms; Sliding Window

Abstract

High-utility itemset mining (HUIM) from data streams with limited time and space is a challenging task. Traditional algorithms often scan data many times and use complex data structures to connect, store, and update information. Furthermore, the loss of itemsets caused by heuristic algorithms and the evaluation of duplicate itemsets generated by regular data batches contribute to the algorithm's inefficiency in time and space. To solve these problems, we propose a new algorithm based on genetic algorithms to deploy high-value itemsets from data streams, called HUIM_DS_GA, which effectively solves the problem of limited storage space. The HUIM_DS_GA algorithm designs a new cluster update strategy, which increases convergence and minimizes the loss of important itemsets. Besides, we propose a storage strategy for the hash table to avoid the evaluation of duplicate itemsets, thereby improving the algorithm's execution efficiency. Experiments on real and aggregate datasets show that the algorithm performs effectively, and significantly decreases the consumption level of memory while maintaining better expansion than previous methods.

Published

04-03-2025

How to Cite

Lê, N., Phạm Đức, T., & Trần Anh, D. (2025). Khai thác tập mục hữu ích cao từ các luồng dữ liệu dựa trên di truyền. HUFLIT Journal of Science, 9(1), 45. Retrieved from https://hjs.huflit.edu.vn/index.php/hjs/article/view/233

Issue

Section

Articles

Categories