Khai thác tập mục hữu ích cao từ các luồng dữ liệu dựa trên di truyền

Nguyện Lê; Thành Phạm Đức; Duy Trần Anh

Authors

Nguyện Lê Trường DH Ngoại ngữ - tin học Tp.Hcm
Thành Phạm Đức
Duy Trần Anh

Keywords:

High Utility Itemset Mining; Data Stream; Hash Table; Genetic Algorithms; Sliding Window

Abstract

High-utility itemset mining (HUIM) from data streams with limited time and space is a challenging task. Traditional algorithms often scan data many times and use complex data structures to connect, store, and update information. Furthermore, the loss of itemsets caused by heuristic algorithms and the evaluation of duplicate itemsets generated by regular data batches contribute to the algorithm's inefficiency in time and space. To solve these problems, we propose a new algorithm based on genetic algorithms to deploy high-value itemsets from data streams, called HUIM_DS_GA, which effectively solves the problem of limited storage space. The HUIM_DS_GA algorithm designs a new cluster update strategy, which increases convergence and minimizes the loss of important itemsets. Besides, we propose a storage strategy for the hash table to avoid the evaluation of duplicate itemsets, thereby improving the algorithm's execution efficiency. Experiments on real and aggregate datasets show that the algorithm performs effectively, and significantly decreases the consumption level of memory while maintaining better expansion than previous methods.

Khai thác tập mục hữu ích cao từ các luồng dữ liệu dựa trên di truyền

Authors

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

Categories

Most read articles by the same author(s)

Cover

Announcements

Author Guidelines

Language

Information