Khai thác tập mục hữu ích cao từ các luồng dữ liệu dựa trên di truyền
Keywords:
High Utility Itemset Mining; Data Stream; Hash Table; Genetic Algorithms; Sliding WindowAbstract
High-utility itemset mining (HUIM) from data streams with limited time and space is a challenging task. Traditional algorithms often scan data many times and use complex data structures to connect, store, and update information. Furthermore, the loss of itemsets caused by heuristic algorithms and the evaluation of duplicate itemsets generated by regular data batches contribute to the algorithm's inefficiency in time and space. To solve these problems, we propose a new algorithm based on genetic algorithms to deploy high-value itemsets from data streams, called HUIM_DS_GA, which effectively solves the problem of limited storage space. The HUIM_DS_GA algorithm designs a new cluster update strategy, which increases convergence and minimizes the loss of important itemsets. Besides, we propose a storage strategy for the hash table to avoid the evaluation of duplicate itemsets, thereby improving the algorithm's execution efficiency. Experiments on real and aggregate datasets show that the algorithm performs effectively, and significantly decreases the consumption level of memory while maintaining better expansion than previous methods.