Khai thác tập phổ biến từ dữ liệu luồng bằng cách sử dụng thuật toán di truyền
Abstract
This paper presents a study on mining frequent sets of terms from streaming transaction data in the context of evolving concepts. Streaming data, characterized by its instability, poses numerous challenges in the mining process. A method utilizing a genetic algorithm is proposed, and the relationship between concept drift, sliding window size, and genetic algorithm constraints is explored. Concept drift is identified through changes in frequent itemsets. The uniqueness of this study lies in determining concept drift by leveraging frequent itemsets for streaming data mining, employing a genetic algorithm framework. An equation is presented to compute the minimum support count in streaming data using a sliding window. Experiments have indicated that the ratio betweethe window size and the number of transactions per drift is a critical factor for achieving good performance. Attaining satisfactory results with excessively small window sizes poses a challenge as normal fluctuations in data can manifest as concept drift. The window size must be managed alongside the support value and confidence level to achieve reasonable outcomes. This approach to concept drift detection has performed well when using larger window sizes.