Khai thác tập phổ biến từ dữ liệu luồng bằng cách sử dụng thuật toán di truyền

Đức Thành Phạm; Nguyện Lê Thị Minh

Authors

Đức Thành Phạm Khoa CNTT
Nguyện Lê Thị Minh

Abstract

This paper presents a study on mining frequent sets of terms from streaming transaction data in the context of evolving concepts. Streaming data, characterized by its instability, poses numerous challenges in the mining process. A method utilizing a genetic algorithm is proposed, and the relationship between concept drift, sliding window size, and genetic algorithm constraints is explored. Concept drift is identified through changes in frequent itemsets. The uniqueness of this study lies in determining concept drift by leveraging frequent itemsets for streaming data mining, employing a genetic algorithm framework. An equation is presented to compute the minimum support count in streaming data using a sliding window. Experiments have indicated that the ratio betweethe window size and the number of transactions per drift is a critical factor for achieving good performance. Attaining satisfactory results with excessively small window sizes poses a challenge as normal fluctuations in data can manifest as concept drift. The window size must be managed alongside the support value and confidence level to achieve reasonable outcomes. This approach to concept drift detection has performed well when using larger window sizes.

Khai thác tập phổ biến từ dữ liệu luồng bằng cách sử dụng thuật toán di truyền

Authors

Abstract

Downloads

Published

How to Cite

Issue

Section

Categories

Most read articles by the same author(s)

Cover

Language

Information