KHAI THÁC TẬP PHỔ BIẾN TỪ DỮ LIỆU LUỒNG DỰA TRÊN THUẬT TOÁN DI TRUYỀN SỬ DỤNG BIT VÀ XỬ LÝ SONG SONG
Abstract
In the age of big data, the ability to extract and get useful insights from streaming data is very important for applications like real-time analytics, anomaly detection, and decision-making processes. This paper proposes a novel approach to mine frequent patterns in a stream by using genetic algorithm, parallel processing and bitwise operations. The core of this method is to mainly employs Python’s ThreadPoolExecutor as a means of parallel processing in order to speed up the calculations and efficiently manage large streams of data. The proposed algorithm employs method uses a sliding window technique to dynamically retain and update frequent patterns when new data arrives. This approach method maintains relevant analysis to recent data, and resolve the challenges posed by the transiency of data streams. By the usage of bitwise operations into the genetic algorithm, this method optimizes representation and manipulation of frequent patterns, thus reducing computational costs and improving performance. ThreadPoolExecutor is utilized to obtain parallel processing, this technique allow to concurrently process multiple segments of the data stream. This improvement not only speeds up the algorithm but also assures scalability and compatibility with high throughput data environments. Experimental results demonstrate that in terms of both speed and accuracy, the proposed method performs an outstanding result compared to traditional frequent pattern mining techniques, especially in scenarios involving large and continuous data streams. The paper also provides a detailed discussion about the implementation of design genetic algorithm, bitwise operations among others in it and a parallel processing framework. The paper also provides extensive performance analysis, showcasing how efficient the solution is in real-world data stream scenarios. The suggested methodology offers a new solution for real-time data stream mining, providing powerful solutions with high applicability from continuous data streams.