5 Mining closed high-utility frequent sequential patterns based on a genetic algorithm
Abstract
Mining Frequent Closed High-Utility Sequential Patterns (FCHUSPs) is an important problem in data mining, with many practical applications such as customer behavior analysis, supply chain optimization, and marketing. However, the mining process faces a massive search space and high computational cost, especially when the input thresholds are low or the dataset is large. In this paper, we propose a method for mining FCHUSPs using a Genetic Algorithm (GA), in which each individual is represented as a bit array to optimize memory and genetic operations. To improve efficiency, the fitness evaluation process is performed in parallel using PySpark MapReduce, with a hashtable used to verify the closeness of the patterns. The proposed algorithm, called PFCloHUS_QUANTITY_GA_SS, has been implemented and evaluated on several experimental datasets. The results show that the proposed method not only significantly reduces the execution time compared to traditional approaches but also ensures accuracy in mining frequent closed high-utility sequential patterns
