An Eficient - Distributed model for mining sequential patterns on a large sequence dataset

Các tác giả

  • Minh Thai Tran

Tóm tắt

Sequential pattern mining is an active research area because of its many different applications. There have been many studies suggesting efficient mining algorithms. With the current trend, the size of the sequence dataset is growing, and research has been applying the distributed processing model on the problem of sequential patterns on sequence databases. One of the algorithms that apply the distributed modeling to the efficient sequential pattern mining algorithm is the sequential pattern mining algorithm based on the MapReduce model on the cloud (SPAMC). However, SPAMC is still limited in mining datasets that have a large number of distinct items. This article proposes a distributed algorithm to deal with this problem, called the distributed algorithm for sequential pattern mining on a large sequence dataset using dynamic vector bit structures on the MapReduce distributed programming model (DSPDBV). In addition, the algorithm uses different techniques for early prune redundant candidates and reduce the amount of memory usage. Experimental results show that DSPDBV is highly efficient and scalable for large sequence datasets. Moreover, DSPDBV is more efficient than SPAMC handling datasets have a large number of distinct items.

Tải xuống

Đã Xuất bản

14-03-2022

Cách trích dẫn

Tran, M. T. (2022). An Eficient - Distributed model for mining sequential patterns on a large sequence dataset. Tạp Chí Khoa học HUFLIT, 6(1), 99. Truy vấn từ https://hjs.huflit.edu.vn/index.php/hjs/article/view/96

Số

Chuyên mục

Bài viết