An Eficient - Distributed model for mining sequential patterns on a large sequence dataset

Minh Thai Tran

Các tác giả

Minh Thai Tran

Tóm tắt

Sequential pattern mining is an active research area because of its many different applications. There have been many studies suggesting efficient mining algorithms. With the current trend, the size of the sequence dataset is growing, and research has been applying the distributed processing model on the problem of sequential patterns on sequence databases. One of the algorithms that apply the distributed modeling to the efficient sequential pattern mining algorithm is the sequential pattern mining algorithm based on the MapReduce model on the cloud (SPAMC). However, SPAMC is still limited in mining datasets that have a large number of distinct items. This article proposes a distributed algorithm to deal with this problem, called the distributed algorithm for sequential pattern mining on a large sequence dataset using dynamic vector bit structures on the MapReduce distributed programming model (DSPDBV). In addition, the algorithm uses different techniques for early prune redundant candidates and reduce the amount of memory usage. Experimental results show that DSPDBV is highly efficient and scalable for large sequence datasets. Moreover, DSPDBV is more efficient than SPAMC handling datasets have a large number of distinct items.

An Eficient - Distributed model for mining sequential patterns on a large sequence dataset

Các tác giả

Tóm tắt

Tải xuống

Đã Xuất bản

Cách trích dẫn

Số

Chuyên mục

Các bài báo được đọc nhiều nhất của cùng tác giả

Trang bìa

Thông báo

Thể lệ viết bài

Ngôn ngữ

Thông tin