SPAM is a new algorithm for finding all frequent sequences within a transactional database. The algorithm is especially efficient when the sequential patterns in the database are very long. A depth-first search strategy is used to generate candidate sequences, and various pruning mechanisms are implemented to reduce the search space.
The transactional data is stored using a vertical bitmap representation, which allows for efficient support counting as well as significant bitmap compression. A salient feature of our algorithm is that it incrementally outputs new frequent itemsets in an online fashion.
In a thorough experimental evaluation using standard benchmark data, we isolate the effects of the individual components of our algorithm. Our performance numbers show that our algorithm outperforms previous work by a factor of 3 to over an order of magnitude.
An animated gif that demonstrates the SPAM algorithm can be found here.
The SourceForge download page has instructions on downloading the last stable version of the code. You can also download the datasets used for testing.
CVS access is also available:
Please send email to or contact the authors directly:
Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick.
Sequential
PAttern Mining Using Bitmaps.
In Proceedings of the
Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining.
Edmonton, Alberta, Canada, July 2002.