Himalaya| SourceForge Project| Code Releases| Cornell Research| FAQ
MAFIA › Overview » Itemset Tree » Search Space Pruning » Vertical Bitmaps › Download Code › Contact › Publications › Code Documentation › Presentation Slides
SPAM › Overview › Example › Download Code › Contact › Publications › Code Documentation › Presentation Slides
SECRET › Overview › Download Code › Contact › Publications › Code Documentation

SPAM: Sequential PAttern Mining

SPAM is a new algorithm for finding all frequent sequences within a transactional database. The algorithm is especially efficient when the sequential patterns in the database are very long. A depth-first search strategy is used to generate candidate sequences, and various pruning mechanisms are implemented to reduce the search space.

The transactional data is stored using a vertical bitmap representation, which allows for efficient support counting as well as significant bitmap compression. A salient feature of our algorithm is that it incrementally outputs new frequent itemsets in an online fashion.

In a thorough experimental evaluation using standard benchmark data, we isolate the effects of the individual components of our algorithm. Our performance numbers show that our algorithm outperforms previous work by a factor of 3 to over an order of magnitude.

Illustrative Example

An animated gif that demonstrates the SPAM algorithm can be found here.

Code Download

The SourceForge download page has instructions on downloading the last stable version of the code. You can also download the datasets used for testing.

CVS access is also available:

  1. Browse the source tree hosted at SourceForge: CVS Tree
  2. Type 'cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/himalaya-tools login'
  3. Press Enter when prompted for a password.
  4. Type 'cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/himalaya-tools co Spam'
  5. A source code tree rooted in a directory called Spam will be created.

Contact

Please send email to or contact the authors directly:

Publications

Jay Ayres, Johannes Gehrke, Tomi Yiu, and Jason Flannick.
Sequential PAttern Mining Using Bitmaps.
In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.
Edmonton, Alberta, Canada, July 2002.