admin/ | Contains config files for compiling. Should not be altered. |
datasets/ | Contains several datasets for testing |
src/ | Contains all of source code for SPAM |
src/Bitmap4.h src/Bitmap4.cpp src/Bitmap8.h src/Bitmap8.cpp src/Bitmap16.h src/Bitmap16.cpp src/Bitmap32.h src/Bitmap32.cpp src/Bitmap64.h src/Bitmap64.cpp | Vertical representations of the data. They handle both uncompressed and compressed data. Each bitmap can represent customers with up to x transactions since it allocates exactly x bits per customer |
src/DatasetInfo.h | A class for representing the info gathered from the dataset |
src/FileInput.cpp | The functions necessary for reading in datasets. |
src/ResizableArray.h | A class containing a resizable array data structure |
src/SeqBitmap.h src/SeqBitmap.cpp | A representation of a sequence (or an item) |
src/Spam.cpp | Main file with most of the algorithmic code |
src/Stats.h src/Stats.cpp | Collects statistics about SPAM as it is run |
src/StringMap.h | A class containing a data structure that allows for two way mapping between ints and strings |
src/Tables.h | For gathering the lookup tables in one place |
src/TreeNode.h | Class for representing nodes in the search tree |
INSTALL | Generic installation instructions |
spam.{kdevprj,kdevses} | KDevelop project files for Linux |
Spam.{sln,vcproj} | Visual Studio .NET project files for Windows |
README | Pointer to this page |
test | Contains perlscripts and executables to test SPAM |
Usage: spam -sup <minSup> [-fn <infile>] [-stdin] [-ascii] [-str] [-outFile <outfile>] [-stdout]
minSup - The minimum support (between 0.0 and 1.0) infile - The data file to read in (see below for specifications) stdin - Use this flag if the data should be read in from stdin. Must use if -fn is not specified. Overrides any file specified via -fn ascii - Use this flag if your input is ASCII text; otherwise SPAM assumes it is in a binary file format str - Use this flag if your input is a list of strings representing customers, transactions, and items (see documentation for full file-format description) outfile - The file to place the output in. If -outFile and -stdout are not present, no output will be produced stdout - Use this flag if you want the output to go to stdout. Overrides any file specified via -outFile
There are three input data formats: 1. ASCII numbers (use -ascii) This data format is ASCII text with each line containing the customer ID, the transaction ID, and the item ID separated by spaces. The data must be sorted in ascending order first by cust ID, then by trans ID, then by item ID. Note that SPAM has the limitation that transactions can contain no more than 64 items.
2. ASCII strings (use -str) This data format is also ASCII text, but each customer, transaction, and item is an actual string instead of a number. Since the strings may have spaces, the customer, transaction, and item must be separated by newline characters instead of by spaces as for format #1. This option should not be used with extremely large files because the input and output is slow compared to #1.
3. Binary file (don't use -ascii or -str) This data format is present to support AssocGen-generated data files. See the perlscripts in the test directory for information on how to generate binary files.
New in release 1.3: -stdin and -stdout: Now you can have files come in through standard input and output go through standard output. Note that SPAM does not support having binary files come in through stdin.Program Testing
All datasets used by SPAM were generated by the IBM AssocGen synthetic data generator. Several sample datasets are included in the datasets directory, and the AssocGen executable can be used along with the perlscripts in the test directory to generate custom datasets with varying parameters. Please view the perlscripts for instructions on how to use them.
Generated on Thu Mar 11 12:01:51 2004 for SPAM by 1.3.4