Main Page | Modules | Namespace List | Data Structures | File List | Data Fields | Globals

SPAM Code Documentation




Linux Compilation

  1. automake
  2. ./configure
  3. make
  4. make install

Windows Compilation

  1. Open MAFIA.dsw in Visual Studio
  2. Build the program

Directory Structure

admin/ Contains config files for compiling. Should not be altered.
datasets/ Contains several datasets for testing
src/ Contains all of source code for SPAM
Vertical representations of the data. They handle both uncompressed and compressed data.

Each bitmap can represent customers with up to x transactions since it allocates exactly x bits per customer

src/DatasetInfo.h A class for representing the info gathered from the dataset
src/FileInput.cpp The functions necessary for reading in datasets.
src/ResizableArray.h A class containing a resizable array data structure
A representation of a sequence (or an item)
src/Spam.cpp Main file with most of the algorithmic code
Collects statistics about SPAM as it is run
src/StringMap.h A class containing a data structure that allows for two way mapping between ints and strings
src/Tables.h For gathering the lookup tables in one place
src/TreeNode.h Class for representing nodes in the search tree
INSTALL Generic installation instructions
spam.{kdevprj,kdevses} KDevelop project files for Linux
Spam.{sln,vcproj} Visual Studio .NET project files for Windows
README Pointer to this page
test Contains perlscripts and executables to test SPAM

Program Usage

Usage: spam -sup <minSup> [-fn <infile>] [-stdin] [-ascii] [-str] [-outFile <outfile>] [-stdout]

minSup - The minimum support (between 0.0 and 1.0) infile - The data file to read in (see below for specifications) stdin - Use this flag if the data should be read in from stdin. Must use if -fn is not specified. Overrides any file specified via -fn ascii - Use this flag if your input is ASCII text; otherwise SPAM assumes it is in a binary file format str - Use this flag if your input is a list of strings representing customers, transactions, and items (see documentation for full file-format description) outfile - The file to place the output in. If -outFile and -stdout are not present, no output will be produced stdout - Use this flag if you want the output to go to stdout. Overrides any file specified via -outFile

There are three input data formats: 1. ASCII numbers (use -ascii) This data format is ASCII text with each line containing the customer ID, the transaction ID, and the item ID separated by spaces. The data must be sorted in ascending order first by cust ID, then by trans ID, then by item ID. Note that SPAM has the limitation that transactions can contain no more than 64 items.

2. ASCII strings (use -str) This data format is also ASCII text, but each customer, transaction, and item is an actual string instead of a number. Since the strings may have spaces, the customer, transaction, and item must be separated by newline characters instead of by spaces as for format #1. This option should not be used with extremely large files because the input and output is slow compared to #1.

3. Binary file (don't use -ascii or -str) This data format is present to support AssocGen-generated data files. See the perlscripts in the test directory for information on how to generate binary files.

New in release 1.3: -stdin and -stdout: Now you can have files come in through standard input and output go through standard output. Note that SPAM does not support having binary files come in through stdin.

Program Testing

All datasets used by SPAM were generated by the IBM AssocGen synthetic data generator. Several sample datasets are included in the datasets directory, and the AssocGen executable can be used along with the perlscripts in the test directory to generate custom datasets with varying parameters. Please view the perlscripts for instructions on how to use them.
Generated on Thu Mar 11 12:01:51 2004 for SPAM by doxygen 1.3.4