Main Page   Modules   Namespace List   Class Hierarchy   Data Structures   File List   Data Fields   Globals  

MAFIA Code Documentation

MafiaLogo.jpg

Contact

Download

Linux Compilation

  1. ./configure
  2. make

Windows Compilation

  1. Use cygwin and follow the Linux instructions or
  2. Use a Windows Compiler or IDE (such as Visual Studio) to compile the source code.

Directory Structure

admin/ Contains config files for compiling. Should not be altered.
src/ Contains all of source code for MAFIA
src/Transaction.h
src/Transaction.cpp
Class for reading transactions from ASCII datasets
src/ItemsetOutput.h
src/ItemsetOutput.cpp
Class for writing itemsets to file output
src/BaseBitmap.h
src/BaseBitmap.cpp
Simple bitmap class for name bitmaps
src/Bitmap.h
src/Bitmap.cpp
Main bitmap class for transaction bitmaps
src/Mafia.cpp Main class file with most of the MAFIA code
src/Tables.h Stores precomputed lookup tables (not included in documentation due to very large tables)
src/TreeNode.h Class for representing nodes in the search tree
INSTALL Generic installation instructions for Linux/Unix
mafia.{kdevprj,kdevses} KDevelop project files for Linux
README Pointer this page

Program Usage

    Usage: mafia [-mfi/-fci/-fi] [min sup (percent)]
            [-ascii/-binary] [input filename]
            [output filename (optional)]
    Ex: mafia -mfi .5 -ascii connect4.ascii mfi.txt
    Ex: mafia -mfi .3 -binary chess.binary
    

File Input

Datasets can be in ASCII or binary format. For ASCII files, the file format must be:
    [item_id_1] [item_id_2] ... [item_id_n]
    

Items do not have to be sorted within each transaction. Items are separated by spaces and each transaction should end with a newline, e.g.

    1 4 2
    2 8 9 4
    2 5
    

For binary files, the file format must be:

    [custid][transid][number of items][itemid_1][itemid_2] ... [itemid_n]
    

The custid and transid numbers are ignored at this time. Since the file is in binary format, all numbers are read as integers.

Datasets

Download Datasets-ascii.tar.gz or Datasets-binary.tar.gz for the full set of datasets used for testing:

Program Output

The frequent itemsets outputted by the program are in ASCII form with the following format:
    [item_id_1] [item_id_2] ... [item_id_n] [(support)]
    
Ex:
    28 64 42 60 40 29 52 58 (966)
    46 64 42 3 25 9 5 48 66 56 34 62 7 36 60 40 29 52 58 (962)
    39 36 40 29 52 58 (960)
    


Generated on Thu Dec 4 15:22:06 2003 for MAFIA by doxygen1.2.18