Main Page | Modules | Namespace List | Data Structures | File List | Data Fields | Globals

File Input Functions

It contains all the functions necessary for reading in datasets. More...

Data Structures

struct  CountInfo
class  DatasetInfo
 A class for representing the info gathered from the dataset. More...


Functions

void PrintFileReadError (int errorType)
 It prints an error related to the data file input.

void IncArraySize (int *&array, int oldSize, int newSize)
 It resizes the array to have an increased size.

bool CollectBinaryInfo (char *filename, int &custCount, int &itemCount, int &transCount, int *&custTransCount, int *&itemCustCount, int *&cids, int *&tids, int *&iids, int *&transLens, int &overallCount, int &transLensLength)
 It collects information about a binary data file.

bool CollectASCIIInfo (char *filename, bool isStringFile, StringMap *&custStrMap, StringMap *&transStrMap, StringMap *&itemStrMap, int &custCount, int &itemCount, int &lineCount, int *&custTransCount, int *&itemCustCount, int *&cids, int *&tids, int *&iids, int &overallCount)
 It collects information about an ASCII data file.

bool ReadBinary (char *filename, int *cids, int *tids, int *iids, int numEntries, int *transLens, int transLensLength, int *custBitmapMap, int **custMap, int *itemMap, SeqBitmap **f1Buff)
 It reads stores the binary data file.

bool ReadASCII (char *filename, int *cids, int *tids, int *iids, int numEntries, int *custBitmapMap, int **custMap, int *itemMap, SeqBitmap **f1Buff)
 It reads stores the ASCII data file.

DatasetInfoReadDataset (bool isBinaryFile, bool isStringFile, char *filename, double minSupPercent, StringMap *&custStrMap, StringMap *&transStrMap, StringMap *&itemStrMap)
 It reads the input file and finds the frequent-1 itemsets.


Variables

const int NUM_BITMAP = 5
 number of different bitmaps in SeqBitmap

const int BITMAP_LENGTH [5]
 the size of each customer data in each bitmap

const int NUM_BITMAPS_USED = 4
 tempAndBitmap, specialBitmap, returnBitmap, SBitmap

const int MAX_STRING_SIZE = 256
 Maximum length of strings used to represent customers, transactions, and items when -str is used.


Detailed Description

It contains all the functions necessary for reading in datasets.


Function Documentation

bool CollectASCIIInfo char *  filename,
bool  isStringFile,
StringMap *&  custStrMap,
StringMap *&  transStrMap,
StringMap *&  itemStrMap,
int &  custCount,
int &  itemCount,
int &  lineCount,
int *&  custTransCount,
int *&  itemCustCount,
int *&  cids,
int *&  tids,
int *&  iids,
int &  overallCount
 

It collects information about an ASCII data file.

It finds the number of customers, the number of items, the number of line in the file. It also finds the number of transactions each customer has and the number of customers having a particular item in their transactions.

Note:

  • Memory should not be allocated for custTransCount and itemCustCount before calling this function.
  • Memory have to be deallocated for custTransCount and itemCustCount by the caller.
  • File format: [custID] [transID] [itemID] [custID] [transID] [itemID] ...
  • Assume that transactions of a customer appears together in the file. It assures that the following case will not happen: [custID-1]... [custID-1]... [custID-2]... [custID-1]...
  • Assume that items of a transactions appears together in the file. It assures that the following case will not happen: ... [transID-1] ... ... [transID-2] ... ... [transID-1] ...

Parameters:
filename the filename of the data file
isStringFile whether the input file contains integers or the string names
custStrMap [output] Maps cust IDs to strings (only used when isStringFile == true)
transStrMap [output] Maps trans IDs to strings (only used when isStringFile == true)
itemStrMap [output] Maps item IDs to strings (only used when isStringFile == true)
custCount [output] the number of customers
itemCount [output] the number of items
lineCount [output] the number of line in the file
custTransCount [output] number of transactions each customer has
itemCustCount [output] number of customers having each item in their transactions
cids [output] customer ids exactly as they appear in the file
tids [output] transaction ids exactly as they appear in the file
iids [output] item ids exactly as they appear in the file
overallCount [output] length of the cids, tids, and iids arrays
Returns:
true - if the reading is successful. false - if there is an error in the reading process

Definition at line 398 of file FileInput.cpp.

Referenced by ReadDataset().

bool CollectBinaryInfo char *  filename,
int &  custCount,
int &  itemCount,
int &  transCount,
int *&  custTransCount,
int *&  itemCustCount,
int *&  cids,
int *&  tids,
int *&  iids,
int *&  transLens,
int &  overallCount,
int &  transLensLength
 

It collects information about a binary data file.

It finds the number of customers, the number of items, the number of transactions in the dataset. It also finds the number of transactions each customer has and the number of customers having a particular item in their transactions.

Note:

  • Memory should not be allocated for custTransCount and itemCustCount before calling this function.
  • Memory have to be deallocated for custTransCount and itemCustCount by the caller.
  • File format: [custID] [transID] [number of item] [itemID1, ...] [custID] ...
  • Assume that transactions of a customer appears together in the file. It assures that the following case will not happen: [custID-1]... [custID-1]... [custID-2]... [custID-1]...

Parameters:
filename the filename of the data file
custCount [output] the number of customers
itemCount [output] the number of items
transCount [output] the number of transactions
custTransCount [output] number of transactions each customer has
itemCustCount [output] number of customers having each item in their transactions
cids [output] customer ids exactly as they appear in the file
tids [output] transaction ids exactly as they appear in the file
iids [output] item ids exactly as they appear in the file
transLens [output] transaction lengths exactly as they appear in the file
overallCount [output] length of the cids, tids, and iids arrays
transLensLength [output] length of transLens array
Returns:
true - if the reading is successful. false - if there is an error in the reading process.

Definition at line 178 of file FileInput.cpp.

Referenced by ReadDataset().

void IncArraySize int *&  array,
int  oldSize,
int  newSize
 

It resizes the array to have an increased size.

Parameters:
array pointer to the array
oldSize the size of the array
newSize the desired size of the array

Definition at line 129 of file FileInput.cpp.

Referenced by CollectASCIIInfo(), and CollectBinaryInfo().

void PrintFileReadError int  errorType  ) 
 

It prints an error related to the data file input.

Parameters:
errorType the type of file read error encountered

Definition at line 83 of file FileInput.cpp.

Referenced by CollectASCIIInfo(), CollectBinaryInfo(), and ReadDataset().

bool ReadASCII char *  filename,
int *  cids,
int *  tids,
int *  iids,
int  numEntries,
int *  custBitmapMap,
int **  custMap,
int *  itemMap,
SeqBitmap **  f1Buff
 

It reads stores the ASCII data file.

Note:

  • File format: [custID] [transID] [itemID] [custID] [transID] [itemID] ...
  • Assume that transactions of a customer appears together in the file. It assures that the following case will not happen: [custID-1]... [custID-1]... [custID-2]... [custID-1]...
  • Assume that items of a transactions appears together in the file. It assures that the following case will not happen: ... [transID-1] ... ... [transID-2] ... ... [transID-1] ...

Parameters:
filename the filename of the data file
cids the list of customer IDs as it appears in the file
tids the list of transaction IDs
iids the list of item IDs
numEntries the length of the arrays cids, tids, and iids
custBitmapMap map from custID to bitmapID
custMap the mapping of custID from external naming to internal naming
itemMap the mapping of itemID from external naming to internal naming
f1Buff [output] buffer for frequent-one item sets
Returns:
true - if the reading is successful. false - if there is an error in the reading process

Definition at line 774 of file FileInput.cpp.

bool ReadBinary char *  filename,
int *  cids,
int *  tids,
int *  iids,
int  numEntries,
int *  transLens,
int  transLensLength,
int *  custBitmapMap,
int **  custMap,
int *  itemMap,
SeqBitmap **  f1Buff
 

It reads stores the binary data file.

Note:

  • File format: [custID] [transID] [number of item] [itemID1, ...] [custID] ...
  • Assume that transactions of a customer appears together in the file. It assures that the following case will not happen: [custID-1]... [custID-1]... [custID-2]... [custID-1]...

Parameters:
filename the filename of the data file
cids the list of customer IDs as it appears in the file
tids the list of transaction IDs
iids the list of item IDs
numEntries the length of the arrays cids, tids, and iids
transLens the list of transaction lengths
transLensLength the length of the list transLens
custBitmapMap map from custID to bitmapID
custMap the mapping of custID from external naming to internal naming
itemMap the mapping of itemID from external naming to internal naming
f1Buff [output] buffer for frequent-one item sets
Returns:
true - if the reading is successful. false - if there is an error in the reading process

Definition at line 642 of file FileInput.cpp.

DatasetInfo* ReadDataset bool  isBinaryFile,
bool  isStringFile,
char *  filename,
double  minSupPercent,
StringMap *&  custStrMap,
StringMap *&  transStrMap,
StringMap *&  itemStrMap
 

It reads the input file and finds the frequent-1 itemsets.

Parameters:
isBinaryFile whether the input file is a binary data file
isStringFile whether the input file contains integers or the string names
filename the filename of the data file
minSupPercent the minimum support percentage
custStrMap [output] Maps cust IDs to strings (only used when isStringFile == true)
transStrMap [output] Maps trans IDs to strings (only used when isStringFile == true)
itemStrMap [output] Maps item IDs to strings (only used when isStringFile == true)
Returns:
DatasetInfo - the information gathered from the dataset

Definition at line 877 of file FileInput.cpp.

Referenced by main().


Variable Documentation

const int BITMAP_LENGTH[5]
 

Initial value:

    {
        4, 8, 16, 32, 64
    }
the size of each customer data in each bitmap

Definition at line 65 of file FileInput.cpp.

Referenced by ReadASCII(), ReadBinary(), and ReadDataset().

const int MAX_STRING_SIZE = 256
 

Maximum length of strings used to represent customers, transactions, and items when -str is used.

Definition at line 75 of file FileInput.cpp.

Referenced by CollectASCIIInfo().

const int NUM_BITMAP = 5
 

number of different bitmaps in SeqBitmap

Definition at line 62 of file FileInput.cpp.

Referenced by ReadDataset().

const int NUM_BITMAPS_USED = 4
 

tempAndBitmap, specialBitmap, returnBitmap, SBitmap

Definition at line 71 of file FileInput.cpp.


Generated on Thu Mar 11 12:01:54 2004 for SPAM by doxygen 1.3.4