gr.demokritos.iit.jinsect.documentModel.representations
Class DocumentNGramHistogram

java.lang.Object
  extended by gr.demokritos.iit.jinsect.documentModel.representations.DocumentNGramHistogram
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
DocumentWordHistogram

public class DocumentNGramHistogram
extends java.lang.Object
implements java.io.Serializable

See Also:
Serialized Form

Field Summary
protected  java.lang.String DataString
           
protected  int MaxSize
           
protected  int MinSize
           
 java.util.HashMap NGramHistogram
           
 TextPreprocessorListener TextPreprocessor
           
 WordEvaluatorListener WordEvaluator
           
 
Constructor Summary
DocumentNGramHistogram()
          Creates a new instance of INSECTDocumentNGrams with n-gram sizes from 3 to 5.
DocumentNGramHistogram(int iMinSize, int iMaxSize)
          Creates a new instance of INSECTDocumentNGrams
 
Method Summary
 void createHistogram()
          Creates the histogram of n-grams in the data string.
 void deleteItem(java.lang.String sItem)
           
 java.lang.String getDataString()
           
 DocumentNGramHistogram intersectHistogram(DocumentNGramHistogram dgOtherHistogram)
           
 void inverseIntersectHistogram(DocumentNGramHistogram dgOtherHistogram, boolean bAffectOtherHistogram)
           
 int length()
          Returns the size of the histogram (unique element count).
 void loadDataStringFromFile(java.lang.String sFilename)
          Opens a text file and sets its contents as data string
 void mergeHistogram(DocumentNGramHistogram dnOtherDocumentNGram, double fNewDataImportance)
          Merges the data of another histogram [dnOtherDocumentNGram] with this histogram data.
 void nullify()
           
 void nullifyItem(java.lang.String sItem)
          Sets item value to zero, without removing it.
 int numberOfTotalNGrams()
          Returns the number of total n-grams in the analyzed data string.
 void setDataString(java.lang.String sDataString)
           
 java.lang.String toString()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

MinSize

protected int MinSize

MaxSize

protected int MaxSize

DataString

protected java.lang.String DataString

NGramHistogram

public java.util.HashMap NGramHistogram

WordEvaluator

public WordEvaluatorListener WordEvaluator

TextPreprocessor

public TextPreprocessorListener TextPreprocessor
Constructor Detail

DocumentNGramHistogram

public DocumentNGramHistogram(int iMinSize,
                              int iMaxSize)
Creates a new instance of INSECTDocumentNGrams

Parameters:
iMinSize - The minimum n-gram size
iMaxSize - The maximum n-gram size

DocumentNGramHistogram

public DocumentNGramHistogram()
Creates a new instance of INSECTDocumentNGrams with n-gram sizes from 3 to 5.

Method Detail

length

public int length()
Returns the size of the histogram (unique element count).


numberOfTotalNGrams

public int numberOfTotalNGrams()
Returns the number of total n-grams in the analyzed data string.

Returns:
The number of total n-grams.

loadDataStringFromFile

public void loadDataStringFromFile(java.lang.String sFilename)
                            throws java.io.IOException,
                                   java.io.FileNotFoundException
Opens a text file and sets its contents as data string

Parameters:
sFilename - The filename of the file to open.
Throws:
java.io.IOException
java.io.FileNotFoundException

createHistogram

public void createHistogram()
Creates the histogram of n-grams in the data string. The WordEvaluatorListener (if present) is called for every n-gram before the latter is added.


mergeHistogram

public void mergeHistogram(DocumentNGramHistogram dnOtherDocumentNGram,
                           double fNewDataImportance)
Merges the data of another histogram [dnOtherDocumentNGram] with this histogram data. If an n-gram exists its weight is increased (modified) by [fNewDataImportance] * ([iNewWeight] - ExistingWeight) else it is set to the new data value.

Parameters:
fNewDataImportance - Value of 1.0 means immediate change to new data value. 0.0 means no change. 0.5 means normal change towards new data.

intersectHistogram

public DocumentNGramHistogram intersectHistogram(DocumentNGramHistogram dgOtherHistogram)

inverseIntersectHistogram

public void inverseIntersectHistogram(DocumentNGramHistogram dgOtherHistogram,
                                      boolean bAffectOtherHistogram)

deleteItem

public void deleteItem(java.lang.String sItem)

nullifyItem

public void nullifyItem(java.lang.String sItem)
Sets item value to zero, without removing it.

Parameters:
sItem - The item to nullify

nullify

public void nullify()

toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

getDataString

public java.lang.String getDataString()

setDataString

public void setDataString(java.lang.String sDataString)