gr.demokritos.iit.conceptualIndex.documentModel
Class DistributionDocument

java.lang.Object
  extended by gr.demokritos.iit.conceptualIndex.documentModel.DistributionDocument
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
DistributionWordDocument

public class DistributionDocument
extends java.lang.Object
implements java.io.Serializable

Represents a document, described as a graph of distributions. Each distribution indicates the probability of a token (character) to appear after a given n-gram, indicated as source. Allows input and output operations and can function as grammar indicator, to determine normality of other texts.

See Also:
Serialized Form

Field Summary
protected  java.lang.String DataString
          The string corresponding to input texts directly.
protected  DistributionGraph Graph
          The Graph representing the document
 IDistributionComparisonListener OnCompare
          An event, used to attach a comparator of distributions to this class.
 
Constructor Summary
DistributionDocument(int iNeighbourhoodWindow)
          Creates a new instance of DistributionDocument.
DistributionDocument(int iNeighbourhoodWindow, int iSourceNGramSize)
          Creates a new instance of DistributionDocument.
 
Method Summary
 void clearDocumentGraph()
          Clears the document graph, resetting the representation.
 java.lang.String getDataString()
          Returns the current data string (i.e.
 int length()
          Calculates the size of the full document object, by getting the edge count of the corresponding graph and not the datastring (i.e.
 void loadDataStringFromFile(java.lang.String sFilename, boolean clearCurrentData)
          Loads the contents of a file as the datastring.
 void loadDataStringFromFile(java.lang.String sFilename, boolean clearCurrentData, java.lang.String sEncoding)
          Loads the contents of a file as the datastring.
static void main(java.lang.String[] sArgs)
           
 void mergeWith(DistributionDocument tpData, double fLearningRate)
          TODO: Document
 double normality(java.lang.String s)
          Calculates a degree of normality, indicating whether a given string appears in a form similar to text in the document.
 void prune(double dMinCoexistenceImportance)
          TODO: Document
 void setDataString(java.lang.String sDataString, int iNGramSize, boolean clearCurrentData)
          Creates and saves the graph representation of a string, using substrings of selected size as source nodes and substrings of size 1 (letters) as destination nodes.
 void setDocumentGraph(DistributionGraph dgNew)
          Sets the document graph to a selected existing graph.
 java.lang.String toString()
          Returns a string representation of the document graph.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

Graph

protected DistributionGraph Graph
The Graph representing the document

See Also:
DistributionGraph

DataString

protected java.lang.String DataString
The string corresponding to input texts directly.


OnCompare

public IDistributionComparisonListener OnCompare
An event, used to attach a comparator of distributions to this class. The comparator is used in the normality function.

Constructor Detail

DistributionDocument

public DistributionDocument(int iNeighbourhoodWindow)
Creates a new instance of DistributionDocument. The source n-gram size is set to the default value of 1.

Parameters:
iNeighbourhoodWindow - The size of the window indicative of neighbourhood between a source n-gram and a given token.

DistributionDocument

public DistributionDocument(int iNeighbourhoodWindow,
                            int iSourceNGramSize)
Creates a new instance of DistributionDocument.

Parameters:
iNeighbourhoodWindow - The size of the window indicative of neighbourhood between a source n-gram and a given token.
iSourceNGramSize - The size of the source n-grams in character length.
Method Detail

clearDocumentGraph

public void clearDocumentGraph()
Clears the document graph, resetting the representation.


setDocumentGraph

public void setDocumentGraph(DistributionGraph dgNew)
Sets the document graph to a selected existing graph.

Parameters:
dgNew - The distribution graph to replace the existing one.
See Also:
DistributionGraph

length

public int length()
Calculates the size of the full document object, by getting the edge count of the corresponding graph and not the datastring (i.e. text) size.

Returns:
The size of the document object, based on edge count.

loadDataStringFromFile

public void loadDataStringFromFile(java.lang.String sFilename,
                                   boolean clearCurrentData)
Loads the contents of a file as the datastring.

Parameters:
sFilename - The filename of the input file.
clearCurrentData - Indicates whether the new file replaces existing text. If this parameter is set to false, then the new file is appended to existing text.

loadDataStringFromFile

public void loadDataStringFromFile(java.lang.String sFilename,
                                   boolean clearCurrentData,
                                   java.lang.String sEncoding)
Loads the contents of a file as the datastring.

Parameters:
sFilename - The filename of the input file.
clearCurrentData - Indicates whether the new file replaces existing text. If this parameter is set to false, then the new file is appended to existing text.
sEncoding - The encoding of the input file.

setDataString

public void setDataString(java.lang.String sDataString,
                          int iNGramSize,
                          boolean clearCurrentData)
Creates and saves the graph representation of a string, using substrings of selected size as source nodes and substrings of size 1 (letters) as destination nodes.

Parameters:
sDataString - The data string to analyse and represent as a distribution graph.
iNGramSize - The size of the n-grams used as source nodes.
clearCurrentData - Indicates whether the new data replace existing data. If this parameter is set to false, then the new data is appended to existing data.

getDataString

public java.lang.String getDataString()
Returns the current data string (i.e. text representation) of the document.

Returns:
The data string.

mergeWith

public void mergeWith(DistributionDocument tpData,
                      double fLearningRate)
TODO: Document


prune

public void prune(double dMinCoexistenceImportance)
TODO: Document


toString

public java.lang.String toString()
Returns a string representation of the document graph.

Overrides:
toString in class java.lang.Object

normality

public double normality(java.lang.String s)
Calculates a degree of normality, indicating whether a given string appears in a form similar to text in the document. The process actually compares distributions. These distributions appear in same edges of the graph representations of the DistributionDocument object, and another DistributionDocument, created by use of the given string. If the public variable OnCompare has been set it is used to compare the distributions.

See Also:
Distribution

main

public static void main(java.lang.String[] sArgs)