gr.demokritos.iit.jinsect.documentModel.representations
Class DocumentNGramGraph

java.lang.Object
  extended by gr.demokritos.iit.jinsect.documentModel.representations.DocumentNGramGraph
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable
Direct Known Subclasses:
DocumentNGramDistroGraph, DocumentNGramGaussNormGraph, DocumentNGramSymWinGraph, DocumentWordGraph

public class DocumentNGramGraph
extends java.lang.Object
implements java.io.Serializable, java.lang.Cloneable

Represents the graph of a document, with vertices n-grams of the document and edges the number of the n-grams' co-occurences within a given window.

See Also:
Serialized Form

Field Summary
protected  int CorrelationWindow
           
protected  java.lang.String DataString
           
protected  java.util.HashMap DegradedEdges
           
protected  int MaxSize
           
protected  int MinSize
           
protected  Graph[] NGramGraphArray
           
 NormalizerListener Normalizer
           
 TextPreprocessorListener TextPreprocessor
           
 WordEvaluatorListener WordEvaluator
           
 
Constructor Summary
DocumentNGramGraph()
          Creates a new instance of INSECTDocumentGraph
DocumentNGramGraph(int iMinSize, int iMaxSize, int iCorrelationWindow)
          Creates a new instance of INSECTDocumentGraph
 
Method Summary
 double calcCoexistenceImportance(java.lang.String sNode)
          Returns a functions of [element graph edges max],[number of neighbours], where [element graph edges max] refers to the maximum weight of the edges including [sNode], and [number of neightbours] is its number of neighbours in the graph.
 double calcCoexistenceImportance(salvo.jesus.graph.Vertex vNode)
           
 java.lang.Object clone()
           
 void createEdgesConnecting(Graph gGraph, java.lang.String sStartNode, java.util.List lOtherNodes, java.util.HashMap hAppearenceHistogram)
          Creates an edge in [gGraph] connecting [sBaseNode] to each node in the [lOtherNodes] list of nodes.
 void createGraphs()
          Creates the graph of n-grams, for all the levels specified in the MinSize, MaxSize range.
 void createWeightedEdgesConnecting(Graph gGraph, java.lang.String sStartNode, java.util.List lOtherNodes, double dStartWeight, double dNewWeight, double dDataImportance)
          Creates an edge in [gGraph] connecting [sBaseNode] to each node in the [lOtherNodes] list of nodes.
 void degrade(DocumentNGramGraph dgOtherGraph)
           
 double degredationDegree(salvo.jesus.graph.Edge e)
           
 void deleteItem(java.lang.String sItem)
          Removes an item (node) from all graphs.
 java.util.HashSet getAllNodes()
           
 java.lang.String getDataString()
           
 Graph getGraphLevel(int iIndex)
          Returns graph with M-based index
 Graph getGraphLevelByNGramSize(int iNGramSize)
          Returns graph with n-gram-size-based index
 int getMaxSize()
           
 int getMinSize()
           
 int getWindowSize()
           
protected  void InitGraphs()
           
 DocumentNGramGraph intersectGraph(DocumentNGramGraph dgOtherGraph)
           
 DocumentNGramGraph inverseIntersectGraph(DocumentNGramGraph dgOtherGraph)
          Returns the difference (inverse of the intersection) graph between the current graph and a given graph.
 boolean isEmpty()
           
 int length()
           
 void loadDataStringFromFile(java.lang.String sFilename)
          Creates the graph based on a data string loaded from a given file.
static void main(java.lang.String[] args)
           
 void mergeGraph(DocumentNGramGraph dgOtherGraph, double fWeightPercent)
          Merges the data of [dgOtherGraph] document graph to the data of this graph, by adding all existing edges and moving the values of those existing in both graphs towards the new graph values based on a tendency modifier.
 void nullify()
          Sets all weights in all graphs to zero
 void prune(double dMinCoexistenceImportance)
           
 void setDataString(java.lang.String sDataString)
           
 java.lang.String toCooccurenceText(java.util.Map mCooccurenceMap)
           
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

MinSize

protected int MinSize

MaxSize

protected int MaxSize

CorrelationWindow

protected int CorrelationWindow

DataString

protected java.lang.String DataString

DegradedEdges

protected java.util.HashMap DegradedEdges

Normalizer

public NormalizerListener Normalizer

WordEvaluator

public WordEvaluatorListener WordEvaluator

TextPreprocessor

public TextPreprocessorListener TextPreprocessor

NGramGraphArray

protected Graph[] NGramGraphArray
Constructor Detail

DocumentNGramGraph

public DocumentNGramGraph()
Creates a new instance of INSECTDocumentGraph


DocumentNGramGraph

public DocumentNGramGraph(int iMinSize,
                          int iMaxSize,
                          int iCorrelationWindow)
Creates a new instance of INSECTDocumentGraph

Parameters:
iMinSize - The minimum n-gram size
iMaxSize - The maximum n-gram size
iCorrelationWindow - The maximum distance of terms to be considered as correlated.
Method Detail

InitGraphs

protected void InitGraphs()

length

public int length()

isEmpty

public boolean isEmpty()

loadDataStringFromFile

public void loadDataStringFromFile(java.lang.String sFilename)
                            throws java.io.IOException,
                                   java.io.FileNotFoundException
Creates the graph based on a data string loaded from a given file.

Parameters:
sFilename - The filename of the file containing the data string.
Throws:
java.io.IOException
java.io.FileNotFoundException

getGraphLevel

public Graph getGraphLevel(int iIndex)
Returns graph with M-based index

Parameters:
iIndex - The index of the graph. Zero (0) equals to the graph for level MinSize n-grams.
Returns:
The Graph of the corresponding level.

getGraphLevelByNGramSize

public Graph getGraphLevelByNGramSize(int iNGramSize)
Returns graph with n-gram-size-based index

Parameters:
iNGramSize - The n-gram size of the graph.
Returns:
The Graph of the corresponding level.

getAllNodes

public java.util.HashSet getAllNodes()

createEdgesConnecting

public void createEdgesConnecting(Graph gGraph,
                                  java.lang.String sStartNode,
                                  java.util.List lOtherNodes,
                                  java.util.HashMap hAppearenceHistogram)
Creates an edge in [gGraph] connecting [sBaseNode] to each node in the [lOtherNodes] list of nodes. If an edge exists, its weight is increased by [iIncreaseWeight], else its weight is set to [iStartWeight]

Parameters:
gGraph - The graph to use
sStartNode - The node from which all edges begin
lOtherNodes - The list of nodes to which sBaseNode is connected
hAppearenceHistogram - The histogram of appearences of the terms

createWeightedEdgesConnecting

public void createWeightedEdgesConnecting(Graph gGraph,
                                          java.lang.String sStartNode,
                                          java.util.List lOtherNodes,
                                          double dStartWeight,
                                          double dNewWeight,
                                          double dDataImportance)
Creates an edge in [gGraph] connecting [sBaseNode] to each node in the [lOtherNodes] list of nodes. If an edge exists, its weight is increased by [iIncreaseWeight], else its weight is set to [iStartWeight]

Parameters:
gGraph - The graph to use
sStartNode - The node from which all edges begin
lOtherNodes - The list of nodes to which sBaseNode is connected
dStartWeight - The initial weight for first-occuring nodes
dNewWeight - The new weight
dDataImportance - The tendency towards the new value. 0.0 means no change to the current value. 1.0 means the old value is completely replaced by the new. 0.5 means the final value is the average of the old and the new.

createGraphs

public void createGraphs()
Creates the graph of n-grams, for all the levels specified in the MinSize, MaxSize range.


mergeGraph

public void mergeGraph(DocumentNGramGraph dgOtherGraph,
                       double fWeightPercent)
Merges the data of [dgOtherGraph] document graph to the data of this graph, by adding all existing edges and moving the values of those existing in both graphs towards the new graph values based on a tendency modifier. The convergence tendency towards the starting value or the new value is determined by [fWeightPercent].

Parameters:
dgOtherGraph - The second graph used for the merging
fWeightPercent - The convergence tendency parameter. A value of 0.0 means no change to existing value, 1.0 means new value is the same as that of the new graph. A value of 0.5 means new value is exactly between the old and new value (average).

intersectGraph

public DocumentNGramGraph intersectGraph(DocumentNGramGraph dgOtherGraph)

inverseIntersectGraph

public DocumentNGramGraph inverseIntersectGraph(DocumentNGramGraph dgOtherGraph)
Returns the difference (inverse of the intersection) graph between the current graph and a given graph.

Parameters:
dgOtherGraph - The graph to compare to.
Returns:
A DocumentNGramGraph that is the difference between the current graph and the given graph.

getMinSize

public int getMinSize()

getMaxSize

public int getMaxSize()

getWindowSize

public int getWindowSize()

calcCoexistenceImportance

public double calcCoexistenceImportance(java.lang.String sNode)
Returns a functions of [element graph edges max],[number of neighbours], where [element graph edges max] refers to the maximum weight of the edges including [sNode], and [number of neightbours] is its number of neighbours in the graph.

Parameters:
sNode - The node object the Coexistence Importance of which we calculate

calcCoexistenceImportance

public double calcCoexistenceImportance(salvo.jesus.graph.Vertex vNode)

prune

public void prune(double dMinCoexistenceImportance)

deleteItem

public void deleteItem(java.lang.String sItem)
Removes an item (node) from all graphs.

Parameters:
sItem - The item to remove.

nullify

public void nullify()
Sets all weights in all graphs to zero


setDataString

public void setDataString(java.lang.String sDataString)

getDataString

public java.lang.String getDataString()

degrade

public void degrade(DocumentNGramGraph dgOtherGraph)

degredationDegree

public double degredationDegree(salvo.jesus.graph.Edge e)

toCooccurenceText

public java.lang.String toCooccurenceText(java.util.Map mCooccurenceMap)

main

public static void main(java.lang.String[] args)

clone

public java.lang.Object clone()
Overrides:
clone in class java.lang.Object