gr.demokritos.iit.jinsect.indexing
Class SimilarityBasedIndex

java.lang.Object
  extended by gr.demokritos.iit.jinsect.indexing.SimilarityBasedIndex
All Implemented Interfaces:
IIndex<DocumentNGramGraph>, java.io.Serializable

public class SimilarityBasedIndex
extends java.lang.Object
implements java.io.Serializable, IIndex<DocumentNGramGraph>

A class that describes a hierarchical index, based on similarity. The index contains a representation for each cluster of documents, being able to also identify the best cluster for a given new graph.

See Also:
Serialized Form

Field Summary
protected  java.lang.String CLUSTER_OBJECT_CATEGORY
           
protected  IClusterer Clusterer
           
protected  SimilarityComparatorListener Comparator
           
protected  UniqueVertexGraph Hierarchy
           
 IFileLoader<java.lang.String> Loader
          A IFileLoader variable that can load documents given an identifier.
protected  java.util.Set<DocumentNGramGraph> NamedObjects
           
 NotificationListener Notifier
          A notifier for the progress of various tasks.
protected  INSECTDB<DocumentNGramGraph> Storage
          An INSECTDB type storage, to hold the representations of the documents.
 
Constructor Summary
SimilarityBasedIndex(java.util.Set<NamedDocumentNGramGraph> sNamedObjects, SimilarityComparatorListener sclComparator, INSECTDB<DocumentNGramGraph> dbStorage)
          Creates a new instance of SimilarityBasedIndex, given a set of Graphs, and a similarity calculator.
 
Method Summary
 void createIndex()
          Creates the index, by creating the clusters, and the corresponding representing graphs for each cluster.
protected  java.util.Set<java.lang.String> getDocumentIDsFromCluster(java.lang.String sClusterLabel)
          Splits a cluster name in the corresponding graph names, by a simple split using a comma as the delimiter.
protected  DocumentNGramGraph getRepresentationFromCluster(java.lang.String sClusterLabel)
          Calculates the representing object of a cluster.
protected  void initComparator()
          Initializes the comparator object to a default comparator, if null.
 java.util.Set<java.lang.String> locateSimilarDocuments(DocumentNGramGraph dngCur)
          Returns the set of documents of the cluster that is most appropriate, given a document graph.
static void main(java.lang.String[] args)
          Function testing the functionality of the class.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

Comparator

protected SimilarityComparatorListener Comparator

NamedObjects

protected java.util.Set<DocumentNGramGraph> NamedObjects

Clusterer

protected IClusterer Clusterer

Hierarchy

protected UniqueVertexGraph Hierarchy

Storage

protected INSECTDB<DocumentNGramGraph> Storage
An INSECTDB type storage, to hold the representations of the documents.


CLUSTER_OBJECT_CATEGORY

protected final java.lang.String CLUSTER_OBJECT_CATEGORY
See Also:
Constant Field Values

Notifier

public NotificationListener Notifier
A notifier for the progress of various tasks.


Loader

public IFileLoader<java.lang.String> Loader
A IFileLoader variable that can load documents given an identifier. If null, then the document id is used as a file name and the corresponding file is attempted to be loaded.

Constructor Detail

SimilarityBasedIndex

public SimilarityBasedIndex(java.util.Set<NamedDocumentNGramGraph> sNamedObjects,
                            SimilarityComparatorListener sclComparator,
                            INSECTDB<DocumentNGramGraph> dbStorage)
Creates a new instance of SimilarityBasedIndex, given a set of Graphs, and a similarity calculator.

Parameters:
sGraphs - The set of NamedDocumentNGramGraphs to use as training for the index. Each pair is expected to contain the name of the graph and the graph itself.
sclComparator - If null, a default similarity comparator for graphs is used. Otherwise, the given {@link SimilarityComparatorListener) is used to compare graphs.
dbStorage - If null, then representations are stored in memory. Otherwise the INSECTDB storage provided is used to store document representations.
Method Detail

initComparator

protected void initComparator()
Initializes the comparator object to a default comparator, if null.


createIndex

public void createIndex()
Creates the index, by creating the clusters, and the corresponding representing graphs for each cluster.

Specified by:
createIndex in interface IIndex<DocumentNGramGraph>

getDocumentIDsFromCluster

protected java.util.Set<java.lang.String> getDocumentIDsFromCluster(java.lang.String sClusterLabel)
Splits a cluster name in the corresponding graph names, by a simple split using a comma as the delimiter.


getRepresentationFromCluster

protected DocumentNGramGraph getRepresentationFromCluster(java.lang.String sClusterLabel)
Calculates the representing object of a cluster.

Parameters:
sClusterLabel - The label of the cluster to look up.
Returns:
An object representing the cluster.

locateSimilarDocuments

public java.util.Set<java.lang.String> locateSimilarDocuments(DocumentNGramGraph dngCur)
Returns the set of documents of the cluster that is most appropriate, given a document graph.

Specified by:
locateSimilarDocuments in interface IIndex<DocumentNGramGraph>
Parameters:
dngCur - The graph of the document used.
Returns:
A Set of strings, corresponding to the document IDs in the cluster that has the most similar content to the given document.

main

public static void main(java.lang.String[] args)
Function testing the functionality of the class.