|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectgr.demokritos.iit.jinsect.indexing.SimilarityBasedIndex
public class SimilarityBasedIndex
A class that describes a hierarchical index, based on similarity. The index contains a representation for each cluster of documents, being able to also identify the best cluster for a given new graph.
Field Summary | |
---|---|
protected java.lang.String |
CLUSTER_OBJECT_CATEGORY
|
protected IClusterer |
Clusterer
|
protected SimilarityComparatorListener |
Comparator
|
protected UniqueVertexGraph |
Hierarchy
|
IFileLoader<java.lang.String> |
Loader
A IFileLoader variable that can load documents given an
identifier. |
protected java.util.Set<DocumentNGramGraph> |
NamedObjects
|
NotificationListener |
Notifier
A notifier for the progress of various tasks. |
protected INSECTDB<DocumentNGramGraph> |
Storage
An INSECTDB type storage, to hold the representations of the
documents. |
Constructor Summary | |
---|---|
SimilarityBasedIndex(java.util.Set<NamedDocumentNGramGraph> sNamedObjects,
SimilarityComparatorListener sclComparator,
INSECTDB<DocumentNGramGraph> dbStorage)
Creates a new instance of SimilarityBasedIndex, given a set of Graphs, and a similarity calculator. |
Method Summary | |
---|---|
void |
createIndex()
Creates the index, by creating the clusters, and the corresponding representing graphs for each cluster. |
protected java.util.Set<java.lang.String> |
getDocumentIDsFromCluster(java.lang.String sClusterLabel)
Splits a cluster name in the corresponding graph names, by a simple split using a comma as the delimiter. |
protected DocumentNGramGraph |
getRepresentationFromCluster(java.lang.String sClusterLabel)
Calculates the representing object of a cluster. |
protected void |
initComparator()
Initializes the comparator object to a default comparator, if null. |
java.util.Set<java.lang.String> |
locateSimilarDocuments(DocumentNGramGraph dngCur)
Returns the set of documents of the cluster that is most appropriate, given a document graph. |
static void |
main(java.lang.String[] args)
Function testing the functionality of the class. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected SimilarityComparatorListener Comparator
protected java.util.Set<DocumentNGramGraph> NamedObjects
protected IClusterer Clusterer
protected UniqueVertexGraph Hierarchy
protected INSECTDB<DocumentNGramGraph> Storage
INSECTDB
type storage, to hold the representations of the
documents.
protected final java.lang.String CLUSTER_OBJECT_CATEGORY
public NotificationListener Notifier
public IFileLoader<java.lang.String> Loader
IFileLoader
variable that can load documents given an
identifier. If null, then the document id is used as a file name and the
corresponding file is attempted to be loaded.
Constructor Detail |
---|
public SimilarityBasedIndex(java.util.Set<NamedDocumentNGramGraph> sNamedObjects, SimilarityComparatorListener sclComparator, INSECTDB<DocumentNGramGraph> dbStorage)
sGraphs
- The set of NamedDocumentNGramGraph
s to use as training
for the index. Each pair is expected to contain the name of the graph and the graph itself.sclComparator
- If null, a default similarity comparator for graphs is
used. Otherwise, the given {@link SimilarityComparatorListener) is used to
compare graphs.dbStorage
- If null, then representations are stored in memory.
Otherwise the INSECTDB storage provided is used to store document
representations.Method Detail |
---|
protected void initComparator()
public void createIndex()
createIndex
in interface IIndex<DocumentNGramGraph>
protected java.util.Set<java.lang.String> getDocumentIDsFromCluster(java.lang.String sClusterLabel)
protected DocumentNGramGraph getRepresentationFromCluster(java.lang.String sClusterLabel)
sClusterLabel
- The label of the cluster to look up.
public java.util.Set<java.lang.String> locateSimilarDocuments(DocumentNGramGraph dngCur)
locateSimilarDocuments
in interface IIndex<DocumentNGramGraph>
dngCur
- The graph of the document used.
Set
of strings, corresponding to the document IDs in the
cluster that has the most similar content to the given document.public static void main(java.lang.String[] args)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |