gr.demokritos.iit.conceptualIndex.documentModel
Class DistributionWordDocument
java.lang.Object
gr.demokritos.iit.conceptualIndex.documentModel.DistributionDocument
gr.demokritos.iit.conceptualIndex.documentModel.DistributionWordDocument
- All Implemented Interfaces:
- java.io.Serializable
public class DistributionWordDocument
- extends DistributionDocument
- See Also:
- Serialized Form
Constructor Summary |
DistributionWordDocument(int iNeighbourhoodWindow)
Creates a new instance of DistributionWordDocument. |
DistributionWordDocument(int iNeighbourhoodWindow,
int iSourceNGramSize)
Creates a new instance of DistributionWordDocument. |
Method Summary |
static void |
main(java.lang.String[] sArgs)
|
double |
normality(java.lang.String s)
Calculates a degree of normality, indicating whether a given string appears in a form
similar to text in the document. |
void |
setDataString(java.lang.String sDataString,
int iNGramSize,
boolean clearCurrentData)
Creates and saves the graph representation of a string, using word n-grams of selected size
as source nodes and word n-grams of size 1 (words) as destination nodes. |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
DistributionWordDocument
public DistributionWordDocument(int iNeighbourhoodWindow)
- Creates a new instance of DistributionWordDocument. The source n-gram size is set to the default
value of 1.
- Parameters:
iNeighbourhoodWindow
- The size of the window indicative of neighbourhood between a
source n-gram and a given token.
DistributionWordDocument
public DistributionWordDocument(int iNeighbourhoodWindow,
int iSourceNGramSize)
- Creates a new instance of DistributionWordDocument.
- Parameters:
iNeighbourhoodWindow
- The size of the window indicative of neighbourhood between a
source n-gram and a given token.iSourceNGramSize
- The size of the source n-grams in character length.
setDataString
public void setDataString(java.lang.String sDataString,
int iNGramSize,
boolean clearCurrentData)
- Creates and saves the graph representation of a string, using word n-grams of selected size
as source nodes and word n-grams of size 1 (words) as destination nodes.
- Overrides:
setDataString
in class DistributionDocument
- Parameters:
sDataString
- The data string to analyse and represent as a distribution graph.iNGramSize
- The size of the n-grams used as source nodes.clearCurrentData
- Indicates whether the new data replace existing data. If this parameter
is set to false, then the new data is appended to existing data.
normality
public double normality(java.lang.String s)
- Calculates a degree of normality, indicating whether a given string appears in a form
similar to text in the document. The process actually compares distributions. These
distributions appear in same edges of the graph representations of the DistributionDocument
object, and another DistributionDocument, created by use of the given string.
If the public variable
OnCompare
has been set it is used to compare the distributions.
- Overrides:
normality
in class DistributionDocument
- See Also:
Distribution
main
public static void main(java.lang.String[] sArgs)