|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectgr.demokritos.iit.jinsect.console.grammaticalityEstimator
public class grammaticalityEstimator
The grammaticality estimator uses the probability of finding a given token (character) after a given n-gram (string), extracted from a text corpus, in order to determine normality of other (new) strings.
DistributionDocument
,
Serialized FormField Summary | |
---|---|
protected java.util.TreeMap<java.lang.Integer,DistributionDocument> |
DistroDocs
Map between level and distribution documents. |
protected java.util.TreeMap<java.lang.Integer,DistributionWordDocument> |
DistroWordDocs
Map between level and word distribution documents. |
protected java.lang.String |
FullTextDataString
The concatenation of all corpus texts. |
protected int |
iCharDist
The word and character n-gram neighbourhood sizes. |
protected int |
iMaxCharNGram
The minimum and maximum n-gram sizes to take into account. |
protected int |
iMaxWordNGram
The minimum and maximum n-gram sizes to take into account. |
protected int |
iMinCharNGram
The minimum and maximum n-gram sizes to take into account. |
protected int |
iMinWordNGram
The minimum and maximum n-gram sizes to take into account. |
protected int |
iWordDist
The word and character n-gram neighbourhood sizes. |
Constructor Summary | |
---|---|
grammaticalityEstimator(java.util.Set FileNames,
int iMinChar,
int iMaxChar,
int iMinWord,
int iMaxWord,
int iNeighbourhoodWindow)
Creates a new instance of grammaticalityEstimator, using a given set of documents for training. |
|
grammaticalityEstimator(java.util.Set FileNames,
int iMinChar,
int iMaxChar,
int iCharWindow,
int iMinWord,
int iMaxWord,
int iWordWindow)
Creates a new instance of grammaticalityEstimator, using a given set of documents for training. |
|
grammaticalityEstimator(java.lang.String sCorpusDir,
int iMinChar,
int iMaxChar,
int iMinWord,
int iMaxWord,
int iNeighbourhoodWindow,
boolean bFlatDir)
Creates a new instance of grammaticalityEstimator. |
Method Summary | |
---|---|
double |
getCharNormality(java.lang.String sStr)
Calculates a degree of normality, indicating whether a given string appears in a form similar to text in the training corpus. |
java.util.TreeMap<java.lang.Integer,DistributionDocument> |
getDistroDocs()
|
double |
getNormality(java.lang.String sStr)
Calculates a degree of normality, indicating whether a given string appears in a form similar to text in the training corpus. |
double |
getWordNormality(java.lang.String sStr)
Calculates a degree of normality, indicating whether a given string appears in a form similar to text in the training corpus. |
static grammaticalityEstimator |
loadFromStream(java.io.InputStream is)
|
static void |
main(java.lang.String[] args)
A utility main method that performs grammaticality estimation, given a corpus, a peer document set and a model document set. |
static void |
printSyntax()
Provides command-line syntax information for the execution of the class's main function. |
boolean |
saveToStream(java.io.OutputStream os)
|
void |
train()
Performs the training of the distribution model. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected java.util.TreeMap<java.lang.Integer,DistributionDocument> DistroDocs
protected java.util.TreeMap<java.lang.Integer,DistributionWordDocument> DistroWordDocs
protected int iMinCharNGram
protected int iMaxCharNGram
protected int iMinWordNGram
protected int iMaxWordNGram
protected int iWordDist
protected int iCharDist
protected java.lang.String FullTextDataString
Constructor Detail |
---|
public grammaticalityEstimator(java.util.Set FileNames, int iMinChar, int iMaxChar, int iCharWindow, int iMinWord, int iMaxWord, int iWordWindow)
FileNames
- A set of filenames to be used as input training set.iMinChar
- The minimum character n-gram size to take into account.iMaxChar
- The maximum character n-gram size to take into account.iCharWindow
- The neighbourhood window to use for the calculation of
n-gram - token neighbourhood of characters.iMinWord
- The minimum word n-gram size to take into account.iMaxWord
- The maximum word n-gram size to take into account.iWordWindow
- The neighbourhood window to use for the calculation of
n-gram - token neighbourhood of words.public grammaticalityEstimator(java.util.Set FileNames, int iMinChar, int iMaxChar, int iMinWord, int iMaxWord, int iNeighbourhoodWindow)
FileNames
- A set of filenames to be used as input training set.iMinChar
- The minimum character n-gram size to take into account.iMaxChar
- The maximum character n-gram size to take into account.iMinWord
- The minimum word n-gram size to take into account.iMaxWord
- The maximum word n-gram size to take into account.iNeighbourhoodWindow
- The neighbourhood window to use for the calculation of
n-gram - token neighbourhood.public grammaticalityEstimator(java.lang.String sCorpusDir, int iMinChar, int iMaxChar, int iMinWord, int iMaxWord, int iNeighbourhoodWindow, boolean bFlatDir)
sCorpusDir
- The path to the directory containing the training corpus.iMinChar
- The minimum character n-gram size to take into account.iMaxChar
- The maximum character n-gram size to take into account.iMinWord
- The minimum word n-gram size to take into account.iMaxWord
- The maximum word n-gram size to take into account.iNeighbourhoodWindow
- The neighbourhood window to use for the calculation of
n-gram - token neighbourhood.bFlatDir
- If true, then the corpus is supposed to be a set of texts inMethod Detail |
---|
public void train()
public double getNormality(java.lang.String sStr)
sStr
- The string to test.
DistributionDocument
public double getCharNormality(java.lang.String sStr)
sStr
- The string to test.
DistributionDocument
public double getWordNormality(java.lang.String sStr)
sStr
- The string to test.
DistributionDocument
public boolean saveToStream(java.io.OutputStream os)
public static grammaticalityEstimator loadFromStream(java.io.InputStream is)
public static void printSyntax()
public static void main(java.lang.String[] args)
public java.util.TreeMap<java.lang.Integer,DistributionDocument> getDistroDocs()
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |