|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectgr.demokritos.iit.jinsect.console.grammaticalityEstimator
public class grammaticalityEstimator
The grammaticality estimator uses the probability of finding a given token (character) after a given n-gram (string), extracted from a text corpus, in order to determine normality of other (new) strings.
DistributionDocument,
Serialized Form| Field Summary | |
|---|---|
protected java.util.TreeMap<java.lang.Integer,DistributionDocument> |
DistroDocs
Map between level and distribution documents. |
protected java.util.TreeMap<java.lang.Integer,DistributionWordDocument> |
DistroWordDocs
Map between level and word distribution documents. |
protected java.lang.String |
FullTextDataString
The concatenation of all corpus texts. |
protected int |
iCharDist
The word and character n-gram neighbourhood sizes. |
protected int |
iMaxCharNGram
The minimum and maximum n-gram sizes to take into account. |
protected int |
iMaxWordNGram
The minimum and maximum n-gram sizes to take into account. |
protected int |
iMinCharNGram
The minimum and maximum n-gram sizes to take into account. |
protected int |
iMinWordNGram
The minimum and maximum n-gram sizes to take into account. |
protected int |
iWordDist
The word and character n-gram neighbourhood sizes. |
| Constructor Summary | |
|---|---|
grammaticalityEstimator(java.util.Set FileNames,
int iMinChar,
int iMaxChar,
int iMinWord,
int iMaxWord,
int iNeighbourhoodWindow)
Creates a new instance of grammaticalityEstimator, using a given set of documents for training. |
|
grammaticalityEstimator(java.util.Set FileNames,
int iMinChar,
int iMaxChar,
int iCharWindow,
int iMinWord,
int iMaxWord,
int iWordWindow)
Creates a new instance of grammaticalityEstimator, using a given set of documents for training. |
|
grammaticalityEstimator(java.lang.String sCorpusDir,
int iMinChar,
int iMaxChar,
int iMinWord,
int iMaxWord,
int iNeighbourhoodWindow,
boolean bFlatDir)
Creates a new instance of grammaticalityEstimator. |
|
| Method Summary | |
|---|---|
double |
getCharNormality(java.lang.String sStr)
Calculates a degree of normality, indicating whether a given string appears in a form similar to text in the training corpus. |
java.util.TreeMap<java.lang.Integer,DistributionDocument> |
getDistroDocs()
|
double |
getNormality(java.lang.String sStr)
Calculates a degree of normality, indicating whether a given string appears in a form similar to text in the training corpus. |
double |
getWordNormality(java.lang.String sStr)
Calculates a degree of normality, indicating whether a given string appears in a form similar to text in the training corpus. |
static grammaticalityEstimator |
loadFromStream(java.io.InputStream is)
|
static void |
main(java.lang.String[] args)
A utility main method that performs grammaticality estimation, given a corpus, a peer document set and a model document set. |
static void |
printSyntax()
Provides command-line syntax information for the execution of the class's main function. |
boolean |
saveToStream(java.io.OutputStream os)
|
void |
train()
Performs the training of the distribution model. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected java.util.TreeMap<java.lang.Integer,DistributionDocument> DistroDocs
protected java.util.TreeMap<java.lang.Integer,DistributionWordDocument> DistroWordDocs
protected int iMinCharNGram
protected int iMaxCharNGram
protected int iMinWordNGram
protected int iMaxWordNGram
protected int iWordDist
protected int iCharDist
protected java.lang.String FullTextDataString
| Constructor Detail |
|---|
public grammaticalityEstimator(java.util.Set FileNames,
int iMinChar,
int iMaxChar,
int iCharWindow,
int iMinWord,
int iMaxWord,
int iWordWindow)
FileNames - A set of filenames to be used as input training set.iMinChar - The minimum character n-gram size to take into account.iMaxChar - The maximum character n-gram size to take into account.iCharWindow - The neighbourhood window to use for the calculation of
n-gram - token neighbourhood of characters.iMinWord - The minimum word n-gram size to take into account.iMaxWord - The maximum word n-gram size to take into account.iWordWindow - The neighbourhood window to use for the calculation of
n-gram - token neighbourhood of words.
public grammaticalityEstimator(java.util.Set FileNames,
int iMinChar,
int iMaxChar,
int iMinWord,
int iMaxWord,
int iNeighbourhoodWindow)
FileNames - A set of filenames to be used as input training set.iMinChar - The minimum character n-gram size to take into account.iMaxChar - The maximum character n-gram size to take into account.iMinWord - The minimum word n-gram size to take into account.iMaxWord - The maximum word n-gram size to take into account.iNeighbourhoodWindow - The neighbourhood window to use for the calculation of
n-gram - token neighbourhood.
public grammaticalityEstimator(java.lang.String sCorpusDir,
int iMinChar,
int iMaxChar,
int iMinWord,
int iMaxWord,
int iNeighbourhoodWindow,
boolean bFlatDir)
sCorpusDir - The path to the directory containing the training corpus.iMinChar - The minimum character n-gram size to take into account.iMaxChar - The maximum character n-gram size to take into account.iMinWord - The minimum word n-gram size to take into account.iMaxWord - The maximum word n-gram size to take into account.iNeighbourhoodWindow - The neighbourhood window to use for the calculation of
n-gram - token neighbourhood.bFlatDir - If true, then the corpus is supposed to be a set of texts in| Method Detail |
|---|
public void train()
public double getNormality(java.lang.String sStr)
sStr - The string to test.
DistributionDocumentpublic double getCharNormality(java.lang.String sStr)
sStr - The string to test.
DistributionDocumentpublic double getWordNormality(java.lang.String sStr)
sStr - The string to test.
DistributionDocumentpublic boolean saveToStream(java.io.OutputStream os)
public static grammaticalityEstimator loadFromStream(java.io.InputStream is)
public static void printSyntax()
public static void main(java.lang.String[] args)
public java.util.TreeMap<java.lang.Integer,DistributionDocument> getDistroDocs()
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||