gr.demokritos.iit.jinsect.console
Class summaryFuzzyEvaluator
java.lang.Object
  
gr.demokritos.iit.jinsect.console.summaryEvaluator
      
gr.demokritos.iit.jinsect.console.summaryFuzzyEvaluator
- All Implemented Interfaces: 
 - java.lang.Runnable
 
public class summaryFuzzyEvaluator
- extends summaryEvaluator
 
A class that performs summary evaluation like the summaryEvaluator super-class, but
 with fuzzy string matching between n-grams of different texts. Uses SpectralSpell application
 for fuzziness matching of words.
 
| Fields inherited from class gr.demokritos.iit.jinsect.console.summaryEvaluator | 
CharDist, CharMax, CharMin, Do, DO_ALL, DO_CHARS, DO_WORDS, hModelCache, hNModelCache, ModelDir, OutFile, OutputSemaphore, SummaryDir, Threads, USE_DISTRO_AVERAGE_AS_WEIGHT, USE_OCCURENCES_AS_WEIGHT, WeightMethod, WordDist, WordMax, WordMin | 
 
| 
Constructor Summary | 
summaryFuzzyEvaluator(java.util.concurrent.Semaphore sOutputSemaphore,
                      java.lang.String sDo,
                      int iWordMin,
                      int iWordMax,
                      int iWordDist,
                      int iCharMin,
                      int iCharMax,
                      int iCharDist,
                      int iThreads,
                      java.lang.String sOutFile,
                      java.lang.String sSummaryDir,
                      java.lang.String sModelDir,
                      boolean bSilent,
                      int iWeightMethod,
                      boolean bProgress,
                      java.lang.String sSspellParams)
 
          Creates a summaryEvaluator object. | 
summaryFuzzyEvaluator(java.lang.String[] args)
 
            | 
 
| 
Method Summary | 
protected  SimilarityArray | 
calcSimilarityMeasures(CategorizedFileEntry cfeCur,
                       java.util.List dsModelSet,
                       boolean bOutput,
                       java.io.PrintStream pOut,
                       java.util.concurrent.Semaphore sSem,
                       int WordNGramSize_Min,
                       int WordNGramSize_Max,
                       int Word_Dmax,
                       int CharacterNGramSize_Min,
                       int CharacterNGramSize_Max,
                       int Character_Dmax,
                       boolean bDoCharNGrams,
                       boolean bDoWordNGrams,
                       boolean bSilent)
 
          Performs similarity measurement of a CategorizedFileEntry, given a model set. | 
static void | 
main(java.lang.String[] args)
 
            | 
 
 
| Methods inherited from class java.lang.Object | 
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait | 
 
sspellParams
protected java.lang.String sspellParams
summaryFuzzyEvaluator
public summaryFuzzyEvaluator(java.util.concurrent.Semaphore sOutputSemaphore,
                             java.lang.String sDo,
                             int iWordMin,
                             int iWordMax,
                             int iWordDist,
                             int iCharMin,
                             int iCharMax,
                             int iCharDist,
                             int iThreads,
                             java.lang.String sOutFile,
                             java.lang.String sSummaryDir,
                             java.lang.String sModelDir,
                             boolean bSilent,
                             int iWeightMethod,
                             boolean bProgress,
                             java.lang.String sSspellParams)
- Creates a summaryEvaluator object.
- Parameters:
 sOutputSemaphore - A semaphore that ensures that the output is provided consistently.sDo - The method of evaluation (see DO_WORDS, DO_CHAR, DO_ALL).iWordMin - The min word n-gram rank to take into account, if applicable to the method.iWordMax - The max word n-gram rank to take into account, if applicable to the method.iWordDist - The word n-gram neighbourhood distance to use, if applicable to the method.iCharMin - The min char n-gram rank to take into account, if applicable to the method.iCharMax - The max char n-gram rank to take into account, if applicable to the method.iCharDist - The char n-gram neighbourhood distance to use, if applicable to the method.iThreads - The number of threads to use, for multi-threaded processing.sOutFile - The file to output results.sSummaryDir - The peer summary base directory.sModelDir - The model summaries base directory.bSilent - If true, no debug messages are output.iWeightMethod - The method to use for weighting edges in the n-gram graph. See 
 USE_DISTRO_AVERAGE_AS_WEIGHT, USE_OCCURENCES_AS_WEIGHT.bProgress - If true, indicates that progress indication should be output, even in silent
 mode.sSspellParams - Custom parameters to pass to SpectralSpell.
 
summaryFuzzyEvaluator
public summaryFuzzyEvaluator(java.lang.String[] args)
calcSimilarityMeasures
protected SimilarityArray calcSimilarityMeasures(CategorizedFileEntry cfeCur,
                                                 java.util.List dsModelSet,
                                                 boolean bOutput,
                                                 java.io.PrintStream pOut,
                                                 java.util.concurrent.Semaphore sSem,
                                                 int WordNGramSize_Min,
                                                 int WordNGramSize_Max,
                                                 int Word_Dmax,
                                                 int CharacterNGramSize_Min,
                                                 int CharacterNGramSize_Max,
                                                 int Character_Dmax,
                                                 boolean bDoCharNGrams,
                                                 boolean bDoWordNGrams,
                                                 boolean bSilent)
- Description copied from class: 
summaryEvaluator 
- Performs similarity measurement of a 
CategorizedFileEntry, given a model set.
- Overrides:
 calcSimilarityMeasures in class summaryEvaluator
 
- Parameters:
 cfeCur - The current file to compare to models.dsModelSet - The input model set.bOutput - If true, output is verbose.pOut - The PrintStream to use for output.sSem - The semaphore to use to ascertain that output is consistent and thread-safe.WordNGramSize_Min - The min word n-gram rank to use in the representation.WordNGramSize_Max - The max word n-gram rank to use in the representation.Word_Dmax - The max neighbourhood distance to use in the word n-gram graph 
 representation.CharacterNGramSize_Min - The min character n-gram rank to use in the representation.CharacterNGramSize_Max - The max character n-gram rank to use in the representation.Character_Dmax - The max neighbourhood distance to use in the character n-gram graph 
 representation.bDoCharNGrams - If true performs character n-gram comparison. Can be used together with 
 bDoWordNGrams.bDoWordNGrams - If true performs word n-gram comparison. Can be used together with 
 bDoCharNGrams.bSilent - If true, no debugging information is displayed.
- Returns:
 - A 
SimilarityArray containing similarity values for the given file. 
 
 
main
public static void main(java.lang.String[] args)