gr.demokritos.iit.jinsect.algorithms.estimators
Class DistanceEstimator

java.lang.Object
  extended by gr.demokritos.iit.jinsect.algorithms.estimators.DistanceEstimator

public class DistanceEstimator
extends java.lang.Object


Field Summary
protected  NGramSizeEstimator Estimator
           
protected  int MaxRank
           
protected  int MinRank
           
protected  Distribution NonSymbolsPerRank
           
protected  Distribution SymbolsPerRank
           
 
Constructor Summary
DistanceEstimator(Distribution tmSymbolsPerRank, Distribution tmNonSymbolsPerRank)
          Creates a new instance of DistanceEstimator, given two distribution of symbols and non-symbols, by getting a copy of the distributions.
DistanceEstimator(Distribution tmSymbolsPerRank, Distribution tmNonSymbolsPerRank, NGramSizeEstimator nseEstimator)
          Creates a new instance of DistanceEstimator, given two distribution of symbols and non-symbols, by getting a copy of the distributions.
 
Method Summary
 double getAllSymbolProbability(int iMinRank, int iMaxRank, int iDistance)
          Returns the probability that, for a given distance, all n-grams in it will be symbols, given a rank range.
 double getNonSymbolProbability(int iMinRank, int iMaxRank, int iDistance)
          Returns the probability of occurence of a non-symbol given a range of n-gram ranks.
 int getOptimalDistance(int iMinDist, int iMaxDist)
          Returns the distance corresponding to the highest signal to noise ratio for a given distance range to examine, with respects to ranks identified by the rank estimator.
 int getOptimalDistance(int iMinDist, int iMaxDist, int iMinRank, int iMaxRank)
          Returns the distance corresponding to the highest signal to noise ratio for a given n-gram rank range, and a given distance range to examine.
 double getSignalToNoise(int iMinRank, int iMaxRank, int iDistance)
          Returns the signal to noise ratio for a given n-gram rank range.
 double getSignalToNoise(int iMinRank, int iMaxRank, int iDistance, int iCurNGramSize)
          Returns the signal to noise ratio for a given n-gram rank range.
 double getSymbolToNonSymbolPercentage(int iMinRank, int iMaxRank)
          Returns the symbol to non-symbol percentage given a range of n-gram ranks.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

SymbolsPerRank

protected Distribution SymbolsPerRank

NonSymbolsPerRank

protected Distribution NonSymbolsPerRank

MinRank

protected int MinRank

MaxRank

protected int MaxRank

Estimator

protected NGramSizeEstimator Estimator
Constructor Detail

DistanceEstimator

public DistanceEstimator(Distribution tmSymbolsPerRank,
                         Distribution tmNonSymbolsPerRank)
Creates a new instance of DistanceEstimator, given two distribution of symbols and non-symbols, by getting a copy of the distributions.

Parameters:
tmSymbolsPerRank - The distribution of symbols per n-gram rank.
tmNonSymbolsPerRank - The distribution of non-symbols per n-gram rank.

DistanceEstimator

public DistanceEstimator(Distribution tmSymbolsPerRank,
                         Distribution tmNonSymbolsPerRank,
                         NGramSizeEstimator nseEstimator)
Creates a new instance of DistanceEstimator, given two distribution of symbols and non-symbols, by getting a copy of the distributions.

Parameters:
tmSymbolsPerRank - The distribution of symbols per n-gram rank.
tmNonSymbolsPerRank - The distribution of non-symbols per n-gram rank.
nseEstimator - An estimator for various n-gram rank cardinalities.
Method Detail

getSymbolToNonSymbolPercentage

public double getSymbolToNonSymbolPercentage(int iMinRank,
                                             int iMaxRank)
Returns the symbol to non-symbol percentage given a range of n-gram ranks.

Parameters:
iMinRank - The minimum rank to take into account.
iMaxRank - The maximum rank to take into account.
Returns:
The percentage of symbols to non-symbols.

getNonSymbolProbability

public double getNonSymbolProbability(int iMinRank,
                                      int iMaxRank,
                                      int iDistance)
Returns the probability of occurence of a non-symbol given a range of n-gram ranks.

Parameters:
iMinRank - The minimum rank to take into account.
iMaxRank - The maximum rank to take into account.
Returns:
The probability of occurence of a non-symbol.

getAllSymbolProbability

public double getAllSymbolProbability(int iMinRank,
                                      int iMaxRank,
                                      int iDistance)
Returns the probability that, for a given distance, all n-grams in it will be symbols, given a rank range.

Parameters:
iMinRank - The minimum rank to take into account.
iMaxRank - The maximum rank to take into account.
iDistance - The distance (character range) within which we expect n-grams to be found.
Returns:
The above described probability.

getSignalToNoise

public final double getSignalToNoise(int iMinRank,
                                     int iMaxRank,
                                     int iDistance)
Returns the signal to noise ratio for a given n-gram rank range.

Parameters:
iMinRank - The minimum rank to take into account.
iMaxRank - The maximum rank to take into account.
iDistance - The distance (character range) within which we expect n-grams to be found.
Returns:
The signal to noise.

getSignalToNoise

public final double getSignalToNoise(int iMinRank,
                                     int iMaxRank,
                                     int iDistance,
                                     int iCurNGramSize)
Returns the signal to noise ratio for a given n-gram rank range.

Parameters:
iMinRank - The minimum rank to take into account.
iMaxRank - The maximum rank to take into account.
iDistance - The distance (character range) within which we expect n-grams to be found.
Returns:
The signal to noise.

getOptimalDistance

public int getOptimalDistance(int iMinDist,
                              int iMaxDist,
                              int iMinRank,
                              int iMaxRank)
Returns the distance corresponding to the highest signal to noise ratio for a given n-gram rank range, and a given distance range to examine. The distance range is examined exhaustively to find the best distance.

Parameters:
iMinDist - The minimum distance to examine.
iMaxDist - The maximum distance to examine.
iMinRank - The minimum rank to take into account.
iMaxRank - The maximum rank to take into account.
Returns:
The optimal distance.

getOptimalDistance

public int getOptimalDistance(int iMinDist,
                              int iMaxDist)
Returns the distance corresponding to the highest signal to noise ratio for a given distance range to examine, with respects to ranks identified by the rank estimator. The distance range is examined exhaustively to find the best distance.

Parameters:
iMinDist - The minimum distance to examine.
iMaxDist - The maximum distance to examine.
Returns:
The optimal distance.