|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectgr.demokritos.iit.summarization.analysis.EntropyChunker
public class EntropyChunker
This class can separate a token sequence into chunks, based on the entropy of the following symbol.
Constructor Summary | |
---|---|
EntropyChunker()
Creates a new instance of EntropyChunker. |
Method Summary | |
---|---|
java.util.List |
chunkString(java.lang.String sToChunk)
Returns a list of string chunks, derived from a given string. |
void |
clearDelimiters()
Clears list of delimiters determined. |
protected int |
determineImportantDelimiters(java.util.SortedMap smMap)
|
java.util.SortedMap |
getDelimiters()
Returns a sorted map of delimiters, based on their entropy of next character measure. |
static void |
main(java.lang.String[] sArgs)
Utility method. |
protected java.lang.Integer[] |
splitPointsByDelimiterList(java.lang.String sStr,
java.util.SortedMap lDelimiters)
|
protected static java.lang.String[] |
splitStringByDelimiterPoints(java.lang.String sStr,
java.lang.Integer[] iRes)
Returns the substrings defined by a string and a set of split points. |
void |
train(java.util.Set<java.lang.String> sFileNames)
Train the statistics of the chunker from a given file set. |
void |
train(java.lang.String sTrainingText)
Train the statistics of the chunker from a given text. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public EntropyChunker()
Method Detail |
---|
public void train(java.util.Set<java.lang.String> sFileNames)
sFiles
- The set of CategorizedFileEntry
objects to use for
training.public void train(java.lang.String sTrainingText)
sTrainingText
- The text that defines the statistics used by the
chunker.public void clearDelimiters()
public java.util.SortedMap getDelimiters()
SortedMap
of Delimiters, where each delimiter is matched to its entropy measure.public java.util.List chunkString(java.lang.String sToChunk)
chunkString
in interface IChunker
sToChunk
- The string to chunk.
List
of strings that are the chunks of the given string.protected java.lang.Integer[] splitPointsByDelimiterList(java.lang.String sStr, java.util.SortedMap lDelimiters)
protected static java.lang.String[] splitStringByDelimiterPoints(java.lang.String sStr, java.lang.Integer[] iRes)
sStr
- The string to split.iRes
- An array of integers, indicating the points at which the string
is to be split.
protected int determineImportantDelimiters(java.util.SortedMap smMap)
public static void main(java.lang.String[] sArgs)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |