|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectgr.demokritos.iit.summarization.analysis.EntropyChunker
public class EntropyChunker
This class can separate a token sequence into chunks, based on the entropy of the following symbol.
Constructor Summary | |
---|---|
EntropyChunker()
Creates a new instance of EntropyChunker. |
Method Summary | |
---|---|
java.util.List |
chunkString(java.lang.String sToChunk)
Returns a list of string chunks, derived from a given string. |
protected int |
determineImportantDelimiters(java.util.SortedMap smMap)
|
java.util.SortedMap |
getDelimiters()
Returns a sorted map of delimiters, based on their entropy of next character measure. |
static void |
main(java.lang.String[] sArgs)
Utility method. |
protected java.lang.Integer[] |
splitPointsByDelimiterList(java.lang.String sStr,
java.util.SortedMap lDelimiters)
|
protected static java.lang.String[] |
splitStringByDelimiterPoints(java.lang.String sStr,
java.lang.Integer[] iRes)
Returns the substrings defined by a string and a set of split points. |
void |
train(java.util.Set<java.lang.String> sFileNames)
Train the statistics of the chunker from a given file set. |
void |
train(java.lang.String sTrainingText)
Train the statistics of the chunker from a given text. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public EntropyChunker()
Method Detail |
---|
public void train(java.util.Set<java.lang.String> sFileNames)
sFiles
- The set of CategorizedFileEntry
objects to use for
training.public void train(java.lang.String sTrainingText)
sTrainingText
- The text that defines the statistics used by the
chunker.public java.util.SortedMap getDelimiters()
SortedMap
of Delimiters, where each delimiter is matched to its entropy measure.public java.util.List chunkString(java.lang.String sToChunk)
sToChunk
- The string to chunk.
List
of strings that are the chunks of the given string.protected java.lang.Integer[] splitPointsByDelimiterList(java.lang.String sStr, java.util.SortedMap lDelimiters)
protected static java.lang.String[] splitStringByDelimiterPoints(java.lang.String sStr, java.lang.Integer[] iRes)
sStr
- The string to split.iRes
- An array of integers, indicating the points at which the string
is to be split.
protected int determineImportantDelimiters(java.util.SortedMap smMap)
public static void main(java.lang.String[] sArgs)
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |