|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectgr.demokritos.iit.summarization.analysis.EntropyChunker
public class EntropyChunker
This class can separate a token sequence into chunks, based on the entropy of the following symbol.
| Constructor Summary | |
|---|---|
EntropyChunker()
Creates a new instance of EntropyChunker. |
|
| Method Summary | |
|---|---|
java.util.List |
chunkString(java.lang.String sToChunk)
Returns a list of string chunks, derived from a given string. |
protected int |
determineImportantDelimiters(java.util.SortedMap smMap)
|
java.util.SortedMap |
getDelimiters()
Returns a sorted map of delimiters, based on their entropy of next character measure. |
static void |
main(java.lang.String[] sArgs)
Utility method. |
protected java.lang.Integer[] |
splitPointsByDelimiterList(java.lang.String sStr,
java.util.SortedMap lDelimiters)
|
protected static java.lang.String[] |
splitStringByDelimiterPoints(java.lang.String sStr,
java.lang.Integer[] iRes)
Returns the substrings defined by a string and a set of split points. |
void |
train(java.util.Set<java.lang.String> sFileNames)
Train the statistics of the chunker from a given file set. |
void |
train(java.lang.String sTrainingText)
Train the statistics of the chunker from a given text. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public EntropyChunker()
| Method Detail |
|---|
public void train(java.util.Set<java.lang.String> sFileNames)
sFiles - The set of CategorizedFileEntry objects to use for
training.public void train(java.lang.String sTrainingText)
sTrainingText - The text that defines the statistics used by the
chunker.public java.util.SortedMap getDelimiters()
SortedMap of Delimiters, where each delimiter is matched to its entropy measure.public java.util.List chunkString(java.lang.String sToChunk)
sToChunk - The string to chunk.
List of strings that are the chunks of the given string.
protected java.lang.Integer[] splitPointsByDelimiterList(java.lang.String sStr,
java.util.SortedMap lDelimiters)
protected static java.lang.String[] splitStringByDelimiterPoints(java.lang.String sStr,
java.lang.Integer[] iRes)
sStr - The string to split.iRes - An array of integers, indicating the points at which the string
is to be split.
protected int determineImportantDelimiters(java.util.SortedMap smMap)
public static void main(java.lang.String[] sArgs)
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||