Home
Publications
Named-Entity Recognition
Language Resources
POS Tagging
[2002]
[2001]
[2000]
[1999]
[1998]
Ellogon
Tcl/Tk
Contact Information
|
|
G. Petasis, V. Karkaletsis, D. Farmakiotou, G. Samaritakis, I. Androutsopoulos,
C. D. Spyropoulos, "A Greek Morphological Lexicon and its Exploitation by a
Greek Controlled Language Checker". In Proceedings of the 8th
Panhellenic Conference on Informatics, 8 - 10 November 2001, Nicosia,
Cyprus. This paper presents a large-scale Greek morphological
lexicon, developed by the Software & Knowledge Engineering Laboratory (SKEL)
of NCSR "Demokritos". The paper describes the lexicon architecture, the
procedure followed to develop it, as well as the provided
functionalities to update it. The morphological lexicon was used to
develop a lemmatiser and a morphological analyser that were included in
a controlled language checker for Greek. The paper discusses the current
coverage of the lexicon, as well as remaining issues and how we plan to
address them. Our long-term goal is to produce a wide-coverage
morphological lexicon of Greek that can be easily exploited in several
natural language processing applications. |
|
|
|
|
G. Petasis, F. Vichot,
F. Wolinski, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos, "Using Machine
Learning to Maintain Rule-based Named - Entity Recognition and Classification
Systems". In Proceedings of the 39th Conference of Association
for Computational Linguistics (ACL-EACL 2001), pp. 418 - 425, July 9 - 11
2001, Toulouse, France. This paper presents a method that
assists in maintaining a rule-based named-entity recognition and
classification system. The underlying idea is to use a separate system,
constructed with the use of machine learning, to monitor the performance
of the rule-based sys-tem. The training data for the second system is
generated with the use of the rule-based system, thus avoiding the need
for manual tagging. The disagreement of the two systems acts as a signal
for updating the rule-based sys-tem. The generality of the approach is
illustrated by applying it to large corpora in two different languages:
Greek and French. The results are very encouraging, showing that this
alternative use of machine learning can assist significantly in the
maintenance of rule-based systems. |
|
|
|
|
G. Petasis, S. Petridis, G. Paliouras, V. Karkaletsis, S. J.
Perantonis, C. D. Spyropoulos, "Symbolic and Neural Learning for
Named-Entity Recognition". In Proceedings of European Best
Practice Workshops and Symposium on Computational Intelligence and
Learning (COIL 2000), June 19 - 23 2000, Chios, Greece.
Named-entity recognition involves the identification and
classification of named entities in text. This is an important subtask
in most language engineering applications, in particular information
extraction, where different types of named entity are associated with
specific roles in events. The manual construction of rules for the
recognition of named entities is a tedious and time-consuming task. For
this reason, we present in this paper two approaches to learning
named-entity recognition rules from text. The first approach is a
decision-tree induction method and the second a multi-layered
feed-forward neural network. Particular emphasis is paid on the
selection of the appropriate feature set for each method and the
extraction of training examples from unstructured textual data. We
compare the performance of the two methods on a large corpus of English
text and present the results.
|
|
|
|
|
G. Petasis, A. Cucchiarelli, P. Velardi, G. Paliouras, V. Karkaletsis,
C. D. Spyropoulos, "Automatic adaptation of Proper Noun Dictionaries
through cooperation of machine learning and probabilistic methods".
In Proceedings of the 23rd Annual International ACM SIGIR Conference
on Research and Development in information Retrieval, July 24 - 28
2000, Athens, Greece.
The recognition of Proper Nouns (PNs) is considered an important task
in the area of Information Retrieval and Extraction. However the high
performance of most existing PN classifiers heavily depends upon the
availability of large dictionaries of domain-specific Proper Nouns, and
a certain amount of manual work for rule writing or manual tagging.
Though it is not a heavy requirement to rely on some existing PN
dictionary (often these resources are available on the web), its
coverage of a domain corpus may be rather low, in absence of manual
updating. In this paper we propose a technique for the automatic
updating of a PN Dictionary through the cooperation of an inductive and
a probabilistic classifier. In our experiments we show that, whenever an
existing PN Dictionary allows the identification of 50% of the proper
nouns within a corpus, our technique allows, without additional manual
effort, the successful recognition of about 90% of the remaining 50%.
|
|
|
|
|
G. Paliouras, V. Karkaletsis, G. Petasis and C. D.
Spyropoulos, "Learning Decision Trees for Named-Entity Recognition
and Classification". In Proceedings of the 14th
European Conference on Artificial Intelligence (ECAI 2000), August
20 - 25 2000, Berlin, Germany.
We propose the use of decision tree induction as a solution to the
problem of customising a named-entity recognition and classification (NERC)
system to a specific domain. A NERC system assigns semantic tags to
phrases that correspond to named entities, e.g. persons, locations and
organisations. Typically, such a system makes use of two language
resources: a recognition grammar and a lexicon of known names,
classified by the corresponding named-entity types. NERC systems have
been shown to achieve good results when the domain of application is
very specific. However, the construction of the grammar and the lexicon
for a new domain is a hard and time-consuming process. We propose the
use of decision trees as NERC “grammars” and the construction of these
trees using machine learning. In order to validate our approach, we
tested C4.5 on the identification of person and organisation names
involved in management succession events, using data from the sixth
Message Understanding Conference. The results of the evaluation are very
encouraging showing that the induced tree can outperform a grammar that
was constructed manually.
|
|
|
|
|
G. Petasis, "Machine Learning and Named-Entity
Recognition". Presentation in the 8th ELSNET European
Summer School on Language and Speech Communication on the subject of
Text and Speech Triggered Information Access (TeSTIA 2000), July 15
- 30 2000, Chios, Greece.
[Not Available]
|
|
|
|
|
G. Petasis, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos,
I. Androutsopoulos, "Resolving Part-Of-Speech Ambiguity in the Greek
language Using Learning Techniques". In Proceedings of the ECCAI
Advanced Course on Artificial Intelligence (ACAI '99), July 5 - 16
1999, Chania, Greece.
This article investigates the use of Transformation-Based
Error-Driven learning for resolving part-of-speech ambiguity in the
Greek language. The aim is not only to study the performance, but also
to examine its dependence on different thematic domains. Results are
presented here for two different test cases: a corpus on “management
succession events” and a general-theme corpus. The two experiments show
that the performance of this method does not depend on the thematic
domain of the corpus, and its accuracy for the Greek language is around
95%.
|
|
|
|
|
G. Petasis, "Exploiting Learning in Bilingual Named
Entity Recognition". In Proceedings of the ECCAI Advanced Course
on Artificial Intelligence (ACAI '99), July 5 - 16 1999, Chania,
Greece.
[Not Available]
|
|
|
|
|
G. Petasis, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos,
I. Androutsopoulos,
"Using Machine Learning Techniques for Part-Of-Speech Tagging
in the Greek Language". In Proceedings of the 7th
Hellenic Conference on Informatics, August 26 - 29 1999, Ioannina,
Greece.
This article investigates the use of Transformation-Based
Error-Driven learning for resolving part-of-speech ambiguity in the
Greek language. The aim is not only to study the performance, but also
to examine its dependence on different thematic domains. Results are
presented here for two different test cases: a corpus on "management
succession events" and a general-theme corpus. The two experiments show
that the performance of this method does not depend on the thematic
domain of the corpus, and its accuracy for the Greek language is around
95%.
|
|
|
|
|
V. Karkaletsis, G. Paliouras, G. Petasis, N. Manousopoulou
and C. D. Spyropoulos, "Named-Entity Recognition from Greek and
English Texts". Journal of Intelligent and Robotic Systems v.
26, n.2, pp. 123 - 135, 1999.
[Not Available]
|
|
|
|
|
Karkaletsis, V., Spyropoulos, C. D., and Petasis, G., "Named
Entity Recognition from Greek texts: the GIE Project". In
"Advances in Intelligent Systems: Concepts, Tools and Applications",
ed. S. Tzafestas, Kluwer Academic Publishers, Part II Chapter 12, pp.
131 142. (Presented at the 3rd European Robotics
Intelligent Systems & Control Conference (EURISCON '98), June 22 - 25
1998, Athens, Greece.)
[Not Available]
|
|
|
|
|
|