Georgios Petasis - Publications...

Publication Abstracts

[NERC] [Language Resources] [POS Tagging] [by Category] [by Year] [2002] [2001] [2000] [1999] [1998]

Publications
   Named-Entity Recognition
   Language Resources
   POS Tagging
   [2002]
   [2001]
   [2000]
   [1999]
   [1998]

Ellogon

Tcl/Tk

Contact Information

PCI 2001

G. Petasis, V. Karkaletsis, D. Farmakiotou, G. Samaritakis, I. Androutsopoulos, C. D. Spyropoulos, "A Greek Morphological Lexicon and its Exploitation by a Greek Controlled Language Checker". In Proceedings of the 8^th Panhellenic Conference on Informatics, 8 - 10 November 2001, Nicosia, Cyprus.

This paper presents a large-scale Greek morphological lexicon, developed by the Software & Knowledge Engineering Laboratory (SKEL) of NCSR "Demokritos". The paper describes the lexicon architecture, the procedure followed to develop it, as well as the provided functionalities to update it. The morphological lexicon was used to develop a lemmatiser and a morphological analyser that were included in a controlled language checker for Greek. The paper discusses the current coverage of the lexicon, as well as remaining issues and how we plan to address them. Our long-term goal is to produce a wide-coverage morphological lexicon of Greek that can be easily exploited in several natural language processing applications.

ACL-EACL 2001

G. Petasis, F. Vichot, F. Wolinski, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos, "Using Machine Learning to Maintain Rule-based Named - Entity Recognition and Classification Systems". In Proceedings of the 39^th Conference of Association for Computational Linguistics (ACL-EACL 2001), pp. 418 - 425, July 9 - 11 2001, Toulouse, France.
This paper presents a method that assists in maintaining a rule-based named-entity recognition and classification system. The underlying idea is to use a separate system, constructed with the use of machine learning, to monitor the performance of the rule-based sys-tem. The training data for the second system is generated with the use of the rule-based system, thus avoiding the need for manual tagging. The disagreement of the two systems acts as a signal for updating the rule-based sys-tem. The generality of the approach is illustrated by applying it to large corpora in two different languages: Greek and French. The results are very encouraging, showing that this alternative use of machine learning can assist significantly in the maintenance of rule-based systems.

COIL 2000

G. Petasis, S. Petridis, G. Paliouras, V. Karkaletsis, S. J. Perantonis, C. D. Spyropoulos, "Symbolic and Neural Learning for Named-Entity Recognition". In Proceedings of European Best Practice Workshops and Symposium on Computational Intelligence and Learning (COIL 2000), June 19 - 23 2000, Chios, Greece.

Named-entity recognition involves the identification and classification of named entities in text. This is an important subtask in most language engineering applications, in particular information extraction, where different types of named entity are associated with specific roles in events. The manual construction of rules for the recognition of named entities is a tedious and time-consuming task. For this reason, we present in this paper two approaches to learning named-entity recognition rules from text. The first approach is a decision-tree induction method and the second a multi-layered feed-forward neural network. Particular emphasis is paid on the selection of the appropriate feature set for each method and the extraction of training examples from unstructured textual data. We compare the performance of the two methods on a large corpus of English text and present the results.

SIGIR 2000

G. Petasis, A. Cucchiarelli, P. Velardi, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos, "Automatic adaptation of Proper Noun Dictionaries through cooperation of machine learning and probabilistic methods". In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in information Retrieval, July 24 - 28 2000, Athens, Greece.

The recognition of Proper Nouns (PNs) is considered an important task in the area of Information Retrieval and Extraction. However the high performance of most existing PN classifiers heavily depends upon the availability of large dictionaries of domain-specific Proper Nouns, and a certain amount of manual work for rule writing or manual tagging. Though it is not a heavy requirement to rely on some existing PN dictionary (often these resources are available on the web), its coverage of a domain corpus may be rather low, in absence of manual updating. In this paper we propose a technique for the automatic updating of a PN Dictionary through the cooperation of an inductive and a probabilistic classifier. In our experiments we show that, whenever an existing PN Dictionary allows the identification of 50% of the proper nouns within a corpus, our technique allows, without additional manual effort, the successful recognition of about 90% of the remaining 50%.

ECAI 2000

G. Paliouras, V. Karkaletsis, G. Petasis and C. D. Spyropoulos, "Learning Decision Trees for Named-Entity Recognition and Classification". In Proceedings of the 14^th European Conference on Artificial Intelligence (ECAI 2000), August 20 - 25 2000, Berlin, Germany.

We propose the use of decision tree induction as a solution to the problem of customising a named-entity recognition and classification (NERC) system to a specific domain. A NERC system assigns semantic tags to phrases that correspond to named entities, e.g. persons, locations and organisations. Typically, such a system makes use of two language resources: a recognition grammar and a lexicon of known names, classified by the corresponding named-entity types. NERC systems have been shown to achieve good results when the domain of application is very specific. However, the construction of the grammar and the lexicon for a new domain is a hard and time-consuming process. We propose the use of decision trees as NERC “grammars” and the construction of these trees using machine learning. In order to validate our approach, we tested C4.5 on the identification of person and organisation names involved in management succession events, using data from the sixth Message Understanding Conference. The results of the evaluation are very encouraging showing that the induced tree can outperform a grammar that was constructed manually.

TeSTIA 2000

G. Petasis, "Machine Learning and Named-Entity Recognition". Presentation in the 8^th ELSNET European Summer School on Language and Speech Communication on the subject of Text and Speech Triggered Information Access (TeSTIA 2000), July 15 - 30 2000, Chios, Greece.

[Not Available]

ACAI 1999

G. Petasis, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos, I. Androutsopoulos, "Resolving Part-Of-Speech Ambiguity in the Greek language Using Learning Techniques". In Proceedings of the ECCAI Advanced Course on Artificial Intelligence (ACAI '99), July 5 - 16 1999, Chania, Greece.

This article investigates the use of Transformation-Based Error-Driven learning for resolving part-of-speech ambiguity in the Greek language. The aim is not only to study the performance, but also to examine its dependence on different thematic domains. Results are presented here for two different test cases: a corpus on “management succession events” and a general-theme corpus. The two experiments show that the performance of this method does not depend on the thematic domain of the corpus, and its accuracy for the Greek language is around 95%.

ACAI 1999

G. Petasis, "Exploiting Learning in Bilingual Named Entity Recognition". In Proceedings of the ECCAI Advanced Course on Artificial Intelligence (ACAI '99), July 5 - 16 1999, Chania, Greece.

[Not Available]

PCI 1999

G. Petasis, G. Paliouras, V. Karkaletsis, C. D. Spyropoulos, I. Androutsopoulos, "Using Machine Learning Techniques for Part-Of-Speech Tagging in the Greek Language". In Proceedings of the 7^th Hellenic Conference on Informatics, August 26 - 29 1999, Ioannina, Greece.

This article investigates the use of Transformation-Based Error-Driven learning for resolving part-of-speech ambiguity in the Greek language. The aim is not only to study the performance, but also to examine its dependence on different thematic domains. Results are presented here for two different test cases: a corpus on "management succession events" and a general-theme corpus. The two experiments show that the performance of this method does not depend on the thematic domain of the corpus, and its accuracy for the Greek language is around 95%.

JIRS 1999

V. Karkaletsis, G. Paliouras, G. Petasis, N. Manousopoulou and C. D. Spyropoulos, "Named-Entity Recognition from Greek and English Texts". Journal of Intelligent and Robotic Systems v. 26, n.2, pp. 123 - 135, 1999.

[Not Available]

EURISCON 1998

Karkaletsis, V., Spyropoulos, C. D., and Petasis, G., "Named Entity Recognition from Greek texts: the GIE Project". In "Advances in Intelligent Systems: Concepts, Tools and Applications", ed. S. Tzafestas, Kluwer Academic Publishers, Part II Chapter 12, pp. 131 142. (Presented at the 3^rd European Robotics Intelligent Systems & Control Conference (EURISCON '98), June 22 - 25 1998, Athens, Greece.)

[Not Available]