ICDAR 2013 DISEC

The detection and correction of the document skew is one of the most important document image analysis steps. Some degree of skew is unavoidable to be introduced when a document is scanned manually or mechanically. This may seriously affect the performance of subsequent stages of segmentation and recognition, while a skew of as little as 0.1º may be apparent to a human observer. Due to that, a reliable skew estimation and correction technique has to be used in scanned documents as a pre-processing stage in almost all document analysis and recognition systems, while, according to recent research, skew detection is still an interesting and challenging issue especially for documents with graphics, charts, figures or various font sizes. Several parameters that restrict the efficiency of skew estimation methods, and consequently of OCR, are the unknown layout of the document image and the range of potential skew angles in which a method can efficiently estimate the skew.

In this first international skew estimation contest, the general objectives are to make a comparison of skew estimation techniques in order to record the maturity of this research area and provide a well established generic set that could be the benchmark that is missing from the literature. To this end, we organize DISEC'13 in order to record recent advances in this scientific field and detect whether the excisting techniques can come up to the modern needs of document image skew estimation.

We will provide a dataset of over 200 document images, representative of most realistic cases. The document images will contain figures, tables, diagrams, block diagrams, architectural plans, electrical circuits, while they will be obtained from newspapers, scientific journals, scientific books, literature books, poetry anthologies, course books, dictionaries, travel guides, museum guides, museum tickets, menus, comic books, official state documents and various other sources. The image documents will be written in English, Chinese, Greek, Japanese, Bulgarian, Russian, Danish, Italian, Turkish and ancient Greek, while there will be representative cases of various sizes of image documents, any kind of mixed content, vertical and horizontal writing, multisized fonts and multiple different number of columns in the same document. Part of this dataset (sample set) will be provided to the contest participants after registering.

The evaluation of the submitted skew estimation techniques will be based on well established metrics which will take into consideration the skew angle average error deviation and the number of correct estimations for each of the algorithms.

We invite all researchers in the field of Document Normalization and Skew Correction to register and participate in ICDAR2013 Document Image Skew Estimation Contest. An agreement will be signed by the participants and the organizers in order to protect the Intellectual Property Rights (IPR) of the submitted software. The description of the methods and the evaluation scores will be presented during a dedicated ICDAR2013 contest session. A report on the competition will be published in the ICDAR2013 conference proceedings.