Home Page of Nikolaos Stamatopoulos

Document image segmentation is the process of dividing a document image into its base text components (blocks, text lines, words, characters). One of the most important and challenging tasks of document image analysis is the segmentation of handwritten document images into text lines and words. The overall performance of a character recognition or a word spotting system strongly relies on the results of the text line and word segmentation process. Although text line and word segmentation for machine-printed documents, and especially modern, is usually considered as a solved problem, segmentation of handwritten document images still presents significant challenges and it is an open problem. Different types of challenges are encountered in the handwritten text line and word segmentation processes. Concerning text line segmentation, difference in the skew angle between text lines, curvilinear text lines, variation in inter-line gaps, overlapping and touching text lines that frequently appear in handwritten document images are some of the challenging issues. Furthermore, the appearance of accents in some languages (e.g. Greek) increases segmentation complexities. Regarding word segmentation, the challenges include the appearance of skew along a single text line, the existence of slant, the non-uniform spacing of words as well as the existence of punctuation marks. Over the last decade, a wide variety of segmentation methods for handwritten document images has been reported in the literature. Moreover, four handwriting segmentation competitions have been organized in the context of the ICDAR and ICFHR conferences (ICDAR2007, ICDAR2009, ICFHR2010 and ICDAR2013) in order to address the need for objective, comparative and detailed evaluation under realistic circumstances and standard datasets. In this chapter, the evaluation results of the handwriting segmentation competitions are summarized in terms of a detailed description of the benchmarking datasets and the evaluation protocol used. Moreover, a brief description of the participating methods complemented by recently published methods which report on the competition’s data are presented.

Writer identification is a behavioral handwriting-based recognition problem which proceeds by matching unknown handwritings against a database of samples with known authorship. From the document image analysis scope, writer identification can be defined as the retrieval of handwritten samples of the same writer from a database using a handwritten sample as a graphical query. The large number of recent publications as well as the organization of several competitions, proves that writer identification is a very active and promising area of research. The identification of the writer of a handwritten document has a wide variety of applications. For example, analysis of handwritten documents has great bearing on the criminal justice systems. As stated by Srihari et al. “Numerous cases over the years have dealt with evidence provided by handwritten documents such as wills and ransom notes.” Other application areas include security, financial activity, forensic analysis and access control. The main challenges of a writer identification system as described by Schomaker et al. include the variability and variation of handwritten patterns even among documents of the same writer, the limited amount of image data and the presence of noise patterns. Another challenge concerns the large number of classes (writers) among which the final decision should be taken. In this chapter, we summarize the results of the writer identification competition series for Latin documents presented in the ICDAR and ICFHR conferences including the benchmarking datasets, the evaluation protocol, the participating methods together with several recently published methods which made use of the benchmarking datasets and finally, we draw some comments and conclusions.

Historical manuscript collections can be considered as an important source of original information in order to provide access to historical data and develop cultural documentation over the years. This chapter reports on recent advances and ongoing developments for historical handwritten document processing. It outlines the main challenges involved, the different tasks that have to be implemented as well as practices and technologies that currently exist in the literature. The focus is given on the most promising techniques as well as on existing datasets and competitions that can be proved useful to historical handwritten document processing research. The main tasks that have to be implemented in the historical document image recognition pipeline, include preprocessing for image enhancement and binarization, segmentation for the detection of main page elements, text lines and words and, finally, recognition. In cases where optical recognition is expected to give poor results, keyword spotting has been proposed to substitute full-text recognition. The organization of this chapter is as follows. Section “Preprocessing” gives an overview of document image enhancement and binarization methods while section “Segmentation” presents layout analysis, text line and word segmentation state-of-the-art techniques for historical handwritten documents. In section “Handwritten Text Recognition (HTR)” the focus is on the pure recognition task which can be accomplished on text line, word or character level. Finally, in section “Keyword spotting” recent advances on searching for a keyword directly on the historical document images are presented.

G. Mühlberger, L. Seaward, ... , N. Stamatopoulos, ... , H. Wurster and K. Zagoris, “Transforming Scholarship in the Archives Through Handwritten Text Recognition: Transkribus as a Case Study”, Journal of Documentation, vol. 75, no. 5, pp. 954-976, 2019. impact factor: 0.853

Archives are increasingly investing in the digitisation of their manuscript collections but until recently the textual content of the resulting digital images has only been available to those who have the time to study and transcribe individual passages. The use of computers to process and search images of historical papers using Handwritten Text Recognition (HTR) has the potential to transform access to our written past for the use of researchers, institutions and the general public. This paper reports on the Recognition and Enrichment of Archival Documents (READ) European Union Horizon 2020 project which is developing advanced text recognition technology on the basis of artificial neural networks and resulting in a publicly available infrastructure: the Transkribus platform. Users of Transkribus (whether institutional or individual) are able to extract data from handwritten and printed texts via HTR, while simultaneously contributing to the improvement of the same technology thanks to machine learning principles. The automated recognition of a wide variety of historical texts has significant implications for the accessibility of the written records of global cultural heritage. This paper uses the Transkribus platform as a case study, focusing on the development, application and impact of HTR technology. It demonstrates that HTR has the capacity to make a significant contribution to the archival mission by making it easier for anyone to read, transcribe, process and mine historical documents. It shows that the technology fits neatly into the archival workflow, making direct use of growing repositories of digitised images of historical texts. By providing examples of institutions and researchers who are generating new resources with Transkribus, the paper shows how HTR can extend the existing research infrastructure of the archives, libraries and humanities domain. Looking to the future, this paper argues that this form of machine learning has the potential to change the nature and scope of historical research. Finally, it suggests that a cooperative approach from the archives, library and humanities community is the best way to support and sustain the benefits of the technology offered through Transkribus.

G. Retsinas, G. Louloudis, N. Stamatopoulos and B. Gatos, “Efficient Learning-Free Keyword Spotting”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1587-1600, 2019. impact factor: 8.329

In this article, a method for segmentation-based learning-free Query by Example (QbE) keyword spotting on handwritten documents is proposed. The method consists of three steps, namely preprocessing, feature extraction and matching, which address critical variations of text images (e.g. skew, translation, different writing styles). During the feature extraction step, a sequence of descriptors is generated using a combination of a zoning scheme and a novel appearance descriptor, referred as modified Projections of Oriented Gradients. The preprocessing step, which includes contrast normalization and main-zone detection, aims to overcome the shortcomings of the appearance descriptor. Moreover, an uneven zoning scheme is introduced by applying a denser zoning only on query images for a more detailed representation. This leads to a significant reduction in storage requirements of a document collection. The distance between the query and word sequences is efficiently computed by the proposed Selective Matching algorithm. This algorithm is further extended to handle an augmented set of images originating from a single query image. The efficiency of the proposed method is demonstrated by experimentation conducted on seven publicly available datasets. In these experiments, the proposed method significantly outperforms all state-of-the-art learning-free techniques.

L. Kopeykina and A.V. Savchenko, “Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks”, International Russian Automation Conference (RusAutoCon'19), 2019
V. Thakur and H. Sikarwar, “Deep Learning Feature Extraction for Handwritten Keyword Spotting in Historical Documents”, 2nd International Conference on Emerging Trends in Engineering & Applied Science (ICETEAS'19), vol. 5, no. 1, pp. 11 – 15, 2019

N. Stamatopoulos, B. Gatos and I. Pratikakis, “Performance Evaluation Methodology for Document Image Dewarping Techniques”, IET Image Processing, vol. 6, no. 6, pp. 738-745, 2012. impact factor: 0.895

The performance evaluation of dewarping techniques is currently addressed by concentrating in visual pleasing impressions or by using OCR as a means for indirect evaluation. In this paper, we present a performance evaluation methodology that calculates a comprehensive evaluation measure which reflects the entire performance of a dewarping technique in a concise quantitative manner. The proposed evaluation measure takes into account the deviation of the dewarped text lines from an horizontal straight reference which is considered to be the optimal result. This measure is expressed by the integral over the dewarped text line curves. To reduce the manual effort for identifying the text lines in the dewarped image, we propose a point-to-point matching procedure that finds the correspondence between the manually marked warped document image and the dewarping counterpart. This enables an evaluation for unlimited number of methodologies addressing a marking procedure which is applied only once. The validity of the proposed performance evaluation methodology is demonstrated by a concise experimental work that comprises four state-of-the-art dewarping techniques as well as the involvement of different users in the interactive part of the procedure.

C. Hong, S. Colburn and A. Majumdar, “Flat metaform near-eye visor”, Applied Optics, vol. 56, no. 31, pp. 8822-8827, 2017
P. Yang, A. Antonacopoulos, C. Clausner, S. Pletschacher and J. Qi, “Effective geometric restoration of distorted historical document for large-scale digitisation”, IET Image Processing,vol. 11, no. 10, pp. 841-853, 2017
M. Rahnemoonfar and B. Plale, “Automatic performance evaluation of dewarping methods in large scale digitization of historical documents”, 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCLD'13), pp. 331-334, Indiana, USA, 2013.

N. Stamatopoulos, B. Gatos, I. Pratikakis and S.J. Perantonis, “Goal-oriented Rectification of Camera-Based Document Images”, IEEE Transactions on Image Processing, vol. 20, no. 4, pp. 910-920, 2011. impact factor: 3.042

Document digitization with either flatbed scanners or camera-based systems results in document images which often suffer from warping and perspective distortions that deteriorate the performance of current OCR approaches. In this paper, we present a goal-oriented rectification methodology to compensate for undesirable document image distortions aiming to improve the OCR result. Our approach relies upon a coarse-to-fine strategy. First, a coarse rectification is accomplished with the aid of a computationally low cost transformation which addresses the projection of a curved surface to a 2-D rectangular area. The projection of the curved surface on the plane is guided only by the textual content's appearance in the document image while incorporating a transformation which does not depend on specific model primitives or camera setup parameters. Second, pose normalization is applied on the word level aiming to restore all the local distortions of the document image. Experimental results on various document images with a variety of distortions demonstrate the robustness and effectiveness of the proposed rectification methodology using a consistent evaluation methodology that encounters OCR accuracy and a newly introduced measure using a semi-automatic procedure.

A. Garai and S. Biswas, “Dewarping of Single-Folded Camera Captured Bangla Document Images”, Computational Intelligence in Pattern Recognition (CIPR'19), pp. 647-656, 2019
K.M. Hung, C.H. Yih and C.H. Yeh, “A Reading Assistant System Based on Restoring Warped Document Image”, Journal of Applied Science and Engineering, vol. 21, no. 3, pp. 475-484, 2018
G. Meng, Y. Su, Y. Wu, S. Xiang and C. Pan, "Exploiting Vector Fields for Geometric Rectification of Distorted Document Images”, European Conference on Computer Vision (ECCV'18), pp. 172-187, 2018
V.V. Vashi and D.G. Jani, “Review Paper based on Different Technologies to Read Text using Optical Character Recognition”, International Journal of Management, Technology And Engineering, vol. 8, no. V, pp. 7-10, 2018
L. Galarza, H. Martin and M. Adjouadi, “Integrating low-resolution depth maps to high-resolution images in the development of a book reader design for persons with visual impairment and blindness”, International Journal of Innovative Computing, Information and Control (ICIC), vol. 14, no. 3, pp. 797-816, 2018
R. Sun, S. Wang, L. Ji and Z. Wang, “Multi-scale document image rectification utilising text-features”, Electronics Letters, vol. 54, no. 8, pp. 502-503, 2018
C. Yan, J. Hu and C. Zhang, “Deep Transformer: A Framework For 2D Text Image Rectification From Planar Transformations”, Neurocomputing, https://doi.org/10.1016/j.neucom.2018.02.015, 2018
S. You, Y. Matsushita, S. Sinha Y. Bou and K. Ikeuchi, “Multiview Rectification of Folded Documents”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 2, pp. 505-511, 2018
R. Sun, N. Li, S. Wang, L. Ji and Z. Wang, “The rectification of document images using text-features”, 7th International Conference on Virtual Reality and Visualization, (ICVRV'17), pp. 223-228, 2017
A. Garai, S. Biswas, S. Mandal and B.B. Chaudhuri, “Automatic dewarping of Camera Captured Born-Digital Bangla Document Images”, 9th International Conference on Advances in Pattern Recognition (ICAPR'17), pp. 94-99, 2017
W.T. Dar and M.N.A Khan, “Click-Free, Video-Based Document Capture - Methodology and Evaluation”, 7th International Workshop on Camera-Based Document Analysis and Recognition, (CBDAR'17), pp. 21-26, 2017
P. Yang, A. Antonacopoulos, C. Clausner, S. Pletschacher and J. Qi, “Effective geometric restoration of distorted historical document for large-scale digitisation”, IET Image Processing, vol. 11, no. 10, pp. 841-853, 2017
S. Das, G. Mishra, A. Sudharshana and R. Shilkrot, "The Common Fold: Utilizing the Four-Fold to Dewarp Printed Documents from a Single Image", ACM Symposium on Document Engineering (DocEng'17), pp. 125-128, 2017
H. Eslami, A.A. Raie K. Faez, "Precise vehicle speed measurement based on a hierarchical homographic transform estimation for law enforcement applications", IEICE Transactions on Information and Systems, vol. E99D, no. 6, pp. 1635-1644, 2016.
G. Meng, S. Xiang, C. Pan and N. Zheng, "Active Rectification of Curved Document Images Using Structured Beams", International Journal of Computer Vision, DOI: 10.1007/s11263-016-0952-z, 2016
S. Kumar, K. Kumar, R.K. Mishra, "Scene Text Recognition using Artificial Neural Network: A Survey", International Journal of Computer Applications, vol. 137, no. 6, pp. 40-50, 2016
S. Calarasanu, S. Dubuisson and J. Fabrizio, "Towards the rectification of highly distorted texts", 11th International Conference on Computer Vision Theory and Applications (VISAPP'15), pp. 1-8, 2016
Z. Huang, J. Gu, G. Meng and C. Pan, "Text line extraction of curved document images using hybrid metric", 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Malaysia, pp. 251-255, 2016
C. Crovato, D. Torok, R. Heidrich, B. Cerqueira and E. Velho , “Preparing for OCR of Books Handled by Visually Impaired”, 10th International Conference Ubiquitous Computing and Ambient Intelligence (UCAmI'16), pp. 419-430, 2016
Q. Ye and D. Doermann, "Text Detection and Recognition in Imagery: A Survey", IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 37, no. 7 , pp. 1480-1500, 2015
G. Meng, Z. Huang, Y. Song, S. Xiang and C. Pan, "Extraction of Virtual Baselines from Distorted Document Images Using Curvilinear Projection", International Conference on Computer Vision (ICCV'15), pp. 3925-3933, 2015
M.K. Alqudah, M.F. Bin Nasrudin, B. Bataineh, M. Alqudah and A. Alkhatatneh, "Investigation of binarization techniques for unevenly illuminated document images acquired via handheld cameras", 2nd International Conference on Computer, Communications, and Control Technology (I4CT'15) , pp. 524-529, 2015
M. Fawzi, M.A. Rashwan, H. Ahmed, S. Samir, S.M. Abdou, H.M. Al-Barhamtoshy and K.M. Jambi, "Rectification of Camera Captured Document Images for Camera-Based OCR Technology", 6th International Workshop on Camera Based Document Analysis and Recognition (CBDAR'15) , pp. 1226-1230, Nancy, France, 2015
B.S Kim, H.I. Koo and N.I. Cho, “Document Dewarping via Text-line based Optimization”, Pattern Recognition, doi:10.1016/j.patcog.2015.04.026, 2015
M. P. Nevetha and A. Baskar, " Applications of Text Detection and its Challenges: A Review", 3rd International Symposium on Women in Computing and Informatics (WCI '15), pp. 712-721, 2015
Y.S. Lin, K.H. Lo, H.T. Chen and J.H. Chuang, "Vanishing point-based image transforms for enhancement of probabilistic occupancy map-based people localization", IEEE Transactions on Image Processing, vol. 23, no. 12, pp. 5586-5598, 2014
G. Meng, Y. Wang, S. Qu, S. Xiang, C. Pan, "Active flattening of curved document images via two structured beams", Conference on Computer Vision and Pattern Recognition (CVPR'14), Columbus, USA, pp. 3890-3897, 2014
C. Liu, Y. Zhang, B. Wang and X. Ding, “Restoring camera-captured distorted document images”, International Journal on Document Analysis and Recognition (IJDAR), vol. 18, no. 2 pp. 111–124, 2014
Q. Ye, "Text Detection and Recognition in Imagery: A Survey ", IEEE Transactions on Pattern Analysis and Machine Intelligence, DOI: 10.1109/TPAMI.2014.2366765, 2014
W. Pan, Z. Lian, R. Sun, Y. Tang, and J. Xiao, "FlexiFont: a flexible system to generate personal font libraries", In Proceedings of the 2014 ACM symposium on Document engineering (DocEng '14), Colorado, USA, pp. 17-20, 2014
L. Zhang, Q. Fan, Y. Li, Y. Uchimura and S. Serikawa, “An Implementation of Document Image Reconstruction System on A Smart Device Using a 1D Histogram Calibration Algorithm”, Mathematical Problems in Engineering, article number 313452, 2014
S. Xie, Y. He, P. Pan, J. Sun and S. Naoi, “Book Inner Boundary Extraction with Modified Active Shape Model”, Pattern Recognition Letters, vol. 45, no. 1 pp. 121–128, 2014
C. Merino-Gracia, M. Mirmehdi and J. Sigut, “Fast Perspective Recovery of Text in Natural Scenes”, Image and Vision Computing, vol. 31, no. 10, pp. 714-724, 2013
L. Tong, Q. Peng, S. Li, H. Zhao and G. Zhan, “Vector constraint and Ncc based Chinese document image mosaic”, Journal of Applied Sciences, vol. 13, no. 9, pp. 1537-1543, 2013
M. Rahnemoonfar and B. Plale, “Automatic performance evaluation of dewarping methods in large scale digitization of historical documents”, 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCLD'13), pp. 331-334, Indiana, USA, July 2013
L. Tong, G. Zhan, Q. Peng, Y. Li and Y. Li, “Warped Document Image Mosaicing Method Based on Inflection Point Detection and Registration”, 4th International Conference on Multimedia Information Networking and Security (MINES'12), pp. 306-310, Nanjing, Jiangsu, China, November 2012
A.M. Abdu and M.M. Mokji, “A novel approach to a dynamic template generation algorithm for multiple-choice forms”, International Conference on Control System, Computing and Engineering ( ICCSCE'12), pp. 216 - 221, Batu Ferringhi, Penang, November 2012
P. Yang, A. Antonacopoulos, C. Clausner and S. Pletschacher, “Grid-based modelling and correction of arbitrarily warped historical document images for large-scale digitisation”, 1st Workshop on Historical Document Imaging and Processing (HIP'11), pp. 106-111, Beijing, China, September 2011
I. Kastelan, M. Katona, D. Marijan and J. Zloh, “Automated optical inspection system for digital TV sets”, EURASIP Journal on Advances in Signal Processing, pp. 140-140. 2011

B. Gatos, N. Stamatopoulos and G. Louloudis, "ICDAR2009 Handwriting Segmentation Contest”, International Journal on Document Analysis and Recognition (IJDAR) vol. 14, no. 1, pp. 25-33, 2011. impact factor: 1.03

ICDAR 2009 Handwriting Segmentation Contest was organized in the context of ICDAR2009 conference in order to record recent advances in off-line handwriting segmentation. The contest includes handwritten document images produced by many writers in several languages (English, French, German and Greek). These images are manually annotated in order to produce the ground truth which corresponds to the correct text line and word segmentation result. For the evaluation, a well-established approach is used based on counting the number of matches between the entities detected by the segmentation algorithm and the entities in the ground truth. This paper describes the contest details including the dataset, the ground truth and the evaluation criteria and presents the results of the 12 participating methods as well as of two state-of-the-art algorithms. A description of the winning algorithms is also given.

G.M. Binmakhashen and S.A. Mahmoud, “Document Layout Analysis: A Comprehensive Survey”, ACM Computing Surveys (CSUR), vol 52, no. 6, 2019
B.M.K. Sharma and V.S. Dhaka, “Segmentation of handwritten words using structured support vector machine”, Pattern Analysis and Applications, DOI: https://doi.org/10.1007/s10044-019-00843-x, 2019
B. Barakat, A. Droby, M. Kassis and J. El-Sana, “Text Line Segmentation for Challenging Handwritten Document Images Using Fully Convolutional Network”, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR'18), pp. 374-379, 2018
M.S. Deshmukh, M.P. Patil and S.R. Kolhe, “A hybrid text line segmentation approach for the ancient handwritten unconstrained freestyle Modi script documents”, Imaging Science Journal, vol. 66, no. 7, pp. 433-44, 2018
D. Aldavert and M. Rusinol, “Manuscript text line detection and segmentation using second-order derivatives”, 13th IAPR International Workshop on Document Analysis Systems (DAS'18), pp. 293-298, 2018
T. Gruning, R. Labahn, M. Diem, F. Kleber and S. Fiel, “READ-BAD: A new dataset and evaluation scheme for baseline detection in archival documents”, 13th IAPR International Workshop on Document Analysis Systems (DAS'18), pp. 351-356, 2018
A. Vij and J. Pruthi, “An automated Psychometric Analyzer based on Sentiment Analysis and Emotion Recognition for healthcare”, International Conference on Computational Intelligence and Data Science (ICCIDS'18), pp. 1184-1191, 2018
T. Gruuening, G. Leifert, T. Strauss and R. Labahn, “A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 236-241, 2017
H.E. Bahi and A. Zatni, “Segmentation and recognition of text images acquired by a mobile phone”, International Journal of Tomography and Simulation, vol. 30, no. 4, pp. 95-107, 2017
J.L. Pach and P. Bilski, “A Robust Binarization and Text Line Detection in Historical Handwritten Documents Analysis”, International Journal of Computing, vol 3, no. 15, pp. 154-161, 2016
J.P. Pellicer, M.Z. Afzal, M. Liwicki and M.J. Castro-Bleda, “Complete Text Line Extraction with Convolutional Neural Networks and Watershed Transform”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 30-35, Santorini, Greece, 2016
O. Biller, I. Rabaev, K. Kedem, I. Dinsteiz and J.J. El-Sana, “Evolution maps and applications”, PeerJ, vol 2016, no. 1. art. no. e39, 2016
Y. Lin, Y. Song, Y. Li, F. Wang and K. He , “Multilingual corpus construction based on printed and handwritten character separation”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 826-830, Nancy, France, 2015
A. Asi, R. Cohen, K. Kedem and J. El-Sana, “Simplifying the Reading of Historical Manuscripts”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 826-830, Nancy, France, 2015
L. Wang, W. Fan, J. Sun, S. Naoi and T. Hiroshi, “Text Line Extraction in Document Images”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 191-195, Nancy, France, 2015
W. Swaileh, K.A. Mohand and T. Paquet, "Multi-script Iterative Steerable Directional Filtering For Handwritten Text Line Extraction", 5th International Workshop on Multilingual OCR (MOCR'15), pp. 1241-1245, Nancy, France, 2015
S. Chandna, D. Tonne, T. Jejkal, R. Stotzka, C. Krause, P, Vanscheidtc, H. Buschc and A. Prabhunea, “Software Workflow for the Automatic Tagging of Medieval Manuscript Images (SWATI)”, Document Recognition and Retrieval XXII, Vol. 9402, 940206, 2015
M.A. Ramírez-Ortegón, L.L. Ramírez-Ramírez, I.B. Messaoud, V. Märgner, E. Cuevas and R. Rojas, “Document Retrieval with Unlimited Vocabulary”, IEEE Winter Conference on Applications of Computer Vision, Waikoloa Beach, USA 2015
R. Cohen, I. Dinstein, J. El-Sana and K. Kedem, “Using Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents”, 11th International Conference on Image Analysis and Recognition (ICIAR'14), pp. 349-358, 2014
M.A. Ramírez-Ortegón, L.L. Ramírez-Ramírez, I.B. Messaoud, V. Märgner, E. Cuevas and R. Rojas, “A model for the gray-intensity distribution of historical handwritten documents and its application for binarization”, International Journal on Document Analysis and Recognition, vol. 17, no. 2, pp. 139-160, 2014
D. Brodic, Z.N. Milivojevic and D.R. Milivojevic, “Comparison of Two Goal-Oriented Methods for the Evaluation of the Text-Line Segmentation Algorithms”, Prezeglad Elektrotechniczny, SSN 0033-2097, R. 89 NR 6/2013, 2013
D. Brodic, “Methodology for the Evaluation of the Algorithms for Text Line Segmentation Based on Extended Binary Classification”, Measurement Science Review, vol. 11, no. 3, pp. 71-78, 2011

N. Nikolaou, M. Makridis, B. Gatos, N. Stamatopoulos and N. Papamarkos, “Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths”, Image and Vision Computing, vol. 28, no. 4, pp. 590-604, 2010. impact factor: 1.525

In this paper, we strive towards the development of efficient techniques in order to segment document pages resulting from the digitization of historical machine-printed sources. This kind of documents often suffer from low quality and local skew, several degradations due to the old printing matrix quality or ink diffusion, and exhibit complex and dense layout. To face these problems, we introduce the following innovative aspects: (i) use of a novel Adaptive Run Length Smoothing Algorithm (ARLSA) in order to face the problem of complex and dense document layout, (ii) detection of noisy areas and punctuation marks that are usual in historical machine-printed documents, (iii) detection of possible obstacles formed from background areas in order to separate neighboring text columns or text lines, and (iv) use of skeleton segmentation paths in order to isolate possible connected characters. Comparative experiments using several historical machine-printed documents prove the efficiency of the proposed technique.

G.M. Binmakhashen and S.A. Mahmoud, “Document Layout Analysis: A Comprehensive Survey”, ACM Computing Surveys (CSUR), vol 52, no. 6, 2019
R. Goyal, R.K. Narula and M. Kumar Jindal, "An experimental technique for ocr line and word segmentation using probability distribution estimation", International Journal of Recent Technology and Engineering, vol. 8, no. 2, pp. 1484-1494
S.R. Narang, M.K. Jindal and M. Kuma, "Line Segmentation of Devanagari Ancient Manuscripts", National Academy of Sciences, India Section A: Physical Sciences, pp. 1-8, 2019
M. Liulei, K. Moydin, A. Dawut and A. Hamdulla, "The Algorithms for Segmentation of Text-Lines in Handwriting Images", 3rd International Conference on Smart City and Systems Engineering (ICSCSE'18), pp. 919-922, 2018
J. Zheng, X. Miao, S.H. Fang, J. Chen and H Jiang, "Enhanced Character Segmentation for Multi-Language Data Plate in Substation Transformer Based on Connected Component Analysis", 15th International Conference on Control, Automation, Robotics and Vision (ICARCV'18), Singapore, pp. 180-185, 2018
E. Kamalanaban, M. Gopinath and S.Premkumar, "Medicine box: Doctor's prescription recognition using deep machine learning", International Journal of Engineering and Technology (UAE), vol. 7, no. 3.34, pp. 114-117, 2018
B. Arizanović and V. Vučković, "Efficient Compression and Decompression Algorithms for OCR Systems", Facta Universitatis, Series: Electronics and Energetics, vol. 31, no. 3, pp. 461-485, 2018
M. Daldali and A. Souhar, "Handwritten Arabic Documents Segmentation into Text Lines using Seam Carving", International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI), DOI: 10.9781/ijimai.2018.06.002, 2018
P. Sahare and S.B. Dhok, "Multilingual Character Segmentation and Recognition Schemes for Indian Document Images", IEEE Access, DOI: 10.1109/ACCESS.2018.2795104, 2018
L. Melinda, R. Ghanapuram and C. Bhagvati, “Document Layout Analysis Using Multigaussian Fitting”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 747-752, 2017
F. Simistira, M. Bouillon, M. Seuret, M. Wursch, M. Alberti, R. Ingold and M. Liwicki, “ICDAR2017 Competition on Layout Analysis for Challenging Medieval Manuscripts”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 1361-1370, 2017
V. Vučković and B. Arizanović, "General Character Segmentation Approach for Machine-Typed Documents", 4th International Conference on Electrical, Electronic and Computing Engineering (ETRAN'17), pp. RTI2.2.1-6, 2017
H. Jain and A.P. Kumar, "A Bottom Up Procedure for Text Line Segmentation of Latin Script", International Conference on Advances in Computing, Communications and Informatics (ICACCI'17), 2017
A. Souhar, Y. Boulid, E. Ameur amd M.M. Ouagague, "Watershed transform for text lines extraction on binary Arabic handwritten documents", 2nd International Conference on Big Data Cloud and Applications (BDCA'17), 2017
A. Souhar, Y. Boulid, E. Ameur amd M.M. Ouagague, "Segmentation of Arabic Handwritten Documents into Text Lines using Watershed Transform", International Journal of Interactive Multimedia and Artificial Intelligence, vol. 4, no. 6, pp. 96-102, 2017
N.R. Soora and P.S. Deshpande, "A novel local skew correction and segmentation approach for printed multilingual Indian documents", Alexandria Engineering Journal, https://doi.org/10.1016/j.aej.2017.06.010, 2017
V. Vučković and B. Arizanović, "Efficient Character Segmentation Approach for Machine-Typed Documents", Expert Systems with Applications, http://dx.doi.org/10.1016/j.eswa.2017.03.027, 2017
M. Mehri, P. Héroux, P. Gomez-Krämer and R. Mullot, “Texture feature benchmarking and evaluation for historical document image analysis”, International Journal of Electronics and Communications, DOI: 10.1007/s10032-016-0278-y, 2017
K. Jindal and R. Kumar, “A Novel Shape-Based Character Segmentation Method for Devanagari Script”, Arabian Journal for Science and Engineering, vol. 42, no. 8, pp. 3221-3228, 2017
J. Mtimet and H. Amiri , “A Combined Layer-Based Approach for the Segmentation of Document Images”, Journal of Circuits, Systems, and Computers, vol. 26, no. 10, 2017
P. Sahare and S.B. Dhok, "Review of Text Extraction Algorithms for Scene-text and Document Images", IETE Technical Review, vol 34, no. 2, pp. 144-164, 2017
Y. Yang, R. Pintus, E. Gobbetti E. and H. Rushmeier, "Automatic single page-based algorithms for medieval manuscript analysis", Journal on Computing and Cultural Heritage, vol 10, no. 2, art. no. 9, 2017
A.S. Kavitha, P. Shivakumara, G.H. Kumar and T. Lu, “A New Watershed Model based Syst em for Character Segmentation in Degraded Text Lines”, International Journal of Electronics and Communications, DOI: 10.1016/j.aeue.2016.11.007, 2016
S. Dey, J. Mukherjee and S. Sural, “Consensus-based clustering for document image segmentation”, International Journal on Document Analysis and Recognition (IJDAR), DOI: 10.1007/s10032-016-0275-1, 2016
R. Sharma, “Page Blocks Classification Using Rough Sets”, International Journal of Electrical Electronics & Computer Science Engineering, vol. 3, no. 2, pp. 25-28, 2016
T. Mondal, N. Ragot, J.Y. Ramel and U. Pal, “Flexible Sequence Matching Technique:An Effective Learning-free Approach For word-spotting”, Pattern Recognition, vol. 60, pp. 596-612, 2016
M. Kassis and J. El-Sana, “Complete Text Line Extraction with Convolutional Neural Networks and Watershed Transform”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 239-244, Santorini, Greece, 2016
J.P. Pellicer, M.Z. Afzal, M. Liwicki and M.J. Castro-Bleda, “Complete Text Line Extraction with Convolutional Neural Networks and Watershed Transform”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 30-35, Santorini, Greece, 2016
Z. Liu, F. Cheng and H. Hong, “Identification of Impurities in Fresh Shrimp Using Improved Majority Scheme-Based Classifier”, Journal of Food Analytical Methods, doi: "10.1007/s12161-016-0497-3", 2016
C. Grouin, “Text segmentation of digitized clinical texts”, 10th edition of the Language Resources and Evaluation Conference (LREC'16), pp. 3592-3599, Portorož, Slovenia, 2016
A. Baig, S. Al-Maadeed, A. Bouridane and M. Cheriet, “Direct Unsupervised Text Line Extraction from Colored Historical Manuscript Images Using DCT”, 13th International Conference on Image Analysis and Recognition (ICIAP'16), pp. 753-762, Portuga, 2016
K. Tanaka and K. Terasawa, "Character recognition of medieval English manuscripts supported by a word frequency table", 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Malaysia, pp. 700-704, 2016
R.R Nair, B.U. Kota, I. Nwogu and V. Govindaraju, “Segmentation of highly unstructured handwritten documents using a neural network technique”, 23rd International Conference on Pattern Recognition (ICPR'16), pp. 1291-1296, 2016
N. Venkata Rao, A.S.C.S. Sastry, A.S.N. Chakravarthy, and A.V. Srinivasa Rao, “Analysis of canonical character segmentation technique for ancient Telugu text documents”, Journal of Theoretical and Applied Information Technology vol. 82, no. 2, pp. 311-320, 2015
Y. Lin, Y. Li, Y. Song and F. Wang, “Fast document image comparison in multilingual corpus without OCR”, Multimedia Systems, pp. 1-10, DOI: 10.1007/s00530-015-0484-3, 2015
J. Puigcerver, A.H. Toselli and E. Vidal, “ICDAR2015 Competition on Keyword Spotting for Handwritten Documents”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 1176-1180, Nancy, France, 2015
M. Javed, P. Nagabhushan and B.B. Chaudhuri, “A Direct Approach for Word and Character Segmentation in Run-Length Compressed Documents with an Application to Word Spotting”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 216-220, Nancy, France, 2015
J. Wu, F. Da, C. Wang and S. Gai, “Handwritten Character Recognition Based on Weighted Integral Image and Probability Model”, 8th International Conference on Image and Graphics (ICIG'15), China, 2015
R.K. Mohapatra, B. Majhi and S.K. Jena, “Printed Odia Digit Recognition Using Finite Automaton”, 3rd International Conference on Advanced Computing, Networking, and Informatics (ICACNI'15), KIIT University, Orissa, India, 2015
A.B. Shinde and Y.H. Dandawate, “Shirorekha extraction in Character Segmentation for printed devanagri text in Document Image Processing”, 11th IEEE India Conference: Emerging Trends and Innovation in Technology (INDICON'14), Article number 7030535, 2015
M. Mehri, P. Gomez-Krämer, P. Héroux and A. Boucher, “A texture-based pixel labeling approach for historical books”, Pattern Analysis and Applications, Pattern Analysis and Applications, DOI: 110.1007/s10044-015-0451-9 2015
N. Arvanitopoulos and S. Süsstrunk, "Binarization-free Text Line Extraction for Historical Manuscripts", 25th Conference on Digital Humanities (DH2014), pp. 83-85, 2014
P. Duygulu, D. Arifoglu and M. Kalpakli, “Cross-document word matching for segmentation and retrieval of Ottoman divans”, Pattern Analysis and Applications, DOI: 10.1007/s10044-014-0420-8 2014
N. Arvanitopoulos and S. Süsstrunk, "Seam Carving for Text Line Extraction on Color and Grayscale Historical Manuscripts", 4th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 726-731, Creta, Grecce, September 2014
G. Kamola, M. Spytkowski, M. Paradowski and U. Markowska-Kaczmar, “Image-based logical document structure recognition”, Pattern Analysis and Applications, DOI 10.1007/s10044-014-0412-8, 2014
K. Khankasikam, “Restoration of Degraded Historical Document Image: An Adaptive Multilayer-Information Binarization Technique”, Journal of Information Science and Engineering, vol. 30, no. 5, pp. 1321-1338, 2014
J. Ji, L. Peng and B. Li, “Graph Model Optimization Based Historical Chinese Character Segmentation Method”, 11th IAPR International Workshop on Document Analysis Systems (DAS'14), Tours, France, pp. 282-286, 2014
A. Fischer, M. Baechler, A. Garz, M. Liwicki and R. Ingold, “A Combined System for Text Line Extraction and Handwriting Recognition in Historical Documents”, 11th IAPR International Workshop on Document Analysis Systems (DAS'14), Tours, France, pp. 71-75, 2014
G.F. Chen and J.S. Sheu, “An optical music recognition system for traditional Chinese Kunqu Opera scores written in Gong-Che Notation”, Eurasip Journal on Audio, Speech, and Music Processing, vol. 2014, March 2014, Article number 7 , 2014
M.A. Ramírez-Ortegón, L.L. Ramírez-Ramírez, I.B. Messaoud, V. Märgner, E. Cuevas and R. Rojas, “A model for the gray-intensity distribution of historical handwritten documents and its application for binarization”, International Journal on Document Analysis and Recognition,vol. 17, no. 2, pp. 139-160, 2014
L. Huang, F. Yin and Q. Chen, “Graph-based ensemble method for text line segmentation in offline Chinese handwritten documents”, Journal of Huazhong University of Science and Technology (Natural Science Edition), vol. 42, no.3, pp. 33-36, 2014
G.F. Chen, “Intangible cultural heritage preservation: An exploratory study of digitization of the historical literature of Chinese Kunqu opera librettos”, Journal on Computing and Cultural Heritage (JOCCH), vol. 7, no. 1, Article No. 4 , 2014
J. Ramya, and B. Parvathavarthini, “Feed forward back propagation neural network based character recognition system for tamil palm leaf manuscripts”, Journal of Computer Science, vol. 10, no. 4, pp. 660-670, 2014
V.K. Koppula and A. Negi, “Segmentation of closely set and touching lines in handwritten document images using fringe maps”, International Conference for Convergence of Technology (I2CT'14), Pune, India, 2014
N. Audenaert amd N.M. Houston, “VisualPage: Towards large scale analysis of nineteenth-century print culture”, International Conference on Big Data, Big Data, pp. 9-16, Santa Clara, USA, 2013
M. Javed, P. Nagabhushan and B.B. Chaudhuri, “Extraction of line-word-character segments directly from run-length compressed printed text-documents”, 4th National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG 2013), art. no. 6776195, Rajasthan, India, 2013
M. Baechler, M. Liwicki and R. Ingold, “Text Line Extraction using DMLP Classifiers for Historical Manuscripts”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 1029-1033, Washington DC, USA, August 2013
Y. Mei, X. Wang and J. Wang, “A Chinese Character Segmentation Algorithm for Complicated Printed Documents”, International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 6, no. 3, pp. 91-100, 2013
M.A. Ramírez-Ortegóna, V. Märgnera, E. Cuevasc and R. Rojasb, “An optimization for binarization methods by removing binary artifacts”, Pattern Recognition Letters, vol. 34, no. 11, pp. 1299-1306, 2013
Y. Mei, X. Wang and J. Wang , “An Efficient Character Segmentation Algorithm for Printed Chinese Documents”, Ubiquitous Computing and Multimedia Applications, vol. 22, pp. 183-189, 2013
I.B. Messaoud, H. Amiri, H.E. Abed and V. Märgner, “A multilevel text line segmentation framework for handwritten historical documents”, 13th International Conference on Frontiers in Handwriting Recognition (ICFHR'12), pp. 515-520, Bari, Italy, September 2012
G. Chen, W. Zhang and H. Cui, “Extracting Notes from Chinese Gong-che Notation Musical Score Image Using a Self-adaptive Smoothing and Connected Component Labeling Algorithm”, International Journal of Advancements in Computing Technology, vol. 4, no. 1, pp.86-95, 2012
S. Dey, J. Mukhopadhyay, S. Sural and P. Bhowmick, “Margin Noise Removal from Printed Document Images”, Workshop on Document Analysis and Recognition (DAR'12), Mumbai, ACM Press, 2012
E. Matthaiou and E. Kavallieratou, “An information extraction system from patient historical documents”, 27th Annual ACM Symposium on Applied Computing (SAC'12), pp. 787-791, Italy, March 2012
G. Dang and X. Cheng, “Research on the robustness and integral performance optimization of PI control system”, Journal of Convergence Information Technology, vol. 7, no. 11, pp. 209-216, 2012
Y. Huang, “Research on the line loss rate prediction technology based on the kernel partial least squares”, Journal of Convergence Information Technology, vol. 7, no. 11, pp. 376-383, 2012
A. Garz, A. Fischer, R. Sablatnig and H. Bunke, "Binarization-Free Text Line Segmentation for Historical Documents Based on Interest Point Clustering", 10th IAPR International Workshop on Document Analysis Systems (DAS'12), pp.95-99, Gold Coast, Queensland, Australia, 2012
A.O. Rait and K.S. Venkatesh, “Automatic language-independent indexing of documents using image processing”, 7th International Conference on MEMS, NANO and Smart Systems (ICMENS'11), pp. 817-822, Kuala Lumpur, Malaysia, November 2011
M. Rais, N.A. Goussies and M. Mejail, “Using adaptive run length smoothing algorithm for accurate text localization in images”, 16th Iberoamerican Congress on Pattern Recognition, (CIARP'11), pp. 149-156, Pucón, Chile, November 2011
P. Soujanya1, V.K. Koppula, K. Gaddam and P. Sruthi, “Comparative Study of Text Line Segmentation Algorithms on Low Quality Documents”, International Journal of Computer Science & Informatics, vol II, pp. 110-116, 2011
C. Neudecker, Z.M. Dogan, S. Schlarb, P. Missier, S. Sufi, A. Williams and K. Wolstencroft, “An Experimental Workflow Development Platform for Historical Document Digitisation and Analysis”, 1st Workshop on Historical Document Imaging and Processing (HIP'11), pp. 161-168, Beijing, China, September 2011
M. Zhao, S. Li and J. Kwok, “Text detection in images using sparse representation with discriminative dictionaries”, Image and Vision Computing, vol. 28, no. 12, pp. 1590-1599, 2010.

N. Stamatopoulos, B. Gatos and S.J. Perantonis, “A Method for Combining Complementary Techniques for Document Image Segmentation”, Pattern Recognition Journal, vol. 42, no. 12, pp. 3158-3168, 2009. impact factor: 2.554

Image segmentation is a major task of handwritten document image processing. Many of the proposed techniques for image segmentation are complementary in the sense that each of them using a different approach can solve different difficult problems such as overlapping, touching components, influence of author or font style etc. In this paper, a combination method of different segmentation techniques is presented. Our goal is to exploit the segmentation results of complementary techniques and specific features of the initial image so as to generate improved segmentation results. Experimental results on line segmentation methods for handwritten documents demonstrate the effectiveness of the proposed combination method.

G.A. Farulla, N. Murru and R. Rossini , “A Fuzzy Approach to Segment Touching Characters”, Expert Systems with Applications, vol. 88, no. 1, pp. 1-13, 2017.
F. Drira and F. LeBourgeois, “Mean-Shift segmentation and PDE-based nonlinear diffusion: toward a common variational framework for foreground/background document image segmentation”, International Journal on Document Analysis and Recognition (IJDAR), DOI: 10.1007/s10032-017-0285-7, 2017.
S. Eskenazi, P. Gomez-Krämer and J.M. Ogier, “A comprehensive survey of mostly textual document segmentation algorithms since 2008”, Pattern Recognition, vol. 67, pp. 1-14, 2017.
N.V. Borse and I.R. Shaikh, “Text Extraction from Handwritten Documents”, International Journal Of Engineering, Education And Technology (ARDIJEET), vol. 3, no.2, 2015.
L. Huang, F. Yin and Q. Chen, “Graph-based ensemble method for text line segmentation in offline Chinese handwritten documents”, Journal of Huazhong University of Science and Technology (Natural Science Edition), vol. 42, no.3, pp. 33-36, 2014.
T. Kathirvalavakumar and M.K. Selvi, “Efficient touching text line segmentation in Tamil script using horizontal projection”, 1st International Conference on Mining Intelligence and Knowledge Exploration (MIKE'03), pp. 279-288, Tamil Nadu, India, December 2013.
N. Modi and K. Jindal, “Text Line detection and Segmentation in Handwritten Gurumukhi Scripts”, International Journal of Advanced Research in Computer Science and Software Engineering, vol. 3, no. 5, pp. 1075-1080, 2013.
Y. Zhang and L. Wu, “Fast document image binarization based on an improved adaptive Otsu's method and destination word accumulation”, Journal of Computational Information Systems, vol. 7, no. 6, pp. 1886-1892, 2011.

G. Retsinas, G. Louloudis, N. Stamatopoulos, G. Sfikas and B. Gatos, “An Alternative Deep Feature Approach to Line Level Keyword Spotting”, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR'19), pp. 12658-12666, California, USA, 2019.

Keyword spotting (KWS) is defined as the problem of detecting all instances of a given word, provided by the user either as a query word image (Query-by-Example, QbE) or a query word string (Query-by-String, QbS) in a body of digitized documents. Keyword detection is typically preceded by a preprocessing step where the text is segmented into text lines (line-level KWS). Methods following this paradigm are monopolized by test-time computationally expensive handwritten text recognition (HTR)-based approaches; furthermore, they typically cannot handle image queries (QbE). In this work, we propose a time and storage-efficient, deep feature-based approach that enables both the image and textual search options. Three distinct components, all modeled as neural networks, are combined: normalization, feature extraction and representation of image and textual input into a common space. These components, even if designed on word level image representations, collaborate in order to achieve an efficient line level keyword spotting system. The experimental results indicate that the proposed system is on par with state-of-the-art KWS methods.

G. Retsinas, G. Sfikas, G. Louloudis, N. Stamatopoulos and B. Gatos, “Compact Deep Descriptors for Keyword Spotting”, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR'18), pp. 315-320, Niagara Falls, USA, 2018.

In this work, we present a novel approach for the extraction of deep features from a Convolutional Neural Network (CNN), designed for the task of Keyword Spotting (KWS). The main novelty of our work concerns the generation of a compact descriptor able to simulate the existence/absence of unigrams or bigrams. This is accomplished using a binary, attribute-based representation of a word string together with an appropriate training procedure. Deep features are extracted from the output of the last convolutional layer and are organized into zones in order to incorporate spatial information of the detected attributes. In addition, a novel optimization scheme is proposed which relies on a very effective initialization of the network generating the compact descriptors. Experiments conducted on the IAM dataset prove the efficiency of the novel compact descriptor since the proposed system’s performance in on par with the state-of-the-art.

G. Retsinas, G. Sfikas, N. Stamatopoulos, G. Louloudis and B. Gatos, “Exploring critical aspects of CNN-based Keyword Spotting. A PHOCNet study”, 13th IAPR International Workshop on Document Analysis Systems (DAS'18), pp. 13-18, Vienna, Austria, 2018.

Deep convolutional neural networks are today the new baseline for a wide range of machine vision tasks. The problem of keyword spotting is no exception to this rule. Many successful network architectures and learning strategies have been adapted from other vision tasks to create successful keyword spotting systems. In this paper, we argue that various details concerning this adaptation could be reexamined, to the end of building stronger spotting models. In particular, we examine the usefulness of a pyramidal spatial pooling layer versus a simpler approach, and show that a zoning strategy combined with fixed-size inputs can be just as effective while less computationally expensive. We also examine the usefulness of augmentation, class balancing and ensemble learning strategies and propose an improved network. Our hypotheses are tested with numerical experiments on the IAM document collection, where the proposed network outperforms all other existing models.

G. Dinelli, G. Meoni, E. Rapuano, G. Benelli and L. Fanucci, "An FPGA-Based Hardware Accelerator for CNNs Using On-Chip Memories Only: Design and Benchmarking with Intel Movidius Neural Compute Stick", International Journal of Reconfigurable Computing, https://doi.org/10.1155/2019/7218758, 2019
X. Wang, S. Sun, and L. Xie, "Virtual adversarial training for ds-cnn based small-footprint keyword spotting", IEEE Automatic Speech Recognition and Understanding Workshop (ASRU'19), 2019
X. Wang, S. Sun, C. Shan, J. Hou, L. Xie, S. Li and X. Lei, "Adversarial Examples for Improving End-to-end Attention-based Small-footprint Keyword Spotting", IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6366-6370, 2019

G. Retsinas, G. Louloudis, N. Stamatopoulos, G. Sfikas and B. Gatos, “Nonlinear Manifold Embedding on Keyword Spotting using t-SNE”, 14th International Conference on Document Analysis and Recognition (ICDAR'17), pp. 487-492, Kyoto, Japan, 2017.

Nonlinear manifold embedding has attracted considerable attention due to its highly-desired property of efficiently encoding local structure, i.e. intrinsic space properties, into a low-dimensional space. The benefit of such an approach is twofold: it leads to compact representations while addressing the often-encountered curse of dimensionality. The latter plays an important role in retrieval applications, such as keyword spotting, where a sorted list of retrieved objects with respect to a distance metric is required. In this work, we explore the efficiency of the popular manifold embedding method t-distributed Stochastic Neighbor Embedding (t-SNE) on the Query-by-Example keyword spotting task. The main contribution of this work is the extension of t-SNE in order to support out-of-sample (OOS) embedding which is essential for mapping query images to the embedding space. The experimental results demonstrate a significant increase in keyword spotting performance when the word similarity is calculated on the embedding space.

H. Wei, H. Zhang and G. Gao, "Word Image Representation Based on Visual Embeddings and Spatial Constraints for Keyword Spotting on Historical Documents", International Conference on Pattern Recognition (ICPR'18), pp. 3616-3621. 2018

S. Fiel, F. Kleber, M. Diem, V. Christlein G. Louloudis, N. Stamatopoulos and B. Gatos, “ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI)”, 14th International Conference on Document Analysis and Recognition (ICDAR'17), pp. 1377-1382, Kyoto, Japan, 2017.

The ICDAR 2017 Competition on Historical Document Writer Identification is dedicated to record the most recent advances made in the field of writer identification. The goal of the writer identification task is the retrieval of pages, which have been written by the same author. The test dataset used in this competition consists of 3600 handwritten pages originating from 13th to 20th century. It contains manuscripts from 720 different writers where each writer contributed five pages. This paper describes the dataset, as well as the details of the competition. Five different institutions submitted six methods which were ranked using identification and retrieval metrics. The paper describes the competition details including the dataset, the evaluation measures used as well as a short description of each submitted method.

S. Das, "A statistical tool based binarization method for document images", Multimedia Tools and Applications vol. 78, no. 19, pp 27449–27462, 2019
M. Dahllöf, "Automatic Scribe Attribution for Medieval Manuscripts", Digital Medievalist, vol. 11, no. 1, pp. 6, 2018
G. Abdeljalil, I. Siddiqi, C. Djeddi and S. Al-Maadeed, “Writer Identification on Historical Documents Using Oriented Basic Image Features”, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR'18), pp. 369-373, 2018

G. Louloudis, G. Sfikas, N. Stamatopoulos and B. Gatos, “Word Segmentation using the Student’s-t Distribution”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 78-83, Santorini, Greece, 2016.

Word segmentation refers to the process of defining the word regions of a text line. It is a critical stage towards word and character recognition as well as word spotting and mainly concerns three basic stages, namely preprocessing, distance computation and gap classification. In this paper, we propose a novel word segmentation method which uses the Student’s-t distribution for the gap classification stage. The main advantage of the Student’s-t distribution concerns its robustness to the existence of outliers. In order to test the efficiency of the proposed method we used the four benchmarking datasets of the ICDAR/ICFHR Handwriting Segmentation Contests as well as a historical typewritten dataset of Greek polytonic text. It is observed that the use of mixtures of Student’s-t distributions for word segmentation outperforms other gap classification methods in terms of Recognition Accuracy and F-Measure. Also, in terms of all examined benchmarks, the Student's-t is shown to produce a perfect segmentation result in significantly more cases than the state-of-the-art Gaussian mixture model.

G. Axler and L. Wolf, "Toward a Dataset-Agnostic Word Segmentation Method", 25th IEEE International Conference on Image Processing (ICIP'18), pp. 2635-2639, 2018

G. Retsinas, G. Louloudis, N. Stamatopoulos and B. Gatos, “Keyword Spotting in Handwritten Documents using Projections of Oriented Gradients”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 411-416, Santorini, Greece, 2016.

In this paper, we present a novel approach for segmentation-based handwritten keyword spotting. The proposed approach relies upon the extraction of a simple yet efficient descriptor which is based on projections of oriented gradients. To this end, a global and a local word image descriptors, together with their combination, are proposed. Retrieval is performed using to the euclidean distance between the descriptors of a query image and the segmented word images. The proposed methods have been evaluated on the dataset of the ICFHR 2014 Competition on handwritten keyword spotting. Experimental results prove the efficiency of the proposed methods compared to several state-of-the-art techniques.

H. El Bahi and A. Zatni, “Text recognition in document images obtained by a smartphone based on deep convolutional and recurrent neural network”, Multimedia Tools and Applications, vol. 78, no. 18, pp 26453–2648, 2019
V. Thakur and H. Sikarwar, “Deep Learning Feature Extraction for Handwritten Keyword Spotting in Historical Documents”, 2nd International Conference on Emerging Trends in Engineering & Applied Science (ICETEAS'19), vol. 5, no. 1, pp. 11 – 15, 2019
P. Shivakumara, S. Roy, H.A. Jalab, R.W. Ibrahim, U. Pal, T. Luc, V. Khare and A. Wahaba, “Fractional means based method for multi-oriented keyword spotting in video/scene/license plate images”, Expert Systems with Applications, https://doi.org/10.1016/j.eswa.2018.08.015
R. Ahmed, W.G. Al-Khatib and S. Mahmoud, “A Survey on handwritten documents word spotting”, International Journal of Multimedia Information Retrieval, DOI: 10.1007/s13735-016-0110-y2016

G. Retsinas, G. Louloudis, N. Stamatopoulos and B. Gatos, “Efficient Document Image Segmentation Representation by Approximating Minimum-Link Polygons”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 293-298, Santorini, Greece, 2016.

The result of a document image segmentation task, e.g. text line or word segmentation, is usually a labeled image with each label corresponding to a different segmented region. For many applications, the segmented regions need to be stored and represented in an efficient way, using simple geometric shapes. A challenging task is to restrict all pixels corresponding to a specific label inside a polygon with a minimum number of vertices. Such a polygon promotes the description simplicity and the storage efficiency, while providing a much more userfriendly representation that can be edited easily. The proposed method is a cost-effective approximation of the minimum-edges polygon problem, computing a contour enclosing only pixels of a certain label and using a greedy algorithm in order to reduce the contour into a minimum-link polygon that retains the separability property between the labeled set of pixels.

G. Sfikas, G. Louloudis, N. Stamatopoulos and B. Gatos, “Bayesian mixture models on connected components for Newspaper article segmentation”, ACM Symposium on Document Engineering (DocEng'16), pp. 143 - 146, 2016, Vienna, Austria, 2016.

In this paper we propose a new method for automated segmentation of scanned newspaper pages into articles. Article regions are produced as a result of merging sub-article level content and title regions. We use a Bayesian Gaussian mixture model to model page Connected Component information and cluster input into subarticle components. The Bayesian model is conditioned on a prior distribution over region features, aiding classification into titles and content. Using a Dirichlet prior we are able to automatically estimate correctly the number of title and article regions. The method is tested on a dataset of digitized historical newspapers, where visual experimental results are very promising.

N. Stamatopoulos, G. Louloudis and B. Gatos, “Goal-Oriented Performance Evaluation Methodology for Page Segmentation Techniques”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 281-285, Nancy, France, 2015.

Document image segmentation is a fundamental step in the document image analysis pipeline as it affects the accuracy of subsequent processing steps. An objective and realistic evaluation of page segmentation techniques is crucial for a quantitative comparison among them. In this paper, a goal-oriented performance evaluation methodology that calculates a comprehensive evaluation measure SR (Success Rate) is presented. SR measure reflects the entire performance of a page segmentation technique in a concise quantitative manner. It is a pixel-based approach which avoids the dependence on a strictly defined ground-truth. The proposed evaluation measure SR deals only with text regions and is correlated with the percentage of the text information in which the subsequent processing (e.g. text line segmentation and recognition) can be applied successfully.

L. Quirós, C.D. Martínez-Hinarejos, A.H. Toselli and E. Vidal, “Interactive Layout Detection”, Iberian Conference on Pattern Recognition and Image Analysis (IbPRIA'17), pp. 161-168, Faro, Portugal, 2017
S. Eskenazi, P. Gomez-Kramer and J.M. Ogier, “Evaluation of the stability of four document segmentation algorithms”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 215-220, Santorini, Greece, 2016

B. Gatos, N. Stamatopoulos, G. Louloudis, G. Sfikas, G. Retsinas, V. Papavassiliou, F. Simistira and V. Katsouros, “GRPOLY-DB: An Old Greek Polytonic Document Image Database”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 646-650, Nancy, France, 2015.

Recognition of old Greek document images containing polytonic (multi accent) characters is a challenging task due to the large number of existing character classes (more than 270) which cannot be handled sufficiently by current OCR technologies. Taking into account that the Greek polytonic system was used from the late antiquity until recently, a large amount of scanned Greek documents still remains without full text search capabilities. In order to assist the progress of relevant research, this paper introduces the first publicly available old Greek polytonic database GRPOLY-DB for the evaluation of several document image processing tasks. It contains both machine-printed and handwritten documents as well as annotation with ground-truth information that can be used for training and evaluation of the most common document image processing tasks, i.e., text line and word segmentation, text recognition, isolated character recognition and word spotting. Results using several representative baseline technologies are also presented in order to help researchers evaluate their methods and advance the frontiers of old Greek document image recognition and word spotting.

P.P. Roy, A.K. Bhunia, A. Bhattacharyya and U. Pal, “Word searching in scene image and video frame in multi-script scenario using dynamic shape coding”, Multimedia Tools and Applications, https://doi.org/10.1007/s11042-018-6484-5, 2018
D. Aldavert and M. Rusinol, “Manuscript text line detection and segmentation using second-order derivatives”, 13th IAPR International Workshop on Document Analysis Systems (DAS'18), pp. 293-298, 2018
M. Mehri, P. Héroux, R. Mullot, J.P. Moreux, B. Coüasnon, B. Bertrand and B. Barrett, “HBA 1.0: A Pixel-based Annotated Dataset for Historical Book Analysis”, International Workshop on Historical Document Imaging and Processing (HIP'17), pp. 107-112, 2017
R. Ahmed, W.G. Al-Khatib and S. Mahmoud, “A Survey on handwritten documents word spotting”, International Journal of Multimedia Information Retrieval, DOI: 10.1007/s13735-016-0110-y 2016

G. Retsinas, B. Gatos, N. Stamatopoulos and G. Louloudis, “Isolated Character Recognition using Projections of Oriented Gradients”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 336-340, Nancy, France, 2015.

In this paper, we present a new approach for off-line isolated character recognition. The proposed method relies upon the application of a projection-based feature extraction stage, which resembles the Radon transform, on both the original image and a set of generated images corresponding to different gradient orientations of the original image. For the classification stage, Support Vectors Machines (SVM) are used. The proposed method is evaluated using one typewritten (GRPOLY-DB - Historical Greek) and two handwritten (CIL - Greek, CEDAR - English) publicly available databases. Experimental results prove the efficiency of the proposed method compared to several state-of- the-art techniques.

S. Kaur and S. Rani, “Isolated Curved Gurmukhi Character Recognition Using Projection of Gradient”, International Journal of Computational Intelligence Research (IJCIR), vol. 13, no. 6, pp. 1387-1396, 2017

G. Retsinas, B. Gatos, A. Antonacopoulos, G. Louloudis and N. Stamatopoulos, “Historical Typewritten Document Recognition Using Minimal User Interaction”, 3rd International Workshop on Historical Document Imaging and Processing (HIP’15), pp. 31-38, Nancy, France, 2015.

Recognition of low-quality historical typewritten documents can still be considered as a challenging and difficult task due to several issues i.e. the existence of faint and degraded characters, stains, tears, punch holes etc. In this paper, we exploit the unique characteristics of historical typewritten documents in order to propose an efficient recognition methodology that requires minimum user interaction. It is based on a pre-processing stage in order to enhance the quality and extract connected components, on a semi-supervised clustering for detecting the most representative character samples and on a segmentation-free recognition stage based on a template matching and cross-correlation technique. Experimental results prove that even with minimum user interaction, the proposed method can lead to promising accuracy results.

N. Stamatopoulos, G. Louloudis and B. Gatos, “A Novel Transcript Mapping Technique for Handwritten Document Images”, 14th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 41-46, Creta, Grecce, September 2014.

Transcript mapping refers to the process of aligning meaningful units of a handwritten document image (e.g. text lines, words, characters) with the corresponding transcription information. It has many applications such as (i) fast generation of ground truth at different granularity levels and (ii) indexing handwritten collections for document retrieval. In this paper, a novel transcript mapping technique is proposed which is guided by the number of words as well as the characters per word of a text line. The proposed method combines the results of a local and a global approach using a scoring algorithm. The efficiency of the proposed method is demonstrated by experimentation conducted on a known, publicly available dataset, achieving word level alignment accuracy of 99.48%.

R. Cohen, I. Rabaev, J. El-Sana, K. Kedem and I. Dinstein, “Aligning transcript of historical documents using energy minimization”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 266-270, Nancy, France, 2015

B. Gatos, G. Louloudis and N. Stamatopoulos, “Segmentation of Historical Handwritten Documents into Text Zones and Text Lines”, 14th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 464-469, Creta, Grecce, September 2014.

In order to achieve accurate text recognition performance for historical handwritten document images, robust and efficient page segmentation is necessary. In this paper, we propose a text zone detection followed by a text line segmentation method suitable for historical handwritten documents. Our aim is to handle several challenging cases such as horizontal and vertical rule lines overlapping with the text, two column documents and characters of different text lines touching vertically. For text zone detection, we analyze vertical rule lines, connected components as well as vertical white runs while for text line segmentation, we enhance an existing approach based on Hough transform in order to better treat cases of vertical connected characters. Both methods have been proved very promising after an evaluation using a set of historical handwritten documents.

M. Pastor, "Text baseline detection, a single page trained system", Pattern Recognition, vol. 94, pp. 149-161, 2019
S.R. Narang, M.K. Jindal and M. Kumar, "Drop flow method: an iterative algorithm for complete segmentation of Devanagari ancient manuscripts", Multimedia Tools and Applications. DOI: 10.1007/s11042-019-7620-6, 2019
S. Capobianco, L. Scommegna and S. Marinai, "Historical handwritten document segmentation by using a weighted loss", 8th IAPR TC3 workshop on Artificial Neural Networks for Pattern Recognition (ANNPR'18), pp. 395-406, 2018
P. Kahle, S. Colutto, G. Hackl and G. Mühlberger, "Transkribus-a Service Platform for Transcription, Recognition and Retrieval of Historical Documents", 14th International Conference on Document Analysis and Recognition (ICDAR'17), pp. 19-24, 2017
A. Fawzi, M. Pastor and C.D. Martínez-Hinarejos, "Baseline Detection on Arabic Handwritten Documents", ACM Symposium on Document Engineering (DocEng'17), pp. 193-196, 2017
V. Vučković and B. Arizanović, " Efficient Character Segmentation Approach for Machine-Typed Documents", Expert Systems with Applications, doi.org/10.1016/j.eswa.2017.03.027, 2017
A.S. Kavitha, P. Shivakumara, G.H. Kumar and T. Lu, “Text segmentation in degraded historical document images”, Egyptian Informatics Journal, doi:10.1016/j.eij.2015.11.003, 2015
V. Romero, J.A. Sanchez, V. Bosch, K. Depuydt and J. de Does, “Influence of Text Line Segmentation in Handwritten Text Recognition”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 536-540, Nancy, France, 2015

I. Pratikakis, K. Zagoris, B. Gatos, G. Louloudis and N. Stamatopoulos, “ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)”, 14th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 814-819, Creta, Grecce, September 2014.

H-KWS 2014 is the Handwritten Keyword Spotting Competition organized in conjunction with ICFHR 2014 conference. The main objective of the competition is to record current advances in keyword spotting algorithms using established performance evaluation measures frequently encountered in the information retrieval literature. The competition comprises two distinct tracks, namely, a segmentation-based and a segmentationfree track. Five (5) distinct research groups have participated in the competition with three (3) methods for the segmentationbased track and four (4) methods for the segmentation-free track. The benchmarking datasets that were used in the contest contain both historical and modern documents from multiple writers. In this paper, the contest details are reported including the evaluation measures and the performance of the submitted methods along with a short description of each method.

A. Hast and E. Vats, “Radial line fourier descriptor for historical handwritten text representation”, Journal of WSCG, vol. 26, no. 1, pp. 31-40, 2018
A.H. Toselli, E. Vidal, J. Puigcerver and E. Noya-Garcia, “Probabilistic multi-word spotting in handwritten text images”, Pattern Analysis and Applications, DOI https://doi.org/10.1007/s10044-018-0742-z, 2018
A. Santoro, C.D. Stefano and A. Marcelli, “Assisted Transcription of Historical Documents by Keyword Spotting: A Performance Model”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 971-976, 2017
M.L. Bouined, H. Nemmour and Y. Chibani, "New gradient descriptor for keyword spotting in handwritten documents", 3rd International Conference on Advanced Technologies for Signal and Image (ATSIP'17), 2017
P.P. Roy, A.K. Bhunia, A. Das, P. Dhar and U. Pal, "Keyword spotting in doctor's handwriting on medical prescriptions", Expert Systems with Applications, vol. 76, no. 15, pp. 113-128, 2017
A. Santoro, A. Parziale and A. Marcelli, “A human in the loop approach to historical handwritten documents transcription”, 15th International Conference on Frontiers in Handwriting Recognition (ICFHR'16), pp. 222-227, 2016
R. Ahmed, W.G. Al-Khatib and S. Mahmoud , “A Survey on handwritten documents word spotting”, International Journal of Multimedia Information Retrieval, DOI: 10.1007/s13735-016-0110-y2016
A. Hast and A. Fornes, “A Segmentation-free Handwritten Word Spotting Approach by Relaxed Feature Matching”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 150-155, Santorini, Greece, 2016
J. Puigcerver, A.H. Toselli and E. Vidal, “ICDAR2015 Competition on Keyword Spotting for Handwritten Documents”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 1176-1180, Nancy, France, 2015
E. Vidal, A.H. Toselli and J. Puigcerver, “High Performance Query-by-Example Keyword Spotting Using Query-by-String Techniques”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 741-745, Nancy, France, 2015
S. Yao, Y. Wen and Y. Lu, “HoG based Two-Directional Dynamic Time Warping for Handwritten Word Spotting”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 161-165, Nancy, France, 2015

B. Gatos, N. Stamatopoulos, G. Louloudis and S.J. Perantonis, “H-DocPro: a document image processing platform for historical documents”, 1st International Conference on Digital Access to Textual Cultural Heritage (DATeCH '14), pp. 131-136, Madrid, Spain, May 2014.

In this paper, we introduce the H-DocPro platform which is a publicly available document image processing platform for historical documents. H-DocPro is a result of our recent and on-going research on historical document image processing and has been developed in order to monitor the successive application of several new or state-of-the-art document image processing methods. It is an open architecture software platform that permits several document image processing modules and methods (e.g. binarization, image enhancement, page split) to be utilized in an easy to define processing workflow. We provide detailed information on how to use H-DocPro, the available modules and methods as well as the way one can add his own components exploiting the open architecture form of the platform. Representative examples and experimental results using large sets of historical document images demonstrate the efficiency of H-DocPro methods.

N. Stamatopoulos, G. Louloudis, B. Gatos, U. Pal and A. Alaei, “ICDAR2013 Handwriting Segmentation Contest”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 1402-1406, Washington DC, USA, August 2013.

This paper presents the results of the Handwriting Segmentation Contest that was organized in the context of the ICDAR2013. The general objective of the contest was to use well established evaluation practices and procedures to record recent advances in off-line handwriting segmentation. Two benchmarking datasets, one for text line and one for word segmentation, were created in order to test and compare all submitted algorithms as well as some state-of-the-art methods for handwritten document image segmentation in realistic circumstances. Handwritten document images were produced by many writers in two Latin based languages (English and Greek) and in one Indian language (Bangla, the second most popular language in India). These images were manually annotated in order to produce the ground truth which corresponds to the correct text line and word segmentation results. The datasets of previously organized contests (ICDAR2007, ICDAR2009 and ICFHR2010 Handwriting Segmentation Contests) along with a dataset of Bangla document images were used as training dataset. Eleven methods are submitted in this competition. A brief description of the submitted algorithms, the evaluation criteria and the segmentation results obtained from the submitted methods are also provided in this manuscript.

B.M.K. Sharma and V.S. Dhaka, “Segmentation of handwritten words using structured support vector machine”, Pattern Analysis and Applications, DOI: https://doi.org/10.1007/s10044-019-00843-x, 2019
S. Kundu, S. Paul, S.K. Bera, A. Abraham and R. Sarkar, "Text-line Extraction from Handwritten Document Images using GAN", Expert Systems with Applications, vol. 140, 2020
F. Can F and A. Yilmaz, “Hybrid handwriting character recognition with transfer deep learning”, 27th Signal Processing and Communications Applications Conference (SIU'19), Article number 8806364, 2019
M.A. Garcia-Calderon, R.A. Garcia-Hernandez and Y. Ledeneva, “Providing order to the handwritten TLS task: A complexity index”, Journal of Intelligent and Fuzzy Systems, vol. 36, no. 5, pp. 4621-4631, 2019
C. Adak, B.B. Chaudhuri and M. Blumenstein, "An empirical study on writer identification and verification from intra-variable individual handwriting”, IEEE Access, vol. 7, pp. 24738-24758, 2019
M. Pastor, "Text baseline detection, a single page trained system", Pattern Recognition, vol. 94, pp. 149-161, 2019
G. Axler and L. Wolf, "Toward a Dataset-Agnostic Word Segmentation Method", 25th IEEE International Conference on Image Processing (ICIP'18), pp. 2635-2639, 2018
R. Saabni, “Robust and efficient text‐line extraction by local minimal sub-seams”, 2nd International Symposium on Computer Science and Intelligent Control (ISCSIC'18), 2018
M.W.A. Kesiman, D. Valy, J.C. Burie1, E. Paulus, M. Suryani, S. Hadi, M. Verleysen, S. Chhun and J.M. Ogier, “ICFHR 2018 Competition On Document Image Analysis Tasks for Southeast Asian Palm Leaf Manuscripts”, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR'18), pp. 483-488, 2018
V. Bosch, V. Romero, A.H. Toselli and E. Vidal, “Text Line Extraction Based on Distance Map Features and Dynamic Programming”, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR'18), pp. 357-362, 2018
C. Adak, B.B. Chaudhuri and M. Blumenstein, “A Study on Idiosyncratic Handwriting with Impact on Writer Identification”, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR'18), pp. 193-198, 2018
B. Barakat, A. Droby, M. Kassis and J. El-Sana, “Text Line Segmentation for Challenging Handwritten Document Images Using Fully Convolutional Network”, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR'18), pp. 374-379, 2018
T. Gruning, R. Labahn, M. Diem, F. Kleber and S. Fiel, “READ-BAD: A new dataset and evaluation scheme for baseline detection in archival documents”, 13th IAPR International Workshop on Document Analysis Systems (DAS'18), pp. 351-356, 2018
B. Arizanović and V. Vučković, "Efficient Compression and Decompression Algorithms for OCR Systems", Facta Universitatis, Series: Electronics and Energetics, vol. 31, no. 3, pp. 461-485, 2018
G. Renton, Y. Soullard, C. Chatelain, S. Adam, C. Kermorvant and T. Paquete, "Fully Convolutional Network with dilated convolutions for Handwritten text line segmentation", International Journal on Document Analysis and Recognition (IJDAR), 2018
Q.N. Vo, S.H. Kim, H.J. Yang and G. Lee, "Text line segmentation using a fully convolutional network in handwritten document images", IET Image Processing, vol. 12, no. 3, pp. 438-446, 2018
M.W.A. Kesiman, D. Valy, J.C. Burie, E. Paulus, M. Suryani, S. Hadi, M. Verleysen, S. Chhun and J.M. Ogier, "Benchmarking of Document Image Analysis Tasks for Palm Leaf Manuscripts from Southeast Asia", Journal of Imaging, vol. 4, no. 2, article number 43, 2018
D. Valy D. M. Verleysen and K. Sok, “Line segmentation for grayscale text images of Khmer palm leaf manuscripts”, 7th International Conference on Image Processing Theory, Tools and Applications (IPTA'17), pp. 1-6, 2017
A. Rehman, "Offline touched cursive script segmentation based on pixel intensity analysis: Character segmentation based on pixel intensity analysis", International Conference on Digital Information Management (ICDIM'17), pp. 324-327, 2017
Ι. Setitra, A. Meziane, Ζ. Hadjadj and, N. Bengherbia, "Text line segmentation in handwritten documents based on connected components trajectory generation", 6th International Conference on Pattern Recognition Applications and Methods (ICPRAM'17), pp. 222-234, 2017
I.M. Amer, S. Hamdy and M.G.M. Mostafa, "Deep Arabic document layout analysis", 8th International Conference on Intelligent Computing and Information Systems (ICICIS'17), pp. 224-231, 2017
K. Thangairulappan and K. Mohan, "Efficient segmentation of printed Tamil script into characters using projection and structure ", 4th International Conference on Image Information Processing (ICIIP'17), pp. 484-489, 2017
V. Chavan and F. Mehrotra, "Text line segmentation of multilingual handwritten documents using fourier approximation ", 4th International Conference on Image Information Processing (ICIIP'17) , pp. 250-255, 2017
W Jia, L. Sun, Z. Zhong, X. Mo, G. Ma and Q. Huo, “A Robust Approach to Detecting Text from Images of Whiteboards and Handwritten Notes”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 813-818, 2017
C. Adak, B.B. Chaudhuri and M. Blumenstein, “Legibility and Aesthetic Analysis of Handwriting”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 175-182, 2017
T. Gruuening, G. Leifert, T. Strauss and R. Labahn, “A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 236-241, 2017
K.C. Nguyen, C.T. Nguyen and M. Nakagawa, "A segmentation method of single- and multiple-touching characters in offline handwritten Japanese text recognition", IEICE Transactions on Information and System, vol. E100D, no. 12, pp. 2962-2972, 2017
B. Ahn, J. Ryu, H.I. Koo and N.I. Cho, "Textline detection in degraded historical document images", EURASIP Journal on Image and Video Processing, vol. 2017, no. 1, pp. 82, 2017
V. Vučković and B. Arizanović, "General Character Segmentation Approach for Machine-Typed Documents", 4th International Conference on Electrical, Electronic and Computing Engineering (ETRAN'17), pp. RTI2.2.1-6, 2017
Y. Li, L. Ma, L. Duan, J. Wu, J. Yang, Q. Hu, M.M. Cheng, L. Wang, Q. Liu, X. Bai and D. Meng, "A Text-Line Segmentation Method for Historical Tibetan Documents Based on Baseline Detection", Chinese Conference on Computer Vision (CCCV'17), pp. 356-367, 2017
R. Pramanik and S. Bag, "Linear Curve Fitting-Based Headline Estimation in Handwritten Words for Indian Scripts", International Conference on Pattern Recognition and Machine Intelligence (PReMI'17), pp. 116-123, 2017
R. Pramanik and S. Bag, "Shape Decomposition-based Handwritten Compound Character Recognition for Bangla OCR", Journal of Visual Communication and Image Representation, vol. 50, pp. 123-134, 2017
H. Jain and A.P. Kumar, "A Bottom Up Procedure for Text Line Segmentation of Latin Script", International Conference on Advances in Computing, Communications and Informatics (ICACCI'17), 2017
L.M. Francisa and N. Sreenatha , "TEDLESS-Text Detection using Least-Square SVM from Natural Scene", Journal of King Saud University - Computer and Information Sciences, https://doi.org/10.1016/j.jksuci.2017.09.001, 2017
A. Souhar, Y. Boulid, E. Ameur amd M.M. Ouagague, "Watershed transform for text lines extraction on binary Arabic handwritten documents", 2nd International Conference on Big Data Cloud and Applications (BDCA'17), 2017
Q.N. Vo, S.H. Kim, H.J. Yang and G. Lee, "Binarization of Degraded Document Images based on Hierarchical Deep Supervised Network", Pattern Recognition, DOI: doi.org/10.1016/j.patcog.2017.08.025, 2017
A. Souhar, Y. Boulid, E. Ameur and M.M. Ouagague, "Segmentation of Arabic Handwritten Documents into Text Lines using Watershed Transform", International Journal of Interactive Multimedia and Artificial Intelligence, vol. 4, no. 6, pp. 96-102, 2017
R. Amarnath and P. Nagabhushan, "Spotting Separator Points at Line Terminals in Compressed Document Images for Text-line Segmentation", International Journal of Computer Applications, vol. 172, no. 4, pp. 40-47, 2017
C. Adak, B.B. Chaudhuri and M. Blumenstein, "Impact of struck-out text on writer identification", International Joint Conference on Neural Networks (IJCNN'17), pp. 1465-1471, 2017
V. Vučković and B. Arizanović, "Efficient Character Segmentation Approach for Machine-Typed Documents", Expert Systems with Applications, http://dx.doi.org/10.1016/j.eswa.2017.03.027, 2017
S. Eskenazi, P. Gomez-Krämer and J.M. Ogier, “A comprehensive survey of mostly textual document segmentation algorithms since 2008”, Pattern Recognition, vol. 67, pp. 1-14, 2017
P. Sahare and S.B. Dhok, "Review of Text Extraction Algorithms for Scene-text and Document Images", IETE Technical Review, vol 34, no. 2, pp. 144-164, 2017
H.I. Koo, “Text-line Detection in Camera-captured Document Images using the State Estimation of Connected Components”, IEEE Transactions on Image Processing, vol. 25, no. 11, pp. 5358-5369, 2016
S. Hofmann, M. Gropp, D. Bernecker, C. Pollin, A. Maier and V. Christlein, “Vesselness for text detection in historical document images”, International Conference on Image Processing (ICIP'16), pp. 3259-3263, 2016
Y. Boulid, A. Souhar and M.Y. Elkettani, “Arabic handwritten text line extraction using connected component analysis from a multi agent perspective”, International Conference on Intelligent Systems Design and Applications (ISDA'16), pp. 80-87, 2016
B.B. Chaudhuri and C. Adak, “An Approach for Detecting and Cleaning of Struck-out Handwritten Text”, Pattern Recognition, doi:10.1016/j.patcog.2016.07.032, 2016
B. Biswas, U. Bhattacharya and B.B. Chaudhuri, “A Robust Scheme for Extraction of Text Lines from Handwritten Documents”, International Conference on Computer Vision & Image Processing (CVIP'16), pp. 107-116, 2016
K. Kadam, D. Phadatare, A. Mali and P. Nimbalkar and P Gode, “Detection of Word by Inter - Intra Gap Technique for Handwritten Documents”, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), vol. 5, no. 4, pp. 936-939, 2016
P. Choudhary and N. Nain, “A Four-Tier Annotated Urdu Handwritten Text Image Dataset for Multidisciplinary Research on Urdu Script”, ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 15, no. 4, article no. 26, 2016
Y. Boulid, A. Souhar and M.Y. Elkettani, “Segmentation approach of Arabic manuscripts text lines based on multi agent systems”, International Journal of Computer Information Systems and Industrial Management Applications, vol. 8, no. 1, pp. 173-183, 2016
Y. Boulid, A. Souhar and M.Y. Elkettani, “Detection of Text Lines of Handwritten Arabic Manuscripts using Markov Decision Processes”, International Journal of Interactive Multimedia and Artificial Inteligence, vol. 4, no. 1, pp. 31-36, 2016
Z. Huang, J. Gu, G. Meng and C. Pan, "Text line extraction of curved document images using hybrid metric", 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Malaysia, pp. 251-255, 2016
B. Moysse, J. Louradour, C. Kermorvant and C. Wolf, “Learning text-line localization with shared and local regression neural networks”, 15th International Conference on Frontiers in Handwriting Recognition (ICFHR'16), pp. 1-6, 2016
M.W.A. Kesiman, J.C. Burie and J.M. Ogier, “A New Scheme for Text Line and Character Segmentation from Gray Scale Images of Palm Leaf Manuscript”, 15th International Conference on Frontiers in Handwriting Recognition (ICFHR'16), pp. 325-330, 2016
C. Adak, B. B. Chaudhuri and M. Blumenstein, “Offline Cursive Bengali Word Recognition using CNNs with a Recurrent Model”, 15th International Conference on Frontiers in Handwriting Recognition (ICFHR'16), pp. 429-434, 2016
A. Abliz, W. Simayi, K. Moydin and A. Hamdulla, “Survey on Methods for Basic Unit Segmentation in Off-Line Handwritten Text Recognition”, International Journal of Future Generation Communication and Networking vol. 9, no. 11, pp. 137- 152, 2016
X. Han, H. Yao and G. Zhong, “Handwritten Text Line Segmentation by Spectral Clustering”, Eighth International Conference on Graphic and Image Processing (ICGIP 2016), 102251A, 2016
T. Wilkinson and A. Brun, “A Novel Word Segmentation Method Based on Object Detection and Deep Learning”, Advances in Visual Computing, 9474, pp. 231-240, 2015
Z. Harbi, Y. Hicks, R. Setchi and A. Bayer, “Segmentation of Clock Drawings Based on Spatial and Temporal Features”, Procedia Computer Science, vol. 60, pp. 1640-1648, 2015
E. Kavallieratou, “Word Segmentation Using Wigner-Ville Distribution”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 701-705, Nancy, France, 2015
V. Romero, J.A. Sanchez, V. Bosch, K. Depuydt and J. de Does, “Influence of Text Line Segmentation in Handwritten Text Recognition”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 536-540, Nancy, France, 2015
W. Swaileh, K.A. Mohand and T. Paquet, "Multi-script Iterative Steerable Directional Filtering For Handwritten Text Line Extraction", 5th International Workshop on Multilingual OCR (MOCR'15), pp. 1241-1245, Nancy, France, 2015
M.K. Sharma and V.P. Dhaka , “Pixel plot and trace based segmentation method for bilingual handwritten scripts using feedforward neural network”, Neural Computing and Applications, DOI 10.1007/s00521-015-1972-2, 2015
K. Mullick, S. Banerjee and U. Bhattacharya, “An efficient line segmentation approach for handwritten Bangla document image”, 8th International Conference on Advances in Pattern Recognition (ICAPR'15), no. 7050679, 2015
R. Pintus, Y. Yang, H. Rushmeier, “ATHENA: Automatic text height extraction for the analysis of text lines in old handwritten manuscripts”, Journal of Computing and Cultural Heritage, vol. 8, no. 1, 2015
B.L. Davis, W.A. Barrett and S.D. Swingle, “Min-cut segmentation of cursive handwriting in tabular documents”, Document Recognition and Retrieval XXII, Vol. 940208, 2015
J. Ryu, H.I. Koo and N.I. Cho, “Word segmentation method for handwritten documents based on structured learning”, IEEE Signal Processing Letters, vol. 22, no. 8, pp. 1161-1165, 2015
R. Cohen, I. Dinstein, J. El-Sana and K. Kedem, “Using Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents”, 11th International Conference on Image Analysis and Recognition (ICIAR'14), pp. 349-358, 2014
X. Zhang and C.L. Tan, "Text Line Segmentation for Handwritten Documents Using Constrained Seam Carving", 4th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 98-103, Creta, Grecce, September 2014
Y. Elarian, A. Zidouri and W. Al-Khatib, "Ground-truth and Metric for the Evaluation of Arabic Handwritten Character Segmentation", 4th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 766-770, Creta, Grecce, September 2014
D. Fernández-Mota, J. Almazán, N. Cirera, A. Fornés and J. Lladós, "BH2M : the Barcelona Historical Handwritten Marriages database", International Conference on Pattern Recognition. pp. 256-261, 2014
M. Diem, F. Kleber and R. Sablatnig, "Ruling analysis and classification of torn documents", In Proceedings of the 2014 ACM symposium on Document engineering (DocEng '14), Colorado, USA, pp. 63-72, 2014
J. Ryu, H.I. Koo and N.I. Cho, “Language-Independent Text-Line Extraction Algorithm for Handwritten Documents”, IEEE Signal Processing Letters, vol. 21, no. 9, pp. 1115-1119, 2014
D. Fernández-Mota, J. Lladós and A. Fornés, “A graph-based approach for segmenting touching lines in historical handwritten documents”, International Journal on Document Analysis and Recognition, vol. 17, no. 3, pp. 293-312, 2014
A. Lemaitre, J. Camillerapp and B. Coüasnon, “Handwritten text segmentation using blurred image”, Proceedings of SPIE - The International Society for Optical Engineering, vol. 9021, article number 90210D, Document Recognition and Retrieval XXI, San Francisco, United States, 2014

A. Papandreou, B. Gatos, G. Louloudis and N. Stamatopoulos, “ICDAR2013 Document Image Skew Estimation Contest (DISEC’13)”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 1444-1448, Washington DC, USA, August 2013.

The detection and correction of document skew is one of the most important document image analysis steps. The ICDAR2013 Document Image Skew Estimation Contest (DISEC’13) is the first contest which is dedicated to record advances in the field of skew estimation using well established evaluation performance measures on a variety of printed document images. The benchmarking dataset that is used contains 1550 images that were obtained from various sources such as newspapers, scientific books and travel guides. The document images contain figures, tables, diagrams, architectural plans, electrical circuits, and they are written in various languages such as English, Chinese and Greek. This paper describes the details of the contest including the evaluation measures used as well as the performance of the twelve methods submitted by ten different groups along with a short description of each method.

O. Boudraa, W.K. Hidouci and D. Michelucci, “Using skeleton and Hough transform variant to correct skew in historical documents”, Mathematics and Computers in Simulation, DOI: https://doi.org/10.1016/j.matcom.2019.05.009, 2019
M.A. Garcia-Calderon, R.A. Garcia-Hernandez and Y. Ledeneva, “Unsupervised multi-language handwritten text line segmentation”, Journal of Intelligent and Fuzzy Systems, vol. 34, no 5, pp. 2901-2911. 2018.
B. Sharada, S.N. Sushma and Bharathlal, “Keyword Spotting in Historical Devanagari Manuscripts by Word Matching”, Data Analytics and Learning (DAL'18), pp. 65, India, 2018.
O. Boudraa, W.K. Hidouci and D. Michelucci, “An improved skew angle detection and correction technique for historical scanned documents using morphological skeleton and progressive probabilistic hough transform”, 5th International Conference on Electrical Engineering - Boumerdes (ICEE-B'17), pp. 1-16, 2017
D. Brodic and Z.N. Milivojevic, “Text skew detection using combined entropy algorithm”, Information Technology and Control, vol. 46, no. 3, pp. 308-318, 2017
V. Vučković and B. Arizanovic, “Automatic Document Skew Pre-processor for Character Segmentation Algorithm”, Facta Universitatis, Electronics and Energetics, vol. 30, no. 4, pp. 611-625, 2017
S. Eskenazi, P. Gomez-Kramer and J.M. Ogier, “Let’s be done with thresholds!”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 851-855, Nancy, France, 2015
F. Stahlberg and S. Vogel, “Document Skew Detection Based on Hough Space Derivatives”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 366-370, Nancy, France, 2015
R. Pintus, Y. Yang, E. Gobbetti and H. Rushmeier, “An Automatic Word-spotting Framework for Medieval Manuscripts”, 2nd Digital Heritage International Congress, pp. 5-12, Granada, Spain, 2015
D. Brodić, M. Jevtić, Z.N. Milivojević and V. Tasić, “Text Skew Estimation Based on the Horizontal Entropy Calculation”, International Convention on Information and Communication Technology, Electronics and Microelectronics, Adriatic Coast, Croatia, 2015
R. Pintus, Y. Yang, H. Rushmeier, “ATHENA: Automatic text height extraction for the analysis of text lines in old handwritten manuscripts”, Journal of Computing and Cultural Heritage, vol. 8, no. 1, 2015
R. Pintus, Y. Yang, E. Gobbetti and H. Rushmeier, "A TaLISMAN: Automatic Text and LIne Segmentation of historical MANuscripts", 12th Eurographics Worhshop on Graphics and Cultural Heritage, Darmstadt, Russia, 2014
J. Fabrizio, "A precise skew estimation algorithm for document images using KNN clustering and fourier transform", International Conference on Image Processing (ICIP'14), pp. 2585-2588, 2014

G. Louloudis, B. Gatos, N. Stamatopoulos and A. Papandreou, “ICDAR2013 Writer Identification Contest”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 1397-1401, Washington DC, USA, August 2013.

Writer identification is important for forensic analysis, helping experts to deliberate on the authenticity of documents. The ICDAR2013 Competition on Writer Identification is part of a competition series (see also ICDAR2011 and ICFHR2012 Writer Identification Contests) which is dedicated to record recent advances in the field of writer identification for Latin scripts using established evaluation performance measures. The benchmarking dataset was created with the help of 250 writers that were asked to copy four parts of text in two Latin based languages (English and Greek). This paper describes the contest details of this competition including the evaluation measures used as well as the performance of the 12 submitted methods by 6 different groups along with a short description of each method.

M.L. Bouibed, H. Nemmour and Y. Chibani, “Multiple writer retrieval systems based on language independent dissimilarity learning”, Expert Systems with Applications, DOI: https://doi.org/10.1016/j.eswa.2019.113023, 2019
A. Nicolaou, S. Dey, V. Christlein, A. Maier amd D. Karatzas, “Non-deterministic Behavior of Ranking-Based Metrics When Evaluating Embeddings”, 2nd International Workshop on Reproducible Research in Pattern Recognition (RRPR'18), pp. 71-82, 2019
B. Riyadh, V. Eglin and C. Largeron, “Extraction of musical motifs from handwritten music score images”, 14th International Conference on Computer Vision Theory and Applications (VISAPP'19), pp. 428-435, 2019
S. Chen, Y. Wang, C.T. Lin, W. Ding and Z. Cao, “Semi-supervised feature learning for improving writer identification”, Information Sciences, vol. 482, pp. 156-170, 2019
M. Keglevic, S. Fiel and R. Sablatnig, “Learning Features for Writer Retrieval and Identification using Triplet CNNs”, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR'18), pp. 211-216, 2018
G. Abdeljalil, I. Siddiqi, C. Djeddi and S. Al-Maadeed, “Writer Identification on Historical Documents Using Oriented Basic Image Features”, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR'18), pp. 369-373, 2018
F. Wahlberg, “Gaussian process classification as metric learning for forensic writer identification”, 13th IAPR International Workshop on Document Analysis Systems (DAS'18), pp. 175-180, 2018
V. Christlein and A. Maier, “Encoding CNN activations for writer recognition”, 13th IAPR International Workshop on Document Analysis Systems (DAS'18), pp. 169-174, 2018
K. Ni, P. Callier, B. Hatch, J. Mastarone and J. Cline, "On noise reduction for handwritten writer identification", 51st Asilomar Conference on Signals, Systems and Computers (ACSSC'17), pp. 1984-1988, 2017
H. Mohammed, V. Maergner, T. Konidaris and H.S. Stiehl, “Normalised Local Naïve Bayes Nearest-Neighbour Classifier for Offline Writer Identification”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 1013-1018, 2017
G.J. Tan, G. Sulong and M.S.M. Rahim, "Writer Identification: A comparative study across three world major language" Forensic Science International, https://doi.org/10.1016/j.forsciint.2017.07.034, 2017
K. Ni, P. Callier and B. Hatch, "Writer Identification in Noisy Handwritten Documents", Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, USA, pp. 1177-1186, 2017
V. Christlein, D. Bernecker, F. Hönig, A. Maier and E. Angelopoulou, “Writer Identification Using GMM Supervectors and Exemplar-SVMs”, Pattern Recognition, doi.org/10.1016/j.patcog.2016.10.005, 2016
S. He and L. Schomaker, “Beyond OCR: Multi-faceted understanding of handwritten document characteristics”, Pattern Recognition, DOI: 10.1016/j.patcog.2016.09.017, 2016
D. Siegmund, T. Ebert and N. Damer, “Combining Low-Level Features of Offline Questionnaires for Handwriting Identification”, 13th International Conference on Image Analysis and Recognition (ICIAP'16), pp. 46-54, Portuga, 2016
S. He and L. Schomaker, “General Pattern Run-Length Transform for Writer Identification”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 60-65, Santorini, Greece, 2016
Y. Tang and X. Wu, “Text-independent Writer Identification via CNN Features and Joint Bayesian”, 15th International Conference on Frontiers in Handwriting Recognition (ICFHR'16), pp. 566-571, 2016
S. He and L. Schomaker, “Co-occurrence features for writer identification”, 15th International Conference on Frontiers in Handwriting Recognition (ICFHR'16), pp. 83-78, 2016
A. Parziale, A. Santoro and A. Marcelli, “Writer verification in forensic handwriting examination: a pilot study”, 15th International Conference on Frontiers in Handwriting Recognition (ICFHR'16), pp. 447-452, 2016
V. Christlein, D. Bernecker, A. Maier and E. Angelopoulou, “Offline Writer Identification Using Convolutional Neural Network Activation Features”, 37th German Conference Pattern Recognition (GCPR '15), Volume 9358 2015
S. Fiel and R. Sablatnig, “Writer Identification and Retrieval Using a Convolutional Neural Network”, 16th International Conference on Computer Analysis of Images and Patterns (CAIP'15), pp. 26-37, Malta, 2015
C. Djeddi, S. Al-Maadeed, A. Gattal, I. Siddiqi, L. Souici-Meslati and H.E. Abed, “ICDAR2015 Competition on Multi-script Writer Identification and Gender Classification using ‘QUWI’ Database”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 1191-1195, Nancy, France, 2015
V. Christlein, D. Bernecker and E. Angelopoulou, “Writer Identification Using VLAD Encoded Contour-Zernike Moments”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 906-910, Nancy, France, 2015
A. Nicolaou, A.D. Bagdanov, M. Liwicki and D. Karatzas, “Sparse Radial Sampling LBP for Writer Identification”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 716-720, Nancy, France, 2015
Y.J. Xiong, Y. Wen, P.S.P Wang and Y. Lu, “Text-independent Writer Identification Using SIFT Descriptor and Contour-directional Feature”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 91-95, Nancy, France, 2015
C. Adak and B.B. Chaudhuri, “Writer Identification from offline isolated Bangla characters and numerals”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 486-490, Nancy, France, 2015
A. Marcelli, A. Parziale and C.D. Stefano, "Quantitative Evaluation of Features for Forensic Handwriting Examination", 4th International Workshop on Automated Forensic Handwriting Analysis (AFHA’15), pp. 1266-1271, Nancy, France, 2015
F. Wahlberg L. Mårtensson and A. Brun, "Large scale style based dating of medieval manuscripts", 3rd International Workshop on Historical Document Imaging and Processing (HIP’15), pp. 107-114, Nancy, France, 2015
A. Garz, M. Wursch and R. Ingold, "Training-and Segmentation-Free Intuitive Writer Identification with Task-Adapted Interest Points", 17th Conference of the International Graphonomics Society (IGS'15), 2015
S. He and L. Schomaker, "Delta-n Hinge: Rotation-invariant features for writer identification", International Conference on Pattern Recognition, art. no. 6977065, pp. 2023-2028. 2014
R. Jain and D. Doermann, "Combining Local Features For Offline Writer Identification", 4th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 583-588, Creta, Grecce, September 2014
F. Wahlberg, L. Mårtensson and A. Brun, "Scribal Attribution using a Novel 3-D Quill-Curvature Feature Histogram", 4th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 732-737, Creta, Grecce, September 2014
V. Christlein, D. Bernecker, F. Honig and E. Angelopoulou, “Writer identification and verification using GMM supervectors”, 2014 IEEE Winter Conference on Applications of Computer Vision (WACV'14), Steamboat Springs, USA, pp. 998-1005, 2014

G. Louloudis, B. Gatos and N. Stamatopoulos, “ICFHR2012 Competition on Writer Identification - Challenge 1: Latin/Greek Documents”, 13th International Conference on Frontiers in Handwriting Recognition (ICFHR'12), pp. 825-830, Bari, Italy, September 2012.

Writer identification is important for forensic analysis, helping experts to deliberate on the authenticity of documents. The general objective of the ICFHR 2012 writer identification contest was to record recent advances in the field of writer identification using established evaluation performance measures. Challenge 1 of the contest dealt specifically with Latin scripts. The benchmarking dataset of challenge 1 of the contest was created with the help of 100 writers that were asked to copy four parts of text in two languages (English and Greek). This paper describes the contest details for this challenge including the evaluation measures used as well as the performance of the 7 submitted methods along with a short description of each method.

W. Bouamra, C. Djeddi, B. Nini, M. Diaz and I. Siddiqi, “Towards the design of an offline signature verifier based on a small number of genuine samples for training”, Expert Systems with Applications, vol. 107, pp. 182-1956, 2018
H. Mohammed, V. Maergner, T. Konidaris and H.S. Stiehl, “Normalised Local Naïve Bayes Nearest-Neighbour Classifier for Offline Writer Identification”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 1013-1018, 2017
G.J. Tan, G. Sulong and M.S.M. Rahim, "Writer Identification: A comparative study across three world major language" Forensic Science International, https://doi.org/10.1016/j.forsciint.2017.07.034, 2017
B. Sober and D. Levin, “Computer aided restoration of handwritten character strokes”, CAD Computer Aided Design, vol. 89, pp. 12-24, 2017
Y. Tang and X. Wu, “Text-independent Writer Identification via CNN Features and Joint Bayesian”, 15th International Conference on Frontiers in Handwriting Recognition (ICFHR'16), pp. 566-571, 2016
V. Christlein, D. Bernecker, F. Hönig, A. Maier and E. Angelopoulou, “Writer Identification Using GMM Supervectors and Exemplar-SVMs”, Pattern Recognition, doi.org/10.1016/j.patcog.2016.10.005, 2016
A. Inamdar, “Offline Text-Independent Writer Identification”, International Journal of Engineering Applied Sciences and Technology ,vol. 1, no. 9, pp. 90-94, 2016
C. Djeddi, I. Siddiqi, S. Al-Maadeed, L. Souici-Meslati, A. Gattal and A. Ennaji, “Signature Verification for Offline Skilled Forgeries Using Textural Features”, 11th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS'15), pp. 76-80, Bangkok, 2015
A. Nicolaou, A.D. Bagdanov, M. Liwicki and D. Karatzas, “Sparse Radial Sampling LBP for Writer Identification”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 716-720, Nancy, France, 2015
Y.J. Xiong, Y. Wen, P.S.P Wang and Y. Lu, “Text-independent Writer Identification Using SIFT Descriptor and Contour-directional Feature”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 91-95, Nancy, France, 2015
M.K. Sharma and V.P. Dhaka, "Offline scripting-free author identification based on speeded-up robust features", International Journal on Document Analysis and Recognition (IJDAR), vol 18, no. 4, pp. 303-316, 2015
K. Gayathri and J. Bhuvana, "Optimization of Signature Recognition in IAM Dataset", International Journal of Innovative Research in Engineering Science and Technology (IJIREST), vol 3, no. 2, pp. 89-93, 201
Y. Tang, W. Bu, X. Wu, "Text-independent writer identification using improved structural features", 9th Chinese Conference on Biometric Recognition (CCBR'14), pp. 404-411, Shenyang, China, 2014
F. Slimane, S. Awaida, A. Mezghani, M.T. Parvez, S. Kanoun, S.A. Mahmoud and V. Märgner, "ICFHR2014 Competition on Arabic Writer Identification Using AHTID/MW and KHATT Databases", 4th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 797-802, Creta, Grecce, September 2014
F. Wahlberg, L. Mårtensson and A. Brun, "Scribal Attribution using a Novel 3-D Quill-Curvature Feature Histogram", 4th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 732-737, Creta, Grecce, September 2014
C. Djeddi, L.S. Meslati, I. Siddiqi, A. Ennaji, H.E. Abeda and A. Gattal, “Evaluation of Texture Features for Offline Arabic Writer Identification”, 11th IAPR International Workshop on Document Analysis Systems (DAS'14), Tours, France, pp. 106-110, 2014
X. Wu, Y. Tang and W. Bu, "Offline Text-independent Writer Identification Based on Scale Invariant Feature Transform", IEEE Transactions on Information Forensics and Security, vol. 9, no. 3, pp. 526-536, 2014
F. Kleber, S. Fiel, M. Diem and R. Sablatnig, “CVL-Database: An Off-line Database for Writer Retrieval, Writer Identification and Word Spotting”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 560-564, Washington DC, USA, August 2013
S. Fiel and R. Sablatnig, “Writer Identification and Writer Retrieval using the Fisher Vector on Visual Vocabularies”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 545-549, Washington DC, USA, August 2013
A. Nicolaou, M. Liwicki and R. Ingolf, “Oriented Local Binary Patterns for Writer Identification”, 2nd International Workshop and Tutorial on Automated Forensic Handwriting Analysis (AFHA'13), Washington DC, USA, August 2013
C. Djeddi, I. Siddiqi, L. Souici-Meslati and A. Ennaji, “Text-Independent Writer Recognition Using Multi-script Handwritten Texts”, Pattern Recognition Letters, vol. 34, no. 10, pp. 1196-1202, 2013

B. Gatos , G. Louloudis and N. Stamatopoulos, “Greek Polytonic OCR based on Efficient Character Class Number Reduction”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 1155-1159, Beijing, China, September 2011.

Recognition of document images having Greek polytonic (multi accent) characters is a challenging task due the large number of existing character classes (more than 270). In this paper, we propose a novel OCR framework for the recognition of machine-printed Greek polytonic documents that is based on combining five different recognition modules in order to have a small number of classes (around 30) in each module. One recognition module is used for accent recognition while four recognition modules are used for the recognition of characters belonging to different horizontal text zones. The proposed system also includes the following stages: a) preprocessing, b) text dewarping, text line and text baseline detection, c) accent and character detection and d) combination of accent and character recognition results. Extended experiments have been conducted in order to record the performance of the proposed OCR system, of all involved recognition modules as well as of the accent detection stage.

B. Robertson and F. Boschetti, "Large-Scale Optical Character Recognition of Ancient Greek", Mouseion, vol. 14, no. 3, pp. 341–359, 2017

G. Louloudis, N. Stamatopoulos and B. Gatos, “ICDAR 2011 - Writer Identification Contest”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 1475-1479, Beijing, China, September 2011.

ICDAR 2011 Writer Identification Contest is the first contest which is dedicated to record recent advances in the field of writer identification using established evaluation performance measures. The benchmarking dataset of the contest was created with the help of 26 writers that were asked to copy eight pages that contain text in several languages (English, French, German and Greek). This paper describes the contest details including the evaluation measures used as well as the performance of the 8 submitted methods along with a short description of each method.

M.L. Bouibed, H. Nemmour and Y. Chibani, “Multiple writer retrieval systems based on language independent dissimilarity learning”, Expert Systems with Applications, DOI: https://doi.org/10.1016/j.eswa.2019.113023, 2019
A. Nicolaou, S. Dey, V. Christlein, A. Maier amd D. Karatzas, “Non-deterministic Behavior of Ranking-Based Metrics When Evaluating Embeddings”, 2nd International Workshop on Reproducible Research in Pattern Recognition (RRPR'18), pp. 71-82, 2019
A. Bennour, C. Djeddi, A. Gattal, I. Siddiqi and T. Mekhaznia, “Handwriting Based Writer Recognition Using Implicit Shape Codebook”, Forensic Science International, vol. 301, pp. 91-100, 2019
A. Chahi, Y. El Merabet, Y. Ruichek and R. Touahni, “Off-line Text-independent Writer Identification Using Local Convex Micro-Structure Patterns”, Second conference of The Moroccan Classification Society (SMC'18), 2019
S. Chen, Y. Wang, C.T. Lin, W. Ding and Z. Cao, “Semi-supervised feature learning for improving writer identification”, Information Sciences, vol. 482, pp. 156-170, 2019
M.L Bouibed, H. Nemmour, and Y. Chibani, “Evaluation of gradient descriptors and dissimilarity learning for writer retrieval”, 8th International Conference on Information Science and Technology (ICIST'18), pp. 252-256, 2018
F. Khan, F. Khelifi, M. Tahir and A. Bouridane, “Dissimilarity Gaussian Mixture Models for Efficient Offline Handwritten Text-Independent Identification using SIFT and RootSIFT Descriptors”, IEEE Transactions on Information Forensics and Security, 2018
W. Bouamra, C. Djeddi, B. Nini, M. Diaz and I. Siddiqi, “Towards the design of an offline signature verifier based on a small number of genuine samples for training”, Expert Systems with Applications, vol. 107, pp. 182-1956, 2018
H. Mohammed, V. Maergner, T. Konidaris and H.S. Stiehl, “Normalised Local Naïve Bayes Nearest-Neighbour Classifier for Offline Writer Identification”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 1013-1018, 2017
A.A. Ahmed, H.R. Hasan, F.A. Hameed and O.I. Al-Sanjary, "Writer Identification on Multi-Script Handwritten Using Optimum Features", Kurdistan Journal of Applied Research - KJAR, vol. 2, no. 3, 2017
G.J. Tan, G. Sulong and M.S.M. Rahim, "Writer Identification: A comparative study across three world major language" Forensic Science International, https://doi.org/10.1016/j.forsciint.2017.07.034, 2017
K. Ni, P. Callier and B. Hatch, "Writer Identification in Noisy Handwritten Documents" Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, USA, pp. 1177-1186, 2017
C. Adak, B.B. Chaudhuri and M. Blumenstein, “Writer identification by training on one script but testing on another”, 23rd International Conference on Pattern Recognition (ICPR'16), pp. 1153-1158, 2016
Y. Tang and X. Wu, “Text-independent Writer Identification via CNN Features and Joint Bayesian”, 15th International Conference on Frontiers in Handwriting Recognition (ICFHR'16), pp. 566-571, 2016
V. Christlein, D. Bernecker, F. Hönig, A. Maier and E. Angelopoulou, “Writer Identification Using GMM Supervectors and Exemplar-SVMs”, Pattern Recognition, doi.org/10.1016/j.patcog.2016.10.005, 2016
A. Nicolaou, A.D. Bagdanov, L. Gomez-Bigorda and D. Karatzas, “Visual Script and Language Identification”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 393-398, Santorini, Greece, 2016
A. Inamdar, “Offline Text-Independent Writer Identification”, International Journal of Engineering Applied Sciences and Technology, vol. 1, no. 9, pp. 90-94, 2016
C. Djeddi, I. Siddiqi, S. Al-Maadeed, L. Souici-Meslati, A. Gattal and A. Ennaji, “Signature Verification for Offline Skilled Forgeries Using Textural Features”, 11th International Conference on Signal-Image Technology and Internet-Based Systems (SITIS'15), pp. 76-80, Bangkok, 2015
S. Fiel and R. Sablatnig, “Writer Identification and Retrieval Using a Convolutional Neural Network”, 16th International Conference on Computer Analysis of Images and Patterns (CAIP'15), pp. 26-37, Malta, 2015
M.K. Sharma and V.P. Dhaka, "Offline scripting-free author identification based on speeded-up robust features", International Journal on Document Analysis and Recognition (IJDAR), vol 18, no. 4, pp. 303-316, 2015
K. Gayathri and J. Bhuvana, "Optimization of Signature Recognition in IAM Dataset", International Journal of Innovative Research in Engineering Science and Technology (IJIREST), vol 3, no. 2, pp. 89-93, 201
A. Garz, M. Wursch and R. Ingold, "Training-and Segmentation-Free Intuitive Writer Identification with Task-Adapted Interest Points", 17th Conference of the International Graphonomics Society (IGS'15), 201
S. Al-Maadeed, A. Hassaine and A. Bouridan, “Using codebooks generated from text skeletonization for forensic writer identification”, 11th IEEE/ACS International Conference on Computer Systems and Applications, (AICCSA'14), pp. 729-733, Doha, Qatar, November 2014
Y. Tang, W. Bu, X. Wu, "Text-independent writer identification using improved structural features", 9th Chinese Conference on Biometric Recognition (CCBR'14), pp. 404-411, Shenyang, China, 2014
C. Djeddi, L.S. Meslati, I. Siddiqi, A. Ennaji, H.E. Abeda and A. Gattal, “Evaluation of Texture Features for Offline Arabic Writer Identification”, 11th IAPR International Workshop on Document Analysis Systems (DAS'14), Tours, France, pp. 106-110, 2014
M.R. Welekar and M.V.S.D. Rao, “Survey on Existing Techniques for Writer Verification”, International journal of advanced computer technology (COMPUSOFT), vol. 3, no. 5, pp. 773-776, 2014
S. Fiel, F. Hollaus, M. Gau and R. Sablatnig, “Writer identification on historical Glagolitic documents”, Proceedings of SPIE - The International Society for Optical Engineering, vol. 9021, article number 902102, Document Recognition and Retrieval XXI, San Francisco, United States, 2014
H. Ding, H. Wu, X. Zhang and JP. Chen, "Writer Identification Based on Local Contour Distribution Feature", International Journal of Signal Processing, Image Processing and Pattern Recognition, vol. 7, no.1, pp. 169-180, 2014
X. Wu, Y. Tang and W. Bu, "Offline Text-independent Writer Identification Based on Scale Invariant Feature Transform", IEEE Transactions on Information Forensics and Security, vol. 9, no. 3, pp. 526-536, 2014
A.J. Newell and L.D. Griffin, "Writer identification using oriented basic image features and the delta encoding", Pattern Recognition, vol. 47, no. 6, pp. 2255-2265, 2013
J. Chen and D. Lopresti, “Alternatives for Page Skew Compensation in Writer Identification”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 927-931, Washington DC, USA, August 2013
F. Kleber, S. Fiel, M. Diem and R. Sablatnig, “CVL-Database: An Off-line Database for Writer Retrieval, Writer Identification and Word Spotting”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 560-564, Washington DC, USA, August 2013
S. Fiel and R. Sablatnig, “Writer Identification and Writer Retrieval using the Fisher Vector on Visual Vocabularies”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 545-549, Washington DC, USA, August 2013
Z.A. Daniels and H.S. Baird, “Discriminating Features for Writer Identification”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 1417-1421, Washington DC, USA, August 2013
A. Nicolaou, M. Liwicki and R. Ingolf, “Oriented Local Binary Patterns for Writer Identification”, 2nd International Workshop and Tutorial on Automated Forensic Handwriting Analysis (AFHA'13), Washington DC, USA, August 2013
C. Djeddi, I. Siddiqi, L. Souici-Meslati and A. Ennaji, “Text-Independent Writer Recognition Using Multi-script Handwritten Texts”, Pattern Recognition Letters, vol. 34, no. 10, pp. 1196-1202, 2013
D. Hong, F.Y. Yang and X.F. Zhang, "Local fragment distribution features for text-independent writer identification", BioTechnology: An Indian Journal, vol.8, no. 6, pp. 855-860, 2013
S. Al-Maadeed, W. Ayouby, A. Hassaine and J. Alja’am, “QUWI: An Arabic and English Handwriting Dataset for Offline Writer Identification”, 13th International Conference on Frontiers in Handwriting Recognition (ICFHR'12), pp. 742-747, Bari, Italy, September 2012
A. Hassaine and S. Al-Maadeed, “ICFHR2012 competition on writer identification - Challenge 2: Arabic scripts”, 13th International Conference on Frontiers in Handwriting Recognition (ICFHR'12), pp. 835-840, Bari, Italy, September 2012
C. Djeddi, I. Siddiqi, L. Souici-Meslati and A. Ennaji, “Multi-script Writer Identification Optimized With Retrieval Mechanism”, 13th International Conference on Frontiers in Handwriting Recognition (ICFHR'12), pp. 507-512, Bari, Italy, September 2012
C. Djeddi, L. Souici-Meslati and A. Ennaji, “Writer recognition on arabic handwritten documents”, 5th International Conference on Image and Signal Processing, (ICISP'12), pp. 493-501, Agadir, Morocco, 2012

N. Stamatopoulos, G. Louloudis and B. Gatos, “Efficient Transcript Mapping to Ease the Creation of Document Image Segmentation Ground Truth with Text-Image Alignment”, 12th International Conference on Frontiers in Handwriting Recognition (ICFHR'10), pp. 226-231, Kolkata, India, November 2010.

One of the major issues in document image processing is the efficient creation of ground truth in order to be used for training and evaluation purposes. Since a large number of tools have to be trained and evaluated in realistic circumstances, we need to have a quick and low cost way to create the corresponding ground truth. Moreover, the specific need for having the correct text correlated with the corresponding image area in text line and word level makes the process of ground truth creation a difficult, tedious and costly task. In this paper, we introduce an efficient transcript mapping technique to ease the construction of document image segmentation ground truth that includes text-image alignment. The proposed text line transcript mapping technique is based on Hough transform that is guided by the number of the text lines. Concerning the word segmentation ground truth, a gap classification technique constrained by the number of the words is used. Experimental results prove that using the proposed technique for handwritten documents, the percentage of time saved for ground truth creation and text-image alignment is more than 90%.

A. Vij and J. Pruthi, “An automated Psychometric Analyzer based on Sentiment Analysis and Emotion Recognition for healthcare”, International Conference on Computational Intelligence and Data Science (ICCIDS'18), pp. 1184-1191, 2018
A. Vij and J. Pruthi, “An automated Psychometric Analyzer based on Sentiment Analysis and Emotion Recognition for healthcare”, Procedia Computer Science, vol. 132, pp. 1184-1191, 2018
M. Kassis, J. Nassour and J. El-Sana, “Alignment of Historical Handwritten Manuscripts Using Siamese Neural Network”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 293-298, 2017
M. Seuret, R. Ingold, and M. Liwicki, “N-light-N: A Highly-Adaptable Java Library for Document Analysis with Convolutional Auto-Encoders and Related Architectures”, 15th International Conference on Frontiers in Handwriting Recognition (ICFHR'16), pp. 459-464, 2016
G. Sadeh, L. Wolf, T. Hassner, N. Dershowitz and D.S. Ben-Ezra, “Viral Transcript Alignment”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 711-715, Nancy, France, 2015
W. Swaileh, K.A. Mohand and T. Paquet, "Multi-script Iterative Steerable Directional Filtering For Handwritten Text Line Extraction", 5th International Workshop on Multilingual OCR (MOCR'15), pp. 1241-1245, Nancy, France, 2015
Y. Leydiew, V. Églin, S. Bres and D. Stutzmann, "Learning-free text-image alignment for medieval manuscripts", 4th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 363-368, Creta, Grecce, September 2014
F. Yin, Q-F. Wang and C-L. Liu, “Transcript mapping for handwritten chinese documents by integrating character recognition model and geometric context”, Pattern Recognition, vol. 46, no. 10, pp. 2807-2818, 2013
X.D. Zhou, F. Yin, D.H. Wang, Q.F. Wang, M. Nakagawa and C.L. Liu, “Transcript Mapping for Handwritten Text Lines Using Conditional Random Fields”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 58-62, Beijing, China, September 2011
S. Vajda, A. Junaidi and G.A. Fink, “A Semi-Supervised Ensemble Learning Approach for Character Labeling with Minimal Human Effort”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 259-263, Beijing, China, September 2011
A. Fischer, V. Frinken, A. Fornés and H. Bunke, “Transcription Alignment of Latin Manuscripts Using Hidden Markov Models”, 1st Workshop on Historical Document Imaging and Processing (HIP'11), pp. 29-36, Beijing, China, September 2011
A. Junaidi, S. Vajda and G.A. Fink, “Lampung - a new handwritten character benchmark: database, labeling and recognition”, Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data (MOCR_AND '11), Beijing, China, September 2011

B. Gatos, N. Stamatopoulos and G. Louloudis, “ICFHR 2010 Handwriting Segmentation Contest”, 12th International Conference on Frontiers in Handwriting Recognition (ICFHR'10), pp. 737-742, Kolkata, India, November 2010.

The general objective of the ICFHR 2010 Handwriting Segmentation Contest organized in the context of ICFHR 2010 conference was to use well established evaluation practices and procedures in order to record recent advances in off-line handwriting segmentation. Two new benchmarking datasets, one for text line and one for word segmentation, were created in order to test and compare recent algorithms for handwritten document segmentation in realistic circumstances. Handwritten document images were produced by many writers in several languages (English, French, German and Greek). The dataset of previously organized contest (ICDAR 2009 Handwriting Segmentation Contest) was used as training dataset. This paper describes the contest details including the datasets, the ground truth, the evaluation criteria as well as the performance of the 7 submitted methods along with a short description of each method.

G.M. Binmakhashen and S.A. Mahmoud, “Document Layout Analysis: A Comprehensive Survey”, ACM Computing Surveys (CSUR), vol 52, no. 6, 2019
M. Pastor, "Text baseline detection, a single page trained system", Pattern Recognition, vol. 94, pp. 149-161, 2019
T. Gruning, R. Labahn, M. Diem, F. Kleber and S. Fiel, “READ-BAD: A new dataset and evaluation scheme for baseline detection in archival documents”, 13th IAPR International Workshop on Document Analysis Systems (DAS'18), pp. 351-356, 2018
T. Gruuening, G. Leifert, T. Strauss and R. Labahn, “A Robust and Binarization-Free Approach for Text Line Detection in Historical Documents”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 236-241, 2017
H. Jain and A.P. Kumar, "A Bottom Up Procedure for Text Line Segmentation of Latin Script", International Conference on Advances in Computing, Communications and Informatics (ICACCI'17), 2017
P. Sahare and S.B. Dhok, "Review of Text Extraction Algorithms for Scene-text and Document Images", IETE Technical Review, vol 34, no. 2, pp. 144-164, 2017
Y. Boulid, A. Souhar and M.Y. Elkettani, “Arabic handwritten text line extraction using connected component analysis from a multi agent perspective”, International Conference on Intelligent Systems Design and Applications (ISDA'16), pp. 80-87, 2016
P. Choudhary and N. Nain, “A Four-Tier Annotated Urdu Handwritten Text Image Dataset for Multidisciplinary Research on Urdu Script”, ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 15, no. 4, article no. 26, 2016
Y. Boulid, A. Souhar and M.Y. Elkettani, “Segmentation approach of Arabic manuscripts text lines based on multi agent systems”, International Journal of Computer Information Systems and Industrial Management Applications, vol. 8, no. 1, pp. 173-183, 2016
Y. Boulid, A. Souhar and M.Y. Elkettani, “Detection of Text Lines of Handwritten Arabic Manuscripts using Markov Decision Processes”, International Journal of Interactive Multimedia and Artificial Inteligence, vol. 4, no. 1, pp. 31-36, 2016
W. Swaileh, K.A. Mohand and T. Paquet, "Multi-script Iterative Steerable Directional Filtering For Handwritten Text Line Extraction", 5th International Workshop on Multilingual OCR (MOCR'15), pp. 1241-1245, Nancy, France, 2015
R. Cohen, I. Dinstein, J. El-Sana and K. Kedem, “Using Scale-Space Anisotropic Smoothing for Text Line Extraction in Historical Documents”, 11th International Conference on Image Analysis and Recognition (ICIAR'14), pp. 349-358, 2014
S. Al-Maadeed, A. Hassaine and A. Bouridan, “Using codebooks generated from text skeletonization for forensic writer identification”, 11th IEEE/ACS International Conference on Computer Systems and Applications, (AICCSA'14), pp. 729-733, Doha, Qatar, November 2014
Y. Elarian, A. Zidouri and W. Al-Khatib, "Ground-truth and Metric for the Evaluation of Arabic Handwritten Character Segmentation", 4th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 766-770, Creta, Grecce, September 2014
Y. Tang, X. Wu, and W. Bu, “Text Line Segmentation Based on Matched Filtering and Top-Down Grouping for Handwritten Documents”, 11th IAPR International Workshop on Document Analysis Systems (DAS'14), Tours, France, pp. 365-369, 2014
A. Lemaitre, J. Camillerapp and B. Coüasnon, “Handwritten text segmentation using blurred image”, Proceedings of SPIE - The International Society for Optical Engineering, vol. 9021, article number 90210D, Document Recognition and Retrieval XXI, San Francisco, United States, 2014
M. Diem, F. Kleber, S. Fiel, and R. Sablatnig, “Semi-automated document image clustering and retrieval”, Proceedings of SPIE - The International Society for Optical Engineering, vol. 9021, article number 90210M, Document Recognition and Retrieval XXI, San Francisco, United States, 2014
M. Diem, F. Kleber and R. Sablatnig, “Text Line Detection for Heterogeneous Documents”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 743-747, Washington DC, USA, August 2013
A. Fischer, V. Frinken and H. Bunke, “Hidden markov models for off-line cursive handwriting recognition”, Handbook of Statistics, vol. 31, pp. 421-442, 2013
M. Haji, K.A. Sahoo, T.D. Bui, C.Y. Suen and D. Ponson, “Statistical hypothesis testing for handwritten word segmentation algorithms”, 13th International Conference on Frontiers in Handwriting Recognition (ICFHR'12), pp. 114-119, Bari, Italy, September 2012
I.B. Messaoud, H. Amiri, H.E. Abed and V. Märgner, “A multilevel text line segmentation framework for handwritten historical documents”, 13th International Conference on Frontiers in Handwriting Recognition (ICFHR'12), pp. 515-520, Bari, Italy, September 2012
F. Wahlberg and Anders Brun, “Graph based line segmentation on cluttered handwritten manuscripts”, 21st International Conference on Pattern Recognition (ICPR 2012), pp. 1570-1573, Tsukuba, Japan, November 2012
F. Simistira, V. Papavassiliou, T. Stafylakis and V. Katsouros, “Enhancing Handwritten Word Segmentation by Employing Local Spatial Features”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 1314-1318, Beijing, China, September 2011

N. Stamatopoulos, B. Gatos and T. Georgiou, “Page Frame Detection for Double Page Document Images”, 9th International Workshop on Document Analysis Systems (DAS'10), pp. 401-408, Boston, MA, USA, June 2010.

Scanning two book pages at the same time helps to accelerate the scanning process but on the other hand introduces several difficulties if the user needs to have one page per image. A major difficulty is the appearance of noisy black borders around text areas as well as of noisy black stripes between the two pages. In this paper, we propose a novel algorithm for detecting the page frames on double page document images. Our aim is to split the image into the two pages as well as to remove noisy borders. First we apply a pre-processing which includes binarization, noise removal and image smoothing. Then, we detect the vertical zones of the two pages. In this stage, we introduce the vertical white run projections which have been proved efficient for detecting vertical zones of text areas. Finally, the horizontal zones of the two pages are detected based on horizontal white run projections. The experimental results on several double page document images from fifteen different books demonstrate the effectiveness of the proposed technique.

A. Kordecki, “Fast document area detection for scanned images”, Proceedings of SPIE - The International Society for Optical Engineering, 11041, art. no. 1104120., 2019
M.M. Reza, M.A. Rakib, S.S. Bukhari and A. Dengel, “A Robust Page Frame Detection Method for Complex Historical Document Images”, 8th International Conference on Pattern Recognition Applications and Methods. International Conference on Pattern Recognition Applications and Methods (ICPRAM-2019), 2019
C. Tensmeyer, B. Davis C. Wigington, I. Lee I and B. Barrett, “PageNet: Page boundary extraction in historical handwrien documents”, International Workshop on Historical Document Imaging and Processing (HIP'17), pp. 59-64, 2017
T. Mondal, N. Ragot, J.Y. Ramel and U. Pal, “Flexible Sequence Matching Technique:An Effective Learning-free Approach For word-spotting”, Pattern Recognition, DOI: doi:10.1016/j.patcog.2016.05.011, 2016
A. Chakraborty and M. Blumenstein, “Preserving Text Content from Historical Handwritten Documents”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 329-334, Santorini, Greece, 2016
A. Chakraborty and M. Blumenstein, “Marginal Noise Reduction in Historical Handwritten Documents - A Survey”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 323-328, Santorini, Greece, 2016
C. Crovato, D. Torok, R. Heidrich, B. Cerqueira and E. Velho , “Preparing for OCR of Books Handled by Visually Impaired”, 10th International Conference Ubiquitous Computing and Ambient Intelligence (UCAmI'16), pp. 419-430, 2016
M. Wagdy, I. Faye and D. Rohaya, “Border noise removal from the document image using X-Y cut and filtering technique based on morphological operation”, International Journal of Imaging and Robotics, vol.15, no. 3, pp. 88-105, 2015
L.P. Heras, D. Fernandez, A. Fornes, E. Valveny, G. Sanchez and J. Llados, “Perceptual Retrieval of Architectural Floor Plan Images”, 10th IAPR International Workshop on Graphics Recognition, 2013
M. Agrawal, and D. Doermann, “Clutter noise removal in binary document images”, International Journal on Document Analysis and Recognition (IJDAR) vol. 16, no. 4, pp. 351-369, 2013
A. Gordoa, F. Perronninb and E, Valveny, “Large-scale document image retrieval and classification with runlength histograms and binary embeddings”, Pattern Recognition, 2012
S.S. Bukhari, F. Shafait and T.M. Breuel, “Border Noise Removal of Camera-Captured Document Images using Page Frame Detection”, 4th International Workshop on Camera-Based Document Analysis and Recognition (CBDAR'11), Beijing, China, September 2011

G. Vamvakas, N. Stamatopoulos, B Gatos and S.J. Perantonis, “Automatic Unsupervised Parameter Selection for Character Segmentation”, 9th International Workshop on Document Analysis Systems (DAS'10), pp. 409-415, Boston, MA, USA, June 2010.

A major difficulty for designing a document image segmentation methodology is the proper value selection for all involved parameters. This is usually done after experimentations or after involving a training supervised phase which is a tedious process since the corresponding segmentation ground truth has to be created. In this paper, we propose a novel automatic unsupervised parameter selection methodology that can be applied to the character segmentation problem. It is based on clustering of the entities obtained as a result of the segmentation for different values of the parameters involved in the segmentation method. The clustering is performed using features extracted from the segmented entities based on zones and from the area that is formed from the projections of the upper/lower and left/right profiles. Optimization of an appropriate intra-class distance measure yields the optimal parameter vector. The method is evaluated on two segmentation algorithms, namely a recently proposed character segmentation technique based on skeleton segmentation paths, as well as the well known RLSA technique. The proposed parameter selection method is capable of finding the segmentation parameters that correspond to the optimal or near optimal segmentation result, as this is determined by counting the number of matches between the entities detected by the segmentation algorithm and the entities in the ground truth.

R.D. Lins and C. Gomes, “Automatic Training Set Generation for Better Historic Document Transcription and Compression”, 11th IAPR International Workshop on Document Analysis Systems (DAS'14), Tours, France, pp. 277-281, 2014

N. Stamatopoulos, B. Gatos and I. Pratikakis, “A Methodology for Document Image Dewarping Techniques Performance Evaluation”, 10th International Conference on Document Analysis and Recognition (ICDAR'09), pp. 956-960, Barcelona, Spain, July 2009.

One of the major challenges in camera document analysis is to deal with the page curl and perspective distortions. In spite of the prevalence of dewarping techniques, no standard for their performance evaluation method exists with most of the evaluation done to concentrate in visual pleasing impressions. This paper presents an objective evaluation methodology for document image dewarping techniques. First, manually selected sets of points of the initial warped image are matched with the corresponding points of the dewarping result using the Scale Invariant Feature Transform (SIFT). Each set corresponds to a representative text line of the image. Then, based on cubic polynomial curves that fit to the selected text lines, a comprehensive measure which reflects the entire performance of a dewarping technique in a concise quantitative manner is calculated. Experiments applying the proposed performance evaluation methodology on two state of the art dewarping techniques as well as a commercial package are presented.

J. Diaz-Escobar and V. Kober, “Optical character recognition of camera-captured images based on phase features”, Applications of Digital Image Processing XXXVIII, Article number 959903, 2015
S.S. Bukhari and A. Dengel, “Visual Appearance based Document Classification Methods: Performance Evaluation and Benchmarking”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 981-985, Nancy, France, 2015
J. Diaz-Escobar and V. Kober , “Optical character recognition of camera-captured images based on phase features”, SPIE 9599 Applications of Digital Image Processing XXXVIII, 959903, 2015
T.V. Vidula and V.V. Nair, “A robust performance evaluation scheme for rectification algorithms in camera captured document images”, 1st International Conference on Computational Systems and Communications (ICCSC '14), pp. 162-166, Trivandrum, India, 2014
A. Pugliese and S. Pomes, S. Ferilli and D. Redavid, “A novel model-based dewarping technique for advanced digital library systems”, Italian Research Conference on Digital Libraries (IRCDL'14), pp. 108-115, Padova, Italy, 2014
M. Rahnemoonfar and B. Plale, “Automatic performance evaluation of dewarping methods in large scale digitization of historical documents”, 13th ACM/IEEE-CS Joint Conference on Digital Libraries (JCLD'13), pp. 331-334, Indiana, USA, July 2013
L. Tong, G. Zhan, Q. Peng, Y. Li and Y. Li, “Warped document image correction method based on heterogeneous registration strategies”, 5th International Conference on Machine Vision (ICMV'12), 878308, Wuhan, China, October 2012
S.S. Bukhari, F. Shafait and T.M. Breuel, “An image based performance evaluation method for page dewarping algorithms using SIFT features”, 4th International Workshop on Camera-Based Document Analysis and Recognition (CBDAR'11), pp. 138-149, Beijing, China, September 2011
S. Pletschacher and A. Antonacopoulos, “The PAGE (Page Analysis and Ground-Truth Elements) Format Framework”, 20th International Conference on Pattern Recognition (ICPR'10), pp. 257-260, Istanbul, Turkey, August 2010

G. Louloudis, N. Stamatopoulos and B. Gatos, “A Novel Two Stage Evaluation Methodology for Word Segmentation Techniques”, 10th International Conference on Document Analysis and Recognition (ICDAR'09), pp. 686-690, Barcelona, Spain, July 2009.

Word segmentation is a critical stage towards word and character recognition as well as word spotting and mainly concerns two basic aspects, distance computation and gap classification. In this paper, we propose a robust evaluation methodology that treats the distance computation and the gap classification stages independently. The detection rate calculated for every distance metric corresponds to the maximum detection rate that we could have achieved if we had a perfect classifier for the gap classification stage. The proposed evaluation framework has been applied to several state-of-the-art techniques using a handwritten as well as a historical typewritten document set. The best combination of distance metric computation and gap classification state-of-the-art techniques is proposed.

K. Thangairulappan and K. Mohan, "Efficient segmentation of printed Tamil script into characters using projection and structure ", 4th International Conference on Image Information Processing (ICIIP'17), pp. 484-489, 2017
A. Abliz, W. Simayi, K. Moydin and A. Hamdulla, “Survey on Methods for Basic Unit Segmentation in Off-Line Handwritten Text Recognition”, International Journal of Future Generation Communication and Networking vol. 9, no. 11, pp. 137- 152, 2016
Y. Lin, Y. Li, Y. Song and F. Wang, “Fast document image comparison in multilingual corpus without OCR”, Multimedia Systems, pp. 1-10, DOI: 10.1007/s00530-015-0484-3, 2015
S. Pannirselvam and S. Ponmani, “A Novel Hybrid Model For Tamil Handwritten Character Segmentation”, International Journal of Scientific & Enginee ring Research, vol. 5, no. 11, pp. 271-275, 2014
S. Gomathi, R.U. Devi and S. Mohanavel, “Trimming approach for word segmentation with focus on overlapping characters”, International Conference on Computer Communications and Informatics (ICCCI'13), pp. 1-4, Coimbatore, India, 2013

B. Gatos, N. Stamatopoulos and G. Louloudis, “ICDAR2009 Handwriting Segmentation Contest”, 10th International Conference on Document Analysis and Recognition (ICDAR'09), pp. 1393-1397, Barcelona, Spain, July 2009.

The Handwriting Segmentation Contest was organized in the context of ICDAR2009 conference in order to record recent advances in off-line handwriting segmentation. This paper describes the contest details including the dataset, the ground truth and the evaluation criteria and presents the results of the 12 participating methods. The contest includes handwritten document images produced by many writers in several languages (English, French, German and Greek). These images are manually annotated in order to produce the ground truth which corresponds to the correct text line and word segmentation result. For the evaluation, a well established approach is used based on counting the number of matches between the entities detected by the segmentation algorithm and the entities in the ground truth.

B.M.K. Sharma and V.S. Dhaka, “Segmentation of handwritten words using structured support vector machine”, Pattern Analysis and Applications, DOI: https://doi.org/10.1007/s10044-019-00843-x, 2019
M.A. Garcia-Calderon, R.A. Garcia-Hernandez and Y. Ledeneva, “Providing order to the handwritten TLS task: A complexity index”, Journal of Intelligent and Fuzzy Systems, vol. 36, no. 5, pp. 4621-4631, 2019
M. Pastor, "Text baseline detection, a single page trained system", Pattern Recognition, vol. 94, pp. 149-161, 2019
G. Nagendar, V. Ranjan, G. Harit and C.V. Jawahar, "Efficient query specific dtw distance for document retrieval with unlimited vocabulary", Journal of Imaging, vol. 4, no. 2, 2018
A. Pradhan, S. Behera and P. Pujari, "Comparative study on recent text line segmentation methods of unconstrained handwritten scripts", International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS'17), pp. 3853-385, pp. 877-884, 2017
M. Yashoda, S.K. Niranjan and V.N.M. Aradhya, "eLL: Enhanced Linked List---An Approach for Handwritten Text Segmentation", Fourth International Conference on Information Systems Design and Intelligent Applications (INDIA'17), pp. 877-884, 2017
Y. Akbari, M.J. Jalili, J. Sadri, K. Nouri, I. Siddiqi and C. Djeddi, "A novel database for automatic processing of Persian handwritten bank checks", Pattern Recognition, vol 74, pp. 253-265, 2018
H. Jain and A.P. Kumar, "A Bottom Up Procedure for Text Line Segmentation of Latin Script", International Conference on Advances in Computing, Communications and Informatics (ICACCI'17), 2017
S.M. Obaidullah, C. Halder, K.C. Santosh, N. Das and K. Roy, “PHDIndic_11: page-level handwritten document image dataset of 11 official Indic scripts for script identification”, Multimedia Tools and Applications, DOI: 0.1007/s11042-017-4373-y, 2017
P. Sahare and S.B. Dhok, "Review of Text Extraction Algorithms for Scene-text and Document Images", IETE Technical Review, vol 34, no. 2, pp. 144-164, 2017
A. Abliz, W. Simayi, K. Moydin and A. Hamdulla, “Survey on Methods for Basic Unit Segmentation in Off-Line Handwritten Text Recognition”, International Journal of Future Generation Communication and Networking vol. 9, no. 11, pp. 137- 152, 2016
K. Kadam, D. Phadatare, A. Mali and P. Nimbalkar, P Gode, “Detection of Word by Inter - Intra Gap Technique for Handwritten Documents”, International Journal of Advanced Research in Computer Engineering & Technology (IJARCET), vol. 5, no. 4, pp. 936-939, 2016
P. Choudhary and N. Nain, “A Four-Tier Annotated Urdu Handwritten Text Image Dataset for Multidisciplinary Research on Urdu Script”, ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 15, no. 4, article no. 26, 2016
N. Aouadi and A. Kacem, “A proposal for touching component segmentation in Arabic manuscripts”, Pattern Analysis and Applications, doi="10.1007/s10044-016-0543-1, 2016
P. Barlas, D. Hebert, C. Chatelain, S. Adam and T. Paquet, “Language identification in document images”, Journal of Imaging Science and Technology, vol. 60, no. 1, article number 010407, 2016
A. Joshi and D. Bharadwaj, “A segmentation approach based on structured learning for recognition preprocessing”, International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT'16) , pp. 935-939, 2016
A. Masomi, H.R. Ghafari, K. Nouri, Y. Akbari, W. Bouamra and C. Djeddi, “A new database for writer demographics attributes detection based on off-line Persian and English handwriting”, 1st Mediterranean Conference on Pattern Recognition and Artificial Intelligence ( MedPRAI'16) , pp. 125-130, 2016
N. Aouadi, A.K. Echi and A. Belaid, “A Recognition based Approach for segmenting Touching Components in Arabic Manuscripts”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 21-25, Nancy, France, 2015
J. Ryu, H.I. Koo and N.I. Cho, “Word segmentation method for handwritten documents based on structured learning”, IEEE Signal Processing Letters, vol. 22, no. 8, pp. 1161-1165, 2015
S. Pannirselvam and S. Ponmani, “A Novel Hybrid Model For Tamil Handwritten Character Segmentation”, International Journal of Scientific & Enginee ring Research, vol. 5, no. 11, pp. 271-275, 2014
N. Aouadi, A. Kacem, A. Belaïd, "Segmentation of Touching Component in Arabic Manuscripts", 4th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 452-457, Creta, Grecce, September 2014
D. Hebert, P. Barlas, C. Chatelain, S. Adam and T. Paquet, "Writing type and language identification in heterogeneous and complex documents", 4th International Conference on Frontiers in Handwriting Recognition (ICFHR'14), pp. 411-416, Creta, Grecce, September 2014
Y. Tang, X. Wu, and W. Bu, “Text Line Segmentation Based on Matched Filtering and Top-Down Grouping for Handwritten Documents”, 11th IAPR International Workshop on Document Analysis Systems (DAS'14), Tours, France, pp. 365-369, 2014
A. Fischer, M. Baechler, A. Garz, M. Liwicki and R. Ingold, “A Combined System for Text Line Extraction and Handwriting Recognition in Historical Documents”, 11th IAPR International Workshop on Document Analysis Systems (DAS'14), Tours, France, pp. 71-75, 2014
J. Ryu, H.I. Koo and N.I. Cho, “Language-Independent Text-Line Extraction Algorithm for Handwritten Documents”, IEEE Signal Processing Letters, vol. 21, no. 9, pp. 1115-1119, 2014
D. Fernández-Mota, J. Lladós and A. Fornés, “A graph-based approach for segmenting touching lines in historical handwritten documents”, International Journal on Document Analysis and Recognition, vol. 17, no. 3, pp. 293-312, 2014
A. Lemaitre, J. Camillerapp and B. Coüasnon, “Handwritten text segmentation using blurred image”, Proceedings of SPIE - The International Society for Optical Engineering, vol. 9021, article number 90210D, Document Recognition and Retrieval XXI, San Francisco, United States, 2014
Y. Wu, S. Zha, H. Cao, D. Liu, and P. Natarajan, “A Markov chain based line segmentation framework for handwritten character recognition”, Proceedings of SPIE - The International Society for Optical Engineering, vol. 9021, article number 90210C, Document Recognition and Retrieval XXI, San Francisco, United States, 2014
M. Diem, F. Kleber, S. Fiel, and R. Sablatnig, “Semi-automated document image clustering and retrieval”, Proceedings of SPIE - The International Society for Optical Engineering, vol. 9021, article number 90210M, Document Recognition and Retrieval XXI, San Francisco, United States, 2014
F. Cruz and O.R. Terrades, “Handwritten Line Detection via an EM algorithm”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 718-722, Washington DC, USA, August 2013
M. Diem, F. Kleber and R. Sablatnig, “Text Line Detection for Heterogeneous Documents”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 743-747, Washington DC, USA, August 2013
B. Moysset and C. Kermorvant, “On the evaluation of handwritten text line detection algorithms”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 185-189, Washington DC, USA, August 2013
I. Rabaev, O. Biller, J. El-Sana, K. Kedem and I. Dinstein, “Text Line Detection in Corrupted and Damaged Historical Manuscripts”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 812-816, Washington DC, USA, August 2013
X. Peng, H. Cao, S. Setlur, V. Govindaraju and P. Natarajan, “Multilingual OCR research and applications: an overview”, 4th International Workshop on Multilingual OCR (MOCR'13), Washington DC, USA, August 2013
L. Kang, J. Kumar, P. Ye and D. Doermann, “Learning text-line segmentation using codebooks and graph partitioning”, 13th International Conference on Frontiers in Handwriting Recognition (ICFHR'12), pp. 63-68, Bari, Italy, September 2012
I.B. Messaoud, H. Amiri, H.E. Abed and V. Märgner, “A multilevel text line segmentation framework for handwritten historical documents”, 13th International Conference on Frontiers in Handwriting Recognition (ICFHR'12), pp. 515-520, Bari, Italy, September 2012
C. Djeddi, I. Siddiqi, L. Souici-Meslati and A. Ennaji, “Multi-script Writer Identification Optimized With Retrieval Mechanism”, 13th International Conference on Frontiers in Handwriting Recognition (ICFHR'12), pp. 507-512, Bari, Italy, September 2012
A. Alaei, U. Pal and P. Nagabhushan, “Dataset and ground truth for handwritten text in four different scripts”, International Journal of Pattern Recognition and Artificial Intelligence, vol. 26, no. 4, Article number 1253001, 2012
R. Sarkar, N. Das, S. Basu, M. Kundu, M. Nasipuri and D.K. Basu, “CMATERdb1: a database of unconstrained handwritten Bangla and Bangla–English mixed script document image”, International Journal on Document Analysis and Recognition, vol. 15, no. 1, pp. 71-83, 2012
L. Kang and D. Doermann, “Template based Segmentation of Touching Components in Handwritten Text Lines”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 569-573, Beijing, China, September 2011
F. Simistira, V. Papavassiliou, T. Stafylakis and V. Katsouros, “Enhancing Handwritten Word Segmentation by Employing Local Spatial Features”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 1314-1318, Beijing, China, September 2011
V. Manohar, S.N. Vitaladevuni, H. Cao, R. Prasad and P. Natarajan, “Graph Clustering-based Ensemble Method for Handwritten Text Line Segmentation”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 574-578, Beijing, China, September 2011
Y. Gao, X. Ding and C. Liu, “A Multi-scale Text Line Segmentation Method in Freestyle Handwritten Documents”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 643-647, Beijing, China, September 2011
A. Alaei, P. Nagabhushan and U. Pal, “A Benchmark Kannada Handwritten Document Dataset and its Segmentation”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 141-145, Beijing, China, September 2011
J. Kumar, L. Kang, D. Doermann and W. Abd-Almageed, “Segmentation of Handwritten Textlines in Presence of Touching Components”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 109-113, Beijing, China, September 2011
A. Alaei, P. Nagabhushan and U. Pal, “Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents”, Pattern Analysis & Applications, vol. 14, no. 4, pp. 381-394, 2011
A. Alaei, P. Nagabhushan and U. Pal, “A new dataset of Persian handwritten documents and its segmentation”, 7th Iranian Conference on Machine Vision and Image Processing (MVIP'11), pp. 1-5, Tehran, Iran, November 2011
T.D. Nguyen and G. Lee, “Text line segmentation in handwritten document images using tensor voting”, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, vol. E94-A, no. 11, pp. 2434-2441, 2011
E. Kavallieratou and F. Daskas, “Text Line Detection and Segmentation: Uneven Skew Angles and Hill-and-Dale Writing”, Journal of Universal Computer Science, vol. 17, no. 1, pp. 16-29, 2011
A. Lemaitre, J. Camillerapp and B. Coüasnon, “A perceptive method for handwritten text segmentation”, "Document recognition and retrieval XVIII - Electronic Imaging, San Francisco, United States, Article number 78740C, January 2011
A. Alaei, U. Pal and P. Nagabhushan, “A new scheme for unconstrained handwritten text-line segmentation”, Pattern Recognition, vol. 44, no. 4, pp. 917-928, 2011
P. Nagabhushan and A. Alaei, "Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique", International Journal on Computer Science and Engineering (IJCSE), vol. 2, no. 04, pp. 907-916, 2010
V. Papavassiliou, V. Katsouros and G. Carayannis, “A Morphological Approach for Text-Line Segmentation in Handwritten Documents”, 12th International Conference on Frontiers in Handwriting Recognition (ICFHR'10), pp. 19-24, Kolkata, India, November 2010

N. Stamatopoulos, G. Louloudis and B. Gatos, “A Comprehensive Evaluation Methodology for Noisy Historical Document Recognition Techniques”, 3rd Workshop on Analytics for Noisy Unstructured Text Data (AND'09), pp. 47-54, Barcelona, Spain, July 2009.

In this paper, we propose a new comprehensive methodology in order to evaluate the performance of noisy historical document recognition techniques. We aim to evaluate not only the final noisy recognition result but also the main intermediate stages of text line, word and character segmentation. For this purpose, we efficiently create the text line, word and character segmentation ground truth guided by the transcription of the historical documents. The proposed methodology consists of (i) a semi-automatic procedure in order to detect the text line, word and character segmentation ground truth regions making use of the correct document transcription, (ii) calculation of proper evaluation metrics in order to measure the performance of the final OCR result as well as of the intermediate segmentation stages. The semi-automatic procedure for detecting the ground truth regions has been evaluated and proved efficient and time saving. Experimental results prove that using the proposed technique, the percentage of time saved for the text line, word and character segmentation ground truth creation is more than 90%. An analytic experiment using a commercial OCR engine applied to a historical book is also presented.

C. Biswas, P.S. Mukherjee, K. Ghosh, U. Bhattacharya and S.K. Parui, “A Hybrid Deep Architecture for Robust Recognition of Text Lines of Degraded Printed Documents”, International Conference on Pattern Recognition (ICPR'18), pp. 3174-3179, 2018
F.H.F. Wu, “Applying Machine Learning in Optical Music Recognition of Numbered Music Notation”, International Journal of Multimedia Data Engineering and Management (IJMDEM), vol. 8, no. 3, 2017
R.C. Carrasco, “An open-source OCR evaluation tool”, First International Conference on Digital Access to Textual Cultural Heritage (DATeCH '14), pp. 179-184, 2014
T. Shima, K. Terasawa and T. Kawashima, “Image Processing for Historical Newspaper Archives”, 1st Workshop on Historical Document Imaging and Processing (HIP'11), pp. 127-132, Beijing, China, September 2011

N. Stamatopoulos, B. Gatos, I. Pratikakis and S.J. Perantonis, “A Two-Step Dewarping of Camera Document Images”, 8th International Workshop on Document Analysis Systems (DAS'08), pp. 209-216, Nara, Japan, September 2008.

Dewarping of camera document images has attracted a lot of interest over the last few years since warping not only reduces the document readability but also affects the accuracy of an OCR application. In this paper, a two-step approach for efficient dewarping of camera document images is presented. At a first step, a coarse dewarping is accomplished with the help of a transformation model which maps the projection of a curved surface to a 2D rectangular area. The projection of the curved surface is delimited by the two curved lines which fit the top and bottom text lines along with the two straight lines which fit to the left and right text boundaries. At a second step, fine dewarping is achieved based on words detection. All words are pose normalized guided by the lower and upper word baselines. Experimental results on several camera document images demonstrate the robustness and effectiveness of the proposed technique.

V.K.B. Ramanna, S. Bukhari and A. Dengel, “Document image dewarping using deep learning”, 8th International Conference on Pattern Recognition Applications and Methods (ICPRAM'19), pp. 524-531, 2019
L.M. Laskov, “Methods for document image de-warping”, Astronomical and Astrophysical Transactions, vol. 30, no. 4, pp. 511-522, 2018
R. Sun, S. Wang, L. Ji and Z. Wang, “Multi-scale document image rectification utilising text-features”, Electronics Letters, vol. 54, no. 8, pp. 502-503, 2018
H.C. Vinod and S.K. Niranjan, “ De-warping of camera captured document images”, 21st IEEE International Symposium on Consumer Electronics (ISCE'17), pp. 13-18, 2017
H.I. Koo and N.I. Cho, “Document image rectification using single-view or two-view camera input”, Computational Photography: Methods and Applications (Book Chapter), pp. 313-338. 2017
T. Kil, W. Seo, H.I. Koo and N.I. Cho, “Robust Document Image Dewarping Method Using Text-Lines and Line Segments”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 865-870, 2017
F. Bolelli, G. Borghia and C. Grana, "XDOCS: An Application to Index Historical Documents", Italian Research Conference on Digital Libraries and Multimedia Archives (IRCDL'18), 2018
S.H. Lee, D. Kim, S. Jadhav and S. Lee , “A restoration method for distorted comics to improve comic contents identification”, International Journal on Document Analysis and Recognition (IJDAR),DOI https://doi.org/10.1007/s1003, 2017
F. Bolelli, “Indexing of Historical Document Images: Ad Hoc Dewarping Technique for Handwritten Text”, 13th Italian Research Conference on Digital Libraries (IRCDL'17), pp. 45-55, 2017
B.S Kim, H.I. Koo and N.I. Cho, “Document Dewarping via Text-line based Optimization”, Pattern Recognition, doi:10.1016/j.patcog.2015.04.026, 2015
L. Galarza, Z. Wang and M. Adjouadi, “Book spread correction using a time of flight imaging sensor”, International Conference on Image Processing, Computer Vision, and Pattern Recognition (IPCV'14), pp. 250-254, Las Vegas, USA, August 2014
D. Oliveira, R. Lins, G. Torreão, J. Fan and M. Thielo, “An Efficient Algorithm for Segmenting Warped Text-lines in Document Images”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 250-254, Washington DC, USA, August 2013
Y. He, P. Pan, S. Xie, J. Sun and S. Naoi, “A book dewarping system by boundary-based 3D surface reconstruction”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 403-407, Washington DC, USA, August 2013
L. Tong, G. Zhan, Q. Peng, Y. Li and Y. Li, “Warped Document Image Mosaicing Method Based on Inflection Point Detection and Registration”, 4th International Conference on Multimedia Information Networking and Security (MINES'12), pp. 306-310, Nanjing, Jiangsu, China, November 2012
V. Kluzner and A. Tzadok, “Page Curling Correction for Scanned Books Using Local Distortion Information”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 890-894, Beijing, China, September 2011
M. Rahnemoonfar and A. Antonacopoulos, “Restoration of Arbitrarily Warped Historical Document Images Using Flow Lines”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 905-909, Beijing, China, September 2011
C. Neudecker, Z.M. Dogan, S. Schlarb, P. Missier, S. Sufi, A. Williams and K. Wolstencroft, “An Experimental Workflow Development Platform for Historical Document Digitisation and Analysis”, 1st Workshop on Historical Document Imaging and Processing (HIP'11), pp. 161-168, Beijing, China, September 2011
D.M. Oliveira, R.D. Lins, G. Torreão, J. Fan and M. Thielo, “A new algorithm for segmenting warped text-lines in document images”, ACM Symposium on Applied Computing (SAC'11), pp. 259-265, March 2011
S.S. Bukhari, F. Shafait and T.M. Breuel, “Performance Evaluation of Curled Textlines Segmentation Algorithms”, 9th International Workshop on Document Analysis Systems (DAS'10), (short paper), pp. 555-558, Boston, MA, USA, June 2010
R.D. Lins, D.M. Oliveira, G. Torreao, J. Fan and M. Thielo, “Correcting Book Binding Distortion in Scanned Documents”, 7th International Conference on Image Analysis and Recognition (ICIAR'10), pp. 355-365, Póvoa de Varzin, Portugal, June 2010
D.M. Oliveira, R.D. Lins, G. Torreao, J. Fan and M. Thielo, “A New Method for Text-Line Segmentation for Warped Documents”, 7th International Conference on Image Analysis and Recognition (ICIAR'10), pp. 398-408, Póvoa de Varzin, Portugal, June 2010
H.I. Koo and N.I. Cho, “State Estimation in a Document Image and Its Application in Text Block Identification and Text Line Extraction”, 11th European conference on Computer vision (ECCV'10), pp. 421-434, Heraklion, Crete, Greece, September 2010
S.S. Bukhari, T.M. Breuel and F. Shafait, “Textline information extraction from grayscale camera-captured document images”, 16th International Conference on Image Processing (ICIP'09), pp. 2013-2016, Cairo, November 2009
S.S. Bukhari, F. Shafait and T.M. Breuel, “Ridges based Curled Textline Region Detection from Grayscale Camera-Captured Document Images”, 13th International Conference on Computer Analysis of Images and Patterns (CAIP'09), pp. 173-180, Münster , Germany, September 2009
S.S. Bukhari, F. Shafait and T.M. Breuel, “Coupled Snakelet Model for Curled Textline Segmentation of Camera-Captured Document Images”, 10th International Conference on Document Analysis and Recognition (ICDAR'09), pp. 61-65, Barcelona, Spain, July 2009
S.S. Bukhari, F. Shafait and T.M. Breuel, “Dewarping of document images using coupled-snakes”, International Workshop on Camera-Based Document Analysis and Recognition (CBDAR'09), pp. 34-41, Barcelona, Spain, July 2009

G. Vamvakas, B. Gatos, N. Stamatopoulos and S.J. Perantonis, “A Complete Optical Character Recognition Methodology for Historical Documents”, 8th International Workshop on Document Analysis Systems (DAS'08), pp. 525-532, Nara, Japan, September 2008.

In this paper a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. This methodology consists of three steps: The first two steps refer to creating a database for training using a set of documents, while the third one refers to recognition of new document images. First, a pre-processing step that includes image binarization and enhancement takes place. At a second step a top - down segmentation approach is used in order to detect text lines, words and characters. A clustering scheme is then adopted in order to group characters of similar shape. This is a semi-automatic procedure since the user is able to interact at any time in order to correct possible errors of clustering and assign an ASCII label. After this step, a database is created in order to be used for recognition. Finally, in the third step, for every new document image the above segmentation approach takes place while the recognition is based on the character database that has been produced at the previous step.

S.K. Satapathy, S. Mishra, R.S. Sundeep, U.S.R. Teja, P.K. Mallick, M. Shruti and K. Shravya, "Deep learning based image recognition for vehicle number information", International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 8, pp. 52-55, 2019
J. Shentu and M. Zheng, "Mechanism design of data management system for nuclear power", Annals of Nuclear Energy, vol. 129, pp. pp. 21-29, 2019
S.T. Deokate and N.J. Uke, "Devnagari Script Categorization by Utilizing CNN and KNN", International Journal of Innovative Technology and Exploring Engineering (IJITEE), vol. 8, no. 5, pp. 1136-1140, 2019
N. Babu and A. Soumya, "Character Recognition in Historical Handwritten Documents–A Survey", International Conference on Communication and Signal Processing (ICCSP'19), pp. 299-304, 2019
D.M. Kassa and H. Hagras, “An Adaptive Segmentation Technique for the Ancient Ethiopian Ge'ez Language Digital Manuscripts”, 10th Computer Science and Electronic Engineering Conference (CEEC'18), pp. 83-88, 2018
C. Biswas, P.S. Mukherjee, K. Ghosh, U. Bhattacharya and S.K. Parui, “A Hybrid Deep Architecture for Robust Recognition of Text Lines of Degraded Printed Documents”, International Conference on Pattern Recognition (ICPR'18), pp. 3174-3179, 2018
S. Choudhary, N.K. Singh and S. Chichadwani, "Text Detection and Recognition from Scene Images using MSER and CNN", 2nd International Conference on Advances in Electronics, Computers and Communications (ICAECC'18), no. 8479419, 2018
P. Sharma, "A Survey on Optical Character Recognition Techniques", International Journal of Management, Technology And Engineering, vol 8, pp. 2889-2895, 2018
P. Chaturvedi, M. Saxena and B. Sharma, "A Bounding Box Approach for Performing Dynamic Optical Character Recognition in MATLAB", International Conference on Emerging Trends in Expert Applications & Security (ICETEAS 2018, pp. 117-123, Jaipur, India, 2018
P. Kumari and A. Kalia, "A Comparative study of GOCR, Tesseract and Improved Tesseract for Character Recognition", International Journal of Technical Innovation in Modern Engineering & Science (IJTIMES), vol. 4, no. 10, pp. 345-352, 2018
K. Kang and H. Xie, "Design and Implementation of Driver's License Recognition System", 13th International Conference on Computer Science & Education (ICCSE'18), pp. 140-143, 2018
A. Farhat, O. Hommos, A. Al-Zawqari A. Al-Qahtani, F. Bensaali, A. Amira and X. Zhai, "Optical character recognition on heterogeneous SoC for HD automatic number plate recognition system", Eurasip Journal on Image and Video Processing, vol. 2018, no. 1, 2018
B. Arizanović and V. Vučković, "Efficient Compression and Decompression Algorithms for OCR Systems", Facta Universitatis, Series: Electronics and Energetics, vol. 31, no. 3, pp. 461-485, 2018
D. Khurana and M. Malik, "Number Plate Detection: A Complete Review", International Journal of Engineering Technology and Computer Research (IJETCR), vol. 6, no. 3, pp. 4-8, 2018
F.D. Nurzam and E.T. Luthfi, "Implementation of Real-Time Scanner Java Language Text with Mobile Vision Android Based", International Conference on Information and Communications Technology (ICOIACT'18), pp. 724-729, 2018
J. Neema, M.C. Merin, M.J. Niya and T. Tresa, "Panulat-An Automated Pen", International Journal of Current Engineering and Scientific Research (IJCESR), vol 5, no. 3, pp. 22-26, 2018
G. Kotzé and F. Wolff, "Developing and evaluating a pipeline for Setswana OCR", Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech), pp. 236-241, 2017
V. Vučković and B. Arizanović, "General Character Segmentation Approach for Machine-Typed Documents", 4th International Conference on Electrical, Electronic and Computing Engineering (ETRAN'17), pp. RTI2.2.1-6, 2017
S. Garg and N. Mishra, “Pollution Check Control Using License Plate Extraction via Image Processing”, Soft Computing: Theories and Applications (SoCTA), pp. 133-146, 2017
H. Modi and M.C. Parikh, “A Review on Optical Character Recognition Techniques”, International Journal of Computer Applications, vol. 160, no. 6, pp. 20-24, 2017
A.A.H.O. Idris and I. Khirwar, “Number plate recognition: A brief overview”, International Journal For Technological Research In Engineering, vol. 4, no. 7, pp. 1023-1027, 2017
V. Vučković and B. Arizanović, "Efficient Character Segmentation Approach for Machine-Typed Documents", Expert Systems with Applications, http://dx.doi.org/10.1016/j.eswa.2017.03.027, 2017
V. Tumane, D. Chaurpagar, A. Somkuwar, G. Sonone and S. Marbade, “A novel approach for image cropping and automatic contactexraction from images”, International Journal of Research In Science & Engineering, vol. 3, no. 2, pp. 271-278, 2017
P. Satyanarayana, K. Sujitha, V.S.A. Kiron, P.A. Reddy and M. Ganesh, “Assistance Vision for Blind People Using k-NN Algorithm and Raspberry Pi”, 2nd International Conference on Micro-Electronics, Electromagnetics and Telecommunications (ICMEET'16), pp. 113-122, 2016
S. Deokate and N. Uke , “Various Traditional and Nature Inspired Approaches Used in Image Preprocessing”, International Conference on Advanced Technologies for Societal Applications (ICATSA'16), 2016
S. Chaudhary, R. Malhotra, M. Jaiswal, S. Gupta and R. Ahuja, “An Android app OCR+: for Text Translator, Document Editor, Business Card Reader & Equation Solver”, International Journal of Engineering Applied Sciences and Technology, vol. 1, no. 7, pp. 92 - 95, 2016
A. Farhat, A. Al-Zawqari, A. Al-Qahtani, O. Hommos, F. Bensaali, A. Amira and X. Zhai, “OCR based feature extraction and template matching algorithms for Qatari number plate”, International Conference on Industrial Informatics and Computer Systems (CIICS'16), Sharjah, pp. 1-5, 2016
G. Agre, J. Pimple, V. Bhavsar, V. Sarode and P. Dhande, “Optimized search engine to find image by providing keyword”, International Journal of Technical Research and Applications, vol. 4, no. 2, pp. 68-71, 2016
R. Hussain, A. Masood, H.A. Khan, K. Khurshid and I. Siddiqi, “Language Independent Keyword Based Information Retrieval System of Handwritten Documents using SVM Classifier and Converting Words into Shapes”, Pakistan Journal of Engineering and Applied Sciences, vol. 19, pp. 63 - 76, 2016
M.A. Agrawal and M.P. Brijpuria, “A Dynamic Object Identification Protocol for Intelligent Robotic Systems”, Internation Journal of Image, Graphics and Signal Processing (IJIGSP), vol. 7, no. 8, pp. 35-41, 2015
S.M. Aswatha, A.N. Talla, J. Mukhopadhyay and P. Bhowmick, “A method for extracting text from stone inscriptions using character spotting”, 12th Asian Conference on Computer Vision (ACCV'14), pp. 598-611, 2014
W. Pantke, A. Haak and V. Margner, “Color segmentation for historical documents using Markov random fields”, 6th International Conference on Soft Computing and Pattern Recognition (SoCPaR'14), pp: 151-156, 2014
F. Hollaus, S. Fiel, S. Saleem, R. Sablatnig and A. Camba, “Manuscript Investigation in the Sinai II Project”, Digital Presentation and Preservation of Cultural and Scientific Heritage (Digital Presentation and Preservation of Cultural and Scientific Heritage), issue: IV, pp: 200-205, 2014
S. Saleem, F. Hollaus and R. Sablatnig, “Recognition of degraded ancient characters based on dense SIFT”, First International Conference on Digital Access to Textual Cultural Heritage (DATeCH '14), pp. 15-20, 2014
K. Fouladi, B.N. Araabi and E. Kabir, “A fast and accurate contour-based method for writer-dependent offline handwritten Farsi/Arabic subwords recognition”, International Journal on Document Analysis and Recognition, vol. 17, no 2, pp. 181-203, 2014
D.S. Patil and M.S. Patel, “Simple and Fast Method for Offline English Handwritten Word Recognition”, Transactions on Electrical and Electronics Engineering (ITSI - TEEE), vol. 1, no. 2, pp. 98-100, 2013
A. Ul-Hasan, S.S. Bukhari, S.F. Rashid, F. Shafait and T.M. Breuel, “Semi-automated OCR database generation for Nabataean scripts”, 21st International Conference on Pattern Recognition (ICPR 2012), pp. 1667-1670, Tsukuba, Japan, November 2012
Y. Chherawala and M. Cheriet, “W-TSV: Weighted topological signature vector for lexicon reduction in handwritten Arabic documents”, Pattern Recognition, vol. 45, no. 9, pp. 3277-3287, 2012
T. Blanke, M. Bryant and M. Hedges, “Open source optical character recognition for historical research”, Journal of Documentation, vol. 68, no. 5, pp. 659-683, 2012
M. Diem, and R. Sablatnig, “Are Characters Objects?”, 12th International Conference on Frontiers in Handwriting Recognition (ICFHR'10), pp. 565-570, Kolkata, India, November 2010
C. Colutto, “Introducing a new image dissimilarity measure with an application to character image clustering in degraded historical documents”, 9th International Workshop on Document Analysis Systems (DAS'10), pp. 325-332, Boston, MA, USA, June 2010
D.R. Lee and S. Oh, “Minimum-Cost Path Algorithm for Separating Touching Characters”, 7th IASTED International Conference on Signal Processing, Pattern Recognition, and Applications (SPPRA'10), pp. 164-168, Innsbruck, Austria, February 2010
M. Diem, and R. Sablatnig, “Recognizing characters of ancient manuscripts”, Proceedings of SPIE - The International Society for Optical Engineering, vol. 7531, article number 753106, January 2010

N. Stamatopoulos, B. Gatos and S.J. Perantonis, “A Method for Combining Complementary Techniques for Document Image Segmentation”, 11th International Conference on Frontiers in Handwriting Recognition (ICFHR'08), pp. 235-240, Montreal, Canada, August 2008.

Image segmentation is a major task of handwritten document processing. Many of the proposed techniques for image segmentation are complementary, in the sense that each of them using a different approach, can solve different difficult problems such as overlapping, touching components, influence of author style etc. In this paper a combination method of different segmentation techniques is presented. Our goal is to exploit the segmentation results of complementary techniques and specific features of the initial image so as to generate improved segmentation results. Experimental results on handwriting line segmentation methods demonstrate the effectiveness of the proposed combination method.

E. Kavallieratou and F. Daskas, “Text Line Detection and Segmentation: Uneven Skew Angles and Hill-and-Dale Writing”, Journal of Universal Computer Science, vol. 17, no. 1, pp. 16-29, 2011

N. Stamatopoulos, B. Gatos and A. Kesidis, "Automatic Borders Detection of Camera Document Images", 2nd International Workshop on Camera-Based Document Analysis and Recognition (CBDAR'07), pp.71-78, Curitiba, Brazil, September 2007.

When capturing a document image through a digital camera are often framed by a noisy black border or include noisy text regions from neighbouring pages. In this paper, we present a novel technique for enhancing the document images are captured by a digital camera by automatically detecting the document borders and cutting out noisy black borders as well as noisy text regions appearing from neighbouring pages. Our methodology is based on projection profiles combined with a connected component labelling process. Signal cross-correlation is also used in order to verify the detected noisy text areas. Experimental results on several camera document images, mainly historical, documents indicate the effectiveness of the proposed technique.

A. Kordecki, “Fast document area detection for scanned images”, Proceedings of SPIE - The International Society for Optical Engineering, 11041, art. no. 1104120., 2019
A. Zhu, C. Zhang, Z. Li and S. Xiong, “Coarse-to-fine document localization in natural scene image with regional attention and recursive corner refinement”, International Journal on Document Analysis and Recognition (IJDAR), DOI: https://doi.org/10.1007/s10032-019-00341-0, 2019
M.M. Reza, M.A. Rakib, S.S. Bukhari and A. Dengel, “A Robust Page Frame Detection Method for Complex Historical Document Images”, 8th International Conference on Pattern Recognition Applications and Methods. International Conference on Pattern Recognition Applications and Methods (ICPRAM-2019), 2019
A. Kordecki, "Fast document area detection for scanned images", Eleventh International Conference on Machine Vision (ICMV'18), 1104120, 2018
S.A. Jain, N.S. Rani and N. Chandan, “Image Enhancement of Complex Document Images Using Histogram of Gradient Features”, International Journal of Engineering & Technology, vol. 7, no. 4.36, pp. 780-783, 2018
K.M. Hung, C.H. Yih and C.H. Yeh, “A Reading Assistant System Based on Restoring Warped Document Image”, Journal of Applied Science and Engineering, vol. 21, no. 3, pp. 475-484, 2018
S. Dey, B. Mitra, J. Mukhopadhyay and S. Sural, “A Comparative Study of Margin Noise Removal Algorithms on MarNR: A Margin Noise Dataset of Document Images”, 11st International Workshop on Open Services and Tools for Document Analysis (ICDAR-OST'17), pp. 35-39, 2017
K. Javed and F. Shafait, “Real-Time Document Localization in Natural Images by Recursive Application of a CNN”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 105-110, 2017
C. Adak, B.B. Chaudhuri and M. Blumenstein, “Legibility and Aesthetic Analysis of Handwriting”, 14th IAPR International Conference on Document Analysis and Recognition (ICDAR'17), pp. 175-182, 2017
S. Prum, “Text-zone detection and rectification in document images captured by smartphone”, 1st EAI International Conference on Computer Science and Engineering (COMPSE'16), 2017
C. Adak, B.B. Chaudhuri and M. Blumenstein, “Writer identification by training on one script but testing on another”, 23rd International Conference on Pattern Recognition (ICPR'16), pp. 1153-1158, 2016
S. He and L. Schomaker, “Writer identification using curvature-free features”, Pattern Recognition, DOI: 10.1016/j.patcog.2016.09.044, 2016
Z. Huang, J. Gu, G. Meng and C. Pan, "Text line extraction of curved document images using hybrid metric", 3rd IAPR Asian Conference on Pattern Recognition (ACPR), Malaysia, pp. 251-255, 2016
D. Karatzas, V. Poulain d’Andecy, M. Rusinol, A. Chica and P.P. Vazque, “Human-Document Interaction systems - a new frontier for document image analysis”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 369-374, Santorini, Greece, 2016
A. Chakraborty and M. Blumenstein, “Marginal Noise Reduction in Historical Handwritten Documents - A Survey”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 323-328, Santorini, Greece, 2016
T. Mondal, N. Ragot, J.Y. Ramel and U. Pal, “Performance Evaluation of DTW and its Variants for Word Spotting in Degraded Documents”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 1141-1145, Nancy, France, 2015
M. Villegas, J.A. Sanchezand and E. Vidal, “Optical Modelling and Language Modelling Trade-off for Handwritten Text Recognition”, 13th International Conference on Document Analysis and Recognition (ICDAR'15), pp. 831-835, Nancy, France, 2015
M. Wagdy, I. Faye and D. Rohaya, “Border noise removal from the document image using X-Y cut and filtering technique based on morphological operation”, International Journal of Imaging and Robotics, vol.15, no. 3, pp. 88-105, 2015
M. Liu, C. Li, W. Zhu and A. Lim, “A morphology-based border noise removal method for camera-captured label images”, 5th International Workshop on Camera-Based Document Analysis and Recognition (CBDAR'13), pp. 126-138, Washington DC, USA, August 2013
M. Wagdy, I. Faye and D. Rohaya, “Border Noise Removal and Clean Up Based on Retinex Theory”, 1st International Conference on Advanced Data and Information Engineering (DaEng-2013) Lecture Notes in Electrical Engineering Vol. 285, pp. 345-352, 2013
M. Agrawal, and D. Doermann, “Clutter noise removal in binary document images”, International Journal on Document Analysis and Recognition (IJDAR) vol. 16, no. 4, pp. 351-369, 2013
S. Kaur and P.S. Mann, “Improved XY cut Page Segmentation Algorithm for Border Noise”,International Journal of Computer Science & Engineering Technology (IJCSET), vol. 3, no. 5, pp. 149-151, 2013
S. Kaur, P.S. Mann and S. Kaur, “Page Segmentation using XY Cut Algorithm in OCR System-A Review”, International Journal of Computers and Technology (IJCT), vol. 6, no. 3, pp. 436-440, 2013
S. Kaur, P.S. Mann and S. Khurana, “Page Segmentation in OCR System-A Review”, International Journal of Computer Science and Information Technologies (IJCSIT), vol. 4, no. 2, pp. 420-422, 2013
M. Shamqoli and H. Khosravi, “Border detection of document images scanned from large books”, 8th Iranian Conference on Machine Vision and Image Processing (MVIP 2013), Zanjan, Iran, pp. 84-88, 2013
M. Shamqoli and H. Khosravi, “Warped document restoration by recovering shape of the surface”, 8th Iranian Conference on Machine Vision and Image Processing (MVIP 2013), Zanjan, Iran, pp. 262-265, 2013
F. Shafait and T.M. Breuel, “The Effect of Border Noise on the Performance of Projection-Based Page Segmentation Methods”, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 33, no. 4, pp. 846-851, 2011
S.S. Bukhari, F. Shafait and T.M. Breuel, “Border Noise Removal of Camera-Captured Document Images using Page Frame Detection”, 4th International Workshop on Camera-Based Document Analysis and Recognition (CBDAR'11), Beijing, China, September 2011
M.M. Haji, T.D. Bui and C.Y. Suen, “Simultaneous Document Margin Removal and Skew Correction Based on Corner Detection in Projection Profiles”, 15th International Conference on Image Analysis and Processing (ICIAP'09), pp. 1025-1034, Vietri sul Mare, Italy, September 2009
F. Shafait, D. Keysers and T.M. Breuel, “Response to "Projection Methods Require Black Border Removal”, IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), vol. 31, no. 4, pp.763-764, 2009
F. Shafait, J. Beusekom, D. Keysers and T.M. Breuel, “Document cleanup using page frame detection”, International Journal on Document Analysis and Recognition (IJDAR), vol. 11, no. 2, pp. 81-96, 2008

B. Gatos, A. Antonacopoulos and N. Stamatopoulos, “ICDAR2007 Handwriting Segmentation Contest”, 9th International Conference on Document Analysis and Recognition (ICDAR'07), pp. 1284-1288, Curitiba, Brazil, September 2007.

This paper presents the results of the Handwriting Segmentation Contest that was organized in the context of ICDAR2007 conference. The aim of this contest was to use well established evaluation practices and procedures in order to record recent advances in off-line handwriting segmentation. Two benchmarking datasets (one for text line and one for word segmentation) were used in a common evaluation platform in order to test and compare all submitted algorithms for handwritten document segmentation in realistic circumstances. The results of the evaluation of five algorithms submitted by participants as well as of two state-of-the-art algorithms are presented. The performance evaluation method is based on counting the number of matches between the text lines or words detected by the algorithms and the text line or words of the ground truth.

S. Kundu, S. Paul, S.K. Bera, A. Abraham and R. Sarkar, "Text-line Extraction from Handwritten Document Images using GAN", Expert Systems with Applications, vol. 140, 2020
M.H.M. Dyla and F. Morain-Nicolier, "Text line segmentation and binarization of handwritten historical documents using the fast and adaptive bidimensional empirical mode decomposition", Optik, vol. pp. 52-63, 2019
M. Pastor, "Text baseline detection, a single page trained system", Pattern Recognition, vol. 94, pp. 149-161, 2019
H. Jain and A.P. Kumar, "A Bottom Up Procedure for Text Line Segmentation of Latin Script", International Conference on Advances in Computing, Communications and Informatics (ICACCI'17), 2017
Renuka and S. Terdal, "Markov Random Field Region Based Text Detection and Segmentation by Stroke Width Transformation", International Journal for Innovative Research in Science & Technology, vol. 4, no.2, pp. 195-200, 2017
P. Sahare and S.B. Dhok, "Review of Text Extraction Algorithms for Scene-text and Document Images", IETE Technical Review, DOI: 10.1080/02564602.2016.1160805, 2016
N.V. Borse and I.R. Shaikh, “Text Extraction from Handwritten Documents”, International Journal Of Engineering, Education And Technology (ARDIJEET), vol. 3, no.2, 2015
T. Saba1, A. Rehman, A. Altameem. and M. Uddin, “Annotated comparisons of proposed preprocessing techniques for script recognition”, Neural Computing and Applications, DOI: 10.1007/s00521-014-1618-9, 2014
Y. Tang, X. Wu, and W. Bu, “Text Line Segmentation Based on Matched Filtering and Top-Down Grouping for Handwritten Documents”, 11th IAPR International Workshop on Document Analysis Systems (DAS'14), Tours, France, pp. 365-369, 2014
S.S. Bukhari, F. Shafait and T.M. Breuel, “Towards Generic Text-Line Extraction”, 12th International Conference on Document Analysis and Recognition (ICDAR'13), pp. 748-752, Washington DC, USA, August 2013
N. Modi and K. Jindal, “Text Line detection and Segmentation in Handwritten Gurumukhi Scripts”, International Journal of Advanced Research in Computer Science and Software Engineering, vol. 3, no. 5, pp. 1075-1080, 2013
R. Sarkar, S, Halder, S. Malakar, N. Das, S, Basu and M. Nasipuri, “Text line extraction from handwritten document pages based on line contour estimation”, 3rd International Conference on Computing, Communication and Networking Technologies (ICCCNT 2012), Article number 6395873, Coimbatore, India, 2012
S. Jindal and G.S. Lehal, “Line segmentation of handwritten Gurmukhi manuscripts”, Workshop on Document Analysis and Recognition (DAR 2012), pp. 74-78, Mumbai, 2012
A. Alaei, U. Pal and P. Nagabhushan, “Dataset and ground truth for handwritten text in four different scripts”, International Journal of Pattern Recognition and Artificial Intelligence, vol. 26, no. 4, Article number 1253001, 2012
A. Rehman and T. Saba, “Off-line cursive script recognition: current advances, comparisons and remaining problems”, Artificial Intelligence Review, vol. 37, no. 4, pp 261-288, 2012
R. Saabni and J. El-Sana, “Language-Independent Text Lines Extraction Using Seam Carving”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 563-568, Beijing, China, September 2011
F. Simistira, V. Papavassiliou, T. Stafylakis and V. Katsouros, “Enhancing Handwritten Word Segmentation by Employing Local Spatial Features”, 11th International Conference on Document Analysis and Recognition (ICDAR'11), pp. 1314-1318, Beijing, China, September 2011
A. Alaei, P. Nagabhushan and U. Pal, “Piece-wise painting technique for line segmentation of unconstrained handwritten text: a specific study with Persian text documents”, Pattern Analysis & Applications, vol. 14, no. 4, pp. 381-394, 2011
A. Sánchez, C.A.B. Mello, P.D. Suárez and A. Lopes, “Automatic line and word segmentation applied to densely line-skewed historical handwritten document images”, Integrated Computer-Aided Engineering, vol. 18, no. 2, pp. 125-142, 2011
A. Rehman and T. Saba, “Performance analysis of character segmentation approach for cursive script recognition on benchmark database”, Digital Signal Processing, vol. 21, no. 3, pp. 486-490, 2011
E. Kavallieratou and F. Daskas, “Text Line Detection and Segmentation: Uneven Skew Angles and Hill-and-Dale Writing”, Journal of Universal Computer Science, vol. 17, no. 1, 2011, pp. 16-29, 2011
A. Alaei, U. Pal and P. Nagabhushan, “A new scheme for unconstrained handwritten text-line segmentation”, Pattern Recognition vol. 44, no. 4, pp. 917-928, 2011
E. Kavallieratou, "Text line detection and segmentation: uneven skew angles and hill-and-dale writing", ACM Symposium on Applied Computing (SAC'10), Sierre, Switzerland, pp. 59-60, 2010
V. Papavassiliou, V. Katsouros and G. Carayannis, “A Morphological Approach for Text-Line Segmentation in Handwritten Documents”, 12th International Conference on Frontiers in Handwriting Recognition (ICFHR'10), pp. 19-24, Kolkata, India, November 2010
H.I. Koo and N.I. Cho, “State Estimation in a Document Image and Its Application in Text Block Identification and Text Line Extraction”, 11th European conference on Computer vision (ECCV'10), pp. 421-434, Heraklion, Crete, Greece, September 2010
P. Nagabhushan and A. Alaei, "Tracing and Straightening the Baseline in Handwritten Persian/Arabic Text-line: A New Approach Based on Painting-technique", International Journal on Computer Science and Engineering (IJCSE'10), vol. 2, no. 04, pp. 907-916, 2010
R. Doumat, E.E. Zsigmond and J.M. Pinon, “User Trace-Based Recommendation System for a Digital Archive”, 8th International Conference on Case-Based Reasoning (ICCBR'10), pp. 360-374 , Alessandria, Italy, 2010
N. Ouwayed, A. Belaïd and F. Auger, “General text line extraction approach based on locally orientation estimation”, 17th Document Recognition and Retrieval Conference (DDR'10), San Jose, CA, United States, pp. 1-10, 2010
V. Papavassiliou, T. Stafylakis, V. Katsouros and G. Carayiannis, “Handwritten document image segmentation into text lines and words”, Pattern Recognition Journal, vol. 43, no. 1, pp. 369-377, 2010
F. Kurniawan and D. Mohamad, “Performance Comparison between Contour-Based and Enhanced Heuristic-Based for Character Segmentation”, 5th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS'09), Marrakesh, Morocco, pp. 112-117, 2009
A. Khandelwal, P. Choudhury, R. Sarkar, S. Basu, M. Nasipuri and N. Das, “Text Line Segmentation for Unconstrained Handwritten Document Images Using Neighborhood Connected Component Analysis”, 3rd International Conference on Pattern Recognition and Machine Intelligence (PreMI'09), pp. 369-374, New Delhi, India, December 2009
S.S. Bukhari, F. Shafait and T.M. Breuel, “Script-Independent Handwritten Textlines Segmentation using Active Contours”, 10th International Conference on Document Analysis and Recognition (ICDAR'09), pp. 446-450, Barcelona, Spain, July 2009
S.S. Bukhari, F. Shafait and T.M. Breuel, “Coupled Snakelet Model for Curled Textline Segmentation of Camera-Captured Document Images”, 10th International Conference on Document Analysis and Recognition (ICDAR'09), pp. 61-65, Barcelona, Spain, July 2009
E. Saund, J. Lin and P. Sarkar, “PixLabeler: User Interface for Pixel-Level Labeling of Elements in Document Images”, 10th International Conference on Document Analysis and Recognition (ICDAR'09), pp. 646-650, Barcelona, Spain, July 2009
R.P. Santos, G.S. Clemente, T.I. Ren and G.D. Cavalcanti, “Text Line Segmentation Based on Morphology and Histogram Projection”, 10th International Conference on Document Analysis and Recognition (ICDAR'09), pp. 651-655, Barcelona, Spain, July 2009
A. Rehman, D. Mohamad, F. Kurniawan and M. Ilays, “Performance analysis of segmentation approach for cursive handwriting on benchmark database”, International Conference on Computer Systems and Applications (AICCSA'09), pp. 265-270, Rabat, Morocco, May 2009
R. Doumat, E. Egyed-Zsigmond and J.M. Pinon, “Digitized ancient documents...What's next?”, Document Numerique, vol. 12, no. 1, pp. 31-51, 2009
T. Stafylakis, V. Papavassiliou, V. Katsouros and G. Carayiannis, “Robust text-line and word segmentation for handwritten documents images”, International Conference on Acoustics, Speech and Signal Processing, pp. 3393–3396, Las Vegas, USA, 2008
S.S. Bukhari, F. Shafait and T.M. Breuel, “Segmentation of Curled Textlines Using Active Contours”, 8th International Workshop on Document Analysis Systems (DAS'08), pp. 270-277, Nara, Japan, September 2008
R. Doumat, E.E. Zsigmond, J.M. Pinon and E. Csiszar, “Online ancient documents: Armarius”, 8th ACM symposium on Document engineering (DocEng'08), pp. 127-130, Sao Paulo, Brazil, 2008

G. Vamvakas, B. Gatos, S. Petridis and N. Stamatopoulos, “An Efficient Feature Extraction and Dimensionality Reduction Scheme for Isolated Greek Handwritten Character Recognition”, 9th International Conference on Document Analysis and Recognition (ICDAR'07), pp.1073-1077, Curitiba, Brazil, September 2007.

In this paper, we present an off-line methodology for isolated Greek handwritten character recognition based on efficient feature extraction followed by a suitable feature vector dimensionality reduction scheme. Extracted features are based on (i) horizontal and vertical zones, (ii) the projections of the character profiles, (iii) distances from the character boundaries and (iv) profiles from the character edges. The combination of these types of features leads to a 325- dimensional feature vector. At a next step, a dimensionality reduction technique is applied, according to which the dimension of the feature space is lowered down to comprise only the features pertinent to the discrimination of characters into the given set of letters. In this paper, we also present a new Greek handwritten database of 36,960 characters that we created in order to measure the performance of the proposed methodology.

R. Latypov R and E. Stolov, “A New Method for Slant Calculation in Off-Line Handwriting Analysis”, 41st International Conference on Telecommunications and Signal Processing (TSP'18), 2018
M. Yağanoğlu and C. Köse, “Wearable Vibration Based Computer Interaction and Communication System for Deaf”, Applied Science, vol. 7, no. 12, 2017
V.L. Padmalatha and M. Sampoorna, “Optimized Voronoi Image Zoning for Handwritten Character Recognition based on Kohonen Neural Networks”, International Journal of Engineering Applied Sciences and Technology, vol. 2, no. 2, pp. 76-80, 2016
R. Hussain, A. Raza, I. Siddiqi, K. Khurshid and C. Djeddi , “A comprehensive survey of handwritten document benchmarks: structure, usage and evaluation”, IEURASIP Journal on Image and Video Processing, DOI 10.1186/s13640-015-0102-5, 2015
L.P. Saxena, “A correlation coefficient based model to separate and classify noncursive (Grantha script) symbols”, International Journal on Electrical Engineering and Informatics, vol. 7, no. 3, pp. 531-540, 2015
D. KumarVerma and A. Khatri, “An Improvement Study for Optical Character Recognition by using Inverse–SVM in Image Processing Technique”, International Journal of Advanced Research in Education Technology (IJARET), vol. 2, no. 2, pp. 101-105, 2015
S.P. Patil and M.P.P. Kulkarni, “Online Handwritten Sanskrit Character Recognition Using Support Vector Classification”, International Journal of Engineering Research and Applications, vol. 4, no. 5, pp. 82-91, 2015
P. Kulkarni, S. Patil and G. Dhanokar, “Review On Marathi And Sanskrit Word Recognition Using Genetic Algorithm”, International Journal of Informative & Futuristic Research, vol. 2, no. 7, pp. 2144-2152, 2015
R. Kaur and S. Gujral, “Recognition of similar shaped isolated handwritten Gurumukhi characters using machine learning”, 5th International Conference on Confluence The Next Generation Information Technology Summit;, India, pp. 251-256, 2014
M. Kumar, M.K. Jindal and R.K. Sharma, “A Novel Hierarchical Technique for Offline Handwritten Gurmukhi Character Recognition”, National Academy Science Letters, vol. 37, no. 6, pp. 567-572, 2014
R. Kaur and S. Gujral, “Recognition of Similar Shaped Isolated Gurumukhi Characters Using ML Algorithms”, 2nd International Conference on Computer and Intelligent Systems (ICCIS’14), Bangkok, Thailand, pp. 41-46, 2014
S. Panwar and N. Nain, “An Efficient Feature Extraction Method for Segmented Cursive Characters Recognition”, International Convention on Information and Communication Technology, Electronics and Microelectronic (MIPRO'14), Adriatic Coast, Croatia, pp. 1153-1158, 2014
D. Impedovo and G. Pirlo, “Zoning Methods for Handwritten Character Recognition: A Survey”, Patern Recognition, vol. 47, no. 3, pp. 969-981, 2013
H. Bobade and A. Sahu, “Character Recognition Technique using Neural Network”, International Journal of Engineering Research and Applications (IJERA), vol. 3, no. 2, pp. 1778-1783, 2013
O.P. Sharma, M.K. Ghose, K.B. Shah and B.K. Thakur, “Recent Trends and Tools for Feature Extraction in OCR Technology”, International Journal of Soft Computing & Engineering, vol. 2, no. 6, pp. 220-223, 2013
S.A. Vaidya and B.R. Bombade, “A Novel Approach of Handwritten Character Recognition using Positional Feature Extraction”, International Journal of Computer Science and Mobile Computing, vol. 2, no. 6, pp. 179-186, 2013
G. Pirlo G. and D. Impedovo, “Adaptive membership functions for handwritten character recognition by Voronoi-based image zoning”, IEEE Transactions on Image Processing, vol. 21, no. 9, pp. 3827-3837, 2012
K.S. Siddharth, M. Jangid, R. Dhir and R. Rani, “Handwritten Gurmukhi Character Recognition Using Statistical and Background Directional Distribution”, International Journal on Computer Science and Engineering, vol. 3, no. 6, pp. 2332-2345, 2011
S. Dabra, S. Agrawal and R.K. Challa, “A novel feature set for recognition of similar shaped handwritten Hindi characters using machine learning”, 1st International Conference on Computer Science, Engineering and Applications (CCSEA'11), pp. 25-35, Chennai, India, July 2011

G. Vamvakas, N. Stamatopoulos, B Gatos, I. Pratikakis and S.J. Perantonis, “Greek Handwritten Character Recognition”, 11th Panhellenic Conference on Informatics (PCI'07), pp. 343-352, Patras, Greece, May 2007.

In this paper, we present a database and methods for off-line isolated Greek handwritten character recognition. The Computational Intelligence Laboratory (CIL) Database consists of 35,000 isolated and labelled Greek handwritten characters. This database was tested with an existing structural approach for Greek handwritten characters as well as with a novel approach based on a hybrid feature extraction scheme. According to this approach, two types of features are combined in a hybrid fashion. The first one divides the character image into a set of zones and calculates the density of the character pixels in each zone. In the second type of features, the area that is formed from the projections of the upper and lower as well as of the left and right character profiles is calculated. For the classification step, Support Vectors Machines (SVM) and Euclidean Minimum Distance Classifier (EMDC) are used.

G. Vamvakas, B. Gatos, I. Pratikakis, N. Stamatopoulos, A. Roniotis and S.J. Perantonis, “Hybrid Off-Line OCR for Isolated Handwritten Greek Characters”, 4th IASTED International Conference on Signal Processing, Pattern Recognition, and Applications (SPPRA'07), pp. 197-202, Innsbruck, Austria, February 2007.

In this paper, we present an off-line OCR methodology for isolated handwritten Greek characters mainly based on a robust hybrid feature extraction scheme. First, image pre-processing is performed in order to normalize the character images as well as to correct character slant. At the next step, two types of features are combined in a hybrid fashion. The first one divides the character image into a set of zones and calculates the density of the character pixels in each zone. In the second type of features, the area that is formed from the projections of the upper and lower as well as of the left and right character profiles is calculated. For the classification step Support Vectors Machines (SVM) are used. The performance of the proposed methodology is demonstrated after testing with the CIL database (handwritten Greek character database), which was created from 100 different writers.

S. Fujino, T. Hasegawa, M. Ueno, N. Mori and K. Matsumoto, “The Convolutional Neural Network Model Based on an Evolutionary Approach For Interactive Picture Book”, 20th Asia PacificSymposium - Intelligent and Evolutionary Systems (IES'16), pp. 103-106, Canberra, Australia, 2016
G. Pagare and K. Verma, “Associative Memory Model for Distorted On-Line Devanagari Character Recognition”, 5th International Conference on Advances in Computing and Communications (ICACC'15), pp. 46-49, Kerala, India 2015
T.V. Thach, N.H. Phi and H. Trang, “Isolated Vietnamese handwriting recognition embedded system applied combined feature extraction method”, 8th International Conference on Advanced Technologies for Communications (ATC'15), Viet Nam, 2015
M. Ueno, K. Fukuda, A. Yasui, N. Mori, and K. Matsumoto, “Casook: Creative animating sketchbook”, Patern Recognition, 12th International Symposium on Distributed Computing and Artificial Intelligence (DCAI'15), Spain, 2015
D. Impedovo and G. Pirlo, “Zoning Methods for Handwritten Character Recognition: A Survey”, Patern Recognition, vol. 47, no. 3, pp. 969-981, 2014
H. Pham-Van, H.T. Nguyen and S.J. Wu, “Vietnamese handwriting recognition for automatic data entry in enrollment forms”, 2nd International Conference on Information Technology and Electronic Commerce (ICITEC'14), pp. 141-145, 2014
D. Impedovo, F.M. Mangini and G. Pirlo, “A new adaptive zoning technique for handwritten digit recognition”, 17th International Conference on Image Analysis and Processing (ICIAP'13), pp. 91-100, Naples, Italy, September 2013
O.P. Sharma, M.K. Ghose, K.B. Shah and B.K. Thakur, “ Recent Trends and Tools for Feature Extraction in OCR Technology”, International Journal of Soft Computing & Engineering, vol. 2, no. 6, pp. 220-223, 2013
A. Rehman and T. Saba, “Off-line cursive script recognition: current advances, comparisons and remaining problems”, Artificial Intelligence Review, vol. 37, no. 4, pp. 261-288, 2012
T. Saba, A. Rehman, and G. Sulong, “Off-line cursive script recognition: current advances, comparisons and remaining problems”, International Journal of Innovative Computing, Information and Control, vol.7, no. 9, pp. 5211-5224, 2011
G. Paliouras, C.D. Spyropoulos and G. Tsatsaronis, “Bootstrapping Ontology Evolution with Multimedia Information Extraction”, Multimedia Information Extraction, LNAI 6050, pp. 1–17, 2011
H. Hamdi and M. Khemakhem, “Distributing Arabic Handwriting Recognition System Based on the Combination of Grid Meta-Scheduling and P2P Technologies (Omnivore)”, Universal Journal of Computer Science and Engineering Technology, vol. 1, no 1, pp. 31 - 35, 2010
P.A. Phuong, N.Q. Tao and L.C. Mai, “An Efficient Model for Isolated Vietnamese Handwritten Recognition”, International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP'08), pp. 358-361, 2008

Dr. Nikolaos Stamatopoulos

Research Associate

Short CV

Awards

R&D Projects

2015 - Present

2016 - 2019

2013 - 2016

DRVS

2013 - 2015

2014 - 2015

2012

CITO

2008 - 2012

2007 - 2008

POLYTIMO

Competitions

Nikolaos Stamatopoulos has co-organized the following competitions

International Conference on Document Analysis and Recognition 2017 (ICDAR2017)

International Conference on Frontiers in Handwriting Recognition 2014 (ICFHR2014)

International Conference on Document Analysis and Recognition 2013 (ICDAR2013)

International Conference on Frontiers in Handwriting Recognition 2012 (ICFHR2012)

International Conference on Document Analysis and Recognition 2011 (ICDAR2011)

International Conference on Frontiers in Handwriting Recognition 2010 (ICFHR2010)

International Conference on Document Analysis and Recognition 2009 (ICDAR2009)

International Conference on Document Analysis and Recognition 2007 (ICDAR2007)

Publications

N. Stamatopoulos, "Optical Process and Analysis of Historical Documents", 2011. (in Greek)

An extended abstract of the dissertation in English can be found here

N. Stamatopoulos, G. Louloudis and B. Gatos, Handwriting Segmentation, in the book “Document Analysis and Text Recognition: Benchmarking State-of-the-Art Systems", World Scientific Publishing Co., ISBN: 978-981-3229-26-6, 2018. Link

[Abstract]

G. Louloudis, N. Stamatopoulos and B. Gatos, Writer Identification, in the book “Document Analysis and Text Recognition: Benchmarking State-of-the-Art Systems", World Scientific Publishing Co., ISBN: 978-981-3229-26-6, 2018. Link

[Abstract]

B. Gatos, G. Louloudis, N. Stamatopoulos and G. Sfikas, Historical Document Processing, in the book “Handwriting: Recognition, Development and Analysis", Nova Science Publishers, ISBN: 978-1-53611-957-2, 2017. Link

[Abstract]

G. Mühlberger, L. Seaward, ... , N. Stamatopoulos, ... , H. Wurster and K. Zagoris, “Transforming Scholarship in the Archives Through Handwritten Text Recognition: Transkribus as a Case Study”, Journal of Documentation, vol. 75, no. 5, pp. 954-976, 2019. impact factor: 0.853

[Abstract]

[Cited By]

G. Retsinas, G. Louloudis, N. Stamatopoulos and B. Gatos, “Efficient Learning-Free Keyword Spotting”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1587-1600, 2019. impact factor: 8.329

[Abstract]

[Cited By]

N. Stamatopoulos, B. Gatos and I. Pratikakis, “Performance Evaluation Methodology for Document Image Dewarping Techniques”, IET Image Processing, vol. 6, no. 6, pp. 738-745, 2012. impact factor: 0.895

[Abstract]

[Cited By]

N. Stamatopoulos, B. Gatos, I. Pratikakis and S.J. Perantonis, “Goal-oriented Rectification of Camera-Based Document Images”, IEEE Transactions on Image Processing, vol. 20, no. 4, pp. 910-920, 2011. impact factor: 3.042

[Abstract]

[Cited By]

B. Gatos, N. Stamatopoulos and G. Louloudis, "ICDAR2009 Handwriting Segmentation Contest”, International Journal on Document Analysis and Recognition (IJDAR) vol. 14, no. 1, pp. 25-33, 2011. impact factor: 1.03

[Abstract]

[Cited By]

N. Nikolaou, M. Makridis, B. Gatos, N. Stamatopoulos and N. Papamarkos, “Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths”, Image and Vision Computing, vol. 28, no. 4, pp. 590-604, 2010. impact factor: 1.525

[Abstract]

[Cited By]

N. Stamatopoulos, B. Gatos and S.J. Perantonis, “A Method for Combining Complementary Techniques for Document Image Segmentation”, Pattern Recognition Journal, vol. 42, no. 12, pp. 3158-3168, 2009. impact factor: 2.554

[Abstract]

[Cited By]

G. Retsinas, G. Louloudis, N. Stamatopoulos, G. Sfikas and B. Gatos, “An Alternative Deep Feature Approach to Line Level Keyword Spotting”, The IEEE Conference on Computer Vision and Pattern Recognition (CVPR'19), pp. 12658-12666, California, USA, 2019.

[Abstract]

[Cited By]

G. Retsinas, G. Sfikas, G. Louloudis, N. Stamatopoulos and B. Gatos, “Compact Deep Descriptors for Keyword Spotting”, 16th International Conference on Frontiers in Handwriting Recognition (ICFHR'18), pp. 315-320, Niagara Falls, USA, 2018.

[Abstract]

[Cited By]

G. Retsinas, G. Sfikas, N. Stamatopoulos, G. Louloudis and B. Gatos, “Exploring critical aspects of CNN-based Keyword Spotting. A PHOCNet study”, 13th IAPR International Workshop on Document Analysis Systems (DAS'18), pp. 13-18, Vienna, Austria, 2018.

[Abstract]

[Cited By]

G. Retsinas, G. Louloudis, N. Stamatopoulos, G. Sfikas and B. Gatos, “Nonlinear Manifold Embedding on Keyword Spotting using t-SNE”, 14th International Conference on Document Analysis and Recognition (ICDAR'17), pp. 487-492, Kyoto, Japan, 2017.

[Abstract]

[Cited By]

S. Fiel, F. Kleber, M. Diem, V. Christlein G. Louloudis, N. Stamatopoulos and B. Gatos, “ICDAR2017 Competition on Historical Document Writer Identification (Historical-WI)”, 14th International Conference on Document Analysis and Recognition (ICDAR'17), pp. 1377-1382, Kyoto, Japan, 2017.

[Abstract]

[Cited By]

G. Louloudis, G. Sfikas, N. Stamatopoulos and B. Gatos, “Word Segmentation using the Student’s-t Distribution”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 78-83, Santorini, Greece, 2016.

[Abstract]

[Cited By]

G. Retsinas, G. Louloudis, N. Stamatopoulos and B. Gatos, “Keyword Spotting in Handwritten Documents using Projections of Oriented Gradients”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 411-416, Santorini, Greece, 2016.

[Abstract]

[Cited By]

G. Retsinas, G. Louloudis, N. Stamatopoulos and B. Gatos, “Efficient Document Image Segmentation Representation by Approximating Minimum-Link Polygons”, 12th Workshop on Document Analysis Systems (DAS'16), pp. 293-298, Santorini, Greece, 2016.

[Abstract]

[Cited By]