H-DocPro v.1


Setup - Run - Document Image Processing Components - About


_
Setup:

Step 1: Download and install the H-DocPro v.1 Application. (send an email to bgat<at>iit<dot>demokritos<dot>gr to get a username/password)  H-DocProc v.1 Release 28/5/2013
Step 2: Download and unrar the document image processing modules in the H-DocPro v.1 installation directory ([Install Dir]).   Binarization
  Border Removal
  Page Split
  Dewarping
  Enhancement
Step 3: [Optional] Download some image samples.  Image Samples



Run:

Step 1: Select the directory with your images or copy your images to directory [Install Dir]/images.  _
Step 2: Select the directory for saving the results after pressing the "Settings" button. (default save directory:  [Install Dir]/Results )         
        _

Step 3: Select one or more  document images.  _
Step 4: Define a processing workflow.
  • To add a processing module: Just click on it. You can add a module in any order. 
  • To remove a processing module: Just click again on it (at the bottom module line) or right click on the module at the workflow line and select "Remove".   
  • To change the module order: Right click on the module at the workflow line and select "Move Right" or "Move Left".
 

  _
Step 5: Select the method for every processing module by pressing "<" or ">" on every module at the workflow line. Right click on the module at the workflow line and deselect "Do not recalculate if result exists" if you want to recalculate an existing result.
               _

Step 6: Execute workflow  by pressing "Apply Processes"_
Step 7: View results on the preview window or right click on any module at the workflow line and select "View Result". If you right click on the right-most module you will view the final result otherwise you will view the intermediate results.        _

Document Image Processing Components:

Binarization
  • NCSR:  Based on "B. Gatos, I. Pratikakis and S. J. Perantonis, Adaptive Degraded Document Image Binarization, Pattern Recognition, Vol. 39, pp. 317-327, 2006"
  • FR8.1: From FineReader Engine v. 8.1. IMPORTANT NOTICES: (a) You must have the engine already intalled. (b) You must edit file [Install Dir]/temp/Binarization/FRkey.txt and add your FineReader license key code 
Border Removal
  • Auto: Based on projection profiles and connected component analysis.
  • Auto_Edit: Press inside the marked area and adjust it by draging the black points.
  _
Page Split
  • Auto: Based on "N. Stamatopoulos, B. Gatos, T. Georgiou, Page frame detection for double page document images, 9th IAPR International Workshop on Document Analysis Systems (DAS 2010), pp. 401-408, Cambridge, MA, USA, June 2010"
  • Auto_Edit: Press inside the left or right marked area and adjust it by draging the black points.
_
Dewarping
  • Auto: Based on "N. Stamatopoulos, B. Gatos, I. Pratikakis and S.J. Perantonis, Goal-oriented Rectification of Camera-Based Document Images, IEEE Transactions on Image Processing, vol. 20, no. 4, pp. 910-920, 2011." IMPORTANT NOTICE: It can be applied only to single column documents.
  • Auto_Edit: Manually correct the position of the two lines and the two curves that delimit the text area by draging the corresponding black points. Press ">" button to test the result.
  _
_
Enhancement(gray)
  • Wiener: Wiener filter is commonly used in filtering theory for image restoration. It can be applied to degraded and poor quality grayscale documents in order to eliminate noisy areas, smooth the background texture as well as enhance the contrast between background and text areas (A. Jain, Fundamentals of Digital Image Processing, Prentice-Hall, Englewood Cliffs, NJ, 1989). The windows size of the Wiener filter can be set by activating the settings window of the corresponding module (right click on the module at the workflow line and select “Settings”)  
  Wiener filter
 
Enhancement(binary)
  • Despeckle : Despeckling is the operation of removing unwanted small components of the binary image. The maximum size of the noise component to be removed can be set by activating the settings of the corresponding module. 
  initial image   Despeckled image


About:

Basilis Gatos
Nikolaos Stamatopoulos
Georgios Louloudis

Computational Intelligence Laboratory
Institute of Informatics and Telecommunications
National Center for Scientific Research (NCSR) "Demokritos"
Athens, Greece

The development of the document image processing platform and modules has 
been partially funded from the European Community's Seventh
Framework Programme under grant agreements
n° 215064 (project IMPACT) and n° 600707 (project tranScriptorium).

Contact person:

Basilis Gatos
http://www.iit.demokritos.gr/~bgat
bgat<at>iit<dot>demokritos<dot>gr