Last change 3 August, 2007
XPLAB is a free pattern recognition program written for systems supporting the C library and the X Window System. The graphical user interface is based on GTK 2. XPLAB is focused on OCR but not constrained to it. Its best usage is with a trained database regarding to one or a few media. Once initial samples are trained, most of the remaining pattern can be found automatically. The package offers no any picture manipulation because this is made perfectly by many other tools. After compiling XPLAB (see the README file) enter xplab -h to see all possible options, xplab -v for version and license. A man page xplab is available.
The tutorial shows the main features of XPLAB starting from the
scratch. All steps are showed for a typical usage, beginning with a
simple Training Phase up to the Recognition Phase. The only input needed
is an image suitable for pattern recognition, preferably containing
text. Alternatively the tutorial image mbb143.gif (66KB) can be used. As the web-site
uses the GIF format for images, one have to convert that format to the
PNM format after copying (see next chapter) or simply download the
PNM-formatted image mbb143.ppm (192KB).
|2 Starting XPLAB|
XPLAB starts after entering xplab -D my_db my_image. The option -D is followed by the database file name my_db and the last input parameter my_image is the name of the image, used as data source in the current session. XPLAB needs the PNM format for image inputs. If the database doesn't exist, XPLAB creates an empty database internally. After starting XPLAB, it displays its Main Log Window with more or less informative messages. All actions use the Main Log to display their infos. Together with a menu bar the Main Log Window is the controlling center of XPLAB.
XPLAB is controlled by many settings stored in a config file xplab.cfg. New users can use the default settings of XPLAB and change them later on.
|3 Training Phase|
The first step creating a new pattern recognition session is performing the Training Phase. Training is meant to get a database with a sufficient amount of trained patterns.
Pattern training is made by XPLAB with built-in tools which are controlled by the user just by pressing buttons. The intention is, no direct interaction with pixels. If that is required by some reasons, the user has to use a real image processing tool.
The training session is started by choosing the training button in the
menu bar of the Main Log Window. XPLAB prints out some messages into
the Main Log and then comes up with its Training Image Window. This
window shows the result of the binarization of the input image as well
as the separation of the binary image into bounding boxes, named in
the following as raw boxes RB. When the window is created, the
biggest RB area with a yellow background is shown
automatically. Furthermore, the window displays the training results
(if there are any). A few simple things like changing the pixel size
and so on can be made on this window using the button bar, always
involving the whole image.
At least, the real training session starts by choosing the Training Dialog in the menu bar of the Training Image Window. The Training Dialog comprises two pages. The first has the bookmark Select and is used for manual training. The second page has the bookmark Trace and can be used for semi-automatic training, if there is already a trained database. Below the pages there are the action buttons.
As default the select-page is displayed, which is used throughout the
tutorial. As mentioned above, the Training Image Window has been updated
and has colored red some pixels in the default RB.
The select-page has four frames, this includes:-
|3.1 Training simple patterns|
For a simple case, a RB is chosen containing a single pattern. This can
be done in the Pattern Selection frame by changing the RB-slider with the
mouse pointer. Alternatively it is possible to enter a RB index in the
slider's text entry. Whenever the index of a RB has been changed, the
Training Image and the Pattern Type Selection frame will be updated accordingly.
For the tutorial's example, RB No.5 was chosen which contains a bold
capital B (the example contains a clipping from the Training
Image). Red colored pixels always show the user, that they are
Before training, the user has to fill up the text entry mask in the Class Attributes Dialog frame on the top. The 'CLASS' entry contains a string as well as the 'TRANSLATION' entry. The translation string is used for the output of the recognition phase, while the class string is used to differ different patterns, even if they have an identic translation. It is up to the user, how classes are filled up with patterns. The recommendation is, to create different classes for different patterns. A class can have only one translation, however different classes can have the same translation.
The 'COMMENT' entry is for user comments and ignored fully by any algorithms.
An important input is the 'ALIGNMENT' entry, which describes a feature of the class. This is due to the fact, that pure pattern matching cannot solve all problems of (text) recognition alone and needs sometimes support by that feature. The reference point in text recognition is the baseline and the alignment entry offers four choices, how far away from the baseline a certain text pattern is located basically. For instance, a j should be trained with alignment UNDERLINE, an a should be trained with BASELINE, a hyphen with MIDDLELINE and an apostrophe with UPPERLINE as input. This feature belongs to the class and therefore belongs to all patterns of the class. This is one reason why patterns with different features should be trained to different classes which hold the right feature.
The 'PRINTABLE' entry tells XPLAB, whether a pattern is printable. For instance, the umlaut ä comprises three patterns. Only one of them (a) is printable.
The last entry 'FIXED' is for future use only and currently not processed.
The text entries cannot be filled up with whitespaces (space, line feed etc.).
The two frames Class Attributes Dialog and Pattern Selection are
independent from each other. Changing a setting in one frame doesn't
influence the other frame.
The last step is clicking the action button 'parse' twice with the mouse pointer. After the first click the button's label changes to 'TRAIN', after the second click the pattern has been trained as a reference to the database. If there is something going wrong, XPLAB complains printing an according message into the Class Attributes Dialog frame and inhibits the training of the current pattern. If nothing goes wrong, XPLAB updates the Trained Attribute Lists frame. In this frame there are two lists displayed: The left list contains the list for fragmented patterns, the right list contains the current database classes. Since we are training in this example a single pattern, only the right list is filled (actually with one entry). The new entry is filled up with the user input. Some values are colored, showing, that a new class has been trained.
The Training Image has been updated, too. All red selected pixels has
been changed to blue to indicate them as trained. Trained pixels are not
selectable again for training in this session.
The trained pattern is called a NB (named box) because the former, unknown pattern now is well known and has a name (class B-bold) with various features. Once a class is trained, its amount of members should be increased. When the user wants to train a pattern for an existing class, he has to select the right pattern as described above. The class attributes then can be easily copied to the Class Attributes Dialog by clicking the right row in the Trained Attribute Lists frame by the mouse pointer.
In this chapter the Pattern Type Selection frame could be ignored
completely. Truly, it is used for objects, which comprise more than one
NB. For XPLAB, these objects are called Compound NB (CNB).
|3.2 Training compound patterns|
The term 'compound patterns' shall indicate, that objects often consist of unconnected parts. Here the term CNB (Compound NB) is used for such an object. The parts are called NNB (Neighbored NB). The training of a CNB consists of two stages. In the first stage every NNB has to be trained as described in Ch. 3.1. In the second stage all NNBs are trained as a CNB. The next steps show the details.
As an example, the character i will be trained, which consists of two
patterns. Continuing with the session the RB No.78 in the Pattern
Selection frame is chosen using the RB slider.
The Training Image displays RB No.78 containing the (red colored) lower
part of the i, which isn't printable. Now, the user has to bring all
the compound patterns (NNB) into the focus, the yellow background area.
This is made in the Pattern Selection frame using the focus slider,
next to the RB slider. The focus slider has to be adjusted until the
i-dot is within the focus completely. The value shown in the focus text
entry is relative to the focus size and is of no interest unless to
show, that something has changed.
In our example its a good idea for the user's convenience, to get the right focus by magnifying the Training Image. XPLAB selects automatically the uppermost NNB which is the i-dot and which has to be trained. However, the order, how the patterns are trained, is unimportant. The user can select and deselect, respectively, the patterns by pressing the middle mouse button while pointing the mouse pointer to the pattern.
The Pattern Type Selection frame has been updated and shows the correct
activated button (NNB). It is always activated by XPLAB whenever the focus
changes and contains more than one NB.
In the Class Attributes Dialog the user could update the mask as shown. The reader may consider that the i-dot is not printable and hence the translation string can't never be used for an output in the recognition phase. Nevertheless, the string for translation has to be defined.
The i-dot now can be trained with the action button 'parse' as described
above. After that, the left list in the Trained Attribute Lists contains
the trained i-dot, in opposition to Ch. 3.1. Also,
its pattern is colored light blue in the Training Image instead of dark
blue. All this is done because XPLAB can only update the database
(displayed in the right list) with the complete set of a CNB. As long as
the CNB isn't trained completely, the training results are shown in an
intermediate state for coloring and listing.
The user has to choose the next NNB of i with the left (<) or right
(>) button in the Pattern Type Selection frame and then he repeats
the same procedure as for the i-dot.
After training all NNBs, the last step is to activate the CNB button in
the Pattern Type Selection frame. All NNBs will be colored red,
indicating that they are ready for training as a CNB.
Once more the Class Attributes Dialog must be updated for the CNB. The
mask in the Class Attributes Dialog may be entered with the input shown
in the screen dump. Now, we can train the CNB (activating the 'parse'
action button and so on). The clippings below show the listing in the
Trained Attribute Lists frame before and after the last training step
for the CNB. The right list in the Trained Attribute Lists will be
updated with the content of the left list, which consequently will be
Before training the CNB.
After training the CNB. Note: there is no reason to put the CNB data into
the left list.
|Important: don't forget to save the training results using the save button on the bottom of the Training Dialog window. As long as training results are not saved, they are kept within the training phase and no other tools will see them. Leaving the training session without saving the new database will loose all new data. The maintenance phase provides visualization of the trained patterns of the saved database, see Maintenance Phase.|
|3.3 Saving/Restarting a training session|
In any state of the training dialog, it is possible to save the current session to the hard disc by activating the action button 'save'. This includes storing the current Training Image as the Session Image (e.g. 'my_image.xplab_session') together with the database. The Session Image always is stored automatically after finishing the training phase. However, the database needs to be saved explicitely by the user, what should be done several times during a long session.
The Session Image only contains the marking of the trained and untrained patterns, nothing else. Therefore all other selections from the Training Image has been lost. This is made to avoid problems with untrained NNBs.
To restart a training session at the last interrupted stage, the user has to enter the Session Image instead of the original image. The original binary image is converted from the Session Image by XPLAB.
There are some requirements which have to be met to get consistency, that means, to get all trained patterns true to the whole media. These points are
|3.5 Choosing the right pattern|
After scanning the document, the user got an image in a certain format. For the very first sampled NB, it is recommended, to copy or to crop a region from the image, which have the most pretty looking or the fewest distortions of the pattern. The copy has to be converted to the PNM format, which is the only format XPLAB supports. Any preprocessing can be made on the image, however pattern consistence should be retained. If the scanned image has reasonable quality, XPLAB does all preprocessing needed as well (binarization, noise elimination).
The following recommendations were found for the training phase with highest priority on the top:
|4 Recognition Phase|
Once a database has been created, XPLAB is able to perform the recognition phase, even in the same session. Using the training results above, one has to select the matching button in the menu bar of the Main Log Window. A pop-up menu appears from which the recall button has to be activated. After that, XPLAB creates the Recall Window. It contains an initially empty text area and below three buttons with progress bars. The matching button starts the recognition which can be stopped by the stop button. After starting the recognition, its progress bar shows the processing progress. If the tutorial's current trained database is used, which contains only a few patterns, the recognition is made quickly. The recognition result is dumped to the text area. The trained NBs are recognized as well as similar patterns. Since most of the patterns have not been trained yet, we see a lot of @, which is used for unrecognized patterns. The Main Log is filled up with detailed XPLAB results.
The Recall Phase needs no further manual assistance. It can be used concurrently with the Training Phase as trained NBs are convenient to test.
Using bigger images than used in this example like full page scans, XPLAB needs several minutes for the result, depending on the pattern size, line spacing and last but not least on the computer's capability.
|5 Maintenance Phase|
The Maintenance Phase gives the user the possibility to investigate the database, managing the classes and all their settings and to make a check. Statistics shows the quality of the database. For a few parameters, the maintenance phase is the only possibility to get values into the database (e.g. image resolution).
The Maintenance Phase is entered by activating the maintenance button in the menu bar of the Main Log Window. The activation creates the Maintenance Window, which shows the database as a tree with expandable nodes. The database is read-only as default and hence not editable. The read-only status is visualized by green color. As long as an item is colored green, it is not editable. To get an editable status for the database, it must be locked by the pop-up menu Tree/lock DB in the menu bar of the Maintenance Window. However, there are still nodes not editable in the locked status. An editable node can be edited directly in its row by clicking in it with the mouse pointer.
As an example an edit session is shown for changing the Trained
Pattern Set ID (TPS ID), which has been set to its default string
IDDEF by XPLAB after creation of the database. After locking
the database, the user has to click with the mouse pointer to the ID
row, which then displays a text entry together with a drop-down list
button. The drop down list provides proposals and the history of the
Some items can be processed by actions like copy & paste, delete and create. For this the user selects the row with the mouse pointer and chooses Tools/edit (node-sensitive) in the pop-up menu.
The patterns can be visualized in own windows if the Image/on
button is enabled. It is useful to display the TPS image window during
the training phase, because it gives the user a better overview what
is really trained than the lists in the training
dialog. To get the TPS image window just click using the middle
mouse button to the TPS index row of the database tree. Only printable
patterns are visualized. For the database trained in this tutorial
the TPS image shows the two printable patterns.
Whenever the database has been changed, it should be saved to the hard disc by activating the save button in the menu bar. The database can't be unlocked as long as this isn't done, unless the user reverts to the old settings or exits XPLAB.
© 2008 Elmar Sack, firstname.lastname@example.org