Texts Tab

Setting up the options of WiseImage Pro OCR module is performed in the Texts tab.

The processing of raster texts takes two steps. First, WiseImage Pro searches for raster fragments containing raster texts. These fragments are named Text Areas.

Then WiseImage Pro applies the operation that is set as additional parameter of the tool Text Areas in the Recognition tab to the found raster texts.
The most complicated operation is the raster text recognition using either the built-in module of texts recognition (OCR) or an additional module from a third-party.

The WiseImage Pro OCR module recognizes raster texts and creates WiseImage Pro text objects. In addition height and rotation angle of the created texts are calculated.

Standard shipment contains two files of OCR characters templates (DEFAULT.OCR and CYRILLIC.OCR), using which the program recognizes the characters of the English alphabet, digits, punctuation marks and special characters (the first half of ASCII table). the OCR module can also be trained to recognize any other text characters.

If the OCR cannot recognize a character, then it is substituted by the “~” (tilde) in the text line. If none of the word characters are recognized, then the OCR does not generate a text object.

Text Recognition Options

Orientation – defines the accepted raster text operation.

Horizontal only Searches for horizontal text lines. The text areas will only be horizontal.
Horizontal and Vertical Searches for horizontal and vertical text lines. The text areas will only be horizontal and vertical.
Arbitrary Searches for skewed text lines. The choice of this option can lower the speed of text areas searches.

Overlapped by Graphic

If this option is on, then WiseImage Pro searches for raster texts crossing other raster objects. The choice of this option can lower the speed of text areas searches.

Standalone Letters

Allow searching for standalone text characters. If this option is off, then WiseImage Pro will not find single text characters, and will not identify graphics objects as text, such as markers, dash, etc.

Patterns

To customise the OCR, specify a set of word patterns. A pattern is a rule specifying an allowed sequence of characters within one recognized word. This list contains definitions of accepted word patterns. WiseImage Pro OCR can generate only words which correspond to one of the specified patterns (if Patterns is checked in Texts tab of Conversion Options dialog).

The Add and Delete buttons

Used to edit the list of definitions of word patterns.

Here is a formal description of word pattern definition:
“ [% [length] character type] || [letter]] … “

[%] Beginning of character sequence definition
[length] Any decimal number; absent if length is variable
[type] Character type (D,E,e,N,n,S)
[letter] Standalone letter

Character type is specified in the following way:

D Digits
E The upper case letters of English Alphabet
e The lower case letters of English Alphabet
N Capital letters of national alphabets
n Small letters of national alphabets
S Special characters (signs plus and minus, sign of equality and etc.)
%% Standalone characters “%”
[characters] Standalone characters

For example:
 The pattern “Rz%D” generates words, which start with “Rz”, followed by any sequence of digits, for example, “Rz40”, “Rz2.5”, “Rz5000”.
 The pattern “%1N%n” generates words of the national alphabet with the capital first letter.
 “ %D %% “ generates percent numbers of the following pattern:
“20 % “, “ 1100 % “, “ 12.50 % “, etc.
 “%DV” allows the generation voltage numbers of the following pattern:
“5V”, “220V”, “13.8V”, etc.

Height Table

Possible text heights can be specified in this box. If the checkbox is on, then during the generating of the recognized texts the OCR module will create text objects with heights from this list, rounding the recognized height to the nearest value specified in the list.

Character template libraries

Specifies the character template libraries that will be used during recognition. Character templates are topological models of text characters (letters, special symbols etc.), with which raster text characters are recognized.
The list contains DEFAULT.OCR and CYRILLIC.OCR files, included in the standard shipment. Using the “default” file, the OCR module can recognize characters of the English alphabet, digits, signs, punctuation marks and special symbols (the first half of ASCII table). CYRILLIC file enables recognition of all the above mentioned plus Russian characters.
 Users can also train the OCR to recognize other text characters. See “Training OCR” on page 193.
During the training process, the OCR creates character templates and writes them in the library. These can be saved as character templates in a new or existing library file.
 If using a user created file of character template libraries, then the OCR will recognize characters described in this file only.

To place on layer

Use this box to set the name of a layer, on which the texts obtained as a result of the OCR operation will be placed.

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment