UiPath: How to use OCR in real time?
Hari Prasad
5 mins
The Optical Character Recognition or OCR as commonly called, is the most used activity now a days for extracting the content from the website, image, scanned PDF, hand written text and so on.
Extracting information or data from images, scanned documents, or PDFs is a very tedious job. Normal activities are not recommended for extracting these types of inputs. OCR in real time uses a different methodology and approach for extracting the information. It’s advanced features allows user to transform paper documents and images into editable PDFs.
What are the types of OCRs?
There are mainly two types of OCRs available in UiPath Studio:
1. Microsoft OCR
2. Google OCR
These OCR's are available as the individual activities and also used internally in the screen scraping tool. You can select the required OCR according to the purpose, we will discuss about them in detail in this blog further.
Microsoft's OCR is known as MODI, and Google's OCR is called Tesseract. OCR is not limited to only these two types of OCR. You are free to use another type of OCR. There are different flavors of OCR available like third party activities.
OCR engines in UI Path
Microsoft OCR
Properties
Input
It accepts only the image variables on which we want to perform our OCR activities like GET OCR TEXT etc.,
Options
Extract Words - If this check box is selected, the on-screen position of each detected word is extracted.
Language - This is used to specify the language used in the image for better extraction. It should be mentioned with full name as "english" etc.,
Profile - The profile contains four options about what the image is
None - Does not apply a Pre processing profile.
Screen - Pre processing suitable for remote desktop applications .
Scan - Pre processing suitable for scanned files.
Legacy - Uses the engine's default settings for Pre processing images, this is the default option.
Scale - The scaling factor of the selected UI element or image. The higher the number is, the more you enlarge the image. This can provide a better OCR read and it is recommended with small images.
Output
Text - The extracted string. This field supports only String variables.
Result - The extracted words along with their on-screen position. This field supports only KeyValuePair <rectangle, string>variables.
My experience
Multiple languages are supported by default.
It is suitable for extracting text from a large area and works very fine if the scale is increased.
Google OCR
Google's OCR is called Tesseract.
The properties of the Tesseract OCR are same as the Microsoft OCR but some more options are given for Tesseract OCR Engine.
Options
Allowed Characters - The OCR engine extracts the given string according to the characters specified here.
Denied Characters - The OCR engine extracts the given string without taking into account the characters specified here.
Invert - If this check box is selected, the colors of the UI element are inverted before scraping. This is useful when the background is darker than the text color.
These are the other options available for Tesseract OCR which are not present for Microsoft OCR.
My experience
Multiple language support can be added in Google OCR.
Suitable for extracting the text from a small area.
It has full support for color inversion.
It can filter only allowed characters.
Microsoft Azure Computer Vision OCR
This OCR uses the Microsoft Azure Computer Vision OCR engine for extracting the Specified string from the image.
This OCR engine is capable of extracting the text even if the image is non classified image like contains hand written text, graphs, images etc.,
Logon
API Key - The API key used to provide you access to the Microsoft Azure Computer Vision OCR. This OCR engine requires to have the azure account for accessing the computer vision features.
End Point - The endpoint associated with your Microsoft Azure Computer Vision OCR API key. This field supports only strings and String variables.
Options
HandWriting Recognition - This is a Boolean check box, If this is checked then the OCR engine will extract the hand written text in the image. If unchecked it will ignore the hand written text.
My experience
It works perfectly for the classified images without any issues.
It even works decent if the image is non classified.
I used for the extraction of the scanned hand written text and its accurate.
We can use the computer vision features if we have Azure account ,then the API key and End point pretty easy to get.
Microsoft Project Oxford Online OCR:
It Extracts a string and its information from an indicated UI element or image using the MODI Microsoft Cloud OCR engine. It can be used with other OCR activities, such as Click OCR Text, Hover OCR Text , Get OCR Text etc , .
Logon:
API Key: The API key used to provide you access to the Microsoft Cloud OCR.
This OCR connects with the Microsoft cloud for performing the extracting features of the OCR. It helps in the more specific extraction of the text and the position of the text.
Google Cloud Vision OCR
It Extracts a string and its information from an indicated UI element or image using the Google Cloud OCR engine.
It gives faster and precise results when compared with the Tesseract OCR engine and is connected with the cloud.
Options
ResizeToMaxLimitIfNecessary - When selected, the engine attempts downsizing the target image so that it does not exceed the size limit of the Google Cloud Vision engine. By default, this check box is cleared.
It works same as the Microsoft cloud OCR and works better on the smaller images and comparatively faster than the Microsoft OCR.
ABBYY OCR
This OCR is the third party OCR which is famous for extracting the text more accurate and faster than the other OCR's available and with many options even for the different kinds of documents.
Options
Correct Orientation - If selected, the page orientation is detected by the engine, and if needed, is corrected automatically. By default, this check box is selected.
Correct Skew - Detects whether the page is skewed and automatically corrects it. The drop-down contains three options,
Auto - deskews only images that are detected as being skewed.
Yes - forces deskew on all pages.
No - does not automatically deskew pages
By default, this property is set to Auto.
CustomRecognitionProfilePath - The full path to a custom built Recognition Profile. This field supports only strings and String variables.
FineReaderVersion - Specifies which version of the Fine Reader Engine is to be used. The options are FineReaderEngine 11 and FineReaderEngine 12. By default, this property is set to FineReaderEngine 12.
PredefinedRecognitionProfile - Specifies the Predefined Recognition Profile that is to be used when processing an image. This field supports only strings and String variables. The Predefined Recognition Profiles present in ABBYY are present in this reference link - Predefined Recognition Profiles
Output
Confidence - The resulting confidence score, stored in an Int32 variable. This field supports only Int32 variables.
The other properties are similar to the other OCR's that are available in Ui path.
Advantages
This ocr helps in giving accurate and fast results.
It contains features for converting the TIFF and JPEG into searchable PDF and PDF/A, and extract data or text from photos or screenshots.
It can support multiple languages effectively and accurately.
NOTE:
ABBYY FineReader Engine SDK is required.
The engine only works with a license distributed by the Ui Path sales department.
ABBYY Cloud OCR
This OCR is accessible only when subscribe to the abbyy cloud and then we can use the features given by the abbyy cloud platform.
Logon
ApplicationID - The application ID provided when subscribing to the Abbyy Cloud OCR service.
Password - The password provided when subscribing to the Abbyy Cloud OCR service.
ServerUrl - The Server URL provided when subscribing to the Abbyy Cloud OCR service.
This OCR engine gives better result and has many options or features to perform on the different type of documents.
Conclusion
Among all the OCR engines the cloud OCR engines produce accurate results.
These OCR engines are also used with other OCR activities (Click OCR Text, Hover OCR Text, Double Click OCR Text, Get OCR Text, Find OCR Text Position).
These OCR are used in the recording wizards like Screen Scrapping , citrix etc.,
According to me, The best OCR engine with many options and fast and accurate is ABBY OCR engine and Microsoft Azure computer vision OCR engine.
Stay up-to-date with the latest insights and news from Sedin
Subscribe to email updates