UI PATH: HOW TO USE OCR (OPTICAL CHARACTER RECOGNITION) IN REAL TIME
The OCR Activity is the most used activity nowadays for extracting content from the website, Image, Scanned PDF, Hand Written Text and so on.
Extracting information or data from images, scanned documents, or PDFs is a very tedious job. Normal activities are not recommended for extracting these types of inputs. OCR uses a different method and approach to extract the information.
What is OCR?
OCR also known as Optical Character Recognition is a technology which helps professionals to convert various types of documents, such as scanned paper, images captured by digital camera into editable data and PDF files. Using an OCR software enables to single out letters on the images; putting them into words and then forming sentences.This provides easy access and edit of the original document content. Powered with better search capabilities and optical character recognition for scanned documents, enterprise content management solution providers can produce the best OCR software for the business using full-text search and document management capabilities.
TYPES OF OCR‘S:
There are mainly two types of OCR available in UI Path Studio:
2. Google OCR
These OCRs are available as the individual activities and also used internally in the screen scraping tool. You can select the required OCR according to the purpose. We will discuss about them in detail in this blog further.
Microsoft’s OCR is known as MODI, and Google’s OCR is called Tesseract. OCR is not limited to only these two types of OCR. You are free to use another type of OCR. There are many different flavors of OCR available like third party activities.
Fig. – OCR engines in UI Path
It accepts only the image variables on which we want to perform our OCR activities like GET OCR TEXT etc.
Extract Words: If this check box is selected, the on-screen position of each detected word is extracted.
Language: This is used to specify the language used in the image for better extraction. It should be mentioned with full name as “English” etc.,
Profile: The profile contains four options about what the image is
- None: Does not apply a Pre-processing profile.
- Screen: Pre-processing suitable for remote desktop applications.
- Scan: Pre-processing suitable for scanned files.
- Legacy: Uses the engine’s default settings for Pre-processing images, this is the default option.
Scale: The scaling factor of the selected UI element or image. The higher the number is, the more you enlarge the image. This can provide a better OCR read and it is recommended with small images.
Text: The extracted string. This field supports only String variables.
Result: The extracted words along with their on-screen position. This field supports only KeyValuePair <rectangle, string>variables.
- Multiple languages are supported by default.
- It is suitable for extracting text from a large area and works very fine if the scale is increased.
Google’s OCR is called Tesseract.
The Properties of the Tesseract OCR are same as the Microsoft OCR but some more options are given for Tesseract OCR Engine.
Allowed Characters: The OCR engine extracts the given string according to the characters specified here.
Denied Characters: The OCR engine extracts the given string without taking into account the characters specified here.
Invert: If this check box is selected, the colors of the UI element are inverted before scraping. This is useful when the background is darker than the text color.
These are the other options available for Tesseract OCR which are not present for Microsoft OCR.
- Multiple language support can be added in Google OCR.
- It is suitable for extracting the text from a small area.
- It has full support for color inversion.
- It can filter only allowed characters.
Microsoft Azure Computer Vision OCR:
This OCR uses the Microsoft Azure Computer Vision OCR engine for extracting the specified string from the image.
This OCR engine is capable of extracting the text even if the image is non classified image like contains hand written text, graphs, images etc.
API Key: The API key used to provide you access to the Microsoft Azure Computer Vision OCR. This OCR engine requires to have the azure account for accessing the computer vision features.
End Point: The endpoint associated with your Microsoft Azure Computer Vision OCR API key. This field supports only strings and string variables.
Handwriting Recognition: This is a Boolean check box. If this is checked, then the OCR engine will extract the hand written text in the image. If unchecked, it will ignore the hand written text.
- It works perfectly for the classified images without any issues.
- It even works decent if the image is non classified.
- I used for the extraction of the scanned hand written text and its accurate.
- We can use the computer vision features if we have Azure account, then the API key and End point pretty easy to get.
Microsoft Project Oxford Online OCR:
It extracts a string and its information from an indicated UI element or image using the MODI Microsoft Cloud OCR engine. It can be used with other OCR activities, such as Click OCR Text, Hover OCR Text, Get OCR Text etc.
API Key: The API key used to provide you access to the Microsoft Cloud OCR.
This OCR connects with the Microsoft Cloud for performing the extracting features of the OCR. It helps in the more specific extraction of the text and the position of the text.
Google Cloud Vision OCR:
It extracts a string and its information from an indicated UI element or image using the Google Cloud OCR engine.
It gives faster and precise results when compared with the Tesseract OCR engine and is connected with the Cloud.
ResizeToMaxLimitIfNecessary: When selected, the engine attempts downsizing the target image so that it does not exceed the size limit of the Google Cloud Vision engine. By default, this check box is cleared.
It works same as the Microsoft Cloud OCR and works better on the smaller images and comparatively faster than the Microsoft OCR.
This OCR is the third party OCR which is famous for extracting the text more accurate and faster than the other OCR’s available and with many options even for the different kinds of documents.
Correct Orientation: If selected, the page orientation is detected by the engine, and if needed, is corrected automatically. By default, this check box is selected.
Correct Skew: Detects whether the page is skewed and automatically corrects it. The drop-down contains three options,
- Auto – deskews only images that are detected as being skewed.
- Yes – forces deskew on all pages.
- No – does not automatically deskew pages
- By default, this property is set to Auto.
Custom Recognition Profile Path: The full path to a custom built Recognition Profile. This field supports only strings and String variables.
FineReader Version: Specifies which version of the Fine Reader Engine is to be used. The options are FineReader Engine 11 and FineReader Engine 12. By default, this property is set to FineReader Engine 12.
Predefined Recognition Profile: Specifies the Predefined Recognition Profile that is to be used when processing an image. This field supports only strings and String variables. The Predefined Recognition Profiles present in ABBYY are present in this link.
Confidence: The resulting confidence score, stored in an Int32 variable. This field supports only Int32 variables.
The other properties are similar to the other OCR’s that are available in UI path.
- This OCR helps in giving accurate and fast results.
- It contains features for converting the TIFF and JPEG into searchable PDF and PDF/A, and extract data or text from photos or screenshots.
- It can support multiple languages effectively and accurately.
- ABBYY FineReader Engine SDK is required.
- The engine only works with a license distributed by the UI Path sales department.
ABBYY Cloud OCR:
This OCR is accessible only when subscribe to the ABBYY Cloud and then we can use the features given by the ABBYY Cloud platform.
ApplicationID – The application ID provided when subscribing to the ABBYY Cloud OCR service.
Password – The password provided when subscribing to the ABBYY Cloud OCR service.
ServerUrl – The Server URL provided when subscribing to the ABBYY Cloud OCR service.
This OCR engine gives better result and has many options or features to perform on the different type of documents.
- Among all the OCR engines the Cloud OCR engines produce accurate results.
- These OCR engines are also used with other OCR activities (Click OCR Text, Hover OCR Text, Double Click OCR Text, Get OCR Text, Find OCR Text Position).
- These OCR are used in the recording wizards like Screen Scrapping, Citrix etc.,
- Accordingly, the best OCR engine with many options and fast and accurate is ABBY OCR engine and Microsoft Azure computer vision OCR engine.