Algorithm for OCR Text Searching
Keywords:
Optical Character Recognition, Linear Algorithm, Mathematical ProbabilityAbstract
The optical character recognition (OCR) is the modern global trend in the digital world. However, due to the errors in character recognition device, this could be almost impossible to obtain user friendly and readable version of the obtained data. For this reason, in this article we present the algorithm for effective text searching in extracted OCR thread. The novel technique is simply used based upon the sliding algorithm with error corrections, which could be also applied to the dictionary data: here this technique is based on the usage of window to search the matching and the Tesseract OCR is used as an OCR engine. In this article we also show that this algorithm is applicable only when the change character error is present. The overview of the past work is also given with respect to the classically known algorithms like Knuth-Morris-Pratt (KMP) or Boyer-Moore (BM). We also show that our algorithm works efficiently in linear polynomial time.
Downloads
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 ADVANCED TECHNOLOGIES AND COMPUTER SCIENCE
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.