Algorithm for OCR Text Searching

Authors

  • Mirzakhmet Syzdykov al-Farabi Kazakh National University, Almaty, Kazakhstan https://orcid.org/0000-0002-8086-775X
  • Santanu Kumar Patro Berhampur University, Odisha, India

Keywords:

Optical Character Recognition, Linear Algorithm, Mathematical Probability

Abstract

The optical character recognition (OCR) is the modern global trend in the digital world. However, due to the errors in character recognition device, this could be almost impossible to obtain user friendly and readable version of the obtained data. For this reason, in this article we present the algorithm for effective text searching in extracted OCR thread. The novel technique is simply used based upon the sliding algorithm with error corrections, which could be also applied to the dictionary data: here this technique is based on the usage of window to search the matching and the Tesseract OCR is used as an OCR engine. In this article we also show that this algorithm is applicable only when the change character error is present. The overview of the past work is also given with respect to the classically known algorithms like Knuth-Morris-Pratt (KMP) or Boyer-Moore (BM). We also show that our algorithm works efficiently in linear polynomial time.

Downloads

Download data is not yet available.

Author Biography

Mirzakhmet Syzdykov, al-Farabi Kazakh National University, Almaty, Kazakhstan

Born 11/09/84. 2006-2009, aspirant at Institute of Problems in Informatics and Control

Downloads

Published

2021-12-20

How to Cite

Syzdykov, M., & Kumar Patro, S. (2021). Algorithm for OCR Text Searching. ADVANCED TECHNOLOGIES AND COMPUTER SCIENCE, 1(4), 4–13. Retrieved from https://atcs.iict.kz/index.php/atcs/article/view/73

Issue

Section

Applied mathematics, computer science and control theory