Algorithm for OCR Text Searching

Mirzakhmet Syzdykov; Santanu Kumar Patro

Algorithm for OCR Text Searching

Авторы

Mirzakhmet Syzdykov al-Farabi Kazakh National University, Almaty, Kazakhstan https://orcid.org/0000-0002-8086-775X
Santanu Kumar Patro Berhampur University, Odisha, India

Ключевые слова:

Optical Character Recognition, Linear Algorithm, Mathematical Probability

Аннотация

The optical character recognition (OCR) is the modern global trend in the digital world. However, due to the errors in character recognition device, this could be almost impossible to obtain user friendly and readable version of the obtained data. For this reason, in this article we present the algorithm for effective text searching in extracted OCR thread. The novel technique is simply used based upon the sliding algorithm with error corrections, which could be also applied to the dictionary data: here this technique is based on the usage of window to search the matching and the Tesseract OCR is used as an OCR engine. In this article we also show that this algorithm is applicable only when the change character error is present. The overview of the past work is also given with respect to the classically known algorithms like Knuth-Morris-Pratt (KMP) or Boyer-Moore (BM). We also show that our algorithm works efficiently in linear polynomial time.

Скачивания

Данные скачивания пока недоступны.

Биография автора

Mirzakhmet Syzdykov, al-Farabi Kazakh National University, Almaty, Kazakhstan

11.09.84 г.р. 2006-2009, аспирант в Инстиутуте Проблем Информатики и Управления

Загрузки

PDF (English)

Опубликован

2021-12-20

Как цитировать

Syzdykov, M., & Kumar Patro, S. (2021). Algorithm for OCR Text Searching. ADVANCED TECHNOLOGIES AND COMPUTER SCIENCE, 1(4), 4–13. извлечено от https://atcs.iict.kz/index.php/atcs/article/view/73