Results 101 to 110 of about 29,591 (137)
Some of the next articles are maybe not open access.

A survey of OCR evaluation tools and metrics

HIP@ICDAR, 2021
The millions of pages of historical documents that are digitized in libraries are increasingly used in contexts that have more specific requirements for OCR quality than keyword search.
Clemens Neudecker   +5 more
semanticscholar   +1 more source

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Conference on Empirical Methods in Natural Language Processing
Structure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for Visual Document Understanding are equipped with text recognition ability ...
Anwen Hu   +10 more
semanticscholar   +1 more source

DeepSeek-OCR: Contexts Optical Compression

arXiv.org
We present DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping. DeepSeek-OCR consists of two components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder.
Haoran Wei, Yaofeng Sun, Yukun Li
semanticscholar   +1 more source

Ocean-OCR: Towards General OCR Application via a Vision-Language Model

arXiv.org
Multimodal large language models (MLLMs) have shown impressive capabilities across various domains, excelling in processing and understanding information from multiple modalities.
Song Chen   +12 more
semanticscholar   +1 more source

TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document

arXiv.org
We present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks. Our approach introduces enhancement across several dimensions: By adopting Shifted Window Attention with zero-initialization, we achieve cross-window connectivity at ...
Yuliang Liu   +6 more
semanticscholar   +1 more source

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

arXiv.org
Traditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters.
Haoran Wei   +11 more
semanticscholar   +1 more source

Digitization of Data from Invoice using OCR

International Conference Computing Methodologies and Communication, 2022
Optical Character Recognition (OCR) is a predominant aspect to transmute scanned images and other visuals into text. Computer vision technology is extrapolated onto the system to enhance the text inside the digitized image.
Venkata Naga Sai Rakesh Kamisetty   +5 more
semanticscholar   +1 more source

Improving OCR-based Image Captioning by Incorporating Geometrical Relationship

Computer Vision and Pattern Recognition, 2021
OCR-based image captioning aims to automatically describe images based on all the visual entities (both visual objects and scene text) in images. Compared with conventional image captioning, the reasoning of scene text is required for OCR-based image ...
Jing Wang   +4 more
semanticscholar   +1 more source

Beyond OCR + VQA: Involving OCR into the Flow for Robust and Accurate TextVQA

ACM Multimedia, 2021
Text-based visual question answering (TextVQA) requires analyzing both the visual contents and texts in an image to answer a question, which is more practical than general visual question answering (VQA). Existing efforts tend to regard optical character
Gangyan Zeng   +3 more
semanticscholar   +1 more source

Leveraging LLMs for Post-OCR Correction of Historical Newspapers

LT4HALA
Poor OCR quality continues to be a major obstacle for humanities scholars seeking to make use of digitised primary sources such as historical newspapers.
Alan Thomas, R. Gaizauskas, Haiping Lu
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy