Results 101 to 110 of about 29,591 (137)
Some of the next articles are maybe not open access.
A survey of OCR evaluation tools and metrics
HIP@ICDAR, 2021The millions of pages of historical documents that are digitized in libraries are increasingly used in contexts that have more specific requirements for OCR quality than keyword search.
Clemens Neudecker +5 more
semanticscholar +1 more source
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding
Conference on Empirical Methods in Natural Language ProcessingStructure information is critical for understanding the semantics of text-rich images, such as documents, tables, and charts. Existing Multimodal Large Language Models (MLLMs) for Visual Document Understanding are equipped with text recognition ability ...
Anwen Hu +10 more
semanticscholar +1 more source
DeepSeek-OCR: Contexts Optical Compression
arXiv.orgWe present DeepSeek-OCR as an initial investigation into the feasibility of compressing long contexts via optical 2D mapping. DeepSeek-OCR consists of two components: DeepEncoder and DeepSeek3B-MoE-A570M as the decoder.
Haoran Wei, Yaofeng Sun, Yukun Li
semanticscholar +1 more source
Ocean-OCR: Towards General OCR Application via a Vision-Language Model
arXiv.orgMultimodal large language models (MLLMs) have shown impressive capabilities across various domains, excelling in processing and understanding information from multiple modalities.
Song Chen +12 more
semanticscholar +1 more source
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
arXiv.orgWe present TextMonkey, a large multimodal model (LMM) tailored for text-centric tasks. Our approach introduces enhancement across several dimensions: By adopting Shifted Window Attention with zero-initialization, we achieve cross-window connectivity at ...
Yuliang Liu +6 more
semanticscholar +1 more source
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
arXiv.orgTraditional OCR systems (OCR-1.0) are increasingly unable to meet people's usage due to the growing demand for intelligent processing of man-made optical characters.
Haoran Wei +11 more
semanticscholar +1 more source
Digitization of Data from Invoice using OCR
International Conference Computing Methodologies and Communication, 2022Optical Character Recognition (OCR) is a predominant aspect to transmute scanned images and other visuals into text. Computer vision technology is extrapolated onto the system to enhance the text inside the digitized image.
Venkata Naga Sai Rakesh Kamisetty +5 more
semanticscholar +1 more source
Improving OCR-based Image Captioning by Incorporating Geometrical Relationship
Computer Vision and Pattern Recognition, 2021OCR-based image captioning aims to automatically describe images based on all the visual entities (both visual objects and scene text) in images. Compared with conventional image captioning, the reasoning of scene text is required for OCR-based image ...
Jing Wang +4 more
semanticscholar +1 more source
Beyond OCR + VQA: Involving OCR into the Flow for Robust and Accurate TextVQA
ACM Multimedia, 2021Text-based visual question answering (TextVQA) requires analyzing both the visual contents and texts in an image to answer a question, which is more practical than general visual question answering (VQA). Existing efforts tend to regard optical character
Gangyan Zeng +3 more
semanticscholar +1 more source
Leveraging LLMs for Post-OCR Correction of Historical Newspapers
LT4HALAPoor OCR quality continues to be a major obstacle for humanities scholars seeking to make use of digitised primary sources such as historical newspapers.
Alan Thomas, R. Gaizauskas, Haiping Lu
semanticscholar +1 more source

