Results 91 to 100 of about 29,591 (137)
Some of the next articles are maybe not open access.
Related searches:
Related searches:
OCRBench: on the hidden mystery of OCR in large multimodal models
Science China Information Sciences, 2023Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual tasks remains relatively unexplored. In this paper, we conducted a comprehensive
Yuliang Liu +9 more
semanticscholar +1 more source
OCR-Free Document Understanding Transformer
European Conference on Computer Vision, 2021Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of
Geewook Kim +9 more
semanticscholar +1 more source
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation
arXiv.org, 2023This paper presents a comprehensive evaluation of the Optical Character Recognition (OCR) capabilities of the recently released GPT-4V(ision), a Large Multimodal Model (LMM).
Yongxin Shi +7 more
semanticscholar +1 more source
OCR performance prediction using cross-OCR alignment
2015 13th International Conference on Document Analysis and Recognition (ICDAR), 2015Since 2006 the national library of France (BnF) has developed many mass digitization projects on its collections. The indexation of digital documents on Gallica (the digital library of the BnF) is done through their textual content obtained thanks to service providers that use Optical Character Recognition software (OCR). The modern technologies of OCR
Ben Salah, Ahmed +3 more
openaire +2 more sources
OCR-VQA: Visual Question Answering by Reading Text in Images
IEEE International Conference on Document Analysis and Recognition, 2019The problem of answering questions about an image is popularly known as visual question answering (or VQA in short). It is a well-established problem in computer vision.
Anand Mishra +3 more
semanticscholar +1 more source

