Workshop on Document Intelligence Understanding [PDF]
Document understanding and information extraction include different tasks to understand a document and extract valuable information automatically. Recently, there has been a rising demand for developing document understanding among different domains, including business, law, and medicine, to boost the efficiency of work that is associated with a large ...
arxiv
Term-Specific Eigenvector-Centrality in Multi-Relation Networks [PDF]
Fuzzy matching and ranking are two information retrieval techniques widely used in web search. Their application to structured data, however, remains an open problem.
Bry, François+3 more
core +2 more sources
PDFVQA: A New Dataset for Real-World VQA on PDF Documents [PDF]
Document-based Visual Question Answering examines the document understanding of document images in conditions of natural language questions. We proposed a new document-based VQA dataset, PDF-VQA, to comprehensively examine the document understanding from various aspects, including document element recognition, document layout structural understanding ...
arxiv
Documenting software systems using types
AbstractWe show how hypertext-based program understanding tools can achieve new levels of abstraction by using inferred type information for cases where the subject software system is written in a weakly typed language. We propose TypeExplorer, a tool for browsing Cobol legacy systems based on these types. The paper addresses (1) how types, an invented
Arie van Deursen, Leon Moonen
openaire +1 more source
An overview of the question-response system in American English conversation [PDF]
This article, part of a 10 language comparative project on question–response sequences, discusses these sequences in American English conversation.
Stivers, T.
core +2 more sources
Fourier Document Restoration for Robust Document Dewarping and Recognition [PDF]
State-of-the-art document dewarping techniques learn to predict 3-dimensional information of documents which are prone to errors while dealing with documents with irregular distortions or large variations in depth. This paper presents FDRNet, a Fourier Document Restoration Network that can restore documents with different distortions and improve ...
arxiv
Automata-based Static Analysis of XML Document Adaptation [PDF]
The structure of an XML document can be optionally specified by means of XML Schema, thus enabling the exploitation of structural information for efficient document handling. Upon schema evolution, or when exchanging documents among different collections exploiting related but not identical schemas, the need may arise of adapting a document, known to ...
arxiv +1 more source
Análisis de la producción científica de la Universidad de Salamanca indexada en SCOPUS (2010-2015)
El objetivo de este artículo es analizar la producción científica del personal docente e investigador de la Universidad de Salamanca durante el periodo 2010-2015.
Alejandro Medina-González+3 more
doaj +1 more source
Towards Just-Enough Documentation for Agile Effort Estimation: What Information Should Be Documented? [PDF]
Effort estimation is an integral part of activities planning in Agile iterative development. An Agile team estimates the effort of a task based on the available information which is usually conveyed through documentation. However, as documentation has a lower priority in Agile, little is known about how documentation effort can be optimized while ...
arxiv
OCR with Tesseract, Amazon Textract, and Google Document AI: a benchmarking experiment
Optical Character Recognition (OCR) can open up understudied historical documents to computational analysis, but the accuracy of OCR software varies. This article reports a benchmarking experiment comparing the performance of Tesseract, Amazon Textract ...
Thomas Hegghammer
semanticscholar +1 more source