Assessing the alignment between the information needs of developers and the documentation of programming languages: A case study on Rust [PDF]
Programming language documentation refers to the set of technical documents that provide application developers with a description of the high-level concepts of a language. Such documentation is essential to support application developers in the effective use of a programming language.
arxiv +1 more source
Document-level Neural Machine Translation with Document Embeddings [PDF]
Standard neural machine translation (NMT) is on the assumption of document-level context independent. Most existing document-level NMT methods are satisfied with a smattering sense of brief document-level information, while this work focuses on exploiting detailed document-level context in terms of multiple forms of document embeddings, which is ...
arxiv +1 more source
Self-Supervised Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference [PDF]
We present a novel model for the problem of ranking a collection of documents according to their semantic similarity to a source (query) document. While the problem of document-to-document similarity ranking has been studied, most modern methods are limited to relatively short documents or rely on the existence of "ground-truth" similarity labels. Yet,
arxiv +1 more source
Improving Document-Level Sentiment Classification Using Importance of Sentences [PDF]
Previous researchers have considered sentiment analysis as a document classification task, in which input documents are classified into predefined sentiment classes. Although there are sentences in a document that support important evidences for sentiment analysis and sentences that do not, they have treated the document as a bag of sentences. In other
arxiv +1 more source
Cross-Document Pattern Matching [PDF]
We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a ...
A. Andersson+14 more
core +6 more sources
Document AI: Benchmarks, Models and Applications [PDF]
Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents. It is an important research direction for natural language processing and computer vision.
arxiv
Human assessments of document similarity [PDF]
Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA).
Belkin+28 more
core +1 more source
Cross-Domain Document Layout Analysis Using Document Style Guide [PDF]
The document layout analysis (DLA) aims to decompose document images into high-level semantic areas (i.e., figures, tables, texts, and background). Creating a DLA framework with strong generalization capabilities is a challenge due to document objects are diversity in layout, size, aspect ratio, texture, etc.
arxiv
Coherence-Based Distributed Document Representation Learning for Scientific Documents [PDF]
Distributed document representation is one of the basic problems in natural language processing. Currently distributed document representation methods mainly consider the context information of words or sentences. These methods do not take into account the coherence of the document as a whole, e.g., a relation between the paper title and abstract ...
arxiv
A semi-automatic method for document classification in the shipping industry [PDF]
In the shipping industry, document classification plays a crucial role in ensuring that the necessary documents are properly identified and processed for customs clearance. OCR technology is being used to automate the process of document classification, which involves identifying important documents such as Commercial Invoices, Packing Lists, Export ...
arxiv