Results 1 to 10 of about 6,348,587 (222)

Assessing the alignment between the information needs of developers and the documentation of programming languages: A case study on Rust [PDF]

open access: yesACM Transactions on Software Engineering and Methodology (2022), 2022
Programming language documentation refers to the set of technical documents that provide application developers with a description of the high-level concepts of a language. Such documentation is essential to support application developers in the effective use of a programming language.
arxiv   +1 more source

Document-level Neural Machine Translation with Document Embeddings [PDF]

open access: yes, 2020
Standard neural machine translation (NMT) is on the assumption of document-level context independent. Most existing document-level NMT methods are satisfied with a smattering sense of brief document-level information, while this work focuses on exploiting detailed document-level context in terms of multiple forms of document embeddings, which is ...
arxiv   +1 more source

Self-Supervised Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference [PDF]

open access: yes, 2021
We present a novel model for the problem of ranking a collection of documents according to their semantic similarity to a source (query) document. While the problem of document-to-document similarity ranking has been studied, most modern methods are limited to relatively short documents or rely on the existence of "ground-truth" similarity labels. Yet,
arxiv   +1 more source

Improving Document-Level Sentiment Classification Using Importance of Sentences [PDF]

open access: yesEntropy, Vol.22(12), pp.1-11, 2020.11, 2021
Previous researchers have considered sentiment analysis as a document classification task, in which input documents are classified into predefined sentiment classes. Although there are sentences in a document that support important evidences for sentiment analysis and sentences that do not, they have treated the document as a bag of sentences. In other
arxiv   +1 more source

Cross-Document Pattern Matching [PDF]

open access: yes, 2012
We study a new variant of the string matching problem called cross-document string matching, which is the problem of indexing a collection of documents to support an efficient search for a pattern in a selected document, where the pattern itself is a ...
A. Andersson   +14 more
core   +6 more sources

Document AI: Benchmarks, Models and Applications [PDF]

open access: yesarXiv, 2021
Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents. It is an important research direction for natural language processing and computer vision.
arxiv  

Human assessments of document similarity [PDF]

open access: yes, 2010
Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA).
Belkin   +28 more
core   +1 more source

Cross-Domain Document Layout Analysis Using Document Style Guide [PDF]

open access: yesarXiv, 2022
The document layout analysis (DLA) aims to decompose document images into high-level semantic areas (i.e., figures, tables, texts, and background). Creating a DLA framework with strong generalization capabilities is a challenge due to document objects are diversity in layout, size, aspect ratio, texture, etc.
arxiv  

Coherence-Based Distributed Document Representation Learning for Scientific Documents [PDF]

open access: yesarXiv, 2022
Distributed document representation is one of the basic problems in natural language processing. Currently distributed document representation methods mainly consider the context information of words or sentences. These methods do not take into account the coherence of the document as a whole, e.g., a relation between the paper title and abstract ...
arxiv  

A semi-automatic method for document classification in the shipping industry [PDF]

open access: yesProceedings of Neptune's conference, Samudramanthan 2023 IIT Kharagpur, 2023
In the shipping industry, document classification plays a crucial role in ensuring that the necessary documents are properly identified and processed for customs clearance. OCR technology is being used to automate the process of document classification, which involves identifying important documents such as Commercial Invoices, Packing Lists, Export ...
arxiv  

Home - About - Disclaimer - Privacy