LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking [PDF]
Self-supervised pre-training techniques have achieved remarkable progress in Document AI. Most multimodal pre-trained models use a masked language modeling objective to learn bidirectional representations on the text modality, but they differ in pre ...
Yupan Huang+4 more
semanticscholar +1 more source
LinkBERT: Pretraining Language Models with Document Links [PDF]
Language model (LM) pretraining captures various knowledge from text corpora, helping downstream tasks. However, existing methods such as BERT model a single document, and do not capture dependencies or knowledge that span across documents. In this work,
Michihiro Yasunaga+2 more
semanticscholar +1 more source
Document-Level Event Argument Extraction by Conditional Generation [PDF]
Event extraction has long been treated as a sentence-level task in the IE community. We argue that this setting does not match human informative seeking behavior and leads to incomplete and uninformative extraction results.
Sha Li, Heng Ji, Jiawei Han
semanticscholar +1 more source
The semantic geometry of defamation of God and the role of ontology in its drawing [PDF]
Slandering God is attributing something unjustly or without knowledge to God Almighty or describing Him to things that do not exist in Him. The purpose of this research is to discover the semantic system of defamation of God in the Holy Quran with the ...
Hossein Hassanzadeh
doaj +1 more source
Efficient Attentions for Long Document Summarization [PDF]
The quadratic computational and memory complexities of large Transformers have limited their scalability for long document summarization. In this paper, we propose Hepos, a novel efficient encoder-decoder attention with head-wise positional strides to ...
L. Huang+4 more
semanticscholar +1 more source
A Neural Corpus Indexer for Document Retrieval [PDF]
Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be directly optimized for the final retrieval target. In this paper, we aim to show that an end-to-end deep neural network unifying
Yujing Wang+15 more
semanticscholar +1 more source
Research on Crack Resistance of Semi-flexible Pavement Materials [PDF]
Semi-flexible pavement has been widely used in China's road construction due to its excellent rutting resistance. Due to the large difference in volume stability between the matrix asphalt mixture and the cement mortar, the internal stress of the semi ...
Jiahong Wu
doaj +1 more source
DocFormer: End-to-End Transformer for Document Understanding [PDF]
We present DocFormer - a multi-modal transformer based architecture for the task of Visual Document Understanding (VDU). VDU is a challenging problem which aims to understand documents in their varied formats (forms, receipts etc.) and layouts.
Srikar Appalaraju+4 more
semanticscholar +1 more source
Kommunikativ-pragmatische Determiniertheit des internationalen Dokuments „Charta“ (am Beispiel der Europäischen Charta der Regional- oder Minderheitensprachen) [PDF]
The traditional interpretation of the term “non-fiction/official” includes a number of features that are often mandatory for their texts. In modern linguistics the non-fiction/official prose is interpreted differently, which can be traced back to the ...
Svitlana Ivanenko
doaj +1 more source
Error correction of semantic mathematical expressions based on bayesian algorithm
The semantic information of mathematical expressions plays an important role in information retrieval and similarity calculation. However, a large number of presentational expressions in the presentation MathML format contained in electronic scientific ...
Xue Wang +3 more
doaj +1 more source