Results 1 to 10 of about 2,766,627 (107)
Lawformer: A pre-trained language model for Chinese legal long documents [PDF]
Legal artificial intelligence (LegalAI) aims to benefit legal systems with the technology of artificial intelligence, especially natural language processing (NLP).
Chaojun Xiao +4 more
exaly +2 more sources
OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents [PDF]
Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal benchmarks.
Hugo Laurenccon +11 more
semanticscholar +1 more source
Named Entity Recognition and Classification in Historical Documents: A Survey [PDF]
After decades of massive digitisation, an unprecedented number of historical documents are available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it
Maud Ehrmann +4 more
semanticscholar +1 more source
A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents [PDF]
Neural abstractive summarization models have led to promising results in summarizing relatively short documents. We propose the first model for abstractive summarization of single, longer-form documents (e.g., research papers). Our approach consists of a
Arman Cohan +6 more
semanticscholar +1 more source
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents [PDF]
We present a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms. The dataset comprises 199 real, fully annotated, scanned forms.
Guillaume Jaume, H. K. Ekenel, J. Thiran
semanticscholar +1 more source
SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents [PDF]
We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art.
Ramesh Nallapati +2 more
semanticscholar +1 more source
TLDR: Extreme Summarization of Scientific Documents [PDF]
We introduce TLDR generation, a new form of extreme summarization, for scientific papers. TLDR generation involves high source compression and requires expert background knowledge and understanding of complex domain-specific language. To facilitate study
Isabel Cachola +3 more
semanticscholar +1 more source
Key-Value Memory Networks for Directly Reading Documents [PDF]
Directly reading documents and being able to answer questions from them is an unsolved challenge. To avoid its inherent difficulty, question answering (QA) has been directed towards using Knowledge Bases (KBs) instead, which has proven effective ...
Alexander H. Miller +5 more
semanticscholar +1 more source
Objectius. Analitzar l'existència i el contingut de documents de polítiques de col·lecció i criteris de selecció de les col·leccions digitals patrimonials espanyoles.
Estivill Rius, Assumpció +2 more
doaj +1 more source
Constructing Datasets for Multi-hop Reading Comprehension Across Documents [PDF]
Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine comprehension methods ...
Johannes Welbl +2 more
semanticscholar +1 more source

