Results 1 to 10 of about 2,766,627 (107)

Lawformer: A pre-trained language model for Chinese legal long documents [PDF]

open access: yesAI Open, 2021
Legal artificial intelligence (LegalAI) aims to benefit legal systems with the technology of artificial intelligence, especially natural language processing (NLP).
Chaojun Xiao   +4 more
exaly   +2 more sources

OBELISC: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents [PDF]

open access: yesNeural Information Processing Systems, 2023
Large multimodal models trained on natural documents, which interleave images and text, outperform models trained on image-text pairs on various multimodal benchmarks.
Hugo Laurenccon   +11 more
semanticscholar   +1 more source

Named Entity Recognition and Classification in Historical Documents: A Survey [PDF]

open access: yesACM Computing Surveys, 2021
After decades of massive digitisation, an unprecedented number of historical documents are available in digital format, along with their machine-readable texts. While this represents a major step forward with respect to preservation and accessibility, it
Maud Ehrmann   +4 more
semanticscholar   +1 more source

A Discourse-Aware Attention Model for Abstractive Summarization of Long Documents [PDF]

open access: yesNorth American Chapter of the Association for Computational Linguistics, 2018
Neural abstractive summarization models have led to promising results in summarizing relatively short documents. We propose the first model for abstractive summarization of single, longer-form documents (e.g., research papers). Our approach consists of a
Arman Cohan   +6 more
semanticscholar   +1 more source

FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents [PDF]

open access: yes2019 International Conference on Document Analysis and Recognition Workshops (ICDARW), 2019
We present a new dataset for form understanding in noisy scanned documents (FUNSD) that aims at extracting and structuring the textual content of forms. The dataset comprises 199 real, fully annotated, scanned forms.
Guillaume Jaume, H. K. Ekenel, J. Thiran
semanticscholar   +1 more source

SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents [PDF]

open access: yesAAAI Conference on Artificial Intelligence, 2016
We present SummaRuNNer, a Recurrent Neural Network (RNN) based sequence model for extractive summarization of documents and show that it achieves performance better than or comparable to state-of-the-art.
Ramesh Nallapati   +2 more
semanticscholar   +1 more source

TLDR: Extreme Summarization of Scientific Documents [PDF]

open access: yesFindings, 2020
We introduce TLDR generation, a new form of extreme summarization, for scientific papers. TLDR generation involves high source compression and requires expert background knowledge and understanding of complex domain-specific language. To facilitate study
Isabel Cachola   +3 more
semanticscholar   +1 more source

Key-Value Memory Networks for Directly Reading Documents [PDF]

open access: yesConference on Empirical Methods in Natural Language Processing, 2016
Directly reading documents and being able to answer questions from them is an unsolved challenge. To avoid its inherent difficulty, question answering (QA) has been directed towards using Knowledge Bases (KBs) instead, which has proven effective ...
Alexander H. Miller   +5 more
semanticscholar   +1 more source

Les col·leccions digitals patrimonials espanyoles : polítiques de col·lecció i presentació de la col·lecció

open access: yesBiD: Textos Universitaris de Biblioteconomia i Documentació, 2010
Objectius. Analitzar l'existència i el contingut de documents de polítiques de col·lecció i criteris de selecció de les col·leccions digitals patrimonials espanyoles.
Estivill Rius, Assumpció   +2 more
doaj   +1 more source

Constructing Datasets for Multi-hop Reading Comprehension Across Documents [PDF]

open access: yesTransactions of the Association for Computational Linguistics, 2017
Most Reading Comprehension methods limit themselves to queries which can be answered using a single sentence, paragraph, or document. Enabling models to combine disjoint pieces of textual evidence would extend the scope of machine comprehension methods ...
Johannes Welbl   +2 more
semanticscholar   +1 more source

Home - About - Disclaimer - Privacy