Challenges for the Multilingual Web of Data [PDF]
The Web has witnessed an enormous growth in the amount of semantic information published in recent years. This growth has been stimulated to a large extent by the emergence of Linked Data. Although this brings us a big step closer to the vision of a Semantic Web, it also raises new issues such as the need for dealing with information expressed in ...
Gracia del Río, Jorge +5 more
openaire +5 more sources
Bounding the Number of Minimal Transversals in Tripartite 3-Uniform Hypergraphs [PDF]
We focus on the maximum number of minimal transversals in 3-partite 3-uniform hypergraphs on n vertices. Those hypergraphs (and their minimal transversals) are commonly found in database applications.
Alexandre Bazin +3 more
doaj +1 more source
Opportunities and risks of disaster data from social media: a systematic review of incident information [PDF]
Compiling and disseminating information about incidents and disasters are key to disaster management and relief. But due to inherent limitations of the acquisition process, the required information is often incomplete or missing altogether. To fill these
M. Wiegmann +6 more
doaj +1 more source
Visual Summary Identification From Scientific Publications via Self-Supervised Learning
The exponential growth of scientific literature yields the need to support users to both effectively and efficiently analyze and understand the some body of research work.
Shintaro Yamamoto +4 more
doaj +1 more source
Using logical constraints to validate statistical information about disease outbreaks in collaborative knowledge graphs: the case of COVID-19 epidemiology in Wikidata [PDF]
Urgent global research demands real-time dissemination of precise data. Wikidata, a collaborative and openly licensed knowledge graph available in RDF format, provides an ideal forum for exchanging structured data that can be verified and consolidated ...
Houcemeddine Turki +10 more
doaj +2 more sources
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only [PDF]
Large language models are commonly trained on a mixture of filtered web data and curated high-quality corpora, such as social media conversations, books, or technical papers.
Guilherme Penedo +8 more
semanticscholar +1 more source
Predicting Master’s students’ academic performance: an empirical study in Germany
The tremendous growth in electronic educational data creates the need to have meaningful information extracted from it. Educational Data Mining (EDM) is an exciting research area that can reveal valuable knowledge from educational databases.
Sarah Alturki +2 more
doaj +1 more source
A Pretrainer’s Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity [PDF]
Pretraining data design is critically under-documented and often guided by empirically unsupported intuitions. We pretrain models on data curated (1) at different collection times, (2) with varying toxicity and quality filters, and (3) with different ...
S. Longpre +10 more
semanticscholar +1 more source
Collecting a Large Scale Dataset for Classifying Fake News Tweets Using Weak Supervision
The problem of automatic detection of fake news in social media, e.g., on Twitter, has recently drawn some attention. Although, from a technical perspective, it can be regarded as a straight-forward, binary classification problem, the major challenge is ...
Stefan Helmstetter, Heiko Paulheim
doaj +1 more source
Learning Human Activity From Visual Data Using Deep Learning
Advances in wearable technologies have the ability to revolutionize and improve people’s lives. The gains go beyond the personal sphere, encompassing business and, by extension, the global economy.
Taha Alhersh +3 more
doaj +1 more source

