Results 11 to 20 of about 1,551,832 (303)

Quantifying the visual concreteness of words and topics in multimodal datasets [PDF]

open access: greenProceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 2018
NAACL HLT 2018, 14 pages, 6 figures, data available at http://www.cs.cornell.edu/~jhessel/concreteness/concreteness ...
Jack Hessel, David Mimno, Lillian Lee
openalex   +5 more sources

MMT: A Multilingual and Multi-Topic Indian Social Media Dataset [PDF]

open access: greenProceedings of the First Workshop on Cross-Cultural Considerations in NLP (C3NLP), 2023
Social media plays a significant role in cross-cultural communication. A vast amount of this occurs in code-mixed and multilingual form, posing a significant challenge to Natural Language Processing (NLP) tools for processing such information, like language identification, topic modeling, and named-entity recognition.
Dwip Dalal   +2 more
  +6 more sources

TACO: Topics in Algorithmic COde generation dataset

open access: green, 2023
We introduce TACO, an open-source, large-scale code generation dataset, with a focus on the optics of algorithms, designed to provide a more challenging training dataset and evaluation benchmark in the field of code generation models. TACO includes competition-level programming questions that are more challenging, to enhance or evaluate problem ...
Rongni Li   +8 more
openalex   +4 more sources

Diversity Over Size: On the Effect of Sample and Topic Sizes for Argument Mining Datasets

open access: greenProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2022
The task of Argument Mining, that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large Argument Mining datasets are rare and recognition of argument components requires expert knowledge.
Benjamin Schiller   +2 more
openalex   +4 more sources

Multilingual Topic Classification in X: Dataset and Analysis [PDF]

open access: greenProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
In the dynamic realm of social media, diverse topics are discussed daily, transcending linguistic boundaries. However, the complexities of understanding and categorising this content across various languages remain an important challenge with traditional techniques like topic modelling often struggling to accommodate this multilingual diversity.
Dimosthenis Antypas   +3 more
openalex   +3 more sources

Topic modeling for cluster analysis of large biological and medical datasets [PDF]

open access: goldBMC Bioinformatics, 2014
The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors.
Weizhong Zhao, Wen Zou, James J. Chen
openalex   +5 more sources

A social and news media benchmark dataset for topic modeling

open access: yesData in Brief, 2022
Topic modeling is an active research area with several unanswered questions. The focus of recent research in this area is on the use of a vector embedding representation of the input text with both generative and evolutionary topic modeling techniques ...
Samuel Miles   +4 more
doaj   +3 more sources

Introducing a global dataset on conflict forecasts and news topics [PDF]

open access: goldData & Policy
AbstractThis article provides a structured description of openly available news topics and forecasts for armed conflict at the national and grid cell level starting January 2010. The news topics, as well as the forecasts, are updated monthly at conflictforecast.org and provide coverage for more than 170 countries and about 65,000 grid cells of size 55 ×
Hannes Mueller   +2 more
openalex   +5 more sources

Topic-Conversation Relevance (TCR) Dataset and Benchmarks [PDF]

open access: green
To be published in 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and ...
Yaran Fan   +3 more
openalex   +3 more sources

Detecting Similar Linked Datasets Using Topic Modelling

open access: bronze, 2016
The Web of data is growing continuously with respect to both the size and number of the datasets published. Porting a dataset to five-star Linked Data however requires the publisher of this dataset to link it with the already available linked datasets.
Michael Röder   +3 more
openalex   +3 more sources

Home - About - Disclaimer - Privacy