Results 11 to 20 of about 1,551,832 (303)
Quantifying the visual concreteness of words and topics in multimodal datasets [PDF]
NAACL HLT 2018, 14 pages, 6 figures, data available at http://www.cs.cornell.edu/~jhessel/concreteness/concreteness ...
Jack Hessel, David Mimno, Lillian Lee
openalex +5 more sources
MMT: A Multilingual and Multi-Topic Indian Social Media Dataset [PDF]
Social media plays a significant role in cross-cultural communication. A vast amount of this occurs in code-mixed and multilingual form, posing a significant challenge to Natural Language Processing (NLP) tools for processing such information, like language identification, topic modeling, and named-entity recognition.
Dwip Dalal+2 more
+6 more sources
TACO: Topics in Algorithmic COde generation dataset
We introduce TACO, an open-source, large-scale code generation dataset, with a focus on the optics of algorithms, designed to provide a more challenging training dataset and evaluation benchmark in the field of code generation models. TACO includes competition-level programming questions that are more challenging, to enhance or evaluate problem ...
Rongni Li+8 more
openalex +4 more sources
Diversity Over Size: On the Effect of Sample and Topic Sizes for Argument Mining Datasets
The task of Argument Mining, that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large Argument Mining datasets are rare and recognition of argument components requires expert knowledge.
Benjamin Schiller+2 more
openalex +4 more sources
Multilingual Topic Classification in X: Dataset and Analysis [PDF]
In the dynamic realm of social media, diverse topics are discussed daily, transcending linguistic boundaries. However, the complexities of understanding and categorising this content across various languages remain an important challenge with traditional techniques like topic modelling often struggling to accommodate this multilingual diversity.
Dimosthenis Antypas+3 more
openalex +3 more sources
Topic modeling for cluster analysis of large biological and medical datasets [PDF]
The big data moniker is nowhere better deserved than to describe the ever-increasing prodigiousness and complexity of biological and medical datasets. New methods are needed to generate and test hypotheses, foster biological interpretation, and build validated predictors.
Weizhong Zhao, Wen Zou, James J. Chen
openalex +5 more sources
A social and news media benchmark dataset for topic modeling
Topic modeling is an active research area with several unanswered questions. The focus of recent research in this area is on the use of a vector embedding representation of the input text with both generative and evolutionary topic modeling techniques ...
Samuel Miles+4 more
doaj +3 more sources
Introducing a global dataset on conflict forecasts and news topics [PDF]
AbstractThis article provides a structured description of openly available news topics and forecasts for armed conflict at the national and grid cell level starting January 2010. The news topics, as well as the forecasts, are updated monthly at conflictforecast.org and provide coverage for more than 170 countries and about 65,000 grid cells of size 55 ×
Hannes Mueller+2 more
openalex +5 more sources
Topic-Conversation Relevance (TCR) Dataset and Benchmarks [PDF]
To be published in 38th Conference on Neural Information Processing Systems (NeurIPS 2024) Track on Datasets and ...
Yaran Fan+3 more
openalex +3 more sources
Detecting Similar Linked Datasets Using Topic Modelling
The Web of data is growing continuously with respect to both the size and number of the datasets published. Porting a dataset to five-star Linked Data however requires the publisher of this dataset to link it with the already available linked datasets.
Michael Röder+3 more
openalex +3 more sources