Results 21 to 30 of about 1,551,832 (303)
NaturalConv: A Chinese Dialogue Dataset Towards Multi-turn Topic-driven Conversation [PDF]
In this paper, we propose a Chinese multi-turn topic-driven conversation dataset, NaturalConv, which allows the participants to chat anything they want as long as any element from the topic is mentioned and the topic shift is smooth. Our corpus contains 19.9K conversations from six domains, and 400K utterances with an average turn number of 20.1. These
Xiaoyang Wang+3 more
arxiv +3 more sources
Clustering and topic modeling over tweets: A comparison over a health dataset
Twitter became the most popular form of social interactions in the healthcare domain. Thus, various teams have evaluated Twitter as an additional source where patients share information about their healthcare with the potential goal to improve their outcomes.
Juan Antonio Lossio-Ventura+4 more
openalex +5 more sources
WET: Word embedding-topic distribution vectors for MOOC video lectures dataset
In this article, we present a dataset containing word embeddings and document topic distribution vectors generated from MOOCs video lecture transcripts. Transcripts of 12,032 video lectures from 200 courses were collected from Coursera learning platform.
Zenun Kastrati+2 more
openalex +7 more sources
Twitter’s widespread popularity has made it a prime target for malicious actors exploiting trending hashtags to disseminate harmful content. This study marks the first systematic exploration of semantic consistency in tweets to detect trending ...
Insaf Kraidia+2 more
doaj +2 more sources
Topic Concentration in Query Focused Summarization Datasets
Query-Focused Summarization (QFS) summarizes a document cluster in response to a specific input query. QFS algorithms must combine query relevance assessment, central content identification, and redundancy avoidance. Frustratingly, state of the art algorithms designed for QFS do not significantly improve upon generic summarization ...
Tal Baumel+2 more
openalex +3 more sources
Unsupervised Text Topic-Related Gene Extraction for Large Unbalanced Datasets
There is a common notion that traditional unsupervised feature extraction algorithms follow the assumption that the distribution of the different clusters in a dataset is balanced.
Li Jing-Ming+5 more
doaj +3 more sources
Topic-driven Clustering for Document Datasets [PDF]
Ying Zhao, George Karypis
openalex +3 more sources
Topic selection for text classification using ensemble topic modeling with grouping, scoring, and modeling approach [PDF]
TextNetTopics (Yousef et al. in Front Genet 13:893378, 2022. https://doi.org/10.3389/fgene.2022.893378 ) is a recently developed approach that performs text classification-based topics (a topic is a group of terms or words) extracted from a Latent ...
Daniel Voskergian+2 more
doaj +2 more sources
INTERACTIVE TOOL FOR VISUALIZATION OF TOPIC MODELS [PDF]
Digital data are all around us and occurs in various forms as videos, pictures or texts. Digital documents represent the vast majority of such data. It can be e-news, social media contributions and so on.
Miroslav SMATANA+3 more
doaj +1 more source
Cancer care is complex and exists within the broader healthcare system. The CanIMPACT team sought to enhance primary cancer care capacity and improve integration between primary and cancer specialist care, focusing on breast cancer.
Patti Ann Groome+7 more
doaj +1 more source