Results 1 to 10 of about 787,403 (209)

Enhancement of Short Text Clustering by Iterative Classification [PDF]

open access: yesNatural Language Processing and Information Systems25th International Conference on Applications of Natural Language to Information Systems, 2020
Short text clustering is a challenging task due to the lack of signal contained in such short texts. In this work, we propose iterative classification as a method to b o ost the clustering quality (e.g., accuracy) of short texts. Given a clustering of short texts obtained using an arbitrary clustering algorithm, iterative classification applies outlier
Rakib M, Zeh N, Jankowska M, Milios E.
europepmc   +5 more sources

TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution information [PDF]

open access: yesFrontiers in Genetics, 2023
With the exponential growth in the daily publication of scientific articles, automatic classification and categorization can assist in assigning articles to a predefined category.
Daniel Voskergian   +3 more
doaj   +2 more sources

Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis [PDF]

open access: yesFrontiers in Artificial Intelligence, 2020
With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages.
Rania Albalawi   +2 more
doaj   +2 more sources

Efficient Long-Text Understanding with Short-Text Models

open access: yesTransactions of the Association for Computational Linguistics, 2023
Transformer-based pretrained language models (LMs) are ubiquitous across natural language understanding, but cannot be applied to long sequences such as stories, scientific articles, and long documents due to their quadratic complexity. While a myriad of
Maor Ivgi, Uri Shaham, Jonathan Berant
doaj   +3 more sources

Representation Learning for Short Text Clustering [PDF]

open access: yesarXiv, 2021
Effective representation learning is critical for short text clustering due to the sparse, high-dimensional and noise attributes of short text corpus. Existing pre-trained models (e.g., Word2vec and BERT) have greatly improved the expressiveness for short text representations with more condensed, low-dimensional and continuous features compared to the ...
Yin, Hui   +4 more
arxiv   +4 more sources

Learning-based short text compression using BERT models [PDF]

open access: yesPeerJ Computer Science
Learning-based data compression methods have gained significant attention in recent years. Although these methods achieve higher compression ratios compared to traditional techniques, their slow processing times make them less suitable for compressing ...
Emir Öztürk, Altan Mesut
doaj   +3 more sources

LCSTS: A Large Scale Chinese Short Text Summarization Dataset [PDF]

open access: hybridarXiv, 2015
Automatic text summarization is widely regarded as the highly difficult problem, partially because of the lack of large text summarization data set. Due to the great challenge of constructing the large scale summaries for full text, in this paper, we introduce a large corpus of Chinese short text summarization dataset constructed from the Chinese ...
Baotian Hu, Qingcai Chen, Fangze Zhu
arxiv   +3 more sources

End-to-end Learning for Short Text Expansion [PDF]

open access: greenarXiv, 2017
Effectively making sense of short texts is a critical task for many real world applications such as search engines, social media services, and recommender systems. The task is particularly challenging as a short text contains very sparse information, often too sparse for a machine learning algorithm to pick up useful signals.
Jian Tang   +3 more
arxiv   +3 more sources

Topic Modeling over Short Texts by Incorporating Word Embeddings [PDF]

open access: greenarXiv, 2016
Inferring topics from the overwhelming amount of short texts becomes a critical but challenging task for many content analysis tasks, such as content charactering, user interest profiling, and emerging topic detecting. Existing methods such as probabilistic latent semantic analysis (PLSA) and latent Dirichlet allocation (LDA) cannot solve this prob ...
Jipeng Qiang   +3 more
arxiv   +3 more sources

Short Modern Winnebago Text with Song [PDF]

open access: yesKansas Working Papers in Linguistics, 1982
O. Hymes (1981) discusses a number of cases of hitherto overlooked implicit structuring in Amerindian narratives and song texts. His principles of analysis themselves remain largely implicit, but in general the approach seems to be to search for organizing principles which are multiply justified.
Miner, Kenneth L.
doaj   +4 more sources

Home - About - Disclaimer - Privacy