Results 101 to 110 of about 17,818 (198)
Deduplication Methods Using Levenshtein Distance Algorithm
The study aimed to propose methods to improve the data integrity of the Relational databases such as MS SQL, MySQL and PostgreSQL via record duplication detection. The FODORS and ZAGAT Restaurant database benchmark datasets have been utilized to facilitate the processes involved in preparing and delivering high-quality data.
openaire +1 more source
One-Gapped q-Gram Filters for Levenshtein Distance [PDF]
We have recently shown that q- gram filters based on gapped q-grams instead of the usual contiguous q-grams can provide orders of magnitude faster and/or more efficient filtering for the Hamming distance. In this paper, we extend the results for the Levenshtein distance, which is more problematic for gapped q-grams because an insertion or deletion in a
Burkhardt, S., Kärkkäinen, J.
openaire +2 more sources
Matching health information seekers' queries to medical terms
Background The Internet is a major source of health information but most seekers are not familiar with medical vocabularies. Hence, their searches fail due to bad query formulation.
Soualmia Lina F +4 more
doaj +1 more source
A Hybrid Approach to Typo Correction in Indonesian Documents Using Levenshtein Distance
This study developed a typo correction system for the Indonesian language by integrating the Levenshtein Distance algorithm with empirical methods. The system is designed to improve the accuracy of typo detection and correction in Indonesian texts, which
Joseph Teguh Santoso, Song Yan
doaj +1 more source
PLD2flex: Establishing the Phonological Levenshtein Distance for Pairs or Groups of (Pseudo)Words
Establishing the phonological Levenshtein distance (PLD) of words and pseudowords is useful for various psycholinguistic research applications, such as generating stimuli for experiments on language processing or analysing the PLD between erroneous and ...
Helena Wedig +3 more
doaj +1 more source
Online gambling in Indonesia is increasingly widespread and has negative impacts, both in terms of socio-economic aspects and cybersecurity. One of the methods used by online gambling operators is inserting gambling backlinks into websites, particularly
Ismail Puji Saputra
doaj +1 more source
A method and a tool for geocoding and record linkage [PDF]
For many years, researchers have presented the geocoding of postal addresses as a challenge. Several research works have been devoted to achieve the geocoding process.
CHARIF Omar +4 more
core +1 more source
Efficiency and Penalty Factors on Monoids of Strings [PDF]
In information theory, linguistics and computer science, metrics for measuring similarity between two given strings (sequences) are important. In this article we introduce efficiency, measure of similarity and penalty for given parallel decompositions ...
Mitrofan Choban, Ivan Budanaev
doaj
Indonesia has diverse art, cultures, and languages. Linguistically, Indonesia has many local languages, which makes it a diverse country, with Javanese being the regional language with the highest number of entries in the Kamus Besar Bahasa Indonesia ...
Musthofa Galih Pradana +4 more
doaj +1 more source
Difference Measure for Controlled Random Tests
The task of constructing test sequences difference characteristics was studied. Its relevance for generating controlled random tests and complexity in finding difference measures for the case of symbolic tests were substantiated. The limitations of using
V. N. Yarmolik +2 more
doaj +1 more source

