Results 241 to 250 of about 7,953 (261)
Some of the next articles are maybe not open access.
String similarity search and join: a survey
Frontiers of Computer Science, 2015String similarity search and join are two important operations in data cleaning and integration, which extend traditional exact search and exact join operations in databases by tolerating the errors and inconsistencies in the data. They have many real-world applications, such as spell checking, duplicate detection, entity resolution, and webpage ...
Minghe Yu 0001 +3 more
openaire +1 more source
An Efficient Similarity Join Algorithm with Cosine Similarity Predicate
2010Given a large collection of objects, finding all pairs of similar objects, namely similarity join, is widely used to solve various problems in many application domains.Computation time of similarity join is critical issue, since similarity join requires computing similarity values for all possible pairs of objects.
Dongjoo Lee +3 more
openaire +1 more source
Parallelizing String Similarity Join Algorithms
2018A key operation in data cleaning and integration is the use of string similarity join (SSJ) algorithms to identify and remove duplicates or similar records within data sets. With the advent of big data, a natural question is how to parallelize SSJ algorithms.
Ling-Chih Yao, Lipyeow Lim
openaire +1 more source
String Similarity Join with Different Thresholds
2015String similarity join is an essential operation of many applications that need to find all similar string pairs from given two collections. The existing approaches are using the uniform and predefined similarity thresholds. While in real applications, regarding that the longer string pairs typically tolerate many more typos, it is necessary to apply ...
Chuitian Rong, Xiangling Zhang
openaire +1 more source
MinJoin++: a fast algorithm for string similarity joins under edit distance
VLDB Journal, 2023Haoyu Zhang, Qin Zhang
exaly
An efficient algorithm for approximated self-similarity joins in metric spaces
Information Systems, 2020Sebastian Ferrada +2 more
exaly
Privacy preserving similarity joins using MapReduce
Information Sciences, 2019Xiaofeng Ding +2 more
exaly
VChunkJoin: An Efficient Algorithm for Edit Similarity Joins
IEEE Transactions on Knowledge and Data Engineering, 2013Jianbin Qin, Chuan Xiao, Xuemin Lin
exaly
Output-Optimal Massively Parallel Algorithms for Similarity Joins
ACM Transactions on Database Systems, 2019Ke Yi, Yufei Tao
exaly
An Experimental Survey of MapReduce-Based Similarity Joins
Lecture Notes in Computer Science, 2016Yasin N Silva, Chuitian Rong
exaly

