Results 231 to 240 of about 7,953 (261)
Some of the next articles are maybe not open access.
High-dimensional similarity joins
IEEE Transactions on Knowledge and Data Engineering, 2002Many emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the /spl epsi/ tree, for fast spatial similarity joins on high-dimensional points.
Kyuseok Shim +2 more
openaire +1 more source
Solving similarity joins and range queries in metric spaces with the list of twin clusters
The metric space model abstracts many proximity or similarity problems, where the most frequently considered primitives are range and k-nearest neighbor search, leaving out the similarity join, an extremely important primitive. In fact, despite the great
Rodrigo Paredes, Nora Reyes
exaly +2 more sources
Proceedings of the VLDB Endowment, 2014
String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly compared under the same experimental framework.
Yu Jiang +3 more
openaire +1 more source
String similarity join is an important operation in data integration and cleansing that finds similar string pairs from two collections of strings. More than ten algorithms have been proposed to address this problem in the recent two decades. However, existing algorithms have not been thoroughly compared under the same experimental framework.
Yu Jiang +3 more
openaire +1 more source
Similarity Join in Metric Spaces
2003Similarity join in distance spaces constrained by the metric postulates is the necessary complement of more famous similarity range and the nearest neighbors search primitives. However, the quadratic computational complexity of similarity joins prevents from applications on large data collections.
Dohnal V, Gennaro C, Savino P, Zezula P
openaire +3 more sources
Efficient SimRank-Based Similarity Join
ACM Transactions on Database Systems, 2017Graphs have been widely used to model complex data in many real-world applications. Answering vertex join queries over large graphs is meaningful and interesting, which can benefit friend recommendation in social networks and link prediction, and so on.
Zheng, Weigua +3 more
openaire +2 more sources
Document Similarity Self-Join with MapReduce
2010 IEEE International Conference on Data Mining, 2010Given a collection of objects, the Similarity Self-Join problem requires to discover all those pairs of objects whose similarity is above a user defined threshold. In this paper we focus on document collections, which are characterized by a sparseness that allows effective pruning strategies.
Ranieri Baraglia +2 more
openaire +1 more source
Proceedings of the 21st ACM international conference on Information and knowledge management, 2012
Location-based services have attracted significant attention due to modern mobile phones equipped with GPS devices. These services generate large amounts of spatio-textual data which contain both spatial location and textual descriptions. Since a spatio-textual object may have different representations, possibly because of deviations of GPS or ...
Sitong Liu +2 more
openaire +1 more source
Location-based services have attracted significant attention due to modern mobile phones equipped with GPS devices. These services generate large amounts of spatio-textual data which contain both spatial location and textual descriptions. Since a spatio-textual object may have different representations, possibly because of deviations of GPS or ...
Sitong Liu +2 more
openaire +1 more source
Performance Enhanced Multiset Similarity Joins
2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), 2016The amount of data produced on a daily basis isgrowing at an exponential rate. One method of filtering throughthis data is the use of similarity joins, or methods that areused to identify similar data. Such algorithms are used fora variety of applications ranging from plagiarism detection tomarketing.
Jahnavi Yalamanchili +3 more
openaire +1 more source
Trie-based similarity search and join
Proceedings of the Joint EDBT/ICDT 2013 Workshops, 2013Driven by the increasing demands from applications such as data cleansing, integration, and bioinformatics, approximate string matching queries have gain much attention recently. In this paper, we present the design and implementation of a trie-based system which supports both string similarity search and join based on our recent work [23].
Jianbin Qin +3 more
openaire +1 more source
Quicker Similarity Joins in Metric Spaces
2013We consider the join operation in metric spaces. Given two sets A and B of objects drawn from some universe $\mathbb U$ , we want to compute the set $A \Join B = \{a,b \in A \times B\;|\;da,b \leq r\}$ efficiently, where $d : \mathbb U \times \mathbb U \to \mathbb R^+$ is a metric distance function and r∈ℝ+ is user supplied query ...
Braithwaite Billy, Fredriksson Kimmo
openaire +1 more source

