Results 251 to 260 of about 21,703 (289)
Ontology- and LLM-based data harmonization for federated learning in healthcare. [PDF]
Kokash N +7 more
europepmc +1 more source
The similarity join database operator
Similarity joins have been studied as key operations in multiple application domains, e.g., record linkage, data cleaning, multimedia and video applications, and phenomena detection on sensor networks. Multiple similarity join algorithms and implementation techniques have been proposed.
Silva, Yasin +2 more
openaire +3 more sources
Document Similarity Self-Join with MapReduce
Given a collection of objects, the Similarity Self-Join problem requires to discover all those pairs of objects whose similarity is above a user defined threshold. In this paper we focus on document collections, which are characterized by a sparseness that allows effective pruning strategies.
Ranieri Baraglia +2 more
openaire +2 more sources
Some of the next articles are maybe not open access.
Related searches:
Related searches:
String similarity search and join: a survey
Frontiers of Computer Science, 2015String similarity search and join are two important operations in data cleaning and integration, which extend traditional exact search and exact join operations in databases by tolerating the errors and inconsistencies in the data. They have many real-world applications, such as spell checking, duplicate detection, entity resolution, and webpage ...
Guoliang Li, Dong Deng, Feng Jianhua
exaly +2 more sources
2008 IEEE 24th International Conference on Data Engineering, 2008
Similarity joins have attracted significant interest, with applications in geographical information systems, astronomy, marketing analyzes, and anomaly detection. However, all the past algorithms, although highly fine-tuned, suffer an output explosion if the query range is even moderately large relative to the local data density.
Brent Bryan +2 more
openaire +1 more source
Similarity joins have attracted significant interest, with applications in geographical information systems, astronomy, marketing analyzes, and anomaly detection. However, all the past algorithms, although highly fine-tuned, suffer an output explosion if the query range is even moderately large relative to the local data density.
Brent Bryan +2 more
openaire +1 more source
Similarity joins for uncertain strings
Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, 2014A string similarity join finds all similar string pairs between two input string collections. It is an essential operation in many applications, such as data integration and cleaning, and has been extensively studied for deterministic strings. Increasingly, many applications have to deal with imprecise strings or strings with fuzzy information in them.
Manish Patil, Rahul Shah 0001
openaire +1 more source
Streaming Set Similarity Joins
2021We consider the problem of efficiently answering set similarity joins over streams. This problem is challenging both in terms of CPU cost, because similarity matching is computationally much more expensive than equality comparisons, and memory requirements, due to the unbounded nature of streams.
Lucas PacĂfico +1 more
openaire +1 more source
High-dimensional similarity joins
IEEE Transactions on Knowledge and Data Engineering, 2002Many emerging data mining applications require a similarity join between points in a high-dimensional domain. We present a new algorithm that utilizes a new index structure, called the /spl epsi/ tree, for fast spatial similarity joins on high-dimensional points.
Kyuseok Shim +2 more
openaire +1 more source

