Results 91 to 100 of about 285,141 (137)
Some of the next articles are maybe not open access.
2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2018
Duplicates arise when there are multiple representations of the same real-world entities. The process of identifying these duplicates is known as duplicate detection. Duplicates may cause a lot of problems and confusion while managing the database and thus need to be eliminated.
Nikita Medidar, Manik Chavan
openaire +1 more source
Duplicates arise when there are multiple representations of the same real-world entities. The process of identifying these duplicates is known as duplicate detection. Duplicates may cause a lot of problems and confusion while managing the database and thus need to be eliminated.
Nikita Medidar, Manik Chavan
openaire +1 more source
Data Preparation for Duplicate Detection
Journal of Data and Information Quality, 2020Data errors represent a major issue in most application workflows. Before any important task can take place, a certain data quality has to be guaranteed by eliminating a number of different errors that may appear in data. Typically, most of these errors are fixed with data preparation methods, such as whitespace removal.
Koumarelas, Ioannis (Dr.) +2 more
openaire +1 more source
2010
The aim of duplicate detection is to group records in a relation which refer to the same entity in the real world such as a person or business. Most existing works require user specified parameters such as similarity threshold in order to conduct duplicate detection. These methods are called user-first in this paper.
Ke Deng +4 more
openaire +2 more sources
The aim of duplicate detection is to group records in a relation which refer to the same entity in the real world such as a person or business. Most existing works require user specified parameters such as similarity threshold in order to conduct duplicate detection. These methods are called user-first in this paper.
Ke Deng +4 more
openaire +2 more sources
Industry-scale duplicate detection
Proceedings of the VLDB Endowment, 2008Duplicate detection is the process of identifying multiple representations of a same real-world object in a data source. Duplicate detection is a problem of critical importance in many applications, including customer relationship management, personal information management, or data mining.
Melanie Weis +4 more
openaire +1 more source
Duplicate detection in click streams
Proceedings of the 14th international conference on World Wide Web - WWW '05, 2005We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solution based on Bloom Filters [9], and discuss the space and time requirements for running the proposed algorithm in both the contexts of sliding, and landmark stream windows ...
Ahmed Metwally 0001 +2 more
openaire +1 more source
Duplicate code detection algorithm
Proceedings of the 16th International Conference on Computer Systems and Technologies, 2015In this paper we propose an algorithm for detecting duplicate fragments of source code based on call graphs. The complexity of the proposed algorithm is estimated and the practical performance is tested using several executions of the algorithm on three different versions of open source Ant.
Todor Cholakov, Dimitar Birov
openaire +1 more source
Detection of gene duplications and block duplications in eukaryotic genomes
Journal of Structural and Functional Genomics, 2003Several eukaryotic genomes have been completely sequenced and this provides an opportunity to investigate the extent and characteristics (e.g., single gene duplication, block duplication, etc.) of gene duplication in a genome. Detecting duplicate genes in a genome, however, is not a simple problem because of several complications such as domain ...
Wen-Hsiung, Li +3 more
openaire +2 more sources
Speed up duplicate/near-duplicate image detection
Proceedings of the Second International Conference on Internet Multimedia Computing and Service, 2010Finding duplicate and near-duplicate images plays an important role on redundancy reduction for image storage, summarization and recommendation. This paper introduces how to speed up Duplicate/Near-Duplicate(D/ND) image detection. Image clustering was first applied to partition the images into multiple groups by using coarse visual features; pair-wise ...
Chunlei Yang +3 more
openaire +1 more source
The detection of duplicates in document image databases
Image and Vision Computing, 1998Abstract Document imaging technology has developed to the point where it is not uncommon for organizations to scan large numbers of documents into databases with little or no index information. This may be done for archival purposes with an index as simple as a case number, or with the ultimate goal of automatically extracting index information for ...
David S. Doermann +2 more
openaire +1 more source
Probabilistic Iterative Duplicate Detection
2005The problem of identifying approximately duplicate records between databases is known, among others, as duplicate detection or record linkage. To this end, typically either rules or a weighted aggregation of distances between the individual attributes of potential duplicates is used. However, choosing the appropriate rules, distance functions, weights,
Patrick Lehti, Peter Fankhauser
openaire +1 more source

