Results 91 to 100 of about 285,141 (137)
Some of the next articles are maybe not open access.

Data Duplicate Detection

2018 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2018
Duplicates arise when there are multiple representations of the same real-world entities. The process of identifying these duplicates is known as duplicate detection. Duplicates may cause a lot of problems and confusion while managing the database and thus need to be eliminated.
Nikita Medidar, Manik Chavan
openaire   +1 more source

Data Preparation for Duplicate Detection

Journal of Data and Information Quality, 2020
Data errors represent a major issue in most application workflows. Before any important task can take place, a certain data quality has to be guaranteed by eliminating a number of different errors that may appear in data. Typically, most of these errors are fixed with data preparation methods, such as whitespace removal.
Koumarelas, Ioannis (Dr.)   +2 more
openaire   +1 more source

Active Duplicate Detection

2010
The aim of duplicate detection is to group records in a relation which refer to the same entity in the real world such as a person or business. Most existing works require user specified parameters such as similarity threshold in order to conduct duplicate detection. These methods are called user-first in this paper.
Ke Deng   +4 more
openaire   +2 more sources

Industry-scale duplicate detection

Proceedings of the VLDB Endowment, 2008
Duplicate detection is the process of identifying multiple representations of a same real-world object in a data source. Duplicate detection is a problem of critical importance in many applications, including customer relationship management, personal information management, or data mining.
Melanie Weis   +4 more
openaire   +1 more source

Duplicate detection in click streams

Proceedings of the 14th international conference on World Wide Web - WWW '05, 2005
We consider the problem of finding duplicates in data streams. Duplicate detection in data streams is utilized in various applications including fraud detection. We develop a solution based on Bloom Filters [9], and discuss the space and time requirements for running the proposed algorithm in both the contexts of sliding, and landmark stream windows ...
Ahmed Metwally 0001   +2 more
openaire   +1 more source

Duplicate code detection algorithm

Proceedings of the 16th International Conference on Computer Systems and Technologies, 2015
In this paper we propose an algorithm for detecting duplicate fragments of source code based on call graphs. The complexity of the proposed algorithm is estimated and the practical performance is tested using several executions of the algorithm on three different versions of open source Ant.
Todor Cholakov, Dimitar Birov
openaire   +1 more source

Detection of gene duplications and block duplications in eukaryotic genomes

Journal of Structural and Functional Genomics, 2003
Several eukaryotic genomes have been completely sequenced and this provides an opportunity to investigate the extent and characteristics (e.g., single gene duplication, block duplication, etc.) of gene duplication in a genome. Detecting duplicate genes in a genome, however, is not a simple problem because of several complications such as domain ...
Wen-Hsiung, Li   +3 more
openaire   +2 more sources

Speed up duplicate/near-duplicate image detection

Proceedings of the Second International Conference on Internet Multimedia Computing and Service, 2010
Finding duplicate and near-duplicate images plays an important role on redundancy reduction for image storage, summarization and recommendation. This paper introduces how to speed up Duplicate/Near-Duplicate(D/ND) image detection. Image clustering was first applied to partition the images into multiple groups by using coarse visual features; pair-wise ...
Chunlei Yang   +3 more
openaire   +1 more source

The detection of duplicates in document image databases

Image and Vision Computing, 1998
Abstract Document imaging technology has developed to the point where it is not uncommon for organizations to scan large numbers of documents into databases with little or no index information. This may be done for archival purposes with an index as simple as a case number, or with the ultimate goal of automatically extracting index information for ...
David S. Doermann   +2 more
openaire   +1 more source

Probabilistic Iterative Duplicate Detection

2005
The problem of identifying approximately duplicate records between databases is known, among others, as duplicate detection or record linkage. To this end, typically either rules or a weighted aggregation of distances between the individual attributes of potential duplicates is used. However, choosing the appropriate rules, distance functions, weights,
Patrick Lehti, Peter Fankhauser
openaire   +1 more source

Home - About - Disclaimer - Privacy