Duplicate detection - Open Access .click

Results 251 to 260 of about 568,068 (285)

Some of the next articles are maybe not open access.

Transforming Pairwise Duplicates to Entity Clusters for High-quality Duplicate Detection

Journal of Data and Information Quality, 2019
Duplicate detection algorithms produce clusters of database records, each cluster representing a single real-world entity. As most of these algorithms use pairwise comparisons, the resulting (transitive) clusters can be inconsistent: Not all records within a cluster are sufficiently similar to be classified as duplicate.
Draisbach, Uwe +2 more
openaire +2 more sources

The Detection Algorithms for Similar Duplicate Data

2019 6th International Conference on Systems and Informatics (ICSAI), 2019
This paper studied and analyzed three algorithms which can be used to detect similar duplicate data records. Among them, the two commonly used duplicate data detection algorithms are basic sorted-neighborhood method (SNM) and priority queue algorithm. Both of them are based on sorting-merger thought. The third one is Density-Based Spatial Clustering of
Jin-Yu Song, Quan Yu, Ruo-yu Bao
openaire +1 more source

Detecting Duplicate Pull-requests in GitHub

Proceedings of the 9th Asia-Pacific Symposium on Internetware, 2017
The widespread use of pull-requests boosts the development and evolution for many open source software projects. However, due to the parallel and uncoordinated nature of development process in GitHub, duplicate pull-requests may be submitted by different contributors to solve the same problem.
Zhixing Li +4 more
openaire +1 more source

Duplicates Detection, Counting, and Removal

1992
The need to detect and eliminate duplicate elements arises in many applications such as the processing of relational database operations, comparison of complex objects, transitive closure, and protocol verification. Given a multiset, the process of detecting duplicates, eliminating them, sorting the remaining distinct elements, and counting the number ...
openaire +1 more source

Duplicate detection algorithms of bibliographic descriptions

Library Hi Tech, 2008
Purpose – The purpose of this paper is to focus on duplicate record detection algorithms used for detection in bibliographic databases.Design/methodology/approach – Individual algorithms, their application process for duplicate detection and their results are described based on available literature (published articles), information found at various ...
Anestis Sitas, Sarantos Kapidakis
openaire +1 more source

Evaluating Indeterministic Duplicate Detection Results

2012
Duplicate detection is an important process for cleaning or integrating data. Since real-life data is often polluted, detecting duplicates usually comes along with uncertainty. To handle duplicate uncertainty in an appropriate way, indeterministic duplicate detection approaches, i.e.
Panse, Fabian, Ritter, Norbert
openaire +2 more sources

Unsupervised Duplicate Detection Using Sample Non-Duplicates

2008
The problem of identifying objects in databases that refer to the same real world entity, is known, among others, as duplicate detection or record linkage. Objects may be duplicates, even though they are not identical due to errors and missing data. Traditional scenarios for duplicate detection are data warehouses, which are populated from several data
openaire +2 more sources

Duplicate document detection by template matching

Image and Vision Computing, 2000
Abstract We discuss some operational issues pertaining to the detection of duplicates in the databases of bitmapped binary document images, and reason that efficient and effective duplicate document detection probably needs a combination of an efficient primary detector and an effective subordinate detector to be achieved.
openaire +1 more source

Duplicate Journal Title Detection in References

2009
Our research efforts are oriented towards applying text mining techniques in order to help librarians make more informative decisions when selecting learning resources to be included in the library’s offer. The proper selection of learning resources to be included in the library’s offer is one of the key factors determining the overall usefulness of ...
Kovačević, A., Devedžić, Vladan
openaire +2 more sources

Does Deep Learning improve the performance of duplicate bug report detection? An empirical study

Journal of Systems and Software, 2023
Yuan Jiang, Xiaohong Su, Christoph Treude +2 more
exaly

deep learning
ddc:004
natural language processing

data cleaning
fos: computer and information sciences
deduplication

data quality