Duplicate detection - Open Access .click

Results 1 to 10 of about 566,348 (288)

Duplicate Detection in Probabilistic Data [PDF]

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010), 2009
Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed.
Keijzer, Ander de +3 more
core +16 more sources

Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module. [PDF]

Syst Rev, 2015
BACKGROUND: A major problem arising from searching across bibliographic databases is the retrieval of duplicate citations. Removing such duplicates is an essential task to ensure systematic reviewers do not waste time screening the same citation multiple
Rathbone J, Carter M, Hoffmann T, Glasziou P. +3 more
europepmc +3 more sources

A Novel Two-Step Classification Approach for Runtime Performance Improvement of Duplicate Bug Report Detection [PDF]

Computer and Knowledge Engineering, 2023
Duplicate Bug Report Detection (DBRD) is one of the famous problems in software triage systems like Bugzilla. There are two main approaches to this problem, including information retrieval and machine learning.
Behzad Soleimani Neysiani, Seyed Morteza Babamir +1 more
doaj +1 more source

Missing values compensation in duplicates detection using hot deck method

Journal of Big Data, 2021
Duplicate record is a common problem within data sets especially in huge volume databases. The accuracy of duplicate detection determines the efficiency of duplicate removal process.
Abdulrazzak Ali, Nurul A. Emran, Siti A. Asmai +2 more
doaj +1 more source

Progressive Duplicate Detection [PDF]

IEEE Transactions on Knowledge and Data Engineering, 2015
Duplicate detection is the process of identifying multiple representations of same real world entities. Today, duplicate detection methods need to process ever larger datasets in ever shorter time: maintaining the quality of a dataset becomes increasingly difficult.
Thorsten Papenbrock, Arvid Heise, Felix Naumann +2 more
openaire +1 more source

A Record Linkage-Based Data Deduplication Framework with DataCleaner Extension

Multimodal Technologies and Interaction, 2022
The data management process is characterised by a set of tasks where data quality management (DQM) is one of the core components. Data quality, however, is a multidimensional concept, where the nature of the data quality issues is very diverse.
Otmane Azeroual +5 more
doaj +1 more source

Forum Duplicate Question Detection by Domain Adaptive Semantic Matching

IEEE Access, 2020
Community Question Answering (CQA) forums, such as Stack Overflow, Stack Exchange and Massive Open Online Course (MOOC) forums, spend a lot of manpower and time to manage duplicate questions on the forum.
Zhuojia Xu, Hua Yuan
doaj +1 more source

HDL-ODPRs: A Hybrid Deep Learning Technique Based Optimal Duplication Detection for Pull-Requests in Open-Source Repositories

Applied Sciences, 2022
Recently, open-source repositories have grown rapidly due to volunteer contributions worldwide. Collaboration software platforms have gained popularity as thousands of external contributors have contributed to open-source repositories.
Saud S. Alotaibi
doaj +1 more source

Research on Stateless Address Auto-Confi guration and Duplicate Address Detection in Space-Integrated-Ground Information Network

天地一体化信息网络, 2021
LEO (Low Earth Orbit) satellites with high mobility in space-integrated-ground information network cause ground user terminals change connected satellites continuously, so user terminals can't keep stable IPv6 addresses.SLAAC (Stateless Address Auto ...
Yazheng CHEN, Hewu LI
doaj +2 more sources

Detection Of Duplicate And Near-Duplicate Content For Web Crawlers

JISR on Computing, 2015
There is an abundance of duplicated web documents on the internet. For example, two documents online could be very similar to each other except for a very small portion, such as URLs and advertisements.
Hadi Hussain Khan, Husnain Mansoor Ali
doaj +1 more source

deep learning
deduplication
natural language processing

information retrieval
data quality
data cleaning

transfer learning
ddc:004