Results 1 to 10 of about 566,348 (288)
Duplicate Detection in Probabilistic Data [PDF]
Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed.
Keijzer, Ander de +3 more
core +16 more sources
Better duplicate detection for systematic reviewers: evaluation of Systematic Review Assistant-Deduplication Module. [PDF]
BACKGROUND: A major problem arising from searching across bibliographic databases is the retrieval of duplicate citations. Removing such duplicates is an essential task to ensure systematic reviewers do not waste time screening the same citation multiple
Rathbone J +3 more
europepmc +3 more sources
A Novel Two-Step Classification Approach for Runtime Performance Improvement of Duplicate Bug Report Detection [PDF]
Duplicate Bug Report Detection (DBRD) is one of the famous problems in software triage systems like Bugzilla. There are two main approaches to this problem, including information retrieval and machine learning.
Behzad Soleimani Neysiani +1 more
doaj +1 more source
Missing values compensation in duplicates detection using hot deck method
Duplicate record is a common problem within data sets especially in huge volume databases. The accuracy of duplicate detection determines the efficiency of duplicate removal process.
Abdulrazzak Ali +2 more
doaj +1 more source
Progressive Duplicate Detection [PDF]
Duplicate detection is the process of identifying multiple representations of same real world entities. Today, duplicate detection methods need to process ever larger datasets in ever shorter time: maintaining the quality of a dataset becomes increasingly difficult.
Thorsten Papenbrock +2 more
openaire +1 more source
A Record Linkage-Based Data Deduplication Framework with DataCleaner Extension
The data management process is characterised by a set of tasks where data quality management (DQM) is one of the core components. Data quality, however, is a multidimensional concept, where the nature of the data quality issues is very diverse.
Otmane Azeroual +5 more
doaj +1 more source
Forum Duplicate Question Detection by Domain Adaptive Semantic Matching
Community Question Answering (CQA) forums, such as Stack Overflow, Stack Exchange and Massive Open Online Course (MOOC) forums, spend a lot of manpower and time to manage duplicate questions on the forum.
Zhuojia Xu, Hua Yuan
doaj +1 more source
Recently, open-source repositories have grown rapidly due to volunteer contributions worldwide. Collaboration software platforms have gained popularity as thousands of external contributors have contributed to open-source repositories.
Saud S. Alotaibi
doaj +1 more source
LEO (Low Earth Orbit) satellites with high mobility in space-integrated-ground information network cause ground user terminals change connected satellites continuously, so user terminals can't keep stable IPv6 addresses.SLAAC (Stateless Address Auto ...
Yazheng CHEN, Hewu LI
doaj +2 more sources
Detection Of Duplicate And Near-Duplicate Content For Web Crawlers
There is an abundance of duplicated web documents on the internet. For example, two documents online could be very similar to each other except for a very small portion, such as URLs and advertisements.
Hadi Hussain Khan, Husnain Mansoor Ali
doaj +1 more source

