Results 31 to 40 of about 8,758 (210)
Fast clustering using MapReduce [PDF]
Clustering problems have numerous applications and are becoming more challenging as the size of the data increases. In this paper, we consider designing clustering algorithms that can be used in MapReduce, the most popular programming environment for processing large datasets. We focus on the practical and popular clustering problems, $k$-center and $k$
Ene, Alina +2 more
openaire +2 more sources
A Systematic Overview of Caching Mechanisms to Improve Hadoop Performance
ABSTRACT In today's distributed computing environments, the rapid generation of large‐scale data from diverse sources poses significant challenges in terms of storage, management, and processing, particularly for traditional relational databases. Hadoop has emerged as a widely adopted framework for handling such data through parallel processing across ...
Rana Ghazali, Douglas G. Down
wiley +1 more source
Pre-Processing and Modeling Tools for Bigdata
Modeling tools and operators help the user / developer to identify the processing field on the top of the sequence and to send into the computing module only the data related to the requested result.
Hashem Hadi, Ranc Daniel
doaj +1 more source
Comparative Study Parallel Join Algorithms for MapReduce environment
There are the following techniques that are used to analyze massive amounts of data: MapReduce paradigm, parallel DBMSs, column-wise store, and various combinations of these approaches. We focus in a MapReduce environment.
A. Yu. Pigul
doaj +1 more source
Behavioral simulations in MapReduce [PDF]
In many scientific domains, researchers are turning to large-scale behavioral simulations to better understand real-world phenomena. While there has been a great deal of work on simulation tools from the high-performance computing community, behavioral simulations remain challenging to program and automatically scale in parallel environments.
Wang, Guozhang +7 more
openaire +2 more sources
A Pattern‐Referencing Model for Hourly Temperature Forecasting in Coastal Regions
A novel pattern‐referencing model forecasts hourly temperatures in Taiwan's southwestern coastal region. It robustly handles missing data, achieving high accuracy (MAE 0.323°C–0.539°C, RMSE 0.450°C–0.807°C) even during extreme weather, offering a practical solution for real‐time decision‐making.
Nan‐Jing Wu, Fan‐Hua Nan
wiley +1 more source
A successful deployment of Industry 5.0 is significantly dependent on the synergetic integration of several advanced technologies such as big data processing, Artificial Intelligence (AI) integration, and several effective digitization techniques that ...
Arnab Mitra
doaj +1 more source
Effective storage, processing and analyzing of power device condition monitoring data faces enormous challenges. A framework is proposed that can support both MapReduce and Graph for massive monitoring data analysis at the same time based on Aliyun ...
Hongtao Shen, Peng Tao, Pei Zhao, Hao Ma
doaj +1 more source
A post‐translocation genetic analysis of an endemic wingless grasshopper in urban environments
Translocations are increasingly used in conservation, yet invertebrate outcomes remain understudied. This study assessed genetic diversity in a recently translocated flightless grasshopper (Vandiemenella viatica), finding declines in heterozygosity and nucleotide diversity, alongside increased homozygosity in the F2 generation.
Hiromi Yagui +3 more
wiley +1 more source
Hadoop MapReduce scheduling paradigms [PDF]
Apache Hadoop is one of the most prominent and early technologies for handling big data. Different scheduling algorithms within the framework of Apache Hadoop were developed in the last decade. In this paper, we attempt to provide a comprehensive overview over the different paradigms for scheduling in Apache Hadoop.
Johannessen, Roger +2 more
openaire +2 more sources

