Results 31 to 40 of about 8,758 (210)

Fast clustering using MapReduce [PDF]

open access: yesProceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, 2011
Clustering problems have numerous applications and are becoming more challenging as the size of the data increases. In this paper, we consider designing clustering algorithms that can be used in MapReduce, the most popular programming environment for processing large datasets. We focus on the practical and popular clustering problems, $k$-center and $k$
Ene, Alina   +2 more
openaire   +2 more sources

A Systematic Overview of Caching Mechanisms to Improve Hadoop Performance

open access: yesConcurrency and Computation: Practice and Experience, Volume 37, Issue 25-26, 30 November 2025.
ABSTRACT In today's distributed computing environments, the rapid generation of large‐scale data from diverse sources poses significant challenges in terms of storage, management, and processing, particularly for traditional relational databases. Hadoop has emerged as a widely adopted framework for handling such data through parallel processing across ...
Rana Ghazali, Douglas G. Down
wiley   +1 more source

Pre-Processing and Modeling Tools for Bigdata

open access: yesFoundations of Computing and Decision Sciences, 2016
Modeling tools and operators help the user / developer to identify the processing field on the top of the sequence and to send into the computing module only the data related to the requested result.
Hashem Hadi, Ranc Daniel
doaj   +1 more source

Comparative Study Parallel Join Algorithms for MapReduce environment

open access: yesТруды Института системного программирования РАН, 2018
There are the following techniques that are used to analyze massive amounts of data: MapReduce paradigm, parallel DBMSs, column-wise store, and various combinations of these approaches. We focus in a MapReduce environment.
A. Yu. Pigul
doaj   +1 more source

Behavioral simulations in MapReduce [PDF]

open access: yesProceedings of the VLDB Endowment, 2010
In many scientific domains, researchers are turning to large-scale behavioral simulations to better understand real-world phenomena. While there has been a great deal of work on simulation tools from the high-performance computing community, behavioral simulations remain challenging to program and automatically scale in parallel environments.
Wang, Guozhang   +7 more
openaire   +2 more sources

A Pattern‐Referencing Model for Hourly Temperature Forecasting in Coastal Regions

open access: yesMeteorological Applications, Volume 32, Issue 6, November/December 2025.
A novel pattern‐referencing model forecasts hourly temperatures in Taiwan's southwestern coastal region. It robustly handles missing data, achieving high accuracy (MAE 0.323°C–0.539°C, RMSE 0.450°C–0.807°C) even during extreme weather, offering a practical solution for real‐time decision‐making.
Nan‐Jing Wu, Fan‐Hua Nan
wiley   +1 more source

Cellular automata-based MapReduce design: Migrating a big data processing model from Industry 4.0 to Industry 5.0

open access: yese-Prime: Advances in Electrical Engineering, Electronics and Energy
A successful deployment of Industry 5.0 is significantly dependent on the synergetic integration of several advanced technologies such as big data processing, Artificial Intelligence (AI) integration, and several effective digitization techniques that ...
Arnab Mitra
doaj   +1 more source

Massive power device condition monitoring data feature extraction and clustering analysis using MapReduce and graph model

open access: yesCES Transactions on Electrical Machines and Systems, 2019
Effective storage, processing and analyzing of power device condition monitoring data faces enormous challenges. A framework is proposed that can support both MapReduce and Graph for massive monitoring data analysis at the same time based on Aliyun ...
Hongtao Shen, Peng Tao, Pei Zhao, Hao Ma
doaj   +1 more source

A post‐translocation genetic analysis of an endemic wingless grasshopper in urban environments

open access: yesConservation Science and Practice, Volume 7, Issue 10, October 2025.
Translocations are increasingly used in conservation, yet invertebrate outcomes remain understudied. This study assessed genetic diversity in a recently translocated flightless grasshopper (Vandiemenella viatica), finding declines in heterozygosity and nucleotide diversity, alongside increased homozygosity in the F2 generation.
Hiromi Yagui   +3 more
wiley   +1 more source

Hadoop MapReduce scheduling paradigms [PDF]

open access: yes2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), 2017
Apache Hadoop is one of the most prominent and early technologies for handling big data. Different scheduling algorithms within the framework of Apache Hadoop were developed in the last decade. In this paper, we attempt to provide a comprehensive overview over the different paradigms for scheduling in Apache Hadoop.
Johannessen, Roger   +2 more
openaire   +2 more sources

Home - About - Disclaimer - Privacy