Results 31 to 40 of about 20,810 (198)
Mining Frequency of Drug Side Effects Over a Large Twitter Dataset Using Apache Spark [PDF]
Despite clinical trials by pharmaceutical companies as well as current FDA reporting systems, there are still drug side effects that have not been caught. To find a larger sample of reports, a possible way is to mine online social media. With its current
Hsu, Dennis
core +3 more sources
Time series analysis with apache spark and its applications to energy informatics
In energy economy forecasts of different time series are rudimentary. In this study, a prediction for the German day-ahead spot market is created with Apache Spark and R.
Cornelia Krome, Volker Sander
doaj +1 more source
SpaRC: scalable sequence clustering using Apache Spark [PDF]
Abstract Motivation Whole genome shotgun based next-generation transcriptomics and metagenomics studies often generate 100–1000 GB sequence data derived from tens of thousands of different genes or microbial species.
Lizhen Shi +4 more
openaire +4 more sources
CLASSIFICATION OF BIG POINT CLOUD DATA USING CLOUD COMPUTING [PDF]
Point cloud data plays an significant role in various geospatial applications as it conveys plentiful information which can be used for different types of analysis.
K. Liu, J. Boehm
doaj +1 more source
Scientific Computing Meets Big Data Technology: An Astronomy Use Case
Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application.
Barbary, Kyle +7 more
core +2 more sources
Monitoring Data Integrity in Big Data Analytics Services [PDF]
Enabled by advances in Cloud technologies, Big Data Analytics Services (BDAS) can improve many processes and identify extra information from previously untapped data sources. As our experience with BDAS and its benefits grows and technology for obtaining
Kloukinas, C. +2 more
core +1 more source
Apache Calcite: A Foundational Framework for Optimized Query Processing Over Heterogeneous Data Sources [PDF]
Apache Calcite is a foundational software framework that provides query processing, optimization, and query language support to many popular open-source data processing systems such as Apache Hive, Apache Storm, Apache Flink, Druid, and MapD.
Begoli, Edmon +4 more
core +2 more sources
Diaspore: Diagnosing Performance Interference in Apache Spark
Apache Spark is being increasingly used to execute big data applications on cluster computing platforms. To increase system utilization, cluster operators often configure their clusters such that multiple co-located applications can simultaneously share ...
Sarah Shah +2 more
doaj +1 more source
Apache Spark is a popular system aimed at the analysis of large data sets, but recent studies have shown that certain computations---in particular, many linear algebra computations that are the basis for solving common machine learning problems---are ...
Gerhardt, Lisa +8 more
core +1 more source
A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures [PDF]
Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, co-placement and scheduling of data with compute resources, and storing and transferring large ...
Fox, Geoffrey C. +4 more
core +1 more source

