On the performance of SQL scalable systems on Kubernetes: a comparative study
Loading...
Identifiers
Publication date
Reading date
Collaborators
Advisors
Tutors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Share
Center
Department/Institute
Keywords
Abstract
The popularization of Hadoop as the the-facto standard platform for data analytics in the context of Big Data applications
has led to the upsurge of SQL-on-Hadoop systems, which provide scalable query execution engines allowing the use of
SQL queries on data stored in HDFS. In this context, Kubernetes appears as the leading choice to simplify the deployment
and scaling of containerized applications; however, there is a lack of studies about the performance of SQL-on-Hadoop
systems deployed on Kubernetes, and this is the gap we intend to fill in this paper. We present an experimental study
involving four representative SQL scalable platforms: Apache Drill, Apache Hive, Apache Spark SQL and Trino. Concretely, we analyze the performance of these systems when they are deployed on a Hadoop cluster with Kubernetes by
using the TPC-H benchmark. The results of our study can help practitioners and users about what they can expect in terms
of performance if they plan to use the advantages of Kubernetes to deploy applications using the analyzed SQL scalable
platforms.
Description
Bibliographic citation
Cardas, C., Aldana-Martín, J.F., Burgueño-Romero, A.M. et al. On the performance of SQL scalable systems on Kubernetes: a comparative study. Cluster Comput (2022). https://doi.org/10.1007/s10586-022-03718-9
Collections
Endorsement
Review
Supplemented By
Referenced by
Creative Commons license
Except where otherwised noted, this item's license is described as Atribución 4.0 Internacional










