{rfName}
A

Indexed in

License and use

Altmetrics

Grant support

We appreciate all anonymous reviewers at Middleware'22, who provided insightful feedback that makes this paper much stronger. This work has been partially supported by EU (No. 825184) and Spanish Government (No. PID2019-106774RB-C22). Marc SanchezArtigas is a Serra Hunter Fellow.

Analysis of institutional authors

Sanchez-Artigas, MarcCorresponding AuthorEizaguirre, German TAuthor

Share

Publications
>
Proceedings Paper

A Seer Knows Best: Optimized Object Storage Shuffling for Serverless Analytics

Publicated to:Proceedings Of The Twenty-Third Acm/Ifip International Middleware Conference, Middleware 2022. 148-160 - 2022-01-01 (), DOI: 10.1145/3528535.3565241

Authors: Sanchez-Artigas, Marc; Eizaguirre, German T

Affiliations

Univ Rovira & Virgili, Tarragona, Spain - Author

Abstract

Serverless platforms offer high resource elasticity and pay-as-you-go billing, making them a compelling choice for data analytics. To craft a "pure" serverless solution, the common practice is to transfer intermediate data between serverless functions via serverless object storage (IBM COS; AWS S3). However, prior works have led to inconclusive results about the performance of object storage, since they have left large margin for optimization. To verify that object storage has been underrated, we design a novel shuffle manager for serverless data analytics termed Seer. Specifically, Seer dynamically chooses between two shuffle algorithms to maximize performance. The algorithm choice is based on some predictive models, and very importantly, without users having to specify intermediate data sizes at the time of the job submission. We integrate Seer with PyWren-IBM [31], a serverless analytics framework, and evaluate it against both serverful (e.g., Spark) and serverless systems (e.g., Google BigQuery). Our results certify that our new shuffle manager can deliver performance improvements over them.

Keywords

I/o optimizationObject storageServerless computingShuffle

Quality index

Bibliometric impact. Analysis of the contribution and dissemination channel

From a relative perspective, and based on the normalized impact indicator calculated from the Field Citation Ratio (FCR) of the Dimensions source, it yields a value of: 2.9, which indicates that, compared to works in the same discipline and in the same year of publication, it ranks as a work cited above average. (source consulted: Dimensions Jun 2025)

Specifically, and according to different indexing agencies, this work has accumulated citations as of 2025-06-23, the following number of citations:

  • WoS: 6
  • OpenCitations: 5

Impact and social visibility

From the perspective of influence or social adoption, and based on metrics associated with mentions and interactions provided by agencies specializing in calculating the so-called "Alternative or Social Metrics," we can highlight as of 2025-06-23:

  • The use of this contribution in bookmarks, code forks, additions to favorite lists for recurrent reading, as well as general views, indicates that someone is using the publication as a basis for their current work. This may be a notable indicator of future more formal and academic citations. This claim is supported by the result of the "Capture" indicator, which yields a total of: 3 (PlumX).

Leadership analysis of institutional authors

There is a significant leadership presence as some of the institution’s authors appear as the first or last signer, detailed as follows: First Author (Sanchez Artigas, Marc) and Last Author (Eizaguirre Suárez, Germán Telmo).

the author responsible for correspondence tasks has been Sanchez Artigas, Marc.