Building a Scale-Out Analytics Platform: YARN++ Experience

Abstract

Collecting massive amounts of data and deriving business-value by running massively parallel jobs on commodity clusters has become commonplace in the industry. Building an analytics platform that provides a predictable execution substrate as well as copes with an ever-growing workload poses several interesting challenges. Using Apache Hadoop YARN as a cluster resource management substrate, Sriram and his group, have built working systems and contributed code to YARN.

In this talk, Sriram will provide an overview of the some of the applied research work that he lead. Additionally, Sriram will also describe how his group was able to translate research projects into impactful code contributions to an established open source project.

Bio

Sriram Rao works in the Data Warehouse team at Facebook Corp. Sriram is a hands-on engineer/researcher. He built KFS (Kosmos distributed filesystem), and Sailfish (scale-out distributed merge sort), and released them as  open-source projects.  Both KFS and Sailfish are deployed at Quantcast Corp backend clusters.

Prior to Facebook, he lead the Cloud and Information  Services Lab (CISL) at Microsoft.  At CISL, Sriram initiated several research projects and played a key role in shaping the  Microsoft's open source strategy around Apache YARN.  Due to his efforts, Apache YARN is widely deployed within Microsoft's  Cosmos compute clusters. Sriram has more than 20 publications in top-tier conferences such as, NSDI, OSDI, SOSP, VLDB, SIGMOD, SIGCOMM, etc. Sriram obtained his Bachelors, Masters, and Phd in Computer Sciences from University of Texas, Austin.