Monday, November 16 • 14:55 - 15:40
Swarming Spark applications

 We built Zoe, an open source user-facing service that ties together Spark, a data-intensive framework for big data computation, and Swarm, the Docker clustering system. It targets data scientists who need to run their data analysis applications without having to worry about systems details. Zoe can execute long running Spark jobs, but also Scala or iPython interactive notebooks and streaming applications, covering the full Spark development cycle. When a computation is finished, resources are automatically freed and available for other uses, since all processes are run in Docker containers. 

In this talk we are going to present why Zoe, the Container Analytics as a Service, was born, its architecture and the problems it tries to solve. Zoe would not be there without Swarm and Docker and we will also talk about some of the stumbling blocks we encountered and the solutions we found, in particular in transparently connecting Docker hosts through a physical network. Zoe was born as a research prototype, but is now stable and is currently being used to run real jobs from users in our research institution. Application scheduling on top of Swarm and optimized container placement will also be covered during the presentation. 

Daniele Venzano

Research Engineer, EURECOM
Daniele Venzano works as a Research Engineer in the Distributed Systems Group at Eurecom in Sophia Antipolis, southern France, since 2013. His main focus is virtualization technologies with an eye to optimizations for data intensive frameworks like Spark and Hadoop.Before he was part of the Networked Systems Laboratory at EPFL in Lausanne, Switzerland, where he developed Nice, a testing framework for OpenFlow controller applications.

Monday November 16, 2015 14:55 - 15:40
Level 1, Room 114

