Wednesday, September 14, 2011

The Datacenter Needs an Operating System

Motivation:
With a large number of Internet, enterprise and scientific services deployed in datacenters, this paper motivates the need for an operating system in the datacenter to provide ease of development and efficient sharing.

Main ideas:
- There is a growing diversity of applications deployed in the datacenter and similar to the personal computer, this leads to the need for an operating system
- There are four main roles envisioned for a datacenter OS
   a. Resource sharing: Efficiently share cluster resources among all the jobs and provide optimal scheduling and fairness.
   b. Data sharing: Provide common data interfaces (analogous to pipes) to allow different applications to be chained together.
   c. Programming abstractions: Simplify development of new programming frameworks by providing standard APIs and services.
   d. Debugging and monitoring: Provide support for end-to-end debugging and this could potentially involve a clean-slate approach having some restrictions
- Finally the paper also identifies how the academic research community can contribute to this goal, given that the industry

Trade-offs/influence:
The main trade-off here I see is between the challenge in coming up with standard interfaces against the benefits we would gain from it. The paper mentions that there many different software stacks today that are performing some of the functions mentioned above e.g., Google GFS-MapReduce etc., Hadoop's software stack etc. However as datacenters are owned by a single organization and when most of the software is developed in-house at places like Google and Amazon, there is less of a need to come up with standard interfaces.
Further, the PC operating systems are driven by a commodity business model where the user can assemble a computer bought from different vendors and run a large number of applications. This encourages have a data and resource sharing model that application developers can use. Similarly the development of a standard sharing model will grow once users and application developers have the need to run multiple programs at the same time.
Finally, I think the it is important to have an operating system for the datacenter as it will prevent vendors from locking in customers to their proprietary solutions. For example, if one wants to use both Amazon AWS and Microsoft Azure (to provide say primary backup failure handling), users cannot using the same application across both frameworks.  Having a more standard model for describing data and resources could help the user switch easily between cluster computing providers.