Monday, November 7, 2011

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric


Motivation:
Datacenter networks have a need to support VM migration without losing
any open connections. Layer 2 networks can support such functionality
but cannot scale to many thousands of nodes. While layer 3 networks are
more scalable, they also have a configuration overhead. This paper
proposes a new scalable, layer 2 network fabric for datacenters.

Main points
- The paper assumes a datacenter topology that is a multi-rooted tree.
  This is applicable for fat-trees and some of the other topologies
  developed in recent years.
- The main idea in the paper is the introduction of a Psuedo MAC (PMAC)
  address which allows end hosts to be named hierarchically at level 2.
  The edge-switches perform a translation of PMAC to MAC addresses and
  vice-versa.
- The networking fabric is co-ordinated by a centralized fabric manager.
  The fabric manager helps avoid broadcasts of ARP requests and helps in
  performing fault-tolerant routing.
- The paper also describes a local discovery protocol which helps switches
  bootstap automatically and discover their role in the multi-rooted
  tree.

Trade-offs
- This paper tries to get the benefits of Layer 3 (hierarchical
  namespace, better routing etc.) and the benefits of Layer 2 (migrate
  VMs without losing connections).
- The major insight in this paper is that since datacenter network
  topology is hierarchical and well-known, a new indirection (PMAC)
  can be used to get the benefits of Layer 2,3.
- The centralized manager simplifies the design but is a scalability
  bottleneck. The authors propose the a small cluster could be used, but
  its not clear if this would affect the other properties of the system.