Saturday, September 24, 2011

Kindergarten QoS of UCSM

With quite a bit of over-subscription and combination of FCoE and Ethernet over the same physical media, QoS tends to be quite complicated in the UCS platform. Still in the management plane, the QoS configuration is presented in such a simplified manner that it was referred internally as "kindergarten QoS". Let's take the FCoE part out of the picture. Fibre-Channel is fundamentally loss-less L2 technology on contrary, Ethernet doesn't provide any such promise. However, things changed with per-priority-pause (PPP) and backward congestion notification features of data-center Ethernet (DCE). FCoE uses PPP to simulate loss-less media over Ethernet. In UCS networking components (Fabric-Interconnect aka switch, Febric-Extender and adapters), a dedicated queue is used for FCoE to ensure end-to-end PPP behavior. That takes care of FCoE.

Now, let's be back to how UCSM presents the QoS configuration to the administrator. There are two main parts:
  1. System QoS classes
  2. VNIC (egress) QoS policies
Following is the global QoS class configuration:



There are six global QoS classes defined:
  1. Default Ethernet class (best-effort)
  2. FCoE
  3. Platinum
  4. Gold
  5. Silver
  6. Bronze
Only first two are enabled by default (and can't be disabled). There's one more hidden control class defined internally, but it's not exposed to users. Following are the salient characteristics of UCSM system classes:
  • Classification and marking are combined in one. Classification is only based on L2 cos value (with cos 7 reserved for control traffic). Adapters mark the traffic by referring to the system class, FI trust the marking and classifies the frames accordingly.
  • CBWRR queuing strategy is used on all the ports of FI. Per port policy application is not required.
  • Bandwidth allocation per class is done using relative weight. Explicit bandwidth percentage is not exposed to avoid user (or script) configuration error of exceeding 100% interface bandwidth. (Once user chooses weight, the systems displays percentage for all classes).
  • Per interface MTU is not supported, per class MTU for the entire system is specified.
  • Notably missing things are: priority queuing, Weighed class based Random Early Drop.
Following figure shows how a VNIC QoS policy looks like:

As you can see, not much config is available here. You refer to the system class, and specify shaping parameters. "cos" specified by the system class is used to mark the untagged egress packets from the host. "Host control" config detects the system behavior with already marked packets by host. If "full" is specified, then packets tagged by host are trusted, otherwise, any packet that is marked by host and doesn't match cos specified by the system class is dropped by the Cisco adapters. Shaping parameters specified are enforced by the adapters for the egress (host's perspective) traffic.

That's it! Once you have defined various VNIC QoS policies, the VNICs in the service-profile can refer to them by name. The named policy reference works as per the policy resolution mentioned in my previous post.

Just to compare, above mentioned configuration in the UCSM expands to following MQC in the NXOS:

UCS-A(nxos)# show class-map


Type qos class-maps
===================

class-map type qos match-any class-fcoe
match cos 3

class-map type qos match-all class-gold
match cos 4

class-map type qos match-all class-bronze
match cos 1

class-map type qos match-all class-silver
match cos 2

class-map type qos match-any class-default
match any

class-map type qos match-all class-platinum
match cos 5

class-map type qos match-any class-all-flood
match all flood

class-map type qos match-any class-ip-multicast
match ip multicast


Type queuing class-maps
=======================

class-map type queuing class-fcoe
match qos-group 1

class-map type queuing class-gold
match qos-group 3

class-map type queuing class-bronze
match qos-group 5

class-map type queuing class-silver
match qos-group 4

class-map type queuing class-default
match qos-group 0

class-map type queuing class-platinum
match qos-group 2

class-map type queuing class-all-flood
match qos-group 2

class-map type queuing class-ip-multicast
match qos-group 2



Type network-qos class-maps
==============================

class-map type network-qos class-fcoe
match qos-group 1

class-map type network-qos class-gold
match qos-group 3

class-map type network-qos class-bronze
match qos-group 5

class-map type network-qos class-silver
match qos-group 4

class-map type network-qos class-default
match qos-group 0

class-map type network-qos class-platinum
match qos-group 2

class-map type network-qos class-all-flood
match qos-group 2

class-map type network-qos class-ip-multicast
match qos-group 2

UCS-A(nxos)# show policy-map


Type qos policy-maps
====================

policy-map type qos system_qos_policy
class type qos class-platinum
set qos-group 2
class type qos class-silver
set qos-group 4
class type qos class-bronze
set qos-group 5
class type qos class-gold
set qos-group 3
class type qos class-fcoe
set qos-group 1
class type qos class-default
set qos-group 0

Type queuing policy-maps
========================

policy-map type queuing system_q_in_policy
class type queuing class-platinum
bandwidth percent 22
class type queuing class-gold
bandwidth percent 20
class type queuing class-silver
bandwidth percent 18
class type queuing class-bronze
bandwidth percent 15
class type queuing class-fcoe
bandwidth percent 14
class type queuing class-default
bandwidth percent 11
policy-map type queuing system_q_out_policy
class type queuing class-platinum
bandwidth percent 22
class type queuing class-gold
bandwidth percent 20
class type queuing class-silver
bandwidth percent 18
class type queuing class-bronze
bandwidth percent 15
class type queuing class-fcoe
bandwidth percent 14
class type queuing class-default
bandwidth percent 11
policy-map type queuing org-root/ep-qos-HTTP
class type queuing class-fcoe
bandwidth percent 50
class type queuing class-default
bandwidth percent 50
shape 10000 kbps 10240
policy-map type queuing org-root/ep-qos-Streaming
class type queuing class-fcoe
bandwidth percent 50
class type queuing class-default
bandwidth percent 50
shape 100000 kbps 10240


Type network-qos policy-maps
===============================

policy-map type network-qos system_nq_policy
class type network-qos class-platinum

mtu 1500
pause no-drop
class type network-qos class-silver

mtu 1500
pause drop
class type network-qos class-bronze

mtu 1500
pause drop
class type network-qos class-gold

mtu 9000
pause drop
class type network-qos class-fcoe

pause no-drop
mtu 2158
class type network-qos class-default

pause drop
mtu 1500

Friday, September 9, 2011

UCS management paradigm

UCS Manager (UCSM) exhibits a very unique and interesting set of features for ease of deployment in the cloud. UCSM stores configuration, device information, statistics and policies in an object oriented data model. The "management brain" of UCS is completely data driven. Only interface to UCSM is through XML APIs, and both GUI and CLI internally use XML to communicate with the core UCSM process. With that high level background, let me get in to the details of some interesting characteristics of UCSM.

Named References

UCSM makes maximum use of named references. Templates, pools and policies are used to loosely bind the configuration and to easily share common data. Policies dictate behavior and multiple configuration end-points can share the same behavior. A change in policy would not require to revisit all the end-points that refer to the policy. There are standalone policies for global configuration, for example chassis discovery policy, VM life cycle policy etc. The VLANs are also referred by name. For example, all database servers can be in a "dbNet" VLAN with VLAN id 10 and all the service profiles corresponding to database servers would have VNICs referring to the VLAN by the name "dbNet". Once this is in place and servers are up and running - if network admin changes the network architecture and VLAN id changes from 10 to 20 -- he/she wouldn't have to revisit all the servers to change - it would be changed at only one place.

Policy resolution in hierarchical org structure

UCSM allows to reflect hierarchical organizational structure of a company (or tenants in case of cloud service provider) in the managed object model. For example, you can have classic "coke" and "pepsi" top level orgs. Under "coke", you can have "operations", "research", "legal", "marketing" orgs. Now let's say there are "streaming" and "http" QoS policies defined at "coke" level, where "http" restricts the bandwidth to 1Mbps. But, in the "research", there's a requirement to let web traffic flow up to the line-rate, so, administrator can create a QoS policy with same name at the "research" org level. When policy is referred by name, policy gets resolved to the closest org level that matches the name. So, a port-profile defined in "research" level or sub-org levels would enjoy line-rate if it refers to "http" QoS policy.


Also, the policy gets re-resolved if another policy is added with same name in the org hierarchy. If the previous example, if "http" QoS policy is deleted from "research" level, the port-profiles referring to them would automatically resolve to "http" policy at the "coke" level.

Loose Referential Integrity

Most management system would not let you delete a policy if it is referred by other configuration. However, UCSM does not enforce such strict referential integrity. UCSM referential integrity works in following loose manner:

  • For every policy/pool type, there exists a predefined policy at the root level with "default" name.
  • If the system doesn't find a named policy in org scope of configuration that refers a policy by name, then default policy is used.
  • If a referred policy gets deleted (and no other policy in org scope exists at any other org level), then configuration referring to such policy resolves to the default policy.
  • If a more relevant (closer) named policy is defined with respect to the configuration, then policy gets re-resolved to the "closer" policy in org hierarchy.
Of course, when system dynamically resolves policy, then it must have mechanism to inform administrator about its resolution. Two mechanism are used for this:
  1. An operational property is defined for every named reference which specified distinguished name of the policy that system resolved.
  2. If a specific named policy is not found and default is used, then a fault is raised.
Asynchronous Configuration Deployment

In server management systems, many operations take long time to finish - like VM deployment, server reboot etc., so north bound APIs can not be blocked for such extensive period of time. UCSM provides completely asynchronous north-bound experience. It differs from Cisco's networking gears in this regards - for example, when user issues command to create a VLAN, the management system would check the range and maximum VLANs etc, if user input is good, it immediately unblocks the user, effectively telling "consider it done". Later, it deploys the VLAN on the switch and it would only fail if there are other serious issues like control plane running out of memory etc. and if that happens, faults are raised.

Putting it all together


UCSM is designed to keep cloud and data-center virtualization in mind. It is extremely friendly to automation given its XML APIs, data-driven model and asynchronous nature. Loosely coupled policies and maximum usage of pools, policies and templates make it and ideal fit for server procurement in cloud, especially in multi-tenant systems. Field has very warmly welcome these characteristics. Just as a sample, Here is a blog that talks about integration of XML APIs in the power-shell.

Wednesday, August 31, 2011

GNS3 with Qemu

Thank God GNS exists! I extensively used GNS for my Cisco career certification exams. I only played with routers in the GNS so far - which was enough to experiment with routing protocols and other relevant stuff for CCNP/CCIE. Loopback interfaces substituted for the actual hosts. However, I wanted to go to next step and actually run small Linux images on Qemu and integrate it in GNS labs.

It turned out that setting up qemu was a piece of cake. Finding a small Linux image for qemu was a bit of a fight, but I found Linux microcore 2.10 image that worked very well with qemu on Windows 7 host OS. And voila! My first GNS lab with 3 hosts, one router and a switch was up and running in no time. Following is a screen shot of the network. I initiated ping to "host3" from "host1", which is visible on host1's console. Host3 was running tcpdump to capture ICMP packets. On router, "show interface stats" shows increase in octets and packets. Instant gratification! :)


(Please click on the image to see the larger version).

This has added a lot of interesting stuff in the "TODO" list, for example, compile my own small linux image, so that I can add a few more tools in there, IPv6 network with hosts, ASA and PIX firewalls in GNS etc. Let's see how it goes..

Monday, August 22, 2011

Network Virtualization in Cisco's UCS

I would like to start this blog with my favorite project - Network Virtualization in Cisco's Unified Compute System (UCS). Particularly, I would like to write about integration of UCS with VMware's vCenter to achieve seamless network virtualization in data centers. Before I go into the details of the "solution", let me go over the "problem" first.

Issues with virtualization in data center network

As soon as virtualization is added on a host, a software switch or "soft switch" had to be added in the host kernel. Refer to the diagram on left. As soon as you have multiple VMs running on a host (an ESX host in case of VMware), switching must happen in the host kernel if one VM wants to communicate with another VM because of the Ethernet switching 101 - a frame is never sent back on the same interface from where it came from (it would be either sent to a specific interface where destination MAC address is learned, or flooded to all the interface except from where it came from in case of unknown unicast, broadcast and multicast). So, a soft switch was mandated from the day one in VMware ESX host.

However, as soon as the soft switch was added to the host, another issue surfaced in the way data centers are managed. The line in the middle of the diagram is the boundary between server admin team and network admin team, and now a switch is lying in the server admin's domain. In the good old days before virtualization, the port connected to the host was typically put in access mode, with one access VLAN assigned to it and specific QoS and access control policies attached with it. Now, because of the soft switch in the host, the trust boundary has to be extended to the soft switch, multiple VLANs have to be allowed on the port connected the ESX host, QoS trust boundary has to be extended to the soft switch (or more expensive NBAR based classification has to be performed), and access control policy enforcement becomes complex. The link between switch and ESX host becomes a dump pipe (that's why I have drawn it that way), and network policy enforcement become responsibilities of soft-switch - which is typically managed by server admin.

Cisco's UCS solves this problem and streamlines the data center virtualization by offering hardware VNTag solution.

UCS Network Virtualization

In Cisco's Unified Compute System (UCS), customers have choice to replace the ESX host's soft-switch by the hardware based switching in the UCS access switch, aka UCS Fabric Interconnects. In the ESX host, a Cisco's kernel module - Virtual Ethernet Module or VEM is added.

VEM works as a replacement of soft-switch, where instead of performing switching in the host, for every north bound virtual NIC created on the VEM, a south bound instance of VNIC is created in the Cisco's M81KR virtualized adapter. In the current releases, up to 56 such VNICs can be created on the M81KR adapter. The adapter in turn requests the attached fabric-interconnect (access switch) to create dynamic virtual Ethernet interfaces (vEths). vEths are logical entities created in ASIC and enjoys the same status as physical interfaces, so now the frames from one VM to another is switched in the ASIC on the access switch. All the rich set of switching features of Nexus 5000 switch is now available to the vEths, which is managed by the network admin team. M81KR is also industry's first implementation of VMDirectPath technology, which essentially bypasses the switching module in the hypervisor and provides near bare metal performance to VM-VNICs. In addition to solving the domain issues and providing faster switching in hardware, UCS provides smooth integration with vCenter and port-profile based network management.

UCSM / vCenter integration and port-profiles

Another benefit of using UCS in data center is its integration with VMware's vCenter and port-profile based VM network management. A UCSM first registers itself to vCenter as a management extension and UCSM can create multiple distributed virtual switches in the vCenter. A distributed virtual switch is a VEM that expands across multiple ESX hosts. At this point, network admin can create multiple port-profiles. Port-profiles are grouped network policies and configuration identified by a name. For example, "finance" and "hr" port-profiles would contain VLANs, QoS policies, L2 security policies etc. for those two groups. UCSM pushes profile names and description to vCenter, as server admin wouldn't be interested in the nittygritty of network configuration.

When server admin deploys a VM, she would create VNICs for the VM and assign a port-profiles to VNICs. When VM instantiates on the hypervisor, VEM and M81KR would request the access switch to dynamically create vEth interfaces and provide the corresponding port-profile names. As access switch has complete configuration of port-profiles, the vEths would inherit the configuration from the corresponding port-profiles, and thus, the whole loop finishes. As you can see, this architecture has several benefits:
  • It decouples server and network admin's work flow and eliminates inter-dependency. For example, network admin can easily change VLAN id in a port-profile without bothering the server admin team to make any changes on server side.
  • When vMotion happens, the VM-VNIC's configuration also smoothly moves from one host to another. A vEth interface is detached from one access port and is attached to access port connected to the destination hypervisor. All the state and stats of the vEth is preserved after vMotion. Any external script or entity doesn't need to change configuration of access ports with vMotion.
  • UCS (which includes the access switch) is aware about the virtualization. You can see the association of physical servers, their associated service-profiles, hypervisors, VM instances, VNICs and port-profiles in the UCSM UI.
You can find white papers that describe more technical details of this solution on Cisco website, my goal here was to iterate over the latest end-to-end virtualization technology as simply as possible.

Disclaimer: Anything explained/expressed here is not an official form of communication by/from Cisco Systems, Inc.