Pseudowire FAT Interoperability

I usually don’t think much about Pseudowires Sub-TLV until I encountered two IOS-XR boxes that didn’t use the same value and didn’t forward any packets. There is a special corner case of pseudowires using Flow Labels Transport (FAT) that can cause unexpected behavior and if you don’t watch out you might drop traffic. In this post I’ll go over the details of using FAT with different IOS-XR versions and what can go wrong.

Flow Aware Transport  pseudowire (RFC6391) is a type of L2VPN that operates over MPLS. The main benefit of it is that it implements a mechanism which allows you to load-balance one pseudowire over multiple equal cost paths (i.e. ECMP). ECMP of a pseudowire becomes an advantage when transporting large amount of traffic such as 10Gbps or more. FAT is a special interface sub-TLV that’s negotiated between two PE.

The problem relates to Flow Aware Transport (FAT) pseudowires where one side terminating router operates the IOS-XR version 4.3.2 and the other any version up to 4.3.1. The symptom is lack to forwarding of tunneled packets. Both sides show PW as up and operational but no traffic is being forwarded over it. The problem stems from Cisco’s implementation of the FAT feature and in particular the the value that defines FAT.

Originally it was implemented using the RFC draft version 5 written in 2010 (http://tools.ietf.org/html/draft-ietf-pwe3-fat-pw-05) in it, the Pseudowire Interface Parameters Sub-TLV type for Flow Label indicator was defined as “17”. Cisco interpreted it as decimal 17, which in hex is 0x11. It was implemented with as 0x11 in IOS-XR versions up to 4.3.1. What is even more interesting is that IANA’s allocation for interface sub-TLV. IANA’s Pseudowire Name Spaces (PWE3) registry, lists hex value 0x11 as “TDMoIP AAL2 Options” using RFC5287 which was written in 2006 well before the draft 5. Even thought the draft didn’t specify it as 0x17 in hex, other RFC existed that used 17 (in decimal) which were written before the FAT feature.

Draft version 06  (http://tools.ietf.org/html/draft-ietf-pwe3-fat-pw-06#section-11) finally had the 17  specified  as a hex value instead of decimal. It wasn’t until version IOS XR 4.3.2 that Cisco decided to change the value. It doesn’t look like any of the IOS-XR release notes to date were updated, so not a lot of users would read about it when researching the new release.

But why does the mismatch cause packet loss? Shouldn’t the pseudowire go down if there was a mismatch? When you consider RFC4447 Section 5.5:

“Processing of the interface parameters should continue when unknown interface parameters are encountered, and they MUST be silently ignored.”

Remember FAT uses an interface Sub-TLV so that statement should apply to it as well. It appears that is why the PW didn’t go down due to the mismatch. But why the packet loss? Well that’s where I think a Cisco “bug” comes in. The expected behavior of two pseudowire endpoints using different code points for FAT, should be that the FAT functionality is not detected and ignored and no load-balancing is performed on the PW. What actually occurs is that PW comes up and no packets are tunneled in the PW. This is not the expected behavior, that’s why I would call it a bug.
Cisco came up with a little documented workaround to use the command  flow-label code 17 under the pw-class. Basically this command instructs the new IOS-XR code to use the the old value 17 in decimal. This does fix the interoperability issues when applied on newer code. The problem with this solution is that I think Cisco should have taken a different approach.
Originally IOS-XR was applied with a draft standard and then changed without any logic to deal with the interoperability issue in its own software. Some logic should have been applied when changing the default value from 0x11 to 0x17. Some of the options they could have taken are: detect the mismatch and bring down the PW, print an error message that two PW endpoint are using incomparable FAT sub-TLV values or translate the values upon detection. Either way some sort of an intelligent logic would have sufficed. Someone might say that would break the original intent of ignoring unknown interface cparameters as specified in section 5.5 of RFC4447, but I would argue that since Cisco released software using draft RFC and that caused interoperability issues within their own OS, that outweighs it.

One document I found that discusses the workaround:

https://supportforums.cisco.com/docs/DOC-26687#FAT_Pseudowire_TLV

The only thing you can do is to know about the issue and carefully track IOS-XR versions when operating pseudowires with the Flow Label option.