PWE3 Y(J). Stein Internet-Draft RAD Data Communications Intended status: Informational B. Briscoe Expires: January 3, 2013 BT DL. Black EMC Corporation July 2, 2012 PW Congestion Considerations draft-stein-pwe3-congcons-00 Abstract Pseudowires (PWs) have become a common mechanism for tunneling traffic, and may be found competing for network resources with non-PW traffic, such as TCP/IP flows. Furthermore, some common PW types are constant bit-rate, and can not adapt their throughput to network conditions. It is thus worthwhile studying under what conditions PWs compete fairly, i.e., contribute to congestion no more than their fair share. Status of this Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on January 3, 2013. Copyright Notice Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents Stein, et al. Expires January 3, 2013 [Page 1] Internet-Draft PW-CONGESTION July 2012 carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Elastic PWs . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3. Inelastic PWs . . . . . . . . . . . . . . . . . . . . . . . . . 5 4. Security Considerations . . . . . . . . . . . . . . . . . . . . 7 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 7 6. Informative References . . . . . . . . . . . . . . . . . . . . 7 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 8 Stein, et al. Expires January 3, 2013 [Page 2] Internet-Draft PW-CONGESTION July 2012 1. Introduction A pseudowire (PW) is a construct for tunneling a native service over a Packet Switched Network (PSN)(see [RFC3985] ), such as IPv4, IPv6, or MPLS. The PW packet encapsulates a unit of native service information by prepending the headers required for transport in the particular PSN (which must include a demultiplexer field to distinguish the different PWs) and preferably the 4 byte PWE3 control word. PWs have no IntServ QoS mechanism, meaning that when multiple PWs are transported in parallel there is no defined means for guaranteeing network resources for any particular PW. This competition for resources may translate to a particular PW not being able to deliver the QoS required to emulate the native service. For example, MPLS-TE enables achieving a particular desired allocation of resources between multiple LSPs; however, when multiple Ethernet PWs are placed in a single MPLS tunnel, there is no way to similarly divide resources amongst them. DiffServ QoS prioritization may be available for PWs. While most PWs are placed in MPLS tunnels, there are several mechanisms that enable transporting PWs over an IP infrastructure. These include: TDM PWs ([RFC4553][RFC5086][RFC5087]) that define UDP/IP encapsulations, L2TPv3 PWs, MPLS PWs directly over IP according to 4023, MPLS PWs over GRE over IP according to 4023. Whenever PWs are transported over IP, they may compete with non-PW flows (e.g., TCP flows), and hence in order to prevent congestion collapse and maintain fairness, they must be TCP-friendly to some extent. At first glance one may think that fairness requires a PW transported over IP to be considered a single flow, on a par with a single TCP flow. Were we to accept this tenet, we would require a PW to back off under congestion to consume no more bandwidth than a TCP flow under such conditions (see [RFC5348]). However, since PWs may carry traffic from many users, it may make more sense to consider each PW to be equivalent to multiple TCP flows. We will discuss the optimal back-off strategy for elastic PWs in Section 2. TDM PWs ([RFC4553][RFC5086][RFC5087]) represent inelastic constant bit-rate (CBR) flows that are not able to respond to congestion in a TCP-friendly manner. On the other hand, the total bandwidth they consume remains constant and does not increase to consume additional bandwidth as TCP rates back off. However, if the increase in bandwidth percentage taken by TDM PWs is considered detrimental, the only available remedy may be to completely shut down the PW. Such a Stein, et al. Expires January 3, 2013 [Page 3] Internet-Draft PW-CONGESTION July 2012 shut down would impact multiple users, and the service restoration time would in general be lengthy. We will discuss when shut down of inelastic PWs can be avoided in Section 3. 2. Elastic PWs In this section we consider ATM, frame relay, and Ethernet PWs that ultimately carry user TCP flows. We will show that we automatically obtain the desired congestion avoidance behavior, and that not additional mechanisms are needed. For simplicity, we will solely treat Ethernet PWs, although the other cases are similar. Each Ethernet PW packet carries a single Ethernet frame that carries a single IP packet. Thus, when congestion is signaled by an intermediate router dropping a packet, a single end-user TCP/IP packet is dropped, eventually leading to the TCP sender retransmitting the packet and multiplicatively reducing its sending rate. Without limiting generality we will further assume that there are some number of TCP flows transported alongside a PW containing some number of TCP flows. We commence by positing the following principle. The Fairness Preservation Principle (FPP): When user traffic needs to avail itself of a desirable specific mechanism, that mechanism must not reward nor penalize the user traffic it services. A desirable specific mechanism is any mechanism that we wish to promote to solve a given networking problem or problems, but that should not usually be used otherwise. Pseudowires are an example of such a mechanism, being a standardized way of enabling transport of legacy services or Ethernet over PSNs, but are not needed for merely transporting TCP flows. Assume that such a mechanism penalizes end- user traffic, e.g., by treating its TCP flows less fairly, forcing them to more aggressively reduce throughput during congestion. This would motivate users to seek ways to circumvent the mechanism, thus impeding its use despite our goal of promoting it as a desirable mechanism. Conversely, were the mechanism to reward end-user traffic, e.g., by enabling its TCP flows to avoid decreasing bandwidth, this would encourage users to adopt the mechanism without cause, despite our goal of promoting it as a specific mechanism only for its legitimate purpose. Since, as we mentioned before, pseudowires is a required desirable mechanism, a corollary to the FPP us that elastic PWs ultimately transporting end user TCP flows must neither reward nor penalize these flows. We shall now show that this is precisely the case, and Stein, et al. Expires January 3, 2013 [Page 4] Internet-Draft PW-CONGESTION July 2012 that this means that the aggregate PW behaves as N flows from the fairness point of view. First let us consider the case of an Ethernet PW carrying N TCP flows each with the same average bandwidth and same average number of packets per second. When a PW packet is discarded by an intermediate router due to congestion, then the probability that the dropped packet belonged to a particular TCP flow is 1/N. The drop causes a Multiplicative-Decrease in sending bandwidth for that user TCP flow only, while the neighboring user flows in the same PW are unaffected. The net effect is that of the individual TCP flow inside the PW experiencing the same drop probability and thus exhibiting the same back off, as it would were it not in the PW. On the other hand, the single dropped packet causes the PW as an aggregate to reduce it bandwidth by only 1/N as much as a single TCP flow. What if the N TCP flows in the PW do not send the same average number of packets per second ? Then the higher rate flows have higher probability of experiencing a packet drop, but once again the probability, and hence the throughput behavior, will mimic that of TCP flows in the open. Once again, the PW bandwidth will behave as an aggregate of many flows, not as a single flow expected to be TCP friendly on its own. So, for the general case the individual TCP flows are neither rewarded nor penalized for being carried over the PW, in accordance to the FPP. There have been suggestions to add additional TCP-friendly mechanisms to PWs, for example by carrying PWs over DCCP. In light of the above arguments, it is clear that this would attempt to force the PW to behave as a single flow, rather than N flows, in contradiction to the FPP. In addition, the individual TCP flows will still back off due to the behavior of their end points that are oblivious to the fact that they are carried over a PW. This will further degrade the flow's throughput as compared to a flow in the open. Thus, such additional mechanisms contradict the desirable behavior as described by the FPP. 3. Inelastic PWs TDM PWs ([RFC4553][RFC5086][RFC5087]) are more problematic than the elastic PWs of the previous section. Being constant bit-rate (CBR) they can not perform bandwidth back off in the presence of congestion. On the other hand, being CBR they also do not attempt to capture additional bandwidth when TCP flows back off. Since a TDM PW continuously consumes a constant amount of bandwidth, if the bandwidth occupied by a TDM PW endangers the network as a Stein, et al. Expires January 3, 2013 [Page 5] Internet-Draft PW-CONGESTION July 2012 whole, the only recourse is to shut it down, denying service to all customers of the TDM native service. We should mention in passing that under certain conditions it may be possible to reduce the bandwidth consumption of a TDM PW. A prevalent case is that of a TDM native service that carries voice channels that may not all be active. Using the AAL2 mode of [RFC5087] (perhaps along with connection admission control) can enable bandwidth adaptation, at the expense of more sophisticated native service processing (NSP). In the following we will show that for most network parameter values of interest, TDM PWs will behave in a TCP friendly manner without additional mechanisms. The important network parameters are one-way delay and packet loss ratio (PLR). The one-way delay of a native TDM service consists of the geographical time-of-flight plus 125 microseconds for each TDM switch traversed. This is very small as compared to PSN network-crossing delays. Many protocols and applications running over TDM circuits thus assume low delay, and we need thus only consider delays of up to 32 milliseconds. The TDM PW RFCs specify the behavior of the egress PE upon experiencing packet loss. Structure agnostic transport has no alternative to outputing an AIS pattern towards the TDM AC, which is recognized by the receiving TDM device as a fault indication. ITU-T Recommendation G.826 places stringent limits on the number of such faults tolerated, and it is possible to derive an upper limit of fractions of one percent PLR. Structure aware transport regenerates frame alignment signals thus hiding fault indications resulting from infrequent packet loss. For TDM circuits carrying voice channels the use of packet loss concealment algorithms is possible (such algorithms have been previously described for TDM PWs). However, even structure aware transport ceases to provide a useful service at about 2 percent PLR. We need thus only consider PLRs of up to a percent or two. RFC 5348 supplies a simplified formula that gives the maximum bandwidth that is allowed to be consumed by a "TCP friendly" flow as a function of round trip delay and PLR. S X_Bps = ------------------------------------------------ R ( sqrt(2p/3) + 12 sqrt(3p/8) p (1+32p^2) ) We can use this formula to determine when a TDM PW consumes only as much bandwidth (or less) as a TDM flow under the same conditions. Replacing the round-trip delay with twice the one-way delay D, setting the bandwidth to that of the TDM service BW, and the segment size to be the TDM fragment TDM plus 4 Bytes to account for the PW control word, results in the following TCP friendliness condition. Stein, et al. Expires January 3, 2013 [Page 6] Internet-Draft PW-CONGESTION July 2012 (TDM + 4) D < ------------- 2 f(p) BW where f(p) = sqrt(2p/3) + 12 sqrt(3p/8) p (1+32p^2). The results are displayed in the accompanying figures (available only in the PDF version of this document). TCP friendly behavior is obtained for the area under curves appropriate for each TDM fragment size. We see that a TDM PW carrying an E1 native service (2.048 Mbps) will consume no more bandwdith than a TCP flow for all parameters of interest if each packet carries at least 512 Bytes of TDM data. For the SAToP default of 256 Bytes, as long as the one-way delay is less than 10 milliseconds, the PLR can exceed 0.3 percent. For packets containing 128 or 64 Bytes the constraints are more troublesome, but there are still parameter ranges where the TDM PW consumes less than a TCP flow under similar conditions. Similarly, an E3 native service (34.368 Mbps) with the SAToP default of 1024 Bytes of TDM per packet will be TCP friendly for delays up to about 5 milliseconds. 4. Security Considerations This document does not introduce any new congestion-specific mechanisms and thus does not introduce any new security considerations above those present for PWs in general. 5. IANA Considerations This document is purely informational and requires no IANA actions. 6. Informative References [RFC3985] Bryant, S. and P. Pate, "Pseudo Wire Emulation Edge-to- Edge (PWE3) Architecture", RFC 3985, March 2005. [RFC4553] Vainshtein, A. and YJ. Stein, "Structure-Agnostic Time Division Multiplexing (TDM) over Packet (SAToP)", RFC 4553, June 2006. [RFC5086] Vainshtein, A., Sasson, I., Metz, E., Frost, T., and P. Pate, "Structure-Aware Time Division Multiplexed (TDM) Circuit Emulation Service over Packet Switched Network (CESoPSN)", RFC 5086, December 2007. Stein, et al. Expires January 3, 2013 [Page 7] Internet-Draft PW-CONGESTION July 2012 [RFC5087] Stein, Y(J)., Shashoua, R., Insler, R., and M. Anavi, "Time Division Multiplexing over IP (TDMoIP)", RFC 5087, December 2007. [RFC5348] Floyd, S., Handley, M., Padhye, J., and J. Widmer, "TCP Friendly Rate Control (TFRC): Protocol Specification", RFC 5348, September 2008. Authors' Addresses Yaakov (Jonathan) Stein RAD Data Communications 24 Raoul Wallenberg St., Bldg C Tel Aviv 69719 ISRAEL Phone: +972 (0)3 645-5389 Email: yaakov_s@rad.com Bob Briscoe BT B54/77, Adastral Park Martlesham Heath Ipswich IP5 3RE UK Phone: +44 1473 645196 Email: bob.briscoe@bt.com URI: http://bobbriscoe.net/ David L. Black EMC Corporation 176 South St. Hopkinton, MA 69719 USA Phone: +1 (508) 293-7953 Email: david.black@emc.com Stein, et al. Expires January 3, 2013 [Page 8]