TOC |
|
This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.
This Internet-Draft will expire on February 6, 2010.
Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents in effect on the date of publication of this document (http://trustee.ietf.org/license-info). Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
This document gives guidelines on the encapsulation of IP explicit congestion notification by any outer header, whether encapsulated in a tunnel or in a lower layer header. Following these guidelines should assure interworking between new encapsulation of congestion notification, whether specified by the IETF or other standards bodies.
1.
Introduction
1.1.
Scope
2.
Terminology
3.
Design Guidelines for Adding Congestion Notification to Protocols that Encapsulate IP
3.1.
Indication of ECN Support in the Wire Protocol
3.2.
Encapsulation Guidelines
3.3.
Decapsulation Guidelines
3.4.
Reframing and Congestion Markings
4.
IANA Considerations
5.
Security Considerations
6.
Conclusions
7.
Acknowledgements
8.
Comments Solicited
9.
References
9.1.
Normative References
9.2.
Informative References
TOC |
Explicit Congestion Notification (ECN [RFC3168] (Ramakrishnan, K., Floyd, S., and D. Black, “The Addition of Explicit Congestion Notification (ECN) to IP,” September 2001.)) was defined in the IP header to allow a congested resource to notify the onset of congestion without having to drop packets, by explicitly marking a proportion of packets with the congestion experienced (CE) codepoint.
Some subnetwork technologies (e.g. Frame Relay) have always supported explicit notification of congestion and it is gradually being added to others. The IETF would like to encourage this trend. Of course, the IETF does not have standards authority over every link or tunnel protocol. So this document gives guidelines for designing propagation of congestion notification across the interface between IP and protocols that may encapsulate IP (i.e. that can be layered beneath IP). Each lower layer technology will exhibit slightly different issues and compromises, so the IETF or the relevant standards body must be free to define the specifics of each lower layer congestion notification scheme. But if the guidelines below are followed, congestion notification should interwork between different technologies, using IP in its role as a 'lingua franca' or 'portability layer'.
Often link and physical layer resources are 'non-blocking' by design. In these cases congestion notification does not need to be implemented at the lower layer; ECN in IP is sufficient. A degenerate example is a point-to-point Ethernet link. Excess loading of the link merely causes the queue from the higher layer to back up, while the lower layer remains immune to congestion. Even a whole meshed subnetwork can be made immune to interior congestion by careful network design; interior links can be sufficiently provisioned relative to the edge capacities to absorb even worst-case patterns of load.
Nonetheless, other subnetworks can involve a mesh of links where traffic can converge on interior nodes and cause congestion. If interior nodes do not participate in the Internet protocol, perhaps using MPLS instead or one of the flavours of Ethernet, they should not bury into payloads in order to find an IP header and mark the ECN field within. A correct way to signal congestion without drop would be to standardise explicit notification of congestion in the protocol at the logical link layer. Then it will also be necessary to define how this explicit signal propagates up to IP at the internetwork layer.
Many logical link technologies are based on self-contained protocol data units (PDUs). In these typical cases, at each decapsulation of an outer (lower layer) header, any congestion marking will have to be arranged to propagate into the forwarded (upper layer) header. It can then continue forwards, possibly picking up further congestion signals from congested resources along the way until it finally reaches the destination transport. Then typically the destination will feed this congestion notification back to the source transport.
The purpose of this document is to guide the design of the interfaces between layers, so that explicit congestion signals in PDUs at one layer will propagate consistently into PDUs of an adjacent layer.
TOC |
This document only concerns wire protocol processing of explicit notification of congestion and makes no changes or recommendations concerning algorithms for congestion marking or congestion response.
This document focuses on the congestion notification interface below IP. However, it is likely that the guidelines will also be useful when a protocol is encapsulated by other protocols, including by itself (e.g. IEEE 802.1ah known as MAC in MAC).
ECN support in MPLS has already been defined, including its interface to IP [RFC5129] (Davie, B., Briscoe, B., and J. Tay, “Explicit Congestion Marking in MPLS,” January 2008.). So a label switched router can mark the outer MPLS shim header to signal congestion and, when the packets reach the decapsulator at the edge of the MPLS network, the marks can be propagated into the IP header and onward to the destination transport.
But the congestion notification interface has not been defined between IP and many other protocols that can encapsulate IP, e.g. IEEE 802.3 (Ethernet) frames. Some protocols that encapsulate IP are maintained within the IETF, such as L2TP [RFC2661] (Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, G., and B. Palter, “Layer Two Tunneling Protocol "L2TP",” August 1999.), GRE [RFC1701] (Hanks, S., Li, T., Farinacci, D., and P. Traina, “Generic Routing Encapsulation (GRE),” October 1994.) or PPTP [RFC2637] (Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, W., and G. Zorn, “Point-to-Point Tunneling Protocol,” July 1999.) and §9.3 of RFC3168 pointed out that the IETF might in future want to define how ECN should be added to them.
In some layer 2 technologies, explicit congestion notification has been defined for use internally within the subnet, but the interface with ECN in IP has not been defined. If the lower layer has its own feedback and load regulation, there is no need to propagate congestion signalling up the layers. For instance, EFCI (explicit forward congestion indication) has been present in ATM [ITU‑T.I.371] (ITU-T, “Traffic Control and Congestion Control in B-ISDN,” March 2004.) for a long time, but it has been used for internal control and management rather than being propagated to endpoint transports for them to control end-to-end congestion. FECN (forward ECN) was included in Frame Relay standards but Frame Relay has no internal rate control mechanisms. Therefore, as no interface to IP ECN has ever been defined, FECN is only used to detect where more capacity should be provisioned [Buck00] (Buckwalter, J., “Frame Relay: Technology and Practice,” 2000.).
In another example, backward congestion notification (BCN) is being defined for Ethernet [IEEE802.1au] (IEEE, “IEEE Standard for Local and Metropolitan Area Networks—Virtual Bridged Local Area Networks - Amendment 10: Congestion Notification,” 2008.). BCN avoids the need to define a congestion notification interface with IP by limiting itself to confined scenarios where all endpoints are directly attached by the same layer 2 technology, such as in server area networks. One aim of the guidelines below is to avoid an outcome where one congestion notification scheme has been defined for internal use within a subnetwork technology, but then another has to be defined for interfacing to IP.
TOC |
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.).
Further terminology used within this document:
- Protocol data unit (PDU):
- Information that is delivered as a unit among peer entities of a layered network consisting of protocol control information (typically a header) and possibly user data of that layer. The scope of this document includes layer 2 and layer 3 networks, where the PDU is respectively termed a frame or a packet. PDU is a general term for either.
- Encapsulator:
- The link or tunnel endpoint function that adds an outer header to a PDU (also termed the 'link ingress', the 'ingress tunnel endpoint' or just the 'ingress' where the context is clear).
- Decapsulator:
- The link or tunnel endpoint function that removes an outer header from a PDU (also termed the 'link egress', the 'egress tunnel endpoint' or just the 'egress' where the context is clear).
- Incoming header:
- The header of an arriving PDU before encapsulation.
- Outer header:
- The header added to encapsulate a PDU.
- Inner header:
- The header encapsulated by the outer header.
- Outgoing header:
- The header forwarded by the decapsulator using logic that combines the fields in the outer and inner headers.
- ECN-PDU:
- A PDU destined for an ECN-capable transport (i.e. a transport that will understand explicit congestion notifications). This is intended to be a general term for a PDU at any layer, not just an IP PDU. An IP packet with a non-zero ECN field would be an ECN-PDU, but the term is intended to also be used to describe PDUs of protocols that encapsulate IP packets.
- Not-ECN-PDU:
- A PDU destined for a transport that is not ECN-capable.
- Load Regulator:
- For each flow of PDUs, the transport function that is capable of controlling the data rate. Typically located at the data source, but in-path nodes can regulate load in some congestion control arrangements (e.g. admission control or policing nodes). Note the term "a function capable of controlling the load" deliberately includes a transport that doesn't actually control the load but ought to (e.g. an application without congestion control that uses UDP).
- Congestion Baseline:
- The function that created (or most recently reset) the congestion notification field.
TOC |
These guidelines are compatible with the guidelines on the design of alternate schemes for IP tunnelling of the ECN field [I‑D.ietf‑tsvwg‑ecn‑tunnel] (Briscoe, B., “Tunnelling of Explicit Congestion Notification,” July 2009.) and the more general best current practice for the design of alternate ECN schemes given in [RFC4774] (Floyd, S., “Specifying Alternate Semantics for the Explicit Congestion Notification (ECN) Field,” November 2006.).
The term 'SHOULD (NOT)' has been used in preference to 'MUST (NOT)' because it is difficult to know the compromises that will be necessary in each protocol design. If a particular protocol design chooses to contradict a 'SHOULD (NOT)' given in the advice below, it MUST include a sound justification.
TOC |
An active queue management (AQM) scheme SHOULD NOT apply explicit congestion notifications to PDUs that are destined for legacy transports that will not understand them. Therefore the lower layer wire protocol needs to be able to distinguish whether PDUs are destined for an ECN-capable transport or not. We use the term ECN-PDUs for a PDU that is destined for an ECN-capable transport, and Not-ECN-PDU for one destined for a transport that will not understand ECN.
In IP, if the ECN field in each PDU is cleared to the Not-ECT (not ECN-capable transport) codepoint, it indicates that the transport will not understand congestion markings. The mechanism a lower layer protocol uses to distinguish the ECN-capability of PDUs need not mimic that of IP, but it should achieve the same outcome. For instance, ECN-capable transports might only be allowed to use PDUs identified by a particular set of labels or tags. Alternatively, logical link protocols that use flow state might determine whether a PDU should be congestion marked by checking for ECN-support in the flow state. Whatever mechanisms might be invented, it will be possible for lower layer queues to only apply congestion markings to those PDUs with labels, tags or flow identifiers associated with ECN support (ECN-PDUs).
The per-domain checking of ECN support in MPLS [RFC5129] (Davie, B., Briscoe, B., and J. Tay, “Explicit Congestion Marking in MPLS,” January 2008.) is a good example of a way to avoid sending congestion markings to transports that will not understand them. In MPLS header space is extremely limited, so no field in the MPLS header is used to indicate that the PDU is destined for an ECN-capable transport. Instead, interior nodes in a domain are allowed to set explicit congestion indications without checking whether the PDU is destined for a transport that will understand them. This is made safe by requiring that all the decapsulating edges of a whole domain have to be upgraded at once. Therefore, on decapsulation there will always be a check that the higher layer transport is ECN-capable (by checking for the Not-ECT codepoint in the inner IP header). If the decapsulator discovers that the higher layer (inner header) indicates the transport will not understand ECN, it drops the packet on behalf of the earlier congested node (see Decapsulation Guideline Paragraph 1 in Section 3.3 (Decapsulation Guidelines)).
Note that it was only appropriate to define such an incremental deployment strategy because MPLS is targeted solely at professional operators, who can be expected to ensure that a whole subnetwork is consistently configured. This strategy might not be appropriate for other link technologies targeted at zero-configuration deployment by the general public (e.g. IEEE 802.3 Ethernet). For such 'plug-and-play' environments it would be necessary to invent a failsafe approach that ensured congestion markings would never fall into black holes however inconsistently a system was put together. Alternatively, congestion notification relying on correct system configuration could be confined to flavours of Ethernet intended only for professional network operators, such as IEEE 802.1ah Provider Backbone Bridges (PBB).
TOC |
TOC |
Congestion notification SHOULD NOT simply be copied from outer headers to the forwarded header. The outgoing congestion notification field SHOULD be calculated from the inner and outer headers, using the following rules. If there is any conflict, rules earlier in the list take precedence over rules later in the list:
TOC |
Where framing boundaries are different between two layers, congestion indications SHOULD be propagated on the basis that a congestion indication on a PDU applies to all the octets in the PDU. On average, an encapsulator or decapsulator SHOULD approximately preserve the number of marked octets arriving and leaving (counting the size of inner headers, but not added encapsulating headers).
An algorithm for reframing congestion indications over different sized PDUs SHOULD NOT hold any marked octets back to be signalled in later frames. For instance, a reframing algorithm might maintain a count of marked octets arriving and departing. Such an algorithm SHOULD propagate a congestion indication in the next departing PDU if there have been more arriving marked octets than departing, even if after marking the next PDU the count of departing marked octets will be greater than those arriving.
TOC |
This memo includes no request to IANA.
TOC |
{TBA}
TOC |
{TBA}
TOC |
Bob Briscoe is partly funded by Trilogy, a research project (ICT-216372) supported by the European Community under its Seventh Framework Programme. The views expressed here are those of the author only.
TOC |
Comments and questions are encouraged and very welcome. They can be addressed to the IETF Transport Area working group mailing list <tsvwg@ietf.org>, and/or to the authors.
TOC |
TOC |
[RFC2119] | Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML). |
[RFC3168] | Ramakrishnan, K., Floyd, S., and D. Black, “The Addition of Explicit Congestion Notification (ECN) to IP,” RFC 3168, September 2001 (TXT). |
[RFC4774] | Floyd, S., “Specifying Alternate Semantics for the Explicit Congestion Notification (ECN) Field,” BCP 124, RFC 4774, November 2006 (TXT). |
TOC |
TOC |
Bob Briscoe | |
BT | |
B54/77, Adastral Park | |
Martlesham Heath | |
Ipswich IP5 3RE | |
UK | |
Phone: | +44 1473 645196 |
EMail: | bob.briscoe@bt.com |
URI: | http://bobbriscoe.net/ |