< draft-briscoe-tsvwg-ecn-encap-guidelines-02.txt | draft-briscoe-tsvwg-ecn-encap-guidelines-03a.txt > | |||
---|---|---|---|---|
Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
Internet-Draft BT | Internet-Draft BT | |||
Updates: 3819 (if approved) J. Kaippallimalil | Updates: 3819 (if approved) J. Kaippallimalil | |||
Intended status: BCP Huawei | Intended status: BCP Huawei | |||
Expires: August 28, 2013 P. Thaler | Expires: March 9, 2014 P. Thaler | |||
Broadcom Corporation | Broadcom Corporation | |||
February 24, 2013 | September 05, 2013 | |||
Guidelines for Adding Congestion Notification to Protocols that | Guidelines for Adding Congestion Notification to Protocols that | |||
Encapsulate IP | Encapsulate IP | |||
draft-briscoe-tsvwg-ecn-encap-guidelines-02 | draft-briscoe-tsvwg-ecn-encap-guidelines-03 | |||
Abstract | Abstract | |||
The purpose of this document is to guide the design of congestion | The purpose of this document is to guide the design of congestion | |||
notification in any lower layer or tunnelling protocol that | notification in any lower layer or tunnelling protocol that | |||
encapsulates IP. The aim is for explicit congestion signals to | encapsulates IP. The aim is for explicit congestion signals to | |||
propagate consistently from lower layer protocols into IP. Then the | propagate consistently from lower layer protocols into IP. Then the | |||
IP internetwork layer can act as a portability layer to carry | IP internetwork layer can act as a portability layer to carry | |||
congestion notification from non-IP-aware congested nodes up to the | congestion notification from non-IP-aware congested nodes up to the | |||
transport layer (L4). Following these guidelines should assure | transport layer (L4). Following these guidelines should assure | |||
skipping to change at page 1, line 42 | skipping to change at page 1, line 42 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on August 28, 2013. | This Internet-Draft will expire on March 9, 2014. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2013 IETF Trust and the persons identified as the | Copyright (c) 2013 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 19 | skipping to change at page 2, line 19 | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
3. Modes of Operation . . . . . . . . . . . . . . . . . . . . . . 7 | 3. Modes of Operation . . . . . . . . . . . . . . . . . . . . . . 7 | |||
3.1. Feed-Forward-and-Up Mode . . . . . . . . . . . . . . . . . 7 | 3.1. Feed-Forward-and-Up Mode . . . . . . . . . . . . . . . . . 8 | |||
3.2. Feed-Up-and-Forward Mode . . . . . . . . . . . . . . . . . 9 | 3.2. Feed-Up-and-Forward Mode . . . . . . . . . . . . . . . . . 9 | |||
3.3. Feed-Backward Mode . . . . . . . . . . . . . . . . . . . . 10 | 3.3. Feed-Backward Mode . . . . . . . . . . . . . . . . . . . . 10 | |||
3.4. Null Mode . . . . . . . . . . . . . . . . . . . . . . . . 12 | 3.4. Null Mode . . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion | 4. Feed-Forward-and-Up Mode: Guidelines for Adding Congestion | |||
Notification . . . . . . . . . . . . . . . . . . . . . . . . . 12 | Notification . . . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
4.1. IP-in-IP Tunnels with Tightly Coupled Shim Headers . . . . 13 | 4.1. IP-in-IP Tunnels with Tightly Coupled Shim Headers . . . . 13 | |||
4.2. Wire Protocol Design: Indication of ECN Support . . . . . 13 | 4.2. Wire Protocol Design: Indication of ECN Support . . . . . 13 | |||
4.3. Encapsulation Guidelines . . . . . . . . . . . . . . . . . 15 | 4.3. Encapsulation Guidelines . . . . . . . . . . . . . . . . . 15 | |||
4.4. Decapsulation Guidelines . . . . . . . . . . . . . . . . . 16 | 4.4. Decapsulation Guidelines . . . . . . . . . . . . . . . . . 17 | |||
4.5. Sequences of Similar Tunnels or Subnets . . . . . . . . . 18 | 4.5. Sequences of Similar Tunnels or Subnets . . . . . . . . . 18 | |||
4.6. Reframing and Congestion Markings . . . . . . . . . . . . 18 | 4.6. Reframing and Congestion Markings . . . . . . . . . . . . 19 | |||
5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | |||
Notification . . . . . . . . . . . . . . . . . . . . . . . . . 19 | Notification . . . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
6. Feed-Backward Mode: Guidelines for Adding Congestion | 6. Feed-Backward Mode: Guidelines for Adding Congestion | |||
Notification . . . . . . . . . . . . . . . . . . . . . . . . . 20 | Notification . . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
7. IANA Considerations (to be removed by RFC Editor) . . . . . . 21 | 7. IANA Considerations (to be removed by RFC Editor) . . . . . . 21 | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 21 | |||
9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 21 | 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 21 | 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 22 | 11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 23 | |||
12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
12.1. Normative References . . . . . . . . . . . . . . . . . . . 22 | 12.1. Normative References . . . . . . . . . . . . . . . . . . . 23 | |||
12.2. Informative References . . . . . . . . . . . . . . . . . . 22 | 12.2. Informative References . . . . . . . . . . . . . . . . . . 24 | |||
Appendix A. Outstanding Document Issues . . . . . . . . . . . . . 25 | Appendix A. Outstanding Document Issues . . . . . . . . . . . . . 27 | |||
Appendix B. Changes in This Version (to be removed by RFC | Appendix B. Changes in This Version (to be removed by RFC | |||
Editor) . . . . . . . . . . . . . . . . . . . . . . . 25 | Editor) . . . . . . . . . . . . . . . . . . . . . . . 27 | |||
1. Introduction | 1. Introduction | |||
Explicit Congestion Notification (ECN [RFC3168]) is defined in the IP | Explicit Congestion Notification (ECN [RFC3168]) is defined in the IP | |||
header (v4 & v6) to allow a resource to notify the onset of queue | header (v4 & v6) to allow a resource to notify the onset of queue | |||
build-up without having to drop packets, by explicitly marking a | build-up without having to drop packets, by explicitly marking a | |||
proportion of packets with the congestion experienced (CE) codepoint. | proportion of packets with the congestion experienced (CE) codepoint. | |||
ECN removes nearly all congestion loss and it cuts delays for two | ECN removes nearly all congestion loss and it cuts delays for two | |||
main reasons: i) it avoids the delay when recovering from congestion | main reasons: i) it avoids the delay when recovering from congestion | |||
skipping to change at page 5, line 9 | skipping to change at page 5, line 9 | |||
then in the following sections separate guidelines are given for each | then in the following sections separate guidelines are given for each | |||
mode. | mode. | |||
This document updates the advice to subnetwork designers about ECN in | This document updates the advice to subnetwork designers about ECN in | |||
Section 13 of [RFC3819]. | Section 13 of [RFC3819]. | |||
1.1. Scope | 1.1. Scope | |||
This document only concerns wire protocol processing of explicit | This document only concerns wire protocol processing of explicit | |||
notification of congestion and makes no changes or recommendations | notification of congestion and makes no changes or recommendations | |||
concerning algorithms for congestion marking or congestion response | concerning algorithms for congestion marking or for congestion | |||
(algorithm issues should be independent of the layer the algorithm | response (algorithm issues should be independent of the layer the | |||
operates in). | algorithm operates in). | |||
The question of congestion notification signals with different | The question of congestion notification signals with different | |||
semantics to those of ECN in IP is touched on in a couple of specific | semantics to those of ECN in IP is touched on in a couple of specific | |||
cases (e.g. QCN [IEEE802.1Qau]) and with schemes with multiple | cases (e.g. QCN [IEEE802.1Qau]) and with schemes with multiple | |||
severity levels such as PCN [RFC6660]). However, no attempt is made | severity levels such as PCN [RFC6660]). However, no attempt is made | |||
to give guidelines about schemes with different semantics that are | to give guidelines about schemes with different semantics that are | |||
yet to be invented. | yet to be invented. | |||
The semantics of congestion signals can be relative to the traffic | ||||
class. Therefore correct propagation of congestion signals could | ||||
depend on correct propagation of any traffic class field between the | ||||
layers. In this document, correct propagation of traffic class | ||||
information is assumed, while what 'correct' means and how it is | ||||
achieved is covered elsewhere (e.g. [RFC2983]) and is outside the | ||||
scope of the present document. | ||||
Note that these guidelines do not require the subnet wire protocol to | Note that these guidelines do not require the subnet wire protocol to | |||
be changed to accommodate congestion notification. Another way to | be changed to accommodate congestion notification. Another way to | |||
add congestion notification without consuming header space in the | add congestion notification without consuming header space in the | |||
subnet protocol might be to use a parallel control plane protocol. | subnet protocol might be to use a parallel control plane protocol. | |||
This document focuses on the congestion notification interface | This document focuses on the congestion notification interface | |||
between IP and lower layer protocols that can encapsulate IP, where | between IP and lower layer protocols that can encapsulate IP, where | |||
the term 'IP' includes v4 or v6, unicast, multicast or anycast. | the term 'IP' includes v4 or v6, unicast, multicast or anycast. | |||
However, it is likely that the guidelines will also be useful when a | However, it is likely that the guidelines will also be useful when a | |||
lower layer protocol or tunnel encapsulates itself (e.g. Ethernet | lower layer protocol or tunnel encapsulates itself (e.g. Ethernet | |||
MAC in MAC [IEEE802.1Qah]) or when it encapsulates other protocols. | MAC in MAC [IEEE802.1Qah]) or when it encapsulates other protocols. | |||
In the feed-backward mode, propagation of congestion signals for | ||||
multicast and anycast packets is out-of-scope (because it would be so | ||||
complicated that it is hoped no-one would attempt such an | ||||
abomination). | ||||
2. Terminology | 2. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in RFC 2119 [RFC2119]. | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
Further terminology used within this document: | Further terminology used within this document: | |||
Protocol data unit (PDU): Information that is delivered as a unit | Protocol data unit (PDU): Information that is delivered as a unit | |||
skipping to change at page 13, line 36 | skipping to change at page 13, line 36 | |||
o GRE [RFC1701, RFC2784] | o GRE [RFC1701, RFC2784] | |||
o PPTP [RFC2637] | o PPTP [RFC2637] | |||
o GTP [GTPv1, GTPv1-U, GTPv2-C] | o GTP [GTPv1, GTPv1-U, GTPv2-C] | |||
o VXLAN [vxlan]. | o VXLAN [vxlan]. | |||
4.2. Wire Protocol Design: Indication of ECN Support | 4.2. Wire Protocol Design: Indication of ECN Support | |||
This section is intended to guide the redesign of any lower layer | ||||
protocol that encapsulate IP to add native ECN support at the lower | ||||
layer. It reflects the approaches used in [RFC6040] and in | ||||
[RFC5129]. Therefore IP-in-IP tunnels or IP-in-MPLS or MPLS-in-MPLS | ||||
encapsulations that already comply with [RFC6040] or [RFC5129] will | ||||
already satisfy this guidance. | ||||
A lower layer (or subnet) congestion notification system: | A lower layer (or subnet) congestion notification system: | |||
1. SHOULD NOT apply explicit congestion notifications to PDUs that | 1. SHOULD NOT apply explicit congestion notifications to PDUs that | |||
are destined for legacy layer-4 transport implementations that | are destined for legacy layer-4 transport implementations that | |||
will not understand ECN, and | will not understand ECN, and | |||
2. SHOULD NOT apply explicit congestion notifications to PDUs if the | 2. SHOULD NOT apply explicit congestion notifications to PDUs if the | |||
egress of the subnet might not propagate congestion notifications | egress of the subnet might not propagate congestion notifications | |||
onward into the higher layer. | onward into the higher layer. | |||
skipping to change at page 15, line 22 | skipping to change at page 15, line 29 | |||
(PBB). | (PBB). | |||
QCN [IEEE802.1Qau] provides another example of how to indicate to | QCN [IEEE802.1Qau] provides another example of how to indicate to | |||
lower layer devices that the end-points will not understand ECN. An | lower layer devices that the end-points will not understand ECN. An | |||
operator can define certain 802.1p classes of service to indicate | operator can define certain 802.1p classes of service to indicate | |||
non-QCN frames and an ingress bridge is required to map arriving not- | non-QCN frames and an ingress bridge is required to map arriving not- | |||
QCN-capable IP packets to one of these non-QCN 802.1p classes. | QCN-capable IP packets to one of these non-QCN 802.1p classes. | |||
4.3. Encapsulation Guidelines | 4.3. Encapsulation Guidelines | |||
This section is intended to guide the redesign of any node that | ||||
encapsulates IP with a lower layer header when adding native ECN | ||||
support to the lower layer protocol. It reflects the approaches used | ||||
in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or IP-in- | ||||
MPLS or MPLS-in-MPLS encapsulations that already comply with | ||||
[RFC6040] or [RFC5129] will already satisfy this guidance. | ||||
1. Egress Capability Check: A subnet ingress needs to be sure that | 1. Egress Capability Check: A subnet ingress needs to be sure that | |||
the corresponding egress of a subnet will propagate any | the corresponding egress of a subnet will propagate any | |||
congestion notification added to the outer header across the | congestion notification added to the outer header across the | |||
subnet. This is necessary in addition to checking that an | subnet. This is necessary in addition to checking that an | |||
incoming PDU indicates an ECN-capable (L4) transport. Examples | incoming PDU indicates an ECN-capable (L4) transport. Examples | |||
of how this guarantee might be provided include: | of how this guarantee might be provided include: | |||
* by configuration (e.g. if any label switches in a domain | * by configuration (e.g. if any label switches in a domain | |||
support ECN marking, [RFC5129] requires all egress nodes to | support ECN marking, [RFC5129] requires all egress nodes to | |||
have been configured to propagate ECN) | have been configured to propagate ECN) | |||
skipping to change at page 16, line 38 | skipping to change at page 17, line 7 | |||
Most information can be extracted if the Congestion Baseline is | Most information can be extracted if the Congestion Baseline is | |||
standardised at the node that is regulating the load (the Load | standardised at the node that is regulating the load (the Load | |||
Regulator--typically the data source). Then the operator can | Regulator--typically the data source). Then the operator can | |||
measure both congestion since the Load Regulator, and congestion | measure both congestion since the Load Regulator, and congestion | |||
since the subnet ingress. The latter might be measurable by | since the subnet ingress. The latter might be measurable by | |||
subtracting the level of CE markings on inner headers from that | subtracting the level of CE markings on inner headers from that | |||
on outer headers (see Appendix C of [RFC6040]). | on outer headers (see Appendix C of [RFC6040]). | |||
4.4. Decapsulation Guidelines | 4.4. Decapsulation Guidelines | |||
This section is intended to guide the redesign of any node that | ||||
decapsulates IP from within a lower layer header when adding native | ||||
ECN support to the lower layer protocol. It reflects the approaches | ||||
used in [RFC6040] and in [RFC5129]. Therefore IP-in-IP tunnels or | ||||
IP-in-MPLS or MPLS-in-MPLS encapsulations that already comply with | ||||
[RFC6040] or [RFC5129] will already satisfy this guidance. | ||||
A subnet egress SHOULD NOT simply copy congestion notification from | A subnet egress SHOULD NOT simply copy congestion notification from | |||
outer headers to the forwarded header. It SHOULD calculate the | outer headers to the forwarded header. It SHOULD calculate the | |||
outgoing congestion notification field from the inner and outer | outgoing congestion notification field from the inner and outer | |||
headers using the following guidelines. If there is any conflict, | headers using the following guidelines. If there is any conflict, | |||
rules earlier in the list take precedence over rules later in the | rules earlier in the list take precedence over rules later in the | |||
list: | list: | |||
1. If the arriving inner header is a Not-ECN-PDU it implies the L4 | 1. If the arriving inner header is a Not-ECN-PDU it implies the L4 | |||
transport will not understand explicit congestion markings. | transport will not understand explicit congestion markings. | |||
Then: | Then: | |||
skipping to change at page 18, line 12 | skipping to change at page 18, line 31 | |||
currently unused combinations are not precluded from future use | currently unused combinations are not precluded from future use | |||
through new standards actions. | through new standards actions. | |||
4.5. Sequences of Similar Tunnels or Subnets | 4.5. Sequences of Similar Tunnels or Subnets | |||
In some deployments, particularly in 3GPP networks, an IP packet may | In some deployments, particularly in 3GPP networks, an IP packet may | |||
traverse two or more IP-in-IP tunnels in sequence that all use | traverse two or more IP-in-IP tunnels in sequence that all use | |||
identical technology (e.g. GTP). | identical technology (e.g. GTP). | |||
In such cases, it would be sufficient for every encapsulation and | In such cases, it would be sufficient for every encapsulation and | |||
decapsulation in the chain to comply with RFC6040. Alternatively, as | decapsulation in the chain to comply with RFC 6040. Alternatively, | |||
an optimisation, a node that decapsulates a packet and immediately | as an optimisation, a node that decapsulates a packet and immediately | |||
re-encapsulates it for the next tunnel MAY copy the incoming outer | re-encapsulates it for the next tunnel MAY copy the incoming outer | |||
ECN field directly to the outgoing outer and the incoming inner ECN | ECN field directly to the outgoing outer and the incoming inner ECN | |||
field directly to the outgoing inner. Then the overall behavior | field directly to the outgoing inner. Then the overall behavior | |||
across the sequence of tunnel segments would still be consistent with | across the sequence of tunnel segments would still be consistent with | |||
RFC 6040. | RFC 6040. | |||
Appendix C of RFC6040 describes how a tunnel egress can monitor how | Appendix C of RFC6040 describes how a tunnel egress can monitor how | |||
much congestion has been introduced within a tunnel. A network | much congestion has been introduced within a tunnel. A network | |||
operator might want to monitor how much congestion had been | operator might want to monitor how much congestion had been | |||
introduced within a whole sequence of tunnels. Using the technique | introduced within a whole sequence of tunnels. Using the technique | |||
in Appendix C of RFC6040 at the final egress, the operator could | in Appendix C of RFC6040 at the final egress, the operator could | |||
monitor the whole sequence of tunnels, but only if the above | monitor the whole sequence of tunnels, but only if the above | |||
optimisation were used consistently along the sequence of tunnels, in | optimisation were used consistently along the sequence of tunnels, in | |||
order to make it appear as a single tunnel. Therefore, tunnel | order to make it appear as a single tunnel. Therefore, tunnel | |||
endpoint implementations SHOULD allow the operator to configure | endpoint implementations SHOULD allow the operator to configure | |||
whether this optimisation is enabled. | whether this optimisation is enabled. | |||
When ECN support is added to a subnet technology, consideration | When ECN support is added to a subnet technology, consideration | |||
SHOULD be given to a similar optimisation between subnets in sequnce | SHOULD be given to a similar optimisation between subnets in sequence | |||
if they all use the same technology. | if they all use the same technology. | |||
4.6. Reframing and Congestion Markings | 4.6. Reframing and Congestion Markings | |||
The guidance in this section is worded in terms of framing | ||||
boundaries, but it applies equally whether the protocol data units | ||||
are frames, cells or packets. | ||||
Where framing boundaries are different between two layers, congestion | Where framing boundaries are different between two layers, congestion | |||
indications SHOULD be propagated on the basis that a congestion | indications SHOULD be propagated on the basis that a congestion | |||
indication on a PDU applies to all the octets in the PDU. On | indication on a PDU applies to all the octets in the PDU. On | |||
average, an encapsulator or decapsulator SHOULD approximately | average, an encapsulator or decapsulator SHOULD approximately | |||
preserve the number of marked octets arriving and leaving (counting | preserve the number of marked octets arriving and leaving (counting | |||
the size of inner headers, but not added encapsulating headers). | the size of inner headers, but not added encapsulating headers). | |||
The next departing frame SHOULD be immediately marked even if only | The next departing frame SHOULD be immediately marked even if only | |||
enough incoming marked octets have arrived for part of the departing | enough incoming marked octets have arrived for part of the departing | |||
frame. This ensures that any outstanding congestion marked octets | frame. This ensures that any outstanding congestion marked octets | |||
skipping to change at page 19, line 12 | skipping to change at page 19, line 36 | |||
For instance, an algorithm for marking departing frames could | For instance, an algorithm for marking departing frames could | |||
maintain a counter representing the balance of arriving marked octets | maintain a counter representing the balance of arriving marked octets | |||
minus departing marked octets. It adds the size of every marked | minus departing marked octets. It adds the size of every marked | |||
frame that arrives and if the counter is positive it marks the next | frame that arrives and if the counter is positive it marks the next | |||
frame to depart and subtracts its size from the counter. This will | frame to depart and subtracts its size from the counter. This will | |||
often leave a negative remainder in the counter, which is deliberate. | often leave a negative remainder in the counter, which is deliberate. | |||
5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | 5. Feed-Up-and-Forward Mode: Guidelines for Adding Congestion | |||
Notification | Notification | |||
The guidance in this section is primarily applicable to encapsulation | ||||
of IP packets in Ethernet headers. However, it generalises to | ||||
encapsulation by other subnet technologies with no native support for | ||||
explicit congestion notification. It is unlikely to be applicable or | ||||
necessary for IP-in-IP encapsulation, where feed-forward-and-up mode | ||||
based on [RFC6040] would be more appropriate. | ||||
Marking the IP header while switching at layer-2 (by using a layer-3 | Marking the IP header while switching at layer-2 (by using a layer-3 | |||
switch) seems to represent a layering violation. However, it can be | switch) seems to represent a layering violation. However, it can be | |||
considered as a benign optimisation if the guidelines below are | considered as a benign optimisation if the guidelines below are | |||
followed. Feed-up-and-forward is certainly not a general alternative | followed. Feed-up-and-forward is certainly not a general alternative | |||
to implementing feed-forward congestion notification in the lower | to implementing feed-forward congestion notification in the lower | |||
layer, because: | layer, because: | |||
o IPv4 and IPv6 are not the only layer-3 protocols that might be | o IPv4 and IPv6 are not the only layer-3 protocols that might be | |||
encapsulated by lower layer protocols | encapsulated by lower layer protocols | |||
skipping to change at page 20, line 30 | skipping to change at page 21, line 11 | |||
layer congestion notification. Therefore no detailed protocol design | layer congestion notification. Therefore no detailed protocol design | |||
guidelines are appropriate. Nonetheless, a more general guideline is | guidelines are appropriate. Nonetheless, a more general guideline is | |||
appropriate: | appropriate: | |||
1. A subnetwork technology intended to eventually interface to IP | 1. A subnetwork technology intended to eventually interface to IP | |||
SHOULD NOT be designed using only the feed-backward mode, which | SHOULD NOT be designed using only the feed-backward mode, which | |||
is certainly best for a stand-alone subnet, but would need to be | is certainly best for a stand-alone subnet, but would need to be | |||
modified to work efficiently as part of the wider Internet, | modified to work efficiently as part of the wider Internet, | |||
because IP uses feed-forward-and-up mode. | because IP uses feed-forward-and-up mode. | |||
The feed-backward approach does at least work beneath IP, but it can | The feed-backward approach at least works beneath IP, where the term | |||
result in very inefficient and sluggish congestion control--except if | 'works' is used only in a narrow functional sense because feed- | |||
it is confined to the subnet directly connected to the original data | backward can result in very inefficient and sluggish congestion | |||
source, when it is faster than feed-forward. It would be valid to | control--except if it is confined to the subnet directly connected to | |||
design a protocol that could work in feed-backward mode for paths | the original data source, when it is faster than feed-forward. It | |||
that only cross one subnet, and in feed-forward-and-up mode for paths | would be valid to design a protocol that could work in feed-backward | |||
that cross subnets. | mode for paths that only cross one subnet, and in feed-forward-and-up | |||
mode for paths that cross subnets. | ||||
In the early days of TCP/IP, a similar feed-backward approach was | In the early days of TCP/IP, a similar feed-backward approach was | |||
tried for explicit congestion signalling, using source-quench (SQ) | tried for explicit congestion signalling, using source-quench (SQ) | |||
ICMP control packets. However, SQ fell out of favour and is now | ICMP control packets. However, SQ fell out of favour and is now | |||
formally deprecated [RFC6633]. The main problem was that it is hard | formally deprecated [RFC6633]. The main problem was that it is hard | |||
for a data source to tell the difference between a spoofed SQ message | for a data source to tell the difference between a spoofed SQ message | |||
and a quench request from a genuine buffer on the path. It is also | and a quench request from a genuine buffer on the path. It is also | |||
hard for a lower layer buffer to address an SQ message to the | hard for a lower layer buffer to address an SQ message to the | |||
original source port number, which may be buried within many layers | original source port number, which may be buried within many layers | |||
of headers, and possibly encrypted. | of headers, and possibly encrypted. | |||
skipping to change at page 21, line 14 | skipping to change at page 21, line 45 | |||
technology. If a QCN subnet were later connected into a wider IP- | technology. If a QCN subnet were later connected into a wider IP- | |||
based internetwork (e.g. when attempting to interconnect multiple | based internetwork (e.g. when attempting to interconnect multiple | |||
data centres) it would suffer the inefficiency shown Figure 3. | data centres) it would suffer the inefficiency shown Figure 3. | |||
7. IANA Considerations (to be removed by RFC Editor) | 7. IANA Considerations (to be removed by RFC Editor) | |||
This memo includes no request to IANA. | This memo includes no request to IANA. | |||
8. Security Considerations | 8. Security Considerations | |||
{ToDo}` | If a lower layer wire protocol is redesigned to include explicit | |||
congestion signalling in-band in the protocol header, care SHOULD be | ||||
take to ensure that the field used is specified as mutable during | ||||
transit. Otherwise interior nodes signalling congestion would | ||||
invalidate any authentication protocol applied to the lower layer | ||||
header--by altering a header field that had been assumed as | ||||
immutable. | ||||
The redesign of protocols that encapsulate IP in order to propagate | ||||
congestion signals between layers raises potential signal integrity | ||||
concerns. Experimental or proposed approaches exist for assuring the | ||||
end-to-end integrity of in-band congestion signals, e.g.: | ||||
o Congestion exposure (ConEx ) for networks to audit that their | ||||
congestion signals are not being suppressed by other networks or | ||||
by receivers, and for networks to police that senders are | ||||
responding sufficiently to the signals, irrespective of the | ||||
transport protocol used [I-D.ietf-conex-abstract-mech]. | ||||
o The ECN nonce [RFC3540] for a TCP sender to detect whether a | ||||
network or the receiver is suppressing congestion signals. | ||||
o A test with the same goals as the ECN nonce, but without the need | ||||
for the receiver to co-operate with the protocol | ||||
[I-D.moncaster-tcpm-rcv-cheat]. | ||||
Given these end-to-end approaches are already being specified, it | ||||
would make little sense to attempt to design hop-by-hop congestion | ||||
signal integrity into a new lower layer protocol, because end-to-end | ||||
integrity inherently achieves hop-by-hop integrity. | ||||
9. Conclusions | 9. Conclusions | |||
Following the guidance in the document enables ECN support to be | Following the guidance in the document enables ECN support to be | |||
extended to numerous protocols that encapsulate IP (v4 & v6) in a | extended to numerous protocols that encapsulate IP (v4 & v6) in a | |||
consistent way, so that IP continues to fulfil its role as an end-to- | consistent way, so that IP continues to fulfil its role as an end-to- | |||
end interoperability layer. This includes: | end interoperability layer. This includes: | |||
o A wide range of tunnelling protocols with various forms of shim | o A wide range of tunnelling protocols with various forms of shim | |||
header between two IP headers; | header between two IP headers; | |||
skipping to change at page 21, line 50 | skipping to change at page 23, line 15 | |||
10. Acknowledgements | 10. Acknowledgements | |||
Thanks to Gorry Fairhurst for extensive initial reviews. Michael | Thanks to Gorry Fairhurst for extensive initial reviews. Michael | |||
Welzl pointed out that lower layer congestion notification signals | Welzl pointed out that lower layer congestion notification signals | |||
may have different semantics to those in IP. | may have different semantics to those in IP. | |||
Bob Briscoe was part-funded by the European Community under its | Bob Briscoe was part-funded by the European Community under its | |||
Seventh Framework Programme through the Trilogy project (ICT-216372) | Seventh Framework Programme through the Trilogy project (ICT-216372) | |||
for initial drafts and through the Reducing Internet Transport | for initial drafts and through the Reducing Internet Transport | |||
Latency (RITE) project (ICT-317700) subsequently. The views | Latency (RITE) project (ICT-317700) subsequently. The views | |||
expressed here are solely those of the author. | expressed here are solely those of the authors. | |||
11. Comments Solicited | 11. Comments Solicited | |||
Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
addressed to the IETF Transport Area working group mailing list | addressed to the IETF Transport Area working group mailing list | |||
<tsvwg@ietf.org>, and/or to the authors. | <tsvwg@ietf.org>, and/or to the authors. | |||
12. References | 12. References | |||
12.1. Normative References | 12.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to | [RFC2119] Bradner, S., "Key words for use in | |||
Indicate Requirement Levels", BCP 14, | RFCs to Indicate Requirement Levels", | |||
RFC 2119, March 1997. | BCP 14, RFC 2119, March 1997. | |||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, | [RFC3168] Ramakrishnan, K., Floyd, S., and D. | |||
"The Addition of Explicit Congestion | Black, "The Addition of Explicit | |||
Notification (ECN) to IP", RFC 3168, | Congestion Notification (ECN) to IP", | |||
September 2001. | RFC 3168, September 2001. | |||
[RFC3819] Karn, P., Bormann, C., Fairhurst, G., | [RFC3819] Karn, P., Bormann, C., Fairhurst, G., | |||
Grossman, D., Ludwig, R., Mahdavi, J., | Grossman, D., Ludwig, R., Mahdavi, | |||
Montenegro, G., Touch, J., and L. Wood, | J., Montenegro, G., Touch, J., and L. | |||
"Advice for Internet Subnetwork Designers", | Wood, "Advice for Internet Subnetwork | |||
BCP 89, RFC 3819, July 2004. | Designers", BCP 89, RFC 3819, | |||
July 2004. | ||||
[RFC4774] Floyd, S., "Specifying Alternate Semantics | [RFC4774] Floyd, S., "Specifying Alternate | |||
for the Explicit Congestion Notification | Semantics for the Explicit Congestion | |||
(ECN) Field", BCP 124, RFC 4774, | Notification (ECN) Field", BCP 124, | |||
November 2006. | RFC 4774, November 2006. | |||
[RFC5129] Davie, B., Briscoe, B., and J. Tay, | ||||
"Explicit Congestion Marking in | ||||
MPLS", RFC 5129, January 2008. | ||||
[RFC6040] Briscoe, B., "Tunnelling of Explicit | ||||
Congestion Notification", RFC 6040, | ||||
November 2010. | ||||
12.2. Informative References | 12.2. Informative References | |||
[ATM-TM-ABR] Cisco, "Understanding the Available Bit Rate | [ATM-TM-ABR] Cisco, "Understanding the Available | |||
(ABR) Service Category for ATM VCs", Design | Bit Rate (ABR) Service Category for | |||
Technote 10415, June 2005. | ATM VCs", Design Technote 10415, | |||
June 2005. | ||||
[Buck00] Buckwalter, J., "Frame Relay: Technology and | [Buck00] Buckwalter, J., "Frame Relay: | |||
Practice", Pub. Addison Wesley ISBN-13: 978- | Technology and Practice", Pub. | |||
Addison Wesley ISBN-13: 978- | ||||
0201485240, 2000. | 0201485240, 2000. | |||
[DCTCP] Alizadeh, M., Greenberg, A., Maltz, D., | [DCTCP] Alizadeh, M., Greenberg, A., Maltz, | |||
Padhye, J., Patel, P., Prabhakar, B., | D., Padhye, J., Patel, P., Prabhakar, | |||
Sengupta, S., and M. Sridharan, "Data Center | B., Sengupta, S., and M. Sridharan, | |||
TCP (DCTCP)", ACM SIGCOMM CCR 40(4)63--74, | "Data Center TCP (DCTCP)", ACM | |||
SIGCOMM CCR 40(4)63--74, | ||||
October 2010, <http://portal.acm.org/ | October 2010, <http://portal.acm.org/ | |||
citation.cfm?id=1851192>. | citation.cfm?id=1851192>. | |||
[GTPv1] 3GPP, "GPRS Tunnelling Protocol (GTP) across | [GTPv1] 3GPP, "GPRS Tunnelling Protocol (GTP) | |||
the Gn and Gp interface", Technical | across the Gn and Gp interface", | |||
Specification TS 29.060. | Technical Specification TS 29.060. | |||
[GTPv1-U] 3GPP, "General Packet Radio System (GPRS) | [GTPv1-U] 3GPP, "General Packet Radio System | |||
Tunnelling Protocol User Plane (GTPv1-U)", | (GPRS) Tunnelling Protocol User Plane | |||
Technical Specification TS 29.281. | (GTPv1-U)", Technical | |||
Specification TS 29.281. | ||||
[GTPv2-C] 3GPP, "Evolved General Packet Radio Service | [GTPv2-C] 3GPP, "Evolved General Packet Radio | |||
(GPRS) Tunnelling Protocol for Control plane | Service (GPRS) Tunnelling Protocol | |||
(GTPv2-C)", Technical Specification TS | for Control plane (GTPv2-C)", | |||
29.274. | Technical Specification TS 29.274. | |||
[I-D.ietf-conex-abstract-mech] Mathis, M. and B. Briscoe, | ||||
"Congestion Exposure (ConEx) Concepts | ||||
and Abstract Mechanism", | ||||
draft-ietf-conex-abstract-mech-07 | ||||
(work in progress), July 2013. | ||||
[I-D.moncaster-tcpm-rcv-cheat] Moncaster, T., "A TCP Test to Allow | ||||
Senders to Identify Receiver Non- | ||||
Compliance", | ||||
draft-moncaster-tcpm-rcv-cheat-01 | ||||
(work in progress), June 2007. | ||||
[IEEE802.1Qah] IEEE, "IEEE Standard for Local and | [IEEE802.1Qah] IEEE, "IEEE Standard for Local and | |||
Metropolitan Area Networks--Virtual Bridged | Metropolitan Area Networks--Virtual | |||
Local Area Networks--Amendment 6: Provider | Bridged Local Area Networks-- | |||
Backbone Bridges", IEEE Std 802.1Qah-2008, | Amendment 6: Provider Backbone | |||
August 2008, <http://www.ieee802.org/1/ | Bridges", IEEE Std 802.1Qah-2008, | |||
pages/802.1ah.html>. | August 2008, <http://www.ieee802.org/ | |||
1/pages/802.1ah.html>. | ||||
(Access Controlled link within page) | (Access Controlled link within page) | |||
[IEEE802.1Qau] Finn, N., Ed., "IEEE Standard for Local and | [IEEE802.1Qau] Finn, N., Ed., "IEEE Standard for | |||
Metropolitan Area Networks--Virtual Bridged | Local and Metropolitan Area | |||
Local Area Networks - Amendment 13: | Networks--Virtual Bridged Local Area | |||
Congestion Notification", IEEE Std 802.1Qau- | Networks - Amendment 13: Congestion | |||
Notification", IEEE Std 802.1Qau- | ||||
2010, March 2010, <http:// | 2010, March 2010, <http:// | |||
ieeexplore.ieee.org/xpl/ | ieeexplore.ieee.org/xpl/ | |||
mostRecentIssue.jsp?punumber=5454061>. | mostRecentIssue.jsp?punumber=5454061> | |||
. | ||||
(Access Controlled link within page) | (Access Controlled link within page) | |||
[ITU-T.I.371] ITU-T, "Traffic Control and Congestion | [ITU-T.I.371] ITU-T, "Traffic Control and | |||
Control in B-ISDN", ITU-T Rec. I.371 | Congestion Control in B-ISDN", ITU-T | |||
(03/04), March 2004. | Rec. I.371 (03/04), March 2004. | |||
[RFC1323] Jacobson, V., Braden, B., and D. Borman, | [RFC1323] Jacobson, V., Braden, B., and D. | |||
"TCP Extensions for High Performance", | Borman, "TCP Extensions for High | |||
RFC 1323, May 1992. | Performance", RFC 1323, May 1992. | |||
[RFC1701] Hanks, S., Li, T., Farinacci, D., and P. | [RFC1701] Hanks, S., Li, T., Farinacci, D., and | |||
Traina, "Generic Routing Encapsulation | P. Traina, "Generic Routing | |||
(GRE)", RFC 1701, October 1994. | Encapsulation (GRE)", RFC 1701, | |||
October 1994. | ||||
[RFC2003] Perkins, C., "IP Encapsulation within IP", | [RFC2003] Perkins, C., "IP Encapsulation within | |||
RFC 2003, October 1996. | IP", RFC 2003, October 1996. | |||
[RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, | [RFC2637] Hamzeh, K., Pall, G., Verthein, W., | |||
J., Little, W., and G. Zorn, "Point-to-Point | Taarud, J., Little, W., and G. Zorn, | |||
Tunneling Protocol", RFC 2637, July 1999. | "Point-to-Point Tunneling Protocol", | |||
RFC 2637, July 1999. | ||||
[RFC2661] Townsley, W., Valencia, A., Rubens, A., | [RFC2661] Townsley, W., Valencia, A., Rubens, | |||
Pall, G., Zorn, G., and B. Palter, "Layer | A., Pall, G., Zorn, G., and B. | |||
Two Tunneling Protocol "L2TP"", RFC 2661, | Palter, "Layer Two Tunneling Protocol | |||
August 1999. | "L2TP"", RFC 2661, August 1999. | |||
[RFC2784] Farinacci, D., Li, T., Hanks, S., Meyer, D., | [RFC2784] Farinacci, D., Li, T., Hanks, S., | |||
and P. Traina, "Generic Routing | Meyer, D., and P. Traina, "Generic | |||
Encapsulation (GRE)", RFC 2784, March 2000. | Routing Encapsulation (GRE)", | |||
RFC 2784, March 2000. | ||||
[RFC2884] Hadi Salim, J. and U. Ahmed, "Performance | [RFC2884] Hadi Salim, J. and U. Ahmed, | |||
Evaluation of Explicit Congestion | "Performance Evaluation of Explicit | |||
Notification (ECN) in IP Networks", | Congestion Notification (ECN) in IP | |||
RFC 2884, July 2000. | Networks", RFC 2884, July 2000. | |||
[RFC4301] Kent, S. and K. Seo, "Security Architecture | [RFC2983] Black, D., "Differentiated Services | |||
for the Internet Protocol", RFC 4301, | and Tunnels", RFC 2983, October 2000. | |||
December 2005. | ||||
[RFC5129] Davie, B., Briscoe, B., and J. Tay, | [RFC3540] Spring, N., Wetherall, D., and D. | |||
"Explicit Congestion Marking in MPLS", | Ely, "Robust Explicit Congestion | |||
RFC 5129, January 2008. | Notification (ECN) Signaling with | |||
Nonces", RFC 3540, June 2003. | ||||
[RFC6040] Briscoe, B., "Tunnelling of Explicit | [RFC4301] Kent, S. and K. Seo, "Security | |||
Congestion Notification", RFC 6040, | Architecture for the Internet | |||
November 2010. | Protocol", RFC 4301, December 2005. | |||
[RFC6633] Gont, F., "Deprecation of ICMP Source Quench | [RFC6633] Gont, F., "Deprecation of ICMP Source | |||
Messages", RFC 6633, May 2012. | Quench Messages", RFC 6633, May 2012. | |||
[RFC6660] Briscoe, B., Moncaster, T., and M. Menth, | [RFC6660] Briscoe, B., Moncaster, T., and M. | |||
"Encoding Three Pre-Congestion Notification | Menth, "Encoding Three Pre-Congestion | |||
(PCN) States in the IP Header Using a Single | Notification (PCN) States in the IP | |||
Diffserv Codepoint (DSCP)", RFC 6660, | Header Using a Single Diffserv | |||
Codepoint (DSCP)", RFC 6660, | ||||
July 2012. | July 2012. | |||
[trill-rbridge-options] Eastlake, D., Ghanwani, A., Manral, V., and | [trill-rbridge-options] Eastlake, D., Ghanwani, A., Manral, | |||
C. Bestler, "RBridges: Further TRILL Header | V., and C. Bestler, "RBridges: | |||
Extensions", | Further TRILL Header Extensions", | |||
draft-ietf-trill-rbridge-options-07 (work in | draft-ietf-trill-rbridge-options-07 | |||
progress), June 2012. | (work in progress), June 2012. | |||
[vxlan] Mahalingam, M., Dutt, D., Duda, K., Agarwal, | [vxlan] Mahalingam, M., Dutt, D., Duda, K., | |||
P., Kreeger, L., Sridhar, T., Bursell, M., | Agarwal, P., Kreeger, L., Sridhar, | |||
and C. Wright, "VXLAN: A Framework for | T., Bursell, M., and C. Wright, | |||
Overlaying Virtualized Layer 2 Networks over | "VXLAN: A Framework for Overlaying | |||
Virtualized Layer 2 Networks over | ||||
Layer 3 Networks", | Layer 3 Networks", | |||
draft-mahalingam-dutt-dcops-vxlan-03 (work | draft-mahalingam-dutt-dcops-vxlan-04 | |||
in progress), February 2013. | (work in progress), May 2013. | |||
Appendix A. Outstanding Document Issues | Appendix A. Outstanding Document Issues | |||
1. [GF] Concern that certain guidelines warrant a MUST (NOT) rather | 1. [GF] Concern that certain guidelines warrant a MUST (NOT) rather | |||
than a SHOULD (NOT). Given the guidelines say that if any SHOULD | than a SHOULD (NOT). Given the guidelines say that if any SHOULD | |||
(NOT)s are not followed, a strong justification will be needed, | (NOT)s are not followed, a strong justification will be needed, | |||
they have been left as SHOULD (NOT) pending further list | they have been left as SHOULD (NOT) pending further list | |||
discussion. In particular: | discussion. In particular: | |||
* If inner is a Not-ECN-PDU and Outer is CE (or highest severity | * If inner is a Not-ECN-PDU and Outer is CE (or highest severity | |||
congestion level), MUST (not SHOULD) drop? | congestion level), MUST (not SHOULD) drop? | |||
2. [GF] Impact of Diffserv on alternate marking schemes (referring | 2. Consider whether an IETF Standard Track doc will be needed to | |||
to RFC3168, RFC4774 & RFC2983) | ||||
3. Consider whether an IETF Standard Track doc will be needed to | ||||
Update the IP-in-IP protocols listed in Section 4.1--at least | Update the IP-in-IP protocols listed in Section 4.1--at least | |||
those that the IETF controls--and which Area it should sit under. | those that the IETF controls--and which Area it should sit under. | |||
4. Guidelines referring to subnet technologies should also refer to | Appendix B. Changes in This Version (to be removed by RFC Editor) | |||
tunnels and vice versa. | ||||
5. Check that guidelines allow for multicast as well as unicast. | From briscoe-02 to 03: | |||
6. Security Considerations | * Scope section: | |||
Appendix B. Changes in This Version (to be removed by RFC Editor) | + Added dependence on correct propagation of traffic class | |||
information | ||||
+ For the feed-backward mode, deemed multicast and anycast out | ||||
of scope | ||||
* Ensured all guidelines referring to subnet technologies also | ||||
refer to tunnels and vice versa by adding applicability | ||||
sentences at the start of sections 4.1, 4.2, 4.3, 4.4, 4.6 and | ||||
5. | ||||
* Added Security Considerations on ensuring congestion signal | ||||
fields are classed as immutable and on using end-to-end | ||||
congestion signal integrity technologies rather than hop-by- | ||||
hop. | ||||
From briscoe-01 to 02: | From briscoe-01 to 02: | |||
* Added authors: JK & PT | * Added authors: JK & PT | |||
* Added | * Added | |||
+ Section 4.1 "IP-in-IP Tunnels with Tightly Coupled Shim | + Section 4.1 "IP-in-IP Tunnels with Tightly Coupled Shim | |||
Headers" | Headers" | |||
End of changes. 56 change blocks. | ||||
134 lines changed or deleted | 251 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |