draft-ietf-tsvwg-ecn-tunnel-04.txt | draft-ietf-tsvwg-ecn-tunnel-05.txt | |||
---|---|---|---|---|
Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
Internet-Draft BT | Internet-Draft BT | |||
Updates: 3168, 4301 October 24, 2009 | Updates: 3168, 4301 December 18, 2009 | |||
(if approved) | (if approved) | |||
Intended status: Standards Track | Intended status: Standards Track | |||
Expires: April 27, 2010 | Expires: June 21, 2010 | |||
Tunnelling of Explicit Congestion Notification | Tunnelling of Explicit Congestion Notification | |||
draft-ietf-tsvwg-ecn-tunnel-04 | draft-ietf-tsvwg-ecn-tunnel-05 | |||
Abstract | ||||
This document redefines how the explicit congestion notification | ||||
(ECN) field of the IP header should be constructed on entry to and | ||||
exit from any IP in IP tunnel. On encapsulation it updates RFC3168 | ||||
to bring all IP in IP tunnels (v4 or v6) into line with RFC4301 IPsec | ||||
ECN processing. On decapsulation it updates both RFC3168 and RFC4301 | ||||
to add new behaviours for previously unused combinations of inner and | ||||
outer header. The new rules ensure the ECN field is correctly | ||||
propagated across a tunnel whether it is used to signal one or two | ||||
severity levels of congestion, whereas before only one severity level | ||||
was supported. Tunnel endpoints can be updated in any order without | ||||
affecting pre-existing uses of the ECN field (backward compatible). | ||||
Nonetheless, operators wanting to support two severity levels (e.g. | ||||
for pre-congestion notification--PCN) can require compliance with | ||||
this new specification. A thorough analysis of the reasoning for | ||||
these changes and the implications is included. In the unlikely | ||||
event that the new rules do not meet a specific need, RFC4774 gives | ||||
guidance on designing alternate ECN semantics and this document | ||||
extends that to include tunnelling issues. | ||||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted to IETF in full conformance with the | This Internet-Draft is submitted to IETF in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
Drafts. | Drafts. | |||
skipping to change at page 1, line 34 | skipping to change at page 2, line 9 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on April 27, 2010. | This Internet-Draft will expire on June 21, 2010. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2009 IETF Trust and the persons identified as the | Copyright (c) 2009 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents in effect on the date of | Provisions Relating to IETF Documents | |||
publication of this document (http://trustee.ietf.org/license-info). | (http://trustee.ietf.org/license-info) in effect on the date of | |||
Please review these documents carefully, as they describe your rights | publication of this document. Please review these documents | |||
and restrictions with respect to this document. | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | ||||
Abstract | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | ||||
This document redefines how the explicit congestion notification | described in the BSD License. | |||
(ECN) field of the IP header should be constructed on entry to and | ||||
exit from any IP in IP tunnel. On encapsulation it updates RFC3168 | ||||
to bring all IP in IP tunnels (v4 or v6) into line with RFC4301 IPsec | ||||
ECN processing. On decapsulation it updates both RFC3168 and RFC4301 | ||||
to add new behaviours for previously unused combinations of inner and | ||||
outer header. The new rules propagate the ECN field whether it is | ||||
used to signal one or two severity levels of congestion, whereas | ||||
before they propagated only one. Tunnel endpoints can be updated in | ||||
any order without affecting pre-existing uses of the ECN field | ||||
(backward compatible). Nonetheless, operators wanting to support two | ||||
severity levels (e.g. for pre-congestion notification--PCN) can | ||||
require compliance with this new specification. A thorough analysis | ||||
of the reasoning for these changes and the implications is included. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | 1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 10 | |||
3. Summary of Pre-Existing RFCs . . . . . . . . . . . . . . . . . 11 | 3. Summary of Pre-Existing RFCs . . . . . . . . . . . . . . . . . 11 | |||
3.1. Encapsulation at Tunnel Ingress . . . . . . . . . . . . . 11 | 3.1. Encapsulation at Tunnel Ingress . . . . . . . . . . . . . 12 | |||
3.2. Decapsulation at Tunnel Egress . . . . . . . . . . . . . . 12 | 3.2. Decapsulation at Tunnel Egress . . . . . . . . . . . . . . 13 | |||
4. New ECN Tunnelling Rules . . . . . . . . . . . . . . . . . . . 13 | 4. New ECN Tunnelling Rules . . . . . . . . . . . . . . . . . . . 14 | |||
4.1. Default Tunnel Ingress Behaviour . . . . . . . . . . . . . 14 | 4.1. Default Tunnel Ingress Behaviour . . . . . . . . . . . . . 14 | |||
4.2. Default Tunnel Egress Behaviour . . . . . . . . . . . . . 14 | 4.2. Default Tunnel Egress Behaviour . . . . . . . . . . . . . 15 | |||
4.3. Encapsulation Modes . . . . . . . . . . . . . . . . . . . 16 | 4.3. Encapsulation Modes . . . . . . . . . . . . . . . . . . . 17 | |||
4.4. Single Mode of Decapsulation . . . . . . . . . . . . . . . 18 | 4.4. Single Mode of Decapsulation . . . . . . . . . . . . . . . 18 | |||
5. Updates to Earlier RFCs . . . . . . . . . . . . . . . . . . . 18 | 5. Updates to Earlier RFCs . . . . . . . . . . . . . . . . . . . 19 | |||
5.1. Changes to RFC4301 ECN processing . . . . . . . . . . . . 18 | 5.1. Changes to RFC4301 ECN processing . . . . . . . . . . . . 19 | |||
5.2. Changes to RFC3168 ECN processing . . . . . . . . . . . . 19 | 5.2. Changes to RFC3168 ECN processing . . . . . . . . . . . . 20 | |||
5.3. Motivation for Changes . . . . . . . . . . . . . . . . . . 20 | 5.3. Motivation for Changes . . . . . . . . . . . . . . . . . . 20 | |||
5.3.1. Motivation for Changing Encapsulation . . . . . . . . 20 | 5.3.1. Motivation for Changing Encapsulation . . . . . . . . 21 | |||
5.3.2. Motivation for Changing Decapsulation . . . . . . . . 21 | 5.3.2. Motivation for Changing Decapsulation . . . . . . . . 22 | |||
6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 23 | 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 24 | |||
6.1. Non-Issues Updating Decapsulation . . . . . . . . . . . . 23 | 6.1. Non-Issues Updating Decapsulation . . . . . . . . . . . . 24 | |||
6.2. Non-Update of RFC4301 IPsec Encapsulation . . . . . . . . 24 | 6.2. Non-Update of RFC4301 IPsec Encapsulation . . . . . . . . 25 | |||
6.3. Update to RFC3168 Encapsulation . . . . . . . . . . . . . 24 | 6.3. Update to RFC3168 Encapsulation . . . . . . . . . . . . . 25 | |||
7. Design Principles for Future Non-Default Schemes . . . . . . . 25 | 7. Design Principles for Alternate ECN Tunnelling Semantics . . . 26 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 26 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 28 | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 26 | 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 29 | |||
10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 28 | 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 30 | |||
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 28 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 30 | |||
12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 29 | 11.1. Normative References . . . . . . . . . . . . . . . . . . . 30 | |||
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 | 11.2. Informative References . . . . . . . . . . . . . . . . . . 31 | |||
13.1. Normative References . . . . . . . . . . . . . . . . . . . 29 | Appendix A. Early ECN Tunnelling RFCs . . . . . . . . . . . . . . 33 | |||
13.2. Informative References . . . . . . . . . . . . . . . . . . 29 | Appendix B. Design Constraints . . . . . . . . . . . . . . . . . 33 | |||
Appendix A. Early ECN Tunnelling RFCs . . . . . . . . . . . . . . 31 | B.1. Security Constraints . . . . . . . . . . . . . . . . . . . 33 | |||
Appendix B. Design Constraints . . . . . . . . . . . . . . . . . 32 | B.2. Control Constraints . . . . . . . . . . . . . . . . . . . 35 | |||
B.1. Security Constraints . . . . . . . . . . . . . . . . . . . 32 | B.3. Management Constraints . . . . . . . . . . . . . . . . . . 37 | |||
B.2. Control Constraints . . . . . . . . . . . . . . . . . . . 34 | Appendix C. Contribution to Congestion across a Tunnel . . . . . 37 | |||
B.3. Management Constraints . . . . . . . . . . . . . . . . . . 35 | Appendix D. Why Losing ECT(1) on Decapsulation Impedes PCN . . . 38 | |||
Appendix C. Contribution to Congestion across a Tunnel . . . . . 36 | Appendix E. Why Resetting ECN on Encapsulation Impedes PCN . . . 39 | |||
Appendix D. Why Losing ECT(1) on Decapsulation Impedes PCN . . . 37 | ||||
Appendix E. Why Resetting ECN on Encapsulation Impedes PCN . . . 38 | ||||
Appendix F. Compromise on Decap with ECT(1) Inner and ECT(0) | Appendix F. Compromise on Decap with ECT(1) Inner and ECT(0) | |||
Outer . . . . . . . . . . . . . . . . . . . . . . . . 39 | Outer . . . . . . . . . . . . . . . . . . . . . . . . 40 | |||
Appendix G. Open Issues . . . . . . . . . . . . . . . . . . . . . 40 | Appendix G. Open Issues . . . . . . . . . . . . . . . . . . . . . 41 | |||
Request to the RFC Editor (to be removed on publication): | Request to the RFC Editor (to be removed on publication): | |||
In the RFC index, RFC3168 should be identified as an update to | In the RFC index, RFC3168 should be identified as an update to | |||
RFC2003. RFC4301 should be identified as an update to RFC3168. | RFC2003. RFC4301 should be identified as an update to RFC3168. | |||
Changes from previous drafts (to be removed by the RFC Editor) | Changes from previous drafts (to be removed by the RFC Editor) | |||
Full text differences between IETF draft versions are available at | Full text differences between IETF draft versions are available at | |||
<http://tools.ietf.org/wg/tsvwg/draft-ietf-tsvwg-ecn-tunnel/>, and | <http://tools.ietf.org/wg/tsvwg/draft-ietf-tsvwg-ecn-tunnel/>, and | |||
between earlier individual draft versions at | between earlier individual draft versions at | |||
<http://www.briscoe.net/pubs.html#ecn-tunnel> | <http://www.briscoe.net/pubs.html#ecn-tunnel> | |||
From ietf-03 to ietf-04 (current): | From ietf-04 to ietf-05 (current): | |||
* Functional changes: | ||||
+ Section 4.2: ECT(1) outer with Not-ECT inner: reverted to | ||||
forwarding as Not-ECT (as in RFC3168 & RFC4301), rather than | ||||
dropping. | ||||
+ Altered rationale in bullet 3 of Section 5.3.2 to justify | ||||
this. | ||||
+ Distinguished alarms for dangerous and invalid combinations | ||||
and allowed combinations that are valid in some tunnel | ||||
configurations but dangerous in others to be alarmed at the | ||||
discretion of the implementer and/or operator. | ||||
+ Altered advice on designing alternate ECN tunnelling | ||||
semantics to reflect the above changes. | ||||
* Textual changes: | ||||
+ Changed "Future non-default schemes" to "Alternate ECN | ||||
Tunnelling Semantics" throughout. | ||||
+ Cut down Appendix D and Appendix E for brevity. | ||||
+ A number of clarifying edits & updated refs. | ||||
From ietf-03 to ietf-04: | ||||
* Functional changes: none | * Functional changes: none | |||
* Structural changes: | * Structural changes: | |||
+ Added "Open Issues" appendix | + Added "Open Issues" appendix | |||
* Textual changes: | * Textual changes: | |||
+ Section title: "Changes from Earlier RFCs" -> "Updates to | + Section title: "Changes from Earlier RFCs" -> "Updates to | |||
skipping to change at page 7, line 47 | skipping to change at page 8, line 27 | |||
Roadmap), added new Introductory subsection on "Scope" and | Roadmap), added new Introductory subsection on "Scope" and | |||
improved clarity; | improved clarity; | |||
* Added Design Guidelines for New Encapsulations of Congestion | * Added Design Guidelines for New Encapsulations of Congestion | |||
Notification; | Notification; | |||
* Considerably clarified the Backward Compatibility section | * Considerably clarified the Backward Compatibility section | |||
(Section 6); | (Section 6); | |||
* Considerably extended the Security Considerations section | * Considerably extended the Security Considerations section | |||
(Section 9); | (Section 8); | |||
* Summarised the primary rationale much better in the | * Summarised the primary rationale much better in the | |||
conclusions; | conclusions; | |||
* Added numerous extra acknowledgements; | * Added numerous extra acknowledgements; | |||
* Added Appendix E. "Why resetting CE on encapsulation harms | * Added Appendix E. "Why resetting CE on encapsulation harms | |||
PCN", Appendix C. "Contribution to Congestion across a Tunnel" | PCN", Appendix C. "Contribution to Congestion across a Tunnel" | |||
and Appendix D. "Ideal Decapsulation Rules"; | and Appendix D. "Ideal Decapsulation Rules"; | |||
skipping to change at page 9, line 18 | skipping to change at page 9, line 45 | |||
the ECN field, and the other which blocked all propagation of ECN | the ECN field, and the other which blocked all propagation of ECN | |||
changes. | changes. | |||
Unfortunately, this entirely reasonable sequence of standards actions | Unfortunately, this entirely reasonable sequence of standards actions | |||
resulted in a perverse outcome; non-IPsec tunnels (RFC3168) blocked | resulted in a perverse outcome; non-IPsec tunnels (RFC3168) blocked | |||
the 2-bit covert channel, while IPsec tunnels (RFC4301) did not--at | the 2-bit covert channel, while IPsec tunnels (RFC4301) did not--at | |||
least not at the ingress. At the egress, both IPsec and non-IPsec | least not at the ingress. At the egress, both IPsec and non-IPsec | |||
tunnels still partially restricted propagation of the full ECN field. | tunnels still partially restricted propagation of the full ECN field. | |||
The trigger for the changes in this document was the introduction of | The trigger for the changes in this document was the introduction of | |||
pre-congestion notification (PCN [I-D.ietf-pcn-marking-behaviour]) to | pre-congestion notification (PCN [RFC5670]) to the IETF standards | |||
the IETF standards track. PCN needs the ECN field to be copied at a | track. PCN needs the ECN field to be copied at a tunnel ingress and | |||
tunnel ingress and it needs four states of congestion signalling to | it needs four states of congestion signalling to be propagated at the | |||
be propagated at the egress, but pre-existing tunnels only propagate | egress, but pre-existing tunnels only propagate three in the ECN | |||
three in the ECN field. | field. | |||
This document draws on currently unused (CU) combinations of inner | This document draws on currently unused (CU) combinations of inner | |||
and outer headers to add tunnelling of four-state congestion | and outer headers to add tunnelling of four-state congestion | |||
signalling to RFC3168 and RFC4301. Operators of tunnels who | signalling to RFC3168 and RFC4301. Operators of tunnels who | |||
specifically want to support four states can require that all their | specifically want to support four states can require that all their | |||
tunnels comply with this specification. Nonetheless, all tunnel | tunnels comply with this specification. Nonetheless, all tunnel | |||
endpoint implementations (RFC4301, RFC3168, RFC2481, RFC2401, | endpoint implementations (RFC4301, RFC3168, RFC2481, RFC2401, | |||
RFC2003) can safely be updated to this new specification as part of | RFC2003) can safely be updated to this new specification as part of | |||
general code maintenance. This will gradually add support for four | general code maintenance. This will gradually add support for four | |||
congestion states to the Internet. Existing three state schemes will | congestion states to the Internet. Existing three state schemes will | |||
skipping to change at page 11, line 28 | skipping to change at page 12, line 4 | |||
Resetting ECN: On encapsulation, setting the ECN field of the new | Resetting ECN: On encapsulation, setting the ECN field of the new | |||
outer header to be a copy of the ECN field in the incoming header | outer header to be a copy of the ECN field in the incoming header | |||
except the outer ECN field is set to the ECT(0) codepoint if the | except the outer ECN field is set to the ECT(0) codepoint if the | |||
incoming ECN field is CE ("11"). | incoming ECN field is CE ("11"). | |||
3. Summary of Pre-Existing RFCs | 3. Summary of Pre-Existing RFCs | |||
This section is informative not normative, as it recaps pre-existing | This section is informative not normative, as it recaps pre-existing | |||
RFCs. Earlier relevant RFCs that were either experimental or | RFCs. Earlier relevant RFCs that were either experimental or | |||
incomplete with respect to ECN tunnelling (RFC2481, RFC2401 and | incomplete with respect to ECN tunnelling (RFC2481, RFC2401 and | |||
RFC2003) are briefly outlined inAppendix A. The question of whether | RFC2003) are briefly outlined in Appendix A. The question of whether | |||
tunnel implementations used in the Internet comply with any of these | tunnel implementations used in the Internet comply with any of these | |||
RFCs is not discussed. | RFCs is not discussed. | |||
3.1. Encapsulation at Tunnel Ingress | 3.1. Encapsulation at Tunnel Ingress | |||
At the encapsulator, the controversy has been over whether to | At the encapsulator, the controversy has been over whether to | |||
propagate information about congestion experienced on the path so far | propagate information about congestion experienced on the path so far | |||
into the outer header of the tunnel. | into the outer header of the tunnel. | |||
Specifically, RFC3168 says that, if a tunnel fully supports ECN | Specifically, RFC3168 says that, if a tunnel fully supports ECN | |||
skipping to change at page 13, line 45 | skipping to change at page 14, line 18 | |||
Inappropriate changes were not specifically enumerated. RFC4301 did | Inappropriate changes were not specifically enumerated. RFC4301 did | |||
not mention inappropriate ECN changes. | not mention inappropriate ECN changes. | |||
4. New ECN Tunnelling Rules | 4. New ECN Tunnelling Rules | |||
The standards actions below in Section 4.1 (ingress encapsulation) | The standards actions below in Section 4.1 (ingress encapsulation) | |||
and Section 4.2 (egress decapsulation) define new default ECN tunnel | and Section 4.2 (egress decapsulation) define new default ECN tunnel | |||
processing rules for any IP packet (v4 or v6) with any Diffserv | processing rules for any IP packet (v4 or v6) with any Diffserv | |||
codepoint. | codepoint. | |||
If absolutely necessary, an alternate congestion encapsulation | If unavoidable, an alternate congestion encapsulation behaviour can | |||
behaviour can be introduced as part of the definition of an alternate | be introduced as part of the definition of an alternate congestion | |||
congestion marking scheme used by a specific Diffserv PHB (see S.5 of | marking scheme used by a specific Diffserv PHB (see S.5 of [RFC3168] | |||
[RFC3168] and [RFC4774]). When designing such new encapsulation | and [RFC4774]). When designing such new encapsulation schemes, the | |||
schemes, the principles in Section 7 should be followed. However, | principles in Section 7 should be followed. However, alternate ECN | |||
alternate ECN tunnelling schemes are NOT RECOMMENDED as the | tunnelling schemes are NOT RECOMMENDED as the deployment burden of | |||
deployment burden of handling exceptional PHBs in implementations of | handling exceptional PHBs in implementations of all affected tunnels | |||
all affected tunnels should not be underestimated. There is no | should not be underestimated. There is no requirement for a PHB | |||
requirement for a PHB definition to state anything about ECN | definition to state anything about ECN tunnelling behaviour if the | |||
tunnelling behaviour if the default behaviour in the present | default behaviour in the present specification is sufficient. | |||
specification is sufficient. | ||||
4.1. Default Tunnel Ingress Behaviour | 4.1. Default Tunnel Ingress Behaviour | |||
Two modes of encapsulation are defined here; `normal mode' and | Two modes of encapsulation are defined here; `normal mode' and | |||
`compatibility mode', which is for backward compatibility with tunnel | `compatibility mode', which is for backward compatibility with tunnel | |||
decapsulators that do not understand ECN. Section 4.3 explains why | decapsulators that do not understand ECN. Section 4.3 explains why | |||
two modes are necessary and specifies the circumstances in which it | two modes are necessary and specifies the circumstances in which it | |||
is sufficient to solely implement normal mode. Note that these are | is sufficient to solely implement normal mode. Note that these are | |||
modes of the ingress tunnel endpoint only, not the whole tunnel. | modes of the ingress tunnel endpoint only, not the whole tunnel. | |||
skipping to change at page 15, line 11 | skipping to change at page 15, line 38 | |||
intersection of the appropriate incoming inner header (row) and outer | intersection of the appropriate incoming inner header (row) and outer | |||
header (column) in Figure 4 (the IPv4 header checksum also changes | header (column) in Figure 4 (the IPv4 header checksum also changes | |||
whenever the ECN field is changed). There is no need for more than | whenever the ECN field is changed). There is no need for more than | |||
one mode of decapsulation, as these rules cater for all known | one mode of decapsulation, as these rules cater for all known | |||
requirements. | requirements. | |||
+---------+------------------------------------------------+ | +---------+------------------------------------------------+ | |||
|Incoming | Incoming Outer Header | | |Incoming | Incoming Outer Header | | |||
| Inner +---------+------------+------------+------------+ | | Inner +---------+------------+------------+------------+ | |||
| Header | Not-ECT | ECT(0) | ECT(1) | CE | | | Header | Not-ECT | ECT(0) | ECT(1) | CE | | |||
+---------+---------+------------+------------+------------+ | +---------+---------+------------+------------+------------+ | |||
| Not-ECT | Not-ECT |Not-ECT(!!!)| drop(!!!)| drop(!!!)| | | Not-ECT | Not-ECT |Not-ECT(!!!)|Not-ECT(!!!)| drop(!!!)| | |||
| ECT(0) | ECT(0) | ECT(0) | ECT(1)(!!!)| CE | | | ECT(0) | ECT(0) | ECT(0) | ECT(1) | CE | | |||
| ECT(1) | ECT(1) | ECT(1)(!!!)| ECT(1) | CE | | | ECT(1) | ECT(1) | ECT(1) (!) | ECT(1) | CE | | |||
| CE | CE | CE | CE(!!!)| CE | | | CE | CE | CE | CE(!!!)| CE | | |||
+---------+---------+------------+------------+------------+ | +---------+---------+------------+------------+------------+ | |||
| Outgoing Header | | | Outgoing Header | | |||
+------------------------------------------------+ | +------------------------------------------------+ | |||
Unexpected combinations are indicated by '(!!!)' | Unexpected combinations are indicated by '(!!!)' | |||
Figure 4: New IP in IP Decapsulation Behaviour | Figure 4: New IP in IP Decapsulation Behaviour | |||
This table for decapsulation behaviour is derived from the following | This table for decapsulation behaviour is derived from the following | |||
logic: | logic: | |||
o If the inner ECN field is Not-ECT the decapsulator MUST NOT | o If the inner ECN field is Not-ECT the decapsulator MUST NOT | |||
propagate any other ECN codepoint onwards. This is because the | propagate any other ECN codepoint onwards. This is because the | |||
inner Not-ECT marking is set by transports that use drop as an | inner Not-ECT marking is set by transports that use drop as an | |||
indication of congestion and would not understand or respond to | indication of congestion and would not understand or respond to | |||
any other ECN codepoint [RFC4774]. In addition: | any other ECN codepoint [RFC4774]. In addition: | |||
* If the inner ECN field is Not-ECT and the outer ECN field is | * If the inner ECN field is Not-ECT and the outer ECN field is CE | |||
ECT(1) or CE the decapsulator MUST drop the packet. | the decapsulator MUST drop the packet. | |||
* If the inner ECN field is Not-ECT and the outer ECN field is | * If the inner ECN field is Not-ECT and the outer ECN field is | |||
ECT(0) or Not-ECT the decapsulator MUST forward the outgoing | Not-ECT, ECT(0) or ECT(1) the decapsulator MUST forward the | |||
packet with the ECN field cleared to Not-ECT. | outgoing packet with the ECN field cleared to Not-ECT. | |||
* This specification mandates that any future standards action | ||||
SHOULD NOT use the ECT(0) codepoint as an indication of | ||||
congestion, without giving strong reasons, given the above rule | ||||
forwards an ECT(0) outer as Not-ECT. | ||||
o In all other cases where the inner supports ECN, the outgoing ECN | o In all other cases where the inner supports ECN, the outgoing ECN | |||
field is set to the more severe marking of the outer and inner ECN | field is set to the more severe marking of the outer and inner ECN | |||
fields, where the ranking of severity from highest to lowest is | fields, where the ranking of severity from highest to lowest is | |||
CE, ECT(1), ECT(0), Not-ECT. This in no way precludes cases where | CE, ECT(1), ECT(0), Not-ECT. This in no way precludes cases where | |||
ECT(1) and ECT(0) have the same severity; | ECT(1) and ECT(0) have the same severity; | |||
o Certain combinations of inner and outer ECN fields cannot result | o Certain combinations of inner and outer ECN fields cannot result | |||
from any currently used transition in any current or previous ECN | from any currently used transition in any current or previous ECN | |||
tunneling specification. These cases are indicated in Figure 4 by | tunneling specification. These cases are indicated in Figure 4 by | |||
'(!!!)'). In these cases, the decapsulator SHOULD log the event | '(!!!)' or '(!)', where '(!!!)' means the combination is both | |||
and MAY also raise an alarm. Alarms should be rate-limited so | invalid and always potentially dangerous, while '(!)' means it is | |||
that the illegal combinations will not amplify into a flood of | invalid and possibly dangerous. In these cases, particularly the | |||
alarm messages. It MUST be possible to suppress alarms or | more dangerous ones, the decapsulator SHOULD log the event and MAY | |||
logging, e.g. if it becomes apparent that a combination that | also raise an alarm. Just because the highlighted combinations | |||
previously was not used has started to be used for legitimate | are always invalid, does not mean that all the other combinations | |||
purposes such as a new standards action. An example is an ECT(0) | are always valid. Some are only valid if they have arrived from a | |||
inner combined with an ECT(1) outer, which is proposed as a legal | particular type of legacy ingress, and dangerous otherwise. | |||
combination for PCN [I-D.ietf-pcn-3-in-1-encoding], so an operator | Therefore an implementation MAY allow an operator to configure | |||
that deploys support for PCN should turn off logging and alarms in | logging and alarms for such additional header combinations known | |||
this case. | to be dangerous or invalid for the particular configuration of | |||
tunnel endpoints deployed at run-time. | ||||
Alarms should be rate-limited so that the illegal combinations | ||||
will not amplify into a flood of alarm messages. It MUST be | ||||
possible to suppress alarms or logging, e.g. if it becomes | ||||
apparent that a combination that previously was not used has | ||||
started to be used for legitimate purposes such as a new standards | ||||
action. | ||||
The above logic allows for ECT(0) and ECT(1) to both represent the | The above logic allows for ECT(0) and ECT(1) to both represent the | |||
same severity of congestion marking (e.g. "not congestion marked"). | same severity of congestion marking (e.g. "not congestion marked"). | |||
But it also allows future schemes to be defined where ECT(1) is a | But it also allows future schemes to be defined where ECT(1) is a | |||
more severe marking than ECT(0), in particular enabling the simplest | more severe marking than ECT(0), in particular enabling the simplest | |||
possible encoding for PCN [I-D.ietf-pcn-3-in-1-encoding]. This | possible encoding for PCN [I-D.ietf-pcn-3-in-1-encoding]. This | |||
approach is discussed in Appendix D and in the discussion of the ECN | approach is discussed in Appendix D and in the discussion of the ECN | |||
nonce [RFC3540] in Section 9, which in turn refers to Appendix F. | nonce [RFC3540] in Section 8, which in turn refers to Appendix F. | |||
4.3. Encapsulation Modes | 4.3. Encapsulation Modes | |||
Section 4.1 introduces two encapsulation modes, normal mode and | Section 4.1 introduces two encapsulation modes, normal mode and | |||
compatibility mode, defining their encapsulation behaviour (i.e. | compatibility mode, defining their encapsulation behaviour (i.e. | |||
header copying or zeroing respectively). Note that these are modes | header copying or zeroing respectively). Note that these are modes | |||
of the ingress tunnel endpoint only, not the tunnel as a whole. | of the ingress tunnel endpoint only, not the tunnel as a whole. | |||
A tunnel ingress MUST at least implement `normal mode' and, if it | A tunnel ingress MUST at least implement `normal mode' and, if it | |||
might be used with legacy tunnel egress nodes (RFC2003, RFC2401 or | might be used with legacy tunnel egress nodes (RFC2003, RFC2401 or | |||
skipping to change at page 18, line 44 | skipping to change at page 19, line 27 | |||
5. Updates to Earlier RFCs | 5. Updates to Earlier RFCs | |||
5.1. Changes to RFC4301 ECN processing | 5.1. Changes to RFC4301 ECN processing | |||
Ingress: An RFC4301 IPsec encapsulator is not changed at all by the | Ingress: An RFC4301 IPsec encapsulator is not changed at all by the | |||
present specification | present specification | |||
Egress: The new decapsulation behaviour in Figure 4 updates RFC4301. | Egress: The new decapsulation behaviour in Figure 4 updates RFC4301. | |||
However, it solely updates combinations of inner and outer that | However, it solely updates combinations of inner and outer that | |||
have never been used on the Internet, even though they were | would never result from any protocol defined in the RFC series so | |||
defined in RFC4301 for completeness. Therefore, the present | far, even though they were catered for in RFC4301 for | |||
specification adds new behaviours to RFC4301 decapsulation without | completeness. Therefore, the present specification adds new | |||
altering existing behaviours. The following specific updates have | behaviours to RFC4301 decapsulation without altering existing | |||
been made: | behaviours. The following specific updates have been made: | |||
* The outer, not the inner, is propagated when the outer is | * The outer, not the inner, is propagated when the outer is | |||
ECT(1) and the inner is ECT(0); | ECT(1) and the inner is ECT(0); | |||
* A packet with Not-ECT in the inner and an outer of ECT(1) or CE | * A packet with Not-ECT in the inner and an outer of CE is | |||
is dropped rather than forwarded as Not-ECT; | dropped rather than forwarded as Not-ECT; | |||
* Certain combinations of inner and outer ECN field have been | * Certain combinations of inner and outer ECN field have been | |||
identified as currently unused. These can trigger logging | identified as currently unused. These can trigger logging | |||
and/or raise alarms. | and/or raise alarms. | |||
Modes: RFC4301 does not need modes and is not updated by the modes | Modes: RFC4301 does not need modes and is not updated by the modes | |||
in the present specification. The normal mode of encapsulation is | in the present specification. The normal mode of encapsulation is | |||
unchanged from RFC4301 encapsulation and an RFC4301 IPsec ingress | unchanged from RFC4301 encapsulation and an RFC4301 IPsec ingress | |||
will never need compatibility mode as explained in Section 4.3 | will never need compatibility mode as explained in Section 4.3 | |||
(except in one corner-case described below). | (except in one corner-case described below). | |||
One corner case can exist where an RFC4301 ingress does not use | One corner case can exist where an RFC4301 ingress does not use | |||
IKEv2, but uses manual keying instead. Then an RFC4301 ingress | IKEv2, but uses manual keying instead. Then an RFC4301 ingress | |||
could conceivably be configured to tunnel to an egress with | could conceivably be configured to tunnel to an egress with | |||
limited functionality ECN handling. Strictly, for this corner- | limited functionality ECN handling. Strictly, for this corner- | |||
case, the requirement to use compatibility mode in this | case, the requirement to use compatibility mode in this | |||
specification updates RFC4301. However, this is such a remote | specification updates RFC4301. However, this is such a remote | |||
possibility that in general RFC4301 IPsec implementations are NOT | possibility that RFC4301 IPsec implementations are NOT REQUIRED to | |||
REQUIRED to implement compatibility mode. | implement compatibility mode. | |||
5.2. Changes to RFC3168 ECN processing | 5.2. Changes to RFC3168 ECN processing | |||
Ingress: On encapsulation, the new rule in Figure 3 that a normal | Ingress: On encapsulation, the new rule in Figure 3 that a normal | |||
mode tunnel ingress copies any ECN field into the outer header | mode tunnel ingress copies any ECN field into the outer header | |||
updates the ingress behaviour of RFC3168. Nonetheless, the new | updates the ingress behaviour of RFC3168. Nonetheless, the new | |||
compatibility mode is identical to the limited functionality mode | compatibility mode is identical to the limited functionality mode | |||
of RFC3168. | of RFC3168. | |||
Egress: The new decapsulation behaviour in Figure 4 updates RFC3168. | Egress: The new decapsulation behaviour in Figure 4 updates RFC3168. | |||
However, the present specification solely updates combinations of | However, the present specification solely updates combinations of | |||
inner and outer that have never been used on the Internet, even | inner and outer that would never result from any protocol defined | |||
though they were defined in RFC3168 for completeness. Therefore, | in the RFC series so far, even though they were catered for in | |||
the present specification adds new behaviours to RFC3168 | RFC4301 for completeness. Therefore, the present specification | |||
decapsulation without altering existing behaviours. The following | adds new behaviours to RFC3168 decapsulation without altering | |||
specific updates have been made: | existing behaviours. The following specific updates have been | |||
made: | ||||
* The outer, not the inner, is propagated when the outer is | * The outer, not the inner, is propagated when the outer is | |||
ECT(1) and the inner is ECT(0); | ECT(1) and the inner is ECT(0); | |||
* A packet with Not-ECT in the inner and an outer of ECT(1) is | ||||
dropped rather than forwarded as Not-ECT; | ||||
* Certain combinations of inner and outer ECN field have been | * Certain combinations of inner and outer ECN field have been | |||
identified as currently unused. These can trigger logging | identified as currently unused. These can trigger logging | |||
and/or raise alarms. | and/or raise alarms. | |||
Modes: RFC3168 defines a (required) limited functionality mode and | Modes: RFC3168 defines a (required) limited functionality mode and | |||
an (optional) full functionality mode for a tunnel. In RFC3168, | an (optional) full functionality mode for a tunnel. In RFC3168, | |||
modes applied to both ends of the tunnel, while in the present | modes applied to both ends of the tunnel, while in the present | |||
specification, modes are only used at the ingress--a single egress | specification, modes are only used at the ingress--a single egress | |||
behaviour covers all cases. The normal mode of encapsulation | behaviour covers all cases. The normal mode of encapsulation | |||
updates the encapsulation behaviour of the full functionality mode | updates the encapsulation behaviour of the full functionality mode | |||
skipping to change at page 20, line 38 | skipping to change at page 21, line 19 | |||
both RFC4301 IPsec [RFC4301] and IP in MPLS or MPLS in MPLS | both RFC4301 IPsec [RFC4301] and IP in MPLS or MPLS in MPLS | |||
encapsulation [RFC5129] construct the ECN field. | encapsulation [RFC5129] construct the ECN field. | |||
Compatibility mode has also been defined so a non-RFC4301 ingress can | Compatibility mode has also been defined so a non-RFC4301 ingress can | |||
still switch to using drop across a tunnel for backwards | still switch to using drop across a tunnel for backwards | |||
compatibility with legacy decapsulators that do not propagate ECN | compatibility with legacy decapsulators that do not propagate ECN | |||
correctly. | correctly. | |||
The trigger that motivated this update to RFC3168 encapsulation was a | The trigger that motivated this update to RFC3168 encapsulation was a | |||
standards track proposal for pre-congestion notification (PCN | standards track proposal for pre-congestion notification (PCN | |||
[I-D.ietf-pcn-marking-behaviour]). PCN excess rate marking only | [RFC5670]). PCN excess rate marking only works correctly if the ECN | |||
works correctly if the ECN field is copied on encapsulation (as in | field is copied on encapsulation (as in RFC4301 and RFC5129); it does | |||
RFC4301 and RFC5129); it does not work if ECN is reset (as in | not work if ECN is reset (as in RFC3168). This is because PCN excess | |||
RFC3168). This is because PCN excess rate marking depends on the | rate marking depends on the outer header revealing any congestion | |||
outer header revealing any congestion experienced so far on the whole | experienced so far on the whole path, not just since the last tunnel | |||
path, not just since the last tunnel ingress (see Appendix E for a | ingress (see Appendix E for a full explanation). | |||
full explanation). | ||||
PCN allows a network operator to add flow admission and termination | PCN allows a network operator to add flow admission and termination | |||
for inelastic traffic at the edges of a Diffserv domain, but without | for inelastic traffic at the edges of a Diffserv domain, but without | |||
any per-flow mechanisms in the interior and without the generous | any per-flow mechanisms in the interior and without the generous | |||
provisioning typical of Diffserv, aiming to significantly reduce | provisioning typical of Diffserv, aiming to significantly reduce | |||
costs. The PCN architecture [RFC5559] states that RFC3168 IP in IP | costs. The PCN architecture [RFC5559] states that RFC3168 IP in IP | |||
tunnelling of the ECN field cannot be used for any tunnel ingress in | tunnelling of the ECN field cannot be used for any tunnel ingress in | |||
a PCN domain. Prior to the present specification, this left a stark | a PCN domain. Prior to the present specification, this left a stark | |||
choice between not being able to use PCN for inelastic traffic | choice between not being able to use PCN for inelastic traffic | |||
control or not being able to use the many tunnels already deployed | control or not being able to use the many tunnels already deployed | |||
skipping to change at page 21, line 45 | skipping to change at page 22, line 26 | |||
preferable. | preferable. | |||
o From the traffic security perspective (enforcing congestion | o From the traffic security perspective (enforcing congestion | |||
control, mitigating denial of service etc) copying is preferable. | control, mitigating denial of service etc) copying is preferable. | |||
o From the information security perspective resetting is preferable, | o From the information security perspective resetting is preferable, | |||
but the IETF Security Area now considers copying acceptable given | but the IETF Security Area now considers copying acceptable given | |||
the bandwidth of a 2-bit covert channel can be managed. | the bandwidth of a 2-bit covert channel can be managed. | |||
Therefore there are two points against resetting CE on ingress while | Therefore there are two points against resetting CE on ingress while | |||
copying CE causes no harm (other than opening a 2-bit covert channel | copying CE causes no significant harm. | |||
that is deemed manageable). | ||||
5.3.2. Motivation for Changing Decapsulation | 5.3.2. Motivation for Changing Decapsulation | |||
The specification for decapsulation in Section 4 fixes three problems | The specification for decapsulation in Section 4 fixes three problems | |||
with the pre-existing behaviours of both RFC3168 and RFC4301: | with the pre-existing behaviours of both RFC3168 and RFC4301: | |||
1. The pre-existing rules prevented the introduction of alternate | 1. The pre-existing rules prevented the introduction of alternate | |||
ECN semantics to signal more than one severity level of | ECN semantics to signal more than one severity level of | |||
congestion [RFC4774], [RFC5559]. The four states of the 2-bit | congestion [RFC4774], [RFC5559]. The four states of the 2-bit | |||
ECN field provide room for signalling two severity levels in | ECN field provide room for signalling two severity levels in | |||
skipping to change at page 23, line 8 | skipping to change at page 23, line 36 | |||
the box was deployed, often on the grounds that anything | the box was deployed, often on the grounds that anything | |||
unexpected might be an attack. This tends to bar future use of | unexpected might be an attack. This tends to bar future use of | |||
CU values. The new decapsulation rules specify optional logging | CU values. The new decapsulation rules specify optional logging | |||
and/or alarms for specific combinations of inner and outer header | and/or alarms for specific combinations of inner and outer header | |||
that are currently unused. The aim is to give implementers a | that are currently unused. The aim is to give implementers a | |||
recourse other than drop if they are concerned about the security | recourse other than drop if they are concerned about the security | |||
of CU values. It recognises legitimate security concerns about | of CU values. It recognises legitimate security concerns about | |||
CU values but still eases their future use. If the alarms are | CU values but still eases their future use. If the alarms are | |||
interpreted as an attack (e.g. by a management system) the | interpreted as an attack (e.g. by a management system) the | |||
offending packets can be dropped. But alarms can be turned off | offending packets can be dropped. But alarms can be turned off | |||
if these combinations come into use (e.g. a through a future | if these combinations come into regular use (e.g. a through a | |||
standards action). | future standards action). | |||
3. While reviewing currently unused combinations of inner and outer, | 3. While reviewing currently unused combinations of inner and outer, | |||
the opportunity was taken to define a single consistent behaviour | the opportunity was taken to define a single consistent behaviour | |||
for the cases with a Not-ECT inner header but a different outer. | for the three cases with a Not-ECT inner header but a different | |||
RFC3168 and RFC4301 had diverged in this respect. These | outer. RFC3168 and RFC4301 had diverged in this respect. None | |||
combinations should not result from known Internet protocols. | of these combinations should result from Internet protocols in | |||
So, for safety, it was decided to drop a packet if the outer | the RFC series, but future standards actions might put any or all | |||
carries codepoints CE or ECT(1) that respectively signal | of them to good use. Therefore it was decided that a | |||
congestion or could potentially signal congestion in a scheme | decapsulator must forward a Not-ECT inner unchanged, even if the | |||
progressing through the IETF [I-D.ietf-pcn-3-in-1-encoding]. | arriving outer was ECT(0) or ECT(1). But for safety it should | |||
Given an inner of Not-ECT implies the transport only understands | drop the Not-ECT inner if the arriving outer was CE. Then, if | |||
drop as a signal of congestion, this was the safest course of | some unfortunate misconfiguration resulted in a congested router | |||
action. | marking CE on a packet that was originally Not-ECT, drop would be | |||
the only appropriate signal for the egress to propagate--the only | ||||
signal a non-ECN-capable transport (Not-ECT) would understand. | ||||
ECT(1) is being proposed as an intermediate level of congestion | ||||
in a scheme progressing through the IETF | ||||
[I-D.ietf-pcn-3-in-1-encoding]. But it was decided that it would | ||||
still be safe to mandate forwarding as Not-ECT for a Not-ECT | ||||
inner with an ECT(1) outer, thus keeping this combination | ||||
available for future use. The rationale was as follows: if any | ||||
misconfiguration led to ECT(1) congestion signals with a Not-ECT | ||||
inner, it would be safe for the egress to suppress these signals. | ||||
This is because the congestion would then escalate to CE marking, | ||||
which the egress would drop, thus avoiding any risk of congestion | ||||
collapse. | ||||
Problems 2 & 3 alone would not warrant a change to decapsulation, but | Problems 2 & 3 alone would not warrant a change to decapsulation, but | |||
it was decided they are worth fixing and making consistent at the | it was decided they are worth fixing and making consistent at the | |||
same time as decapsulation code is changed to fix problem 1 (two | same time as decapsulation code is changed to fix problem 1 (two | |||
congestion severity-levels). | congestion severity-levels). | |||
6. Backward Compatibility | 6. Backward Compatibility | |||
A tunnel endpoint compliant with the present specification is | A tunnel endpoint compliant with the present specification is | |||
backward compatible when paired with any tunnel endpoint compliant | backward compatible when paired with any tunnel endpoint compliant | |||
skipping to change at page 25, line 13 | skipping to change at page 26, line 7 | |||
ECN (limited functionality mode) if it is paired with a legacy egress | ECN (limited functionality mode) if it is paired with a legacy egress | |||
(RFC 2481, RFC2401 or RFC2003), which would not propagate ECN | (RFC 2481, RFC2401 or RFC2003), which would not propagate ECN | |||
correctly. The present specification carries forward those rules | correctly. The present specification carries forward those rules | |||
(Section 4.3). It uses compatibility mode whenever RFC3168 would | (Section 4.3). It uses compatibility mode whenever RFC3168 would | |||
have used limited functionality mode, and their per-packet behaviours | have used limited functionality mode, and their per-packet behaviours | |||
are identical. Therefore, all other things being equal, an ingress | are identical. Therefore, all other things being equal, an ingress | |||
using the new rules will interwork with any legacy tunnel egress in | using the new rules will interwork with any legacy tunnel egress in | |||
exactly the same way as an RFC3168 ingress (still black-box backward | exactly the same way as an RFC3168 ingress (still black-box backward | |||
compatible). | compatible). | |||
7. Design Principles for Future Non-Default Schemes | 7. Design Principles for Alternate ECN Tunnelling Semantics | |||
This section is informative not normative. | This section is informative not normative. | |||
S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to | S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to | |||
'switch in' alternative behaviours for marking the ECN field, just as | 'switch in' alternative behaviours for marking the ECN field, just as | |||
it switches in different per-hop behaviours (PHBs) for scheduling. | it switches in different per-hop behaviours (PHBs) for scheduling. | |||
[RFC4774] gives best current practice for designing such alternative | [RFC4774] gives best current practice for designing such alternative | |||
ECN semantics and very briefly mentions that tunnelling should be | ECN semantics and very briefly mentions in section 5.4 that | |||
considered. Here we give additional guidance on designing alternate | tunnelling should be considered. The guidance below extends RFC4774, | |||
ECN semantics that would also require alternate tunnelling semantics. | giving additional guidance on designing any alternate ECN semantics | |||
that would also require alternate tunnelling semantics. | ||||
In one word the guidance is "Don't". If a scheme requires tunnels to | The overriding guidance is: "Avoid designing alternate ECN tunnelling | |||
semantics, if at all possible." If a scheme requires tunnels to | ||||
implement special processing of the ECN field for certain DSCPs, it | implement special processing of the ECN field for certain DSCPs, it | |||
is highly unlikely that every implementer of every tunnel will want | will be hard to guarantee that every implementer of every tunnel will | |||
to add the required exception and that operators will want to deploy | have added the required exception or that operators will have | |||
the required configuration options. Therefore it is highly likely | ubiquitously deployed the required updates. It is unlikely a single | |||
that some tunnels within a network will not implement the required | authority is even aware of all the tunnels in a network, which may | |||
special case. Therefore, designers of new protocols should avoid | include tunnels set up by applications between endpoints, or | |||
non-default tunnelling schemes if at all possible. | dynamically created in the network. Therefore it is highly likely | |||
that some tunnels within a network or on hosts connected to it will | ||||
not implement the required special case. | ||||
That said, if a non-default scheme for tunnelling the ECN field is | That said, if a non-default scheme for tunnelling the ECN field is | |||
really required, the following guidelines may prove useful in its | really required, the following guidelines may prove useful in its | |||
design: | design: | |||
On encapsulation in any new scheme: | On encapsulation in any new scheme: | |||
1. The ECN field of the outer header should be cleared to Not-ECT | 1. The ECN field of the outer header should be cleared to Not-ECT | |||
("00") unless it is guaranteed that the corresponding tunnel | ("00") unless it is guaranteed that the corresponding tunnel | |||
egress will correctly propagate congestion markings introduced | egress will correctly propagate congestion markings introduced | |||
skipping to change at page 26, line 25 | skipping to change at page 27, line 23 | |||
Then the code module doing encapsulation can keep to the | Then the code module doing encapsulation can keep to the | |||
copying rule and the load regulator module can reset | copying rule and the load regulator module can reset | |||
congestion, without any code in either module being | congestion, without any code in either module being | |||
conditional on whether the other is there. | conditional on whether the other is there. | |||
On decapsulation in any new scheme: | On decapsulation in any new scheme: | |||
1. If the arriving inner header is Not-ECT it implies the | 1. If the arriving inner header is Not-ECT it implies the | |||
transport will not understand other ECN codepoints. If the | transport will not understand other ECN codepoints. If the | |||
outer header carries an explicit congestion marking, the | outer header carries an explicit congestion marking, the | |||
packet should be dropped--the only indication of congestion | alternate scheme will probably need to drop the packet--the | |||
the transport will understand. If the outer carries any other | only indication of congestion the transport will understand. | |||
ECN codepoint the packet can be forwarded, but only as Not- | If the outer carries any other ECN codepoint that does not | |||
ECT. | indicate congestion, the alternate scheme can forward the | |||
packet, but probably only as Not-ECT. | ||||
2. If the arriving inner header is other than Not-ECT, the ECN | 2. If the arriving inner header is other than Not-ECT, the ECN | |||
field that the tunnel egress forwards should reflect the more | field that the alternate decapsulation scheme forwards should | |||
severe congestion marking of the arriving inner and outer | reflect the more severe congestion marking of the arriving | |||
headers. | inner and outer headers. | |||
3. If a combination of inner and outer headers is encountered | 3. Any alternate scheme MUST define a behaviour for all | |||
that is not currently used in known standards, this event | combinations of inner and outer headers, even those that would | |||
should be logged and an alarm raised. This is a preferable | not be expected to result from standards known at the time or | |||
approach to dropping currently unused combinations in case | from the expected behaviour of the tunnel ingress paired with | |||
they represent an attack. The new scheme should try to define | the egress at run-time. Consideration should be given to | |||
a way to forward such packets, but only if a safe outgoing | logging such unexpected combinations and raising an alarm, | |||
codepoint can be defined. | particularly if there is a danger that the invalid combination | |||
implies congestion signals are not being propagated correctly. | ||||
The presence of currently unused combinations may represent an | ||||
attack, but the new scheme should try to define a way to | ||||
forward such packets, at least if a safe outgoing codepoint | ||||
can be defined. Raising an alarm to warn of the possibility | ||||
of an attack is a preferable approach to dropping that ensures | ||||
these combinations can be usable in future standards actions. | ||||
8. IANA Considerations | IANA Considerations (to be removed on publication): | |||
This memo includes no request to IANA. | This memo includes no request to IANA. | |||
9. Security Considerations | 8. Security Considerations | |||
Appendix B.1 discusses the security constraints imposed on ECN tunnel | Appendix B.1 discusses the security constraints imposed on ECN tunnel | |||
processing. The new rules for ECN tunnel processing (Section 4) | processing. The new rules for ECN tunnel processing (Section 4) | |||
trade-off between information security (covert channels) and | trade-off between information security (covert channels) and | |||
congestion monitoring & control. In fact, ensuring congestion | congestion monitoring & control. In fact, ensuring congestion | |||
markings are not lost is itself another aspect of security, because | markings are not lost is itself another aspect of security, because | |||
if we allowed congestion notification to be lost, any attempt to | if we allowed congestion notification to be lost, any attempt to | |||
enforce a response to congestion would be much harder. | enforce a response to congestion would be much harder. | |||
Specialist security issues: | Specialist security issues: | |||
skipping to change at page 28, line 8 | skipping to change at page 29, line 19 | |||
'I' will set all ECN fields in outer headers to Not-ECT, 'M' could | 'I' will set all ECN fields in outer headers to Not-ECT, 'M' could | |||
still toggle CE or ECT(1) on and off to communicate covertly with | still toggle CE or ECT(1) on and off to communicate covertly with | |||
'B', because we have specified that 'E' only has one mode | 'B', because we have specified that 'E' only has one mode | |||
regardless of what mode it says it has negotiated. We could have | regardless of what mode it says it has negotiated. We could have | |||
specified that 'E' should have a limited functionality mode and | specified that 'E' should have a limited functionality mode and | |||
check for such behaviour. But we decided not to add the extra | check for such behaviour. But we decided not to add the extra | |||
complexity of two modes on a compliant tunnel egress merely to | complexity of two modes on a compliant tunnel egress merely to | |||
cater for an historic security concern that is now considered | cater for an historic security concern that is now considered | |||
manageable. | manageable. | |||
10. Conclusions | 9. Conclusions | |||
This document uses previously unused combinations of inner and outer | This document uses previously unused combinations of inner and outer | |||
header to augment the rules for calculating the ECN field when | header to augment the rules for calculating the ECN field when | |||
decapsulating IP packets at the egress of IPsec (RFC4301) and non- | decapsulating IP packets at the egress of IPsec (RFC4301) and non- | |||
IPsec (RFC3168) tunnels. In this way it allows tunnels to propagate | IPsec (RFC3168) tunnels. In this way it allows tunnels to propagate | |||
an extra level of congestion severity. | an extra level of congestion severity. | |||
This document also updates the ingress tunnelling encapsulation of | This document also updates the ingress tunnelling encapsulation of | |||
RFC3168 ECN to bring all IP in IP tunnels into line with the new | RFC3168 ECN to bring all IP in IP tunnels into line with the new | |||
behaviour in the IPsec architecture of RFC4301, which copies rather | behaviour in the IPsec architecture of RFC4301, which copies rather | |||
skipping to change at page 28, line 33 | skipping to change at page 29, line 44 | |||
standards track. Operators wanting to support PCN or other alternate | standards track. Operators wanting to support PCN or other alternate | |||
ECN schemes that use an extra severity level can require that their | ECN schemes that use an extra severity level can require that their | |||
tunnels comply with the present specification. Nonetheless, as part | tunnels comply with the present specification. Nonetheless, as part | |||
of general code maintenance, any tunnel can safely be updated to | of general code maintenance, any tunnel can safely be updated to | |||
comply with this specification, because it is backward compatible | comply with this specification, because it is backward compatible | |||
with all previous tunnelling behaviours which will continue to work | with all previous tunnelling behaviours which will continue to work | |||
as before--just using one severity level. | as before--just using one severity level. | |||
The new rules propagate changes to the ECN field across tunnel end- | The new rules propagate changes to the ECN field across tunnel end- | |||
points that previously blocked them to restrict the bandwidth of a | points that previously blocked them to restrict the bandwidth of a | |||
potential covert channel. But limiting the channel's bandwidth to 2 | potential covert channel. Limiting the channel's bandwidth to 2 bits | |||
bits per packet is now considered sufficient. | per packet is now considered sufficient. | |||
At the same time as removing these legacy constraints, the | At the same time as removing these legacy constraints, the | |||
opportunity has been taken to draw together diverging tunnel | opportunity has been taken to draw together diverging tunnel | |||
specifications into a single consistent behaviour. Then any tunnel | specifications into a single consistent behaviour. Then any tunnel | |||
can be deployed unilaterally, and it will support the full range of | can be deployed unilaterally, and it will support the full range of | |||
congestion control and management schemes without any modes or | congestion control and management schemes without any modes or | |||
configuration. Further, any host or router can expect the ECN field | configuration. Further, any host or router can expect the ECN field | |||
to behave in the same way, whatever type of tunnel might intervene in | to behave in the same way, whatever type of tunnel might intervene in | |||
the path. This new certainty could enable new uses of the ECN field | the path. This new certainty could enable new uses of the ECN field | |||
that would otherwise be confounded by ambiguity. | that would otherwise be confounded by ambiguity. | |||
11. Acknowledgements | 10. Acknowledgements | |||
Thanks to Anil Agawaal for pointing out a case where it's safe for a | Thanks to Anil Agawaal for pointing out a case where it's safe for a | |||
tunnel decapsulator to forward a combination of headers it does not | tunnel decapsulator to forward a combination of headers it does not | |||
understand. Thanks to David Black for explaining a better way to | understand. Thanks to David Black for explaining a better way to | |||
think about function placement. Also thanks to Arnaud Jacquet for | think about function placement. Also thanks to Arnaud Jacquet for | |||
the idea for Appendix C. Thanks to Michael Menth, Bruce Davie, Toby | the idea for Appendix C. Thanks to Michael Menth, Bruce Davie, Toby | |||
Moncaster, Gorry Fairhurst, Sally Floyd, Alfred Hoenes, Gabriele | Moncaster, Gorry Fairhurst, Sally Floyd, Alfred Hoenes, Gabriele | |||
Corliano, Ingemar Johansson, David Black and Phil Eardley for their | Corliano, Ingemar Johansson, David Black and Phil Eardley for their | |||
thoughts and careful review comments. | thoughts and careful review comments. | |||
Bob Briscoe is partly funded by Trilogy, a research project (ICT- | Bob Briscoe is partly funded by Trilogy, a research project (ICT- | |||
216372) supported by the European Community under its Seventh | 216372) supported by the European Community under its Seventh | |||
Framework Programme. The views expressed here are those of the | Framework Programme. The views expressed here are those of the | |||
author only. | author only. | |||
12. Comments Solicited | Comments Solicited (to be removed by the RFC Editor): | |||
Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
addressed to the IETF Transport Area working group mailing list | addressed to the IETF Transport Area working group mailing list | |||
<tsvwg@ietf.org>, and/or to the authors. | <tsvwg@ietf.org>, and/or to the authors. | |||
13. References | 11. References | |||
13.1. Normative References | 11.1. Normative References | |||
[RFC2003] Perkins, C., "IP Encapsulation | [RFC2003] Perkins, C., "IP Encapsulation | |||
within IP", RFC 2003, October 1996. | within IP", RFC 2003, October 1996. | |||
[RFC2119] Bradner, S., "Key words for use in | [RFC2119] Bradner, S., "Key words for use in | |||
RFCs to Indicate Requirement | RFCs to Indicate Requirement | |||
Levels", BCP 14, RFC 2119, | Levels", BCP 14, RFC 2119, | |||
March 1997. | March 1997. | |||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. | [RFC3168] Ramakrishnan, K., Floyd, S., and D. | |||
Black, "The Addition of Explicit | Black, "The Addition of Explicit | |||
Congestion Notification (ECN) to | Congestion Notification (ECN) to | |||
IP", RFC 3168, September 2001. | IP", RFC 3168, September 2001. | |||
[RFC4301] Kent, S. and K. Seo, "Security | [RFC4301] Kent, S. and K. Seo, "Security | |||
Architecture for the Internet | Architecture for the Internet | |||
Protocol", RFC 4301, December 2005. | Protocol", RFC 4301, December 2005. | |||
13.2. Informative References | 11.2. Informative References | |||
[I-D.ietf-pcn-3-in-1-encoding] Briscoe, B. and T. Moncaster, "PCN | [I-D.ietf-pcn-3-in-1-encoding] Briscoe, B. and T. Moncaster, "PCN | |||
3-State Encoding Extension in a | 3-State Encoding Extension in a | |||
single DSCP", | single DSCP", | |||
draft-ietf-pcn-3-in-1-encoding-00 | draft-ietf-pcn-3-in-1-encoding-00 | |||
(work in progress), July 2009. | (work in progress), July 2009. | |||
[I-D.ietf-pcn-3-state-encoding] Moncaster, T., Briscoe, B., and M. | [I-D.ietf-pcn-3-state-encoding] Moncaster, T., Briscoe, B., and M. | |||
Menth, "A PCN encoding using 2 | Menth, "A PCN encoding using 2 | |||
DSCPs to provide 3 or more states", | DSCPs to provide 3 or more states", | |||
draft-ietf-pcn-3-state-encoding-00 | ||||
(work in progress), April 2009. | (work in progress), April 2009. | |||
[I-D.ietf-pcn-baseline-encoding] Moncaster, T., Briscoe, B., and M. | ||||
Menth, "Baseline Encoding and | ||||
Transport of Pre-Congestion | ||||
Information", | ||||
draft-ietf-pcn-baseline-encoding-07 | ||||
(work in progress), September 2009. | ||||
[I-D.ietf-pcn-marking-behaviour] Eardley, P., "Metering and marking | ||||
behaviour of PCN-nodes", | ||||
draft-ietf-pcn-marking-behaviour-05 | ||||
(work in progress), August 2009. | ||||
[I-D.ietf-pcn-psdm-encoding] Menth, M., Babiarz, J., Moncaster, | [I-D.ietf-pcn-psdm-encoding] Menth, M., Babiarz, J., Moncaster, | |||
T., and B. Briscoe, "PCN Encoding | T., and B. Briscoe, "PCN Encoding | |||
for Packet-Specific Dual Marking | for Packet-Specific Dual Marking | |||
(PSDM)", | (PSDM)", | |||
draft-ietf-pcn-psdm-encoding-00 | draft-ietf-pcn-psdm-encoding-00 | |||
(work in progress), June 2009. | (work in progress), June 2009. | |||
[I-D.ietf-pcn-sm-edge-behaviour] Charny, A., Karagiannis, G., Menth, | [I-D.ietf-pcn-sm-edge-behaviour] Charny, A., Karagiannis, G., Menth, | |||
M., and T. Taylor, "PCN Boundary | M., and T. Taylor, "PCN Boundary | |||
Node Behaviour for the Single | Node Behaviour for the Single | |||
Marking (SM) Mode of Operation", | Marking (SM) Mode of Operation", | |||
draft-ietf-pcn-sm-edge-behaviour-00 | draft-ietf-pcn-sm-edge-behaviour-01 | |||
(work in progress), July 2009. | (work in progress), October 2009. | |||
[I-D.satoh-pcn-st-marking] Satoh, D., Ueno, H., Maeda, Y., and | [I-D.satoh-pcn-st-marking] Satoh, D., Ueno, H., Maeda, Y., and | |||
O. Phanachet, "Single PCN Threshold | O. Phanachet, "Single PCN Threshold | |||
Marking by using PCN baseline | Marking by using PCN baseline | |||
encoding for both admission and | encoding for both admission and | |||
termination controls", | termination controls", | |||
draft-satoh-pcn-st-marking-02 (work | draft-satoh-pcn-st-marking-02 (work | |||
in progress), September 2009. | in progress), September 2009. | |||
[RFC2401] Kent, S. and R. Atkinson, "Security | [RFC2401] Kent, S. and R. Atkinson, "Security | |||
Architecture for the Internet | Architecture for the Internet | |||
Protocol", RFC 2401, November 1998. | Protocol", RFC 2401, November 1998. | |||
[RFC2474] Nichols, K., Blake, S., Baker, F., | [RFC2474] Nichols, K., Blake, S., Baker, F., | |||
and D. Black, "Definition of the | and D. Black, "Definition of the | |||
skipping to change at page 31, line 35 | skipping to change at page 32, line 34 | |||
November 2006. | November 2006. | |||
[RFC5129] Davie, B., Briscoe, B., and J. Tay, | [RFC5129] Davie, B., Briscoe, B., and J. Tay, | |||
"Explicit Congestion Marking in | "Explicit Congestion Marking in | |||
MPLS", RFC 5129, January 2008. | MPLS", RFC 5129, January 2008. | |||
[RFC5559] Eardley, P., "Pre-Congestion | [RFC5559] Eardley, P., "Pre-Congestion | |||
Notification (PCN) Architecture", | Notification (PCN) Architecture", | |||
RFC 5559, June 2009. | RFC 5559, June 2009. | |||
[RFC5670] Eardley, P., "Metering and Marking | ||||
Behaviour of PCN-Nodes", RFC 5670, | ||||
November 2009. | ||||
[RFC5696] Moncaster, T., Briscoe, B., and M. | ||||
Menth, "Baseline Encoding and | ||||
Transport of Pre-Congestion | ||||
Information", RFC 5696, | ||||
November 2009. | ||||
[VCP] Xia, Y., Subramanian, L., Stoica, | [VCP] Xia, Y., Subramanian, L., Stoica, | |||
I., and S. Kalyanaraman, "One more | I., and S. Kalyanaraman, "One more | |||
bit is enough", Proc. SIGCOMM'05, | bit is enough", Proc. SIGCOMM'05, | |||
ACM CCR 35(4)37--48, 2005, <http:// | ACM CCR 35(4)37--48, 2005, <http:// | |||
doi.acm.org/10.1145/ | doi.acm.org/10.1145/ | |||
1080091.1080098>. | 1080091.1080098>. | |||
Appendix A. Early ECN Tunnelling RFCs | Appendix A. Early ECN Tunnelling RFCs | |||
IP in IP tunnelling was originally defined in [RFC2003]. On | IP in IP tunnelling was originally defined in [RFC2003]. On | |||
encapsulation, the incoming header was copied to the outer and on | encapsulation, the incoming header was copied to the outer and on | |||
decapsulation the outer was simply discarded. Initially, IPsec | decapsulation the outer was simply discarded. Initially, IPsec | |||
tunnelling [RFC2401] followed the same behaviour. | tunnelling [RFC2401] followed the same behaviour. | |||
When ECN was introduced experimentally in [RFC2481], legacy (RFC2003 | When ECN was introduced experimentally in [RFC2481], legacy (RFC2003 | |||
or RFC2401) tunnels would have discarded any congestion markings | or RFC2401) tunnels would have discarded any congestion markings | |||
added to the outer header, so RFC2481 introduced rules for | added to the outer header, so RFC2481 introduced rules for | |||
calculating the outgoing header from a combination of the inner and | calculating the outgoing header from a combination of the inner and | |||
outer on decapsulation. RC2481 also introduced a second mode for | outer on decapsulation. RC2481 also introduced a second mode for | |||
IPsec tunnels, which turned off ECN processing in the outer header | IPsec tunnels, which turned off ECN processing(Not-ECT) in the outer | |||
(Not-ECT) on encapsulation because an RFC2401 decapsulator would | header on encapsulation because an RFC2401 decapsulator would discard | |||
discard the outer on decapsulation. For RFC2401 IPsec this had the | the outer on decapsulation. For RFC2401 IPsec this had the side- | |||
side-effect of completely blocking the covert channel. | effect of completely blocking the covert channel. | |||
In RFC2481 the ECN field was defined as two separate bits. But when | In RFC2481 the ECN field was defined as two separate bits. But when | |||
ECN moved from the experimental to the standards track [RFC3168], the | ECN moved from the experimental to the standards track [RFC3168], the | |||
ECN field was redefined as four codepoints. This required a | ECN field was redefined as four codepoints. This required a | |||
different calculation of the ECN field from that used in RFC2481 on | different calculation of the ECN field from that used in RFC2481 on | |||
decapsulation. RFC3168 also had two modes; a 'full functionality | decapsulation. RFC3168 also had two modes; a 'full functionality | |||
mode' that restricted the covert channel as much as possible but | mode' that restricted the covert channel as much as possible but | |||
still allowed ECN to be used with IPsec, and another that completely | still allowed ECN to be used with IPsec, and another that completely | |||
turned off ECN processing across the tunnel. This 'limited | turned off ECN processing across the tunnel. This 'limited | |||
functionality mode' both offered a way for operators to completely | functionality mode' both offered a way for operators to completely | |||
skipping to change at page 33, line 42 | skipping to change at page 35, line 4 | |||
will have allowed a covert channel from 'M' to 'B'. | will have allowed a covert channel from 'M' to 'B'. | |||
ECN at the IP layer is designed to carry information about congestion | ECN at the IP layer is designed to carry information about congestion | |||
from a congested resource towards downstream nodes. Typically a | from a congested resource towards downstream nodes. Typically a | |||
downstream transport might feed the information back somehow to the | downstream transport might feed the information back somehow to the | |||
point upstream of the congestion that can regulate the load on the | point upstream of the congestion that can regulate the load on the | |||
congested resource, but other actions are possible (see [RFC3168] | congested resource, but other actions are possible (see [RFC3168] | |||
S.6). In terms of the above unicast scenario, ECN effectively | S.6). In terms of the above unicast scenario, ECN effectively | |||
intends to create an information channel (for congestion signalling) | intends to create an information channel (for congestion signalling) | |||
from 'M' to 'B' (for 'B' to feed back to 'A'). Therefore the goals | from 'M' to 'B' (for 'B' to feed back to 'A'). Therefore the goals | |||
of IPsec and ECN are mutually incompatible. | of IPsec and ECN are mutually incompatible, requiring some | |||
compromise. | ||||
With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, | With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, | |||
"controls are provided to manage the bandwidth of this [covert] | "controls are provided to manage the bandwidth of this [covert] | |||
channel". Using the ECN processing rules of RFC4301, the channel | channel". Using the ECN processing rules of RFC4301, the channel | |||
bandwidth is two bits per datagram from 'A' to 'M' and one bit per | bandwidth is two bits per datagram from 'A' to 'M' and one bit per | |||
datagram from 'M' to 'A' (because 'E' limits the combinations of the | datagram from 'M' to 'A' (because 'E' limits the combinations of the | |||
2-bit ECN field that it will copy). In both cases the covert channel | 2-bit ECN field that it will copy). In both cases the covert channel | |||
bandwidth is further reduced by noise from any real congestion | bandwidth is further reduced by noise from any real congestion | |||
marking. RFC4301 implies that these covert channels are sufficiently | marking. RFC4301 implies that these covert channels are sufficiently | |||
limited to be considered a manageable threat. However, with respect | limited to be considered a manageable threat. However, with respect | |||
skipping to change at page 37, line 22 | skipping to change at page 38, line 24 | |||
| | 12 | = 17% | | | 12 | = 17% | |||
0 +-----+---------+---> | 0 +-----+---------+---> | |||
0 30% 100% inner header marking | 0 30% 100% inner header marking | |||
Figure 7: Tunnel Marking of Packets Already Marked at Ingress | Figure 7: Tunnel Marking of Packets Already Marked at Ingress | |||
Appendix D. Why Losing ECT(1) on Decapsulation Impedes PCN | Appendix D. Why Losing ECT(1) on Decapsulation Impedes PCN | |||
Congestion notification with two severity levels is currently on the | Congestion notification with two severity levels is currently on the | |||
IETF's standards track agenda in the Congestion and Pre-Congestion | IETF's standards track agenda in the Congestion and Pre-Congestion | |||
Notification (PCN) working group. The PCN working group requires | Notification (PCN) working group. PCN needs all four possible states | |||
four congestion states (not PCN-enabled, not marked and two | of congestion signalling in the 2-bit ECN field to be propagated at | |||
increasingly severe levels of congestion marking--see [RFC5559]). | the egress, but pre-existing tunnels only propagate three. The four | |||
The aim is for the less severe level of marking to stop admitting new | PCN states are: not PCN-enabled, not marked and two increasingly | |||
traffic and the more severe level to terminate sufficient existing | severe levels of congestion marking. The less severe marking means | |||
flows to bring a network back to its operating point after a link | 'stop admitting new traffic' and the more severe marking means | |||
failure. | 'terminate some existing flows', which may be needed after reroutes | |||
(see [RFC5559] for more details). (Note on terminology: wherever | ||||
this document counts four congestion states, the PCN working group | ||||
would count this as three PCN states plus a not-PCN-enabled state.) | ||||
(Note on terminology: wherever this document counts four congestion | Figure 2 (Section 3.2) shows that pre-existing decapsulation | |||
states, the PCN working group would count this as three PCN states | behaviour would have discarded any ECT(1) markings in outer headers | |||
plus a not-PCN-enabled state.) | if the inner was ECT(0). This prevented the PCN working group from | |||
using ECT(1) -- if a PCN node used ECT(1) to indicate one of the | ||||
severity levels of congestion, any later tunnel egress would revert | ||||
the marking to ECT(0) as if nothing had happened. Effectively the | ||||
decapsulation rules of RFC4301 and RFC3168 waste one ECT codepoint; | ||||
they treat the ECT(0) and ECT(1) codepoints as a single codepoint. | ||||
Although the ECN field gives sufficient codepoints for four states, | A number of work-rounds to this problem were proposed in the PCN w-g; | |||
pre-existing ECN tunnelling RFCs prevented the PCN working group from | to add the fourth state another way or avoid needing it. Without | |||
using four ECN states in case any tunnel decapsulations occur within | wishing to disparage the ingenuity of these work-rounds, none were | |||
a PCN region. If a node in a tunnel changes the ECN field to ECT(0) | chosen for the standards track because they were either somewhat | |||
or ECT(1), this change would be discarded by a tunnel egress | wasteful, imprecise or complicated: | |||
compliant with RFC4301 or RFC3168. This can be seen in Figure 2 | ||||
(Section 3.2), where ECT values in the outer header are ignored | ||||
unless the inner header is the same. Effectively the decapsulation | ||||
rules of RFC4301 and RFC3168 waste one ECT codepoint; they treat the | ||||
ECT(0) and ECT(1) codepoints as a single codepoint. | ||||
As a consequence, the PCN w-g initially took the approach of a | o One uses a pair of Diffserv codepoint(s) in place of each PCN DSCP | |||
standards track baseline encoding for three states | to encode the extra state [I-D.ietf-pcn-3-state-encoding], using | |||
[I-D.ietf-pcn-baseline-encoding] and a number of experimental | up the rapidly exhausting DSCP space while leaving an ECN | |||
alternatives to add or avoid the fourth state. Without wishing to | codepoint unused. | |||
disparage the ingenuity of these work-rounds, none were chosen for | ||||
the standards track because they were either somewhat wasteful, | ||||
imprecise or complicated. One uses a pair of Diffserv codepoint(s) | ||||
in place of each PCN DSCP to encode the extra state | ||||
[I-D.ietf-pcn-3-state-encoding], using up the rapidly exhausting DSCP | o Another survives tunnelling without an extra DSCP | |||
space while leaving an ECN codepoint unused. Another PCN encoding | [I-D.ietf-pcn-psdm-encoding], but it requires the PCN edge | |||
has been proposed that would survive tunnelling without an extra DSCP | gateways to share the initial state of a packet out of band. | |||
[I-D.ietf-pcn-psdm-encoding], but it requires the PCN edge gateways | ||||
to share state out of band so the egress edge can know which marking | o Another proposes a more involved marking algorithm in forwarding | |||
a packet started with at the ingress edge. Yet another work-round to | elements to encode the three congestion notification states using | |||
the ECN tunnelling problem proposes a more involved marking algorithm | only two ECN codepoints [I-D.satoh-pcn-st-marking]. | |||
in forwarding elements to encode the three congestion notification | ||||
states using only two ECN codepoints [I-D.satoh-pcn-st-marking]. One | o Another takes a different approach; it compromises the precision | |||
work-round takes a different approach; it compromises the precision | of the admission control mechanism in some network scenarios, but | |||
of the admission control mechanism in some network scenarios, but | manages to work with just three encoding states and a single | |||
manages to work with just three encoding states and a single marking | marking algorithm [I-D.ietf-pcn-sm-edge-behaviour]. | |||
algorithm [I-D.ietf-pcn-sm-edge-behaviour]. | ||||
Rather than require the IETF to bless any of these experimental | Rather than require the IETF to bless any of these experimental | |||
encoding work-rounds, the present specification fixes the root cause | encoding work-rounds, the present specification fixes the root cause | |||
of the problem so that operators deploying PCN can simply require | of the problem so that operators deploying PCN can simply require | |||
that tunnel end-points within a PCN region should comply with this | that tunnel end-points within a PCN region should comply with this | |||
new ECN tunnelling specification. Universal compliance is feasible | new ECN tunnelling specification. On the public Internet it would | |||
for PCN, because it is intended to be deployed in a controlled | not be possible to know whether all tunnels complied with this new | |||
Diffserv region. Assuming tunnels within a PCN region will be | specification, but universal compliance is feasible for PCN, because | |||
required to comply with the present specification, the PCN w-g is | it is intended to be deployed in a controlled Diffserv region. | |||
progressing a trivially simple four-state ECN encoding | ||||
[I-D.ietf-pcn-3-in-1-encoding]. | Given the present specification, the PCN w-g could progress a | |||
trivially simple four-state ECN encoding | ||||
[I-D.ietf-pcn-3-in-1-encoding]. This would replace the interim | ||||
standards track baseline encoding of just three states [RFC5696] | ||||
which makes a fourth state available for any of the experimental | ||||
alternatives. | ||||
Appendix E. Why Resetting ECN on Encapsulation Impedes PCN | Appendix E. Why Resetting ECN on Encapsulation Impedes PCN | |||
The PCN architecture says "...if encapsulation is done within the | The PCN architecture says "...if encapsulation is done within the | |||
PCN-domain: Any PCN-marking is copied into the outer header. Note: A | PCN-domain: Any PCN-marking is copied into the outer header. Note: A | |||
tunnel will not provide this behaviour if it complies with [RFC3168] | tunnel will not provide this behaviour if it complies with [RFC3168] | |||
tunnelling in either mode, but it will if it complies with [RFC4301] | tunnelling in either mode, but it will if it complies with [RFC4301] | |||
IPsec tunnelling. " | IPsec tunnelling. " | |||
The specific issue here concerns PCN excess rate marking | The specific issue here concerns PCN excess rate marking [RFC5670]. | |||
[I-D.ietf-pcn-marking-behaviour]. The purpose of excess rate marking | The purpose of excess rate marking is to provide a bulk mechanism for | |||
is to provide a bulk mechanism for interior nodes within a PCN domain | interior nodes within a PCN domain to mark traffic that is exceeding | |||
to mark traffic that is exceeding a configured threshold bit-rate, | a configured threshold bit-rate, perhaps after an unexpected event | |||
perhaps after an unexpected event such as a reroute, a link or node | such as a reroute, a link or node failure, or a more widespread | |||
failure, or a more widespread disaster. PCN is intended for | disaster. Reroutes are a common cause of QoS degradation in IP | |||
inelastic flows, so just removing marked packets would degrade every | networks. After reroutes it is common for multiple links in a | |||
flow to the point of uselessness. Instead, the edge nodes around a | network to become stressed at once. Therefore, PCN excess rate | |||
PCN domain terminate an equivalent amount of traffic, but at flow | marking has been carefully designed to ensure traffic marked at one | |||
granularity. As well as protecting the surviving inelastic flows, | queue will not be counted again for marking at subsequent queues (see | |||
this also protects the share of capacity set aside for elastic | the `Excess traffic meter function' of [RFC5670]). | |||
traffic. But users are very sensitive to their flows being | ||||
terminated while in progress, therefore no more flows should be | ||||
terminated than absolutely necessary. | ||||
Re-routes are a common cause of QoS degradation in IP networks. | ||||
After re-routes it is common for multiple links in a network to | ||||
become stressed at once. Therefore, PCN excess rate marking has been | ||||
carefully designed to ensure traffic marked at one queue will not be | ||||
counted again for marking at subsequent queues (see the `Excess | ||||
traffic meter function' of [I-D.ietf-pcn-marking-behaviour]). | ||||
However, if an RFC3168 tunnel ingress intervenes, it resets the ECN | However, if an RFC3168 tunnel ingress intervenes, it resets the ECN | |||
field in all the outer headers. This will cause excess traffic to be | field in all the outer headers. This will cause excess traffic to be | |||
counted more than once, leading to many flows being removed that did | counted more than once, leading to many flows being removed that did | |||
not need to be removed at all. This is why the an RFC3168 tunnel | not need to be removed at all. This is why the an RFC3168 tunnel | |||
ingress cannot be used in a PCN domain. | ingress cannot be used in a PCN domain. | |||
The original reason an RFC3168 encapsulator reset the ECN field was | ||||
to block a covert channel (Appendix B.1), with the overriding aim of | ||||
consistent behaviour between IPsec and non-IPsec tunnels. But later | ||||
RFC4301 IPsec encapsulation placed simplicity above the need to block | ||||
the covert channel, simply copying the ECN field. | ||||
The ECN reset in RFC3168 is no longer deemed necessary, it is | The ECN reset in RFC3168 is no longer deemed necessary, it is | |||
inconsistent with RFC4301, it is not as simple as RFC4301 and it is | inconsistent with RFC4301, it is not as simple as RFC4301 and it is | |||
impeding deployment of new protocols like PCN. The present | impeding deployment of new protocols like PCN. The present | |||
specification corrects this perverse situation. | specification corrects this perverse situation. | |||
Appendix F. Compromise on Decap with ECT(1) Inner and ECT(0) Outer | Appendix F. Compromise on Decap with ECT(1) Inner and ECT(0) Outer | |||
A packet with an ECT(1) inner and an ECT(0) outer should never arise | A packet with an ECT(1) inner and an ECT(0) outer should never arise | |||
from any known IETF protocol. Without giving a reason, RFC3168 and | from any known IETF protocol. Without giving a reason, RFC3168 and | |||
RFC4301 both say the outer should be ignored when decapsulating such | RFC4301 both say the outer should be ignored when decapsulating such | |||
End of changes. 61 change blocks. | ||||
269 lines changed or deleted | 314 lines changed or added | |||
This html diff was produced by rfcdiff 1.37b. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |