draft-briscoe-tsvwg-ecn-tunnel-01.txt | draft-ietf-tsvwg-ecn-tunnel-00.txt | |||
---|---|---|---|---|
Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
Internet-Draft BT | Internet-Draft BT | |||
Intended status: Standards Track July 14, 2008 | Intended status: Standards Track Oct 16, 2008 | |||
Expires: January 15, 2009 | Expires: April 19, 2009 | |||
Layered Encapsulation of Congestion Notification | Layered Encapsulation of Congestion Notification | |||
draft-briscoe-tsvwg-ecn-tunnel-01 | draft-ietf-tsvwg-ecn-tunnel-00 | |||
Status of this Memo | Status of this Memo | |||
By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
skipping to change at page 1, line 34 | skipping to change at page 1, line 34 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on January 15, 2009. | This Internet-Draft will expire on April 19, 2009. | |||
Abstract | Abstract | |||
This document redefines how the explicit congestion notification | This document redefines how the explicit congestion notification | |||
(ECN) field of the outer IP header of a tunnel should be constructed. | (ECN) field of the outer IP header of a tunnel should be constructed. | |||
It brings all IP in IP tunnels (v4 or v6) into line with the way | It brings all IP in IP tunnels (v4 or v6) into line with the way | |||
IPsec tunnels now construct the ECN field. It includes a thorough | IPsec tunnels now construct the ECN field. It includes a thorough | |||
analysis of the reasoning for this change and the implications. It | analysis of the reasoning for this change and the implications. It | |||
also gives guidelines on the encapsulation of IP congestion | also gives guidelines on the encapsulation of IP congestion | |||
notification by any outer header, whether encapsulated in an IP | notification by any outer header, whether encapsulated in an IP | |||
tunnel or in a lower layer header. Following these guidelines should | tunnel or in a lower layer header. Following these guidelines should | |||
help interworking, if the IETF or other standards bodies specify any | help interworking, if the IETF or other standards bodies specify any | |||
new encapsulation of congestion notification. | new encapsulation of congestion notification. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.1. The Need for Rationalisation . . . . . . . . . . . . . . . 4 | 1.1. The Need for Rationalisation . . . . . . . . . . . . . . . 5 | |||
1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 5 | 1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 6 | |||
1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
2. Requirements Language . . . . . . . . . . . . . . . . . . . . 8 | 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 8 | |||
3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 8 | 3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 8 | |||
3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 8 | 3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 8 | |||
3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 10 | 3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 10 | |||
3.3. Management Constraints . . . . . . . . . . . . . . . . . . 11 | 3.3. Management Constraints . . . . . . . . . . . . . . . . . . 12 | |||
4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 12 | 4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 12 | |||
4.1. Design Guidelines for New Encapsulations of Congestion | 4.1. Design Guidelines for New Encapsulations of Congestion | |||
Notification . . . . . . . . . . . . . . . . . . . . . . . 13 | Notification . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 15 | 5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 15 | |||
6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 16 | 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 16 | |||
7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 18 | 7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 18 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 | |||
10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 21 | 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 | 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 22 | 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 23 | |||
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
13.1. Normative References . . . . . . . . . . . . . . . . . . . 22 | 13.1. Normative References . . . . . . . . . . . . . . . . . . . 23 | |||
13.2. Informative References . . . . . . . . . . . . . . . . . . 23 | 13.2. Informative References . . . . . . . . . . . . . . . . . . 23 | |||
Appendix A. Why resetting CE on encapsulation harms PCN . . . . . 25 | Appendix A. Why resetting CE on encapsulation harms PCN . . . . . 25 | |||
Appendix B. Contribution to Congestion across a Tunnel . . . . . 25 | Appendix B. Contribution to Congestion across a Tunnel . . . . . 26 | |||
Appendix C. Ideal Decapsulation Rules . . . . . . . . . . . . . . 27 | Appendix C. Ideal Decapsulation Rules . . . . . . . . . . . . . . 27 | |||
Appendix D. Non-Dependence of Tunnelling on In-path Load | Appendix D. Non-Dependence of Tunnelling on In-path Load | |||
Regulation . . . . . . . . . . . . . . . . . . . . . 28 | Regulation . . . . . . . . . . . . . . . . . . . . . 29 | |||
D.1. Dependence of In-Path Load Regulation on Tunnelling . . . 29 | D.1. Dependence of In-Path Load Regulation on Tunnelling . . . 30 | |||
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 32 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
Intellectual Property and Copyright Statements . . . . . . . . . . 34 | Intellectual Property and Copyright Statements . . . . . . . . . . 34 | |||
Changes from previous drafts (to be removed by the RFC Editor) | Changes from previous drafts (to be removed by the RFC Editor) | |||
From briscoe-01 to ietf-00 (current): | ||||
* Re-wrote Appendix B giving much simpler technique to measure | ||||
contribution to congestion across a tunnel. | ||||
* Added discussion of backward compatibility of the ideal | ||||
decapsulation scheme in Appendix C | ||||
* Updated references. Minor corrections & clarifications | ||||
throughout. | ||||
From -00 to -01: | From -00 to -01: | |||
* Related everything conceptually to the uniform and pipe models | * Related everything conceptually to the uniform and pipe models | |||
of RFC2983 on Diffserv Tunnels, and completely removed the | of RFC2983 on Diffserv Tunnels, and completely removed the | |||
dependence of tunnelling behaviour on the presence of any in- | dependence of tunnelling behaviour on the presence of any in- | |||
path load regulation by using the [1 - Before] [2 - Outer] | path load regulation by using the [1 - Before] [2 - Outer] | |||
function placement concepts from RFC2983. | function placement concepts from RFC2983; | |||
* Added specifc cases where the existing standards limit new | * Added specific cases where the existing standards limit new | |||
proposals. | proposals, particularly Appendix A; | |||
* Added sub-structure to Introduction (Need for Rationalisation, | * Added sub-structure to Introduction (Need for Rationalisation, | |||
Roadmap), added new Introductory subsection on "Scope" and | Roadmap), added new Introductory subsection on "Scope" and | |||
improved clarity | improved clarity; | |||
* Added Design Guidelines for New Encapsulations of Congestion | * Added Design Guidelines for New Encapsulations of Congestion | |||
Notification | Notification (Section 4.1); | |||
* Considerably clarified the Backward Compatibility section | * Considerably clarified the Backward Compatibility section | |||
(Section 6); | ||||
* Considerably extended the Security Considerations section | * Considerably extended the Security Considerations section | |||
(Section 9); | ||||
* Summarised the primary rationale much better in the conclusions | * Summarised the primary rationale much better in the | |||
conclusions; | ||||
* Added numerous extra acknowledgements | * Added numerous extra acknowledgements; | |||
* Added Appendix A. "Why resetting CE on encapsulation harms | * Added Appendix A. "Why resetting CE on encapsulation harms | |||
PCN", Appendix B. "Contribution to Congestion across a Tunnel" | PCN", Appendix B. "Contribution to Congestion across a Tunnel" | |||
and Appendix C. "Ideal Decapsulation Rules" | and Appendix C. "Ideal Decapsulation Rules"; | |||
* Changed Appendix A "In-path Load Regulation" to "Non-Dependence | * Re-wrote Appendix D, explaining how tunnel encapsulation no | |||
of Tunnelling on In-path Load Regulation" and added sub-section | longer depends on in-path load-regulation (changed title from | |||
on "Dependence of In-Path Load Regulation on Tunnelling" | "In-path Load Regulation" to "Non-Dependence of Tunnelling on | |||
In-path Load Regulation"), but explained how an in-path load | ||||
regulation function must be carefully placed with respect to | ||||
tunnel encapsulation (in a new sub-section entitled "Dependence | ||||
of In-Path Load Regulation on Tunnelling"). | ||||
1. Introduction | 1. Introduction | |||
This document redefines how the explicit congestion notification | This document redefines how the explicit congestion notification | |||
(ECN) field [RFC3168] of the outer IP header of a tunnel should be | (ECN) field [RFC3168] of the outer IP header of a tunnel should be | |||
constructed. It brings all IP in IP tunnels (v4 or v6) into line | constructed. It brings all IP in IP tunnels (v4 or v6) into line | |||
with the way IPsec tunnels [RFC4301] now construct the ECN field, | with the way IPsec tunnels [RFC4301] now construct the ECN field, | |||
ensuring that the outer header reveals any congestion experienced so | ensuring that the outer header reveals any congestion experienced so | |||
far on the whole path, not just since the last tunnel ingress. | far on the whole path, not just since the last tunnel ingress. | |||
skipping to change at page 5, line 38 | skipping to change at page 6, line 9 | |||
makes it harder to design networks and new protocols that work | makes it harder to design networks and new protocols that work | |||
predictably. | predictably. | |||
Already complicated constraints have had to be added to a standards | Already complicated constraints have had to be added to a standards | |||
track congestion marking proposal. The section of the pre-congestion | track congestion marking proposal. The section of the pre-congestion | |||
notification (PCN) architecture [I-D.ietf-pcn-architecture] on | notification (PCN) architecture [I-D.ietf-pcn-architecture] on | |||
tunnelling says PCN works correctly in the presence of RFC4301 IPsec | tunnelling says PCN works correctly in the presence of RFC4301 IPsec | |||
encapsulation (and RFC5129 MPLS encapsulation). However it doesn't | encapsulation (and RFC5129 MPLS encapsulation). However it doesn't | |||
work with RFC3168 IP in IP encapsulation (Appendix A explains why). | work with RFC3168 IP in IP encapsulation (Appendix A explains why). | |||
Section 3 assesses further security, control and management functions | To ensure we do not cause any unintended side-effects, Section 3 | |||
that cannot be achieved in each case (resetting vs copying CE | assesses whether copying or resetting CE would harm any security, | |||
markings). It finds that resetting CE makes life difficult in a | control or management functions. It finds that resetting CE makes | |||
number of directions, while copying CE harms nothing (other than | life difficult in a number of directions, while copying CE harms | |||
opening a low bit-rate covert channel vulnerability which the | nothing (other than opening a low bit-rate covert channel | |||
Security Area deems is manageable). | vulnerability which the IETF Security Area deems is manageable). | |||
1.2. Document Roadmap | 1.2. Document Roadmap | |||
Most of the document gives a thorough analysis of the knock-on | Most of the document gives a thorough analysis of the knock-on | |||
effects of the apparently minor change to tunnel encapsulation. The | effects of the apparently minor change to tunnel encapsulation. The | |||
reader may jump to Section 5 if only interested in standards actions | reader may jump to Section 5 if only interested in standards actions | |||
impacting implementation. The whole document is organised as | impacting implementation. The whole document is organised as | |||
follows: | follows: | |||
o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to | o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to | |||
'switch in' different behaviours for marking the ECN field, just | 'switch in' different behaviours for marking the ECN field, just | |||
as it switches in different per-hop behaviours (PHBs) for | as it switches in different per-hop behaviours (PHBs) for | |||
scheduling. Therefore we cannot only discuss the ECN protocol | scheduling. Therefore we cannot only discuss the ECN protocol | |||
that RFC3168 gives as a default. We need to also give guidance | that RFC3168 gives as a default. Instead, Section 3 lays out the | |||
for possible different marking schemes. Therefore in Section 3 we | design constraints when tunnelling congestion notification without | |||
lay out the design constraints when tunnelling congestion | assuming a particular congestion marking scheme. | |||
notification. | ||||
o Then in Section 4 we resolve the tensions between these | o Then in Section 4 we resolve the tensions between these | |||
constraints to give general design principles and guidelines on | constraints to give general design principles and guidelines on | |||
how a tunnel should process congestion notification; principles | how a tunnel should process congestion notification; principles | |||
that could apply to any marking behaviour for any PHB, not just | that could apply to any marking behaviour for any PHB, not just | |||
the default in RFC3168. In particular, we examine the underlying | the default in RFC3168. In particular, we examine the underlying | |||
principles behind whether CE should be reset or copied into the | principles behind whether CE should be reset or copied into the | |||
outer header at the ingress to a tunnel--or indeed at the ingress | outer header at the ingress to a tunnel--or indeed at the ingress | |||
of any layered encapsulation of headers with congestion | of any layered encapsulation of headers with congestion | |||
notification fields. We end this section with a bulleted list of | notification fields. We end this section with a bulleted list of | |||
more design guidelines for new encapsulations of congestion | design guidelines for new encapsulations of congestion | |||
notification. | notification. | |||
o Section 5 then uses precise standards terminology to confirm the | o Section 5 then uses precise standards terminology to confirm the | |||
rules for the default ECN tunnelling behaviour based on the above | rules for the default ECN tunnelling behaviour based on the above | |||
design principles. | design principles. | |||
o Extending the new IPsec tunnel ingress behaviour to all IP in IP | o Extending the new IPsec tunnel ingress behaviour to all IP in IP | |||
tunnels requires consideration of backwards compatibility, which | tunnels requires consideration of backwards compatibility, which | |||
is covered in Section 6 and changes from earlier RFCs are brought | is covered in Section 6 and changes from earlier RFCs are brought | |||
together in Section 7. | together in Section 7. | |||
skipping to change at page 7, line 34 | skipping to change at page 8, line 4 | |||
As well as guiding alternate IP in IP tunnelling schemes, the design | As well as guiding alternate IP in IP tunnelling schemes, the design | |||
guidelines of Section 4 are intended to be followed when IP packets | guidelines of Section 4 are intended to be followed when IP packets | |||
are encapsulated by any connectionless datagram/packet/frame where | are encapsulated by any connectionless datagram/packet/frame where | |||
the outer header is designed to support a congestion notification | the outer header is designed to support a congestion notification | |||
capability. [RFC5129] already deals with handling ECN for IP in MPLS | capability. [RFC5129] already deals with handling ECN for IP in MPLS | |||
and MPLS in MPLS, and S.9.3 of [RFC3168] lists IP encapsulated in | and MPLS in MPLS, and S.9.3 of [RFC3168] lists IP encapsulated in | |||
L2TP [RFC2661], GRE [RFC1701] or PPTP [RFC2637] as possible examples | L2TP [RFC2661], GRE [RFC1701] or PPTP [RFC2637] as possible examples | |||
where ECN may be added in future. | where ECN may be added in future. | |||
Of course, the IETF does not have standards authority over every link | Of course, the IETF does not have standards authority over every link | |||
or tunnel protocol, so this document merely aims to define the | or tunnel protocol, so this document merely aims to guide the | |||
interface between IP ECN and lower layer congestion notification. | interface between IP ECN and lower layer congestion notification. | |||
Then the IETF or the relevant standards body can be free to define | Then the IETF or the relevant standards body can be free to define | |||
the specifics of each lower layer scheme, but a common interface | the specifics of each lower layer scheme, but a common interface | |||
should ensure interworking across all technologies. | should ensure interworking across all technologies. | |||
Note that just because there is forward congestion notification in a | Note that just because there is forward congestion notification in a | |||
lower layer protocol, if the lower layer has its own feedback and | lower layer protocol, if the lower layer has its own feedback and | |||
load regulation, there is no need to propagate it up the layers. For | load regulation, there is no need to propagate it up the layers. For | |||
instance, FECN (forward ECN) has been present in Frame Relay and EFCI | instance, FECN (forward ECN) has been present in Frame Relay and EFCI | |||
(explicit forward congestion indication) in ATM [ITU-T.I.371] for a | (explicit forward congestion indication) in ATM [ITU-T.I.371] for a | |||
long time, but they have been used for internal management rather | long time. But so far they have been used for internal management | |||
than being propagated to endpoint transports for them to control end- | rather than being propagated to endpoint transports for them to | |||
to-end congestion. | control end-to-end congestion. | |||
[RFC2983] is a comprehensive primer on differentiated services and | [RFC2983] is a comprehensive primer on differentiated services and | |||
tunnels. Given ECN raises similar issues to differentiated services | tunnels. Given ECN raises similar issues to differentiated services | |||
when interacting with tunnels, useful concepts introduced in RFC2983 | when interacting with tunnels, useful concepts introduced in RFC2983 | |||
are used throughout, with brief recaps of the explanations where | are used throughout, with brief recaps of the explanations where | |||
necessary. | necessary. | |||
2. Requirements Language | 2. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
skipping to change at page 8, line 31 | skipping to change at page 8, line 49 | |||
Information security can be assured by using various end to end | Information security can be assured by using various end to end | |||
security solutions (including IPsec in transport mode [RFC4301]), but | security solutions (including IPsec in transport mode [RFC4301]), but | |||
a commonly used scenario involves the need to communicate between two | a commonly used scenario involves the need to communicate between two | |||
physically protected domains across the public Internet. In this | physically protected domains across the public Internet. In this | |||
case there are certain management advantages to using IPsec in tunnel | case there are certain management advantages to using IPsec in tunnel | |||
mode solely across the publicly accessible part of the path. The | mode solely across the publicly accessible part of the path. The | |||
path followed by a packet then crosses security 'domains'; the ones | path followed by a packet then crosses security 'domains'; the ones | |||
protected by physical or other means before and after the tunnel and | protected by physical or other means before and after the tunnel and | |||
the one protected by an IPsec tunnel across the otherwise unprotected | the one protected by an IPsec tunnel across the otherwise unprotected | |||
domain. We will use the scenario in Figure 1 where endpoints 'A' and | domain. We will use the scenario in Figure 1 where endpoints 'A' and | |||
'B' communicate through a tunnel with ingress 'I' and egress 'E' | 'B' communicate through a tunnel. The tunnel ingress 'I' and egress | |||
within physically protected edge domains across an unprotected | 'E' are within physically protected edge domains, while the tunnel | |||
internetwork where there may be 'men in the middle', M. | spans an unprotected internetwork where there may be 'men in the | |||
middle', M. | ||||
physically unprotected physically | physically unprotected physically | |||
<-protected domain-><--domain--><-protected domain-> | <-protected domain-><--domain--><-protected domain-> | |||
+------------------+ +------------------+ | +------------------+ +------------------+ | |||
| | M | | | | | M | | | |||
| A-------->I=========>==========>E-------->B | | | A-------->I=========>==========>E-------->B | | |||
| | | | | | | | | | |||
+------------------+ +------------------+ | +------------------+ +------------------+ | |||
<----IPsec secured----> | <----IPsec secured----> | |||
tunnel | tunnel | |||
skipping to change at page 9, line 23 | skipping to change at page 9, line 42 | |||
of the inner header. And if 'E' copies these fields from the outer | of the inner header. And if 'E' copies these fields from the outer | |||
header to the inner, even if it validates authentication from 'I', it | header to the inner, even if it validates authentication from 'I', it | |||
will have allowed a covert channel from 'M' to 'B'. | will have allowed a covert channel from 'M' to 'B'. | |||
ECN at the IP layer is designed to carry information about congestion | ECN at the IP layer is designed to carry information about congestion | |||
from a congested resource towards downstream nodes. Typically a | from a congested resource towards downstream nodes. Typically a | |||
downstream transport might feed the information back somehow to the | downstream transport might feed the information back somehow to the | |||
point upstream of the congestion that can regulate the load on the | point upstream of the congestion that can regulate the load on the | |||
congested resource, but other actions are possible (see [RFC3168] | congested resource, but other actions are possible (see [RFC3168] | |||
S.6). In terms of the above unicast scenario, ECN is typically | S.6). In terms of the above unicast scenario, ECN is typically | |||
intended to create an information channel from 'M' to 'B', for 'B' to | intended to create an information channel from 'M' to 'B' (for 'B' to | |||
forward to 'A'. Therefore the goals of IPsec and ECN are mutually | feed back to 'A'). Therefore the goals of IPsec and ECN are mutually | |||
incompatible. | incompatible. | |||
With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, | With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, | |||
"controls are provided to manage the bandwidth of this [covert] | "controls are provided to manage the bandwidth of this [covert] | |||
channel". Using the ECN processing rules of RFC4301, the channel | channel". Using the ECN processing rules of RFC4301, the channel | |||
bandwidth is two bits per datagram from 'A' to 'M' and one bit per | bandwidth is two bits per datagram from 'A' to 'M' and one bit per | |||
datagram from 'M' to 'A' (because 'E' limits the combinations of the | datagram from 'M' to 'A' (because 'E' limits the combinations of the | |||
2-bit ECN field that it will copy). In both cases the covert channel | 2-bit ECN field that it will copy). In both cases the covert channel | |||
bandwidth is further reduced by noise from any real congestion | bandwidth is further reduced by noise from any real congestion | |||
marking. RFC4301 therefore implies that these covert channels are | marking. RFC4301 therefore implies that these covert channels are | |||
skipping to change at page 10, line 12 | skipping to change at page 10, line 30 | |||
by copying into the outer header on encapsulation and copying from | by copying into the outer header on encapsulation and copying from | |||
the outer header on decapsulation. | the outer header on decapsulation. | |||
The pipe model: where the outer header is independent of that in the | The pipe model: where the outer header is independent of that in the | |||
inner header so it hides the Diffserv field of the inner header | inner header so it hides the Diffserv field of the inner header | |||
from any interaction with nodes along the tunnel. | from any interaction with nodes along the tunnel. | |||
However, for ECN, the new IPsec security architecture in RFC4301 only | However, for ECN, the new IPsec security architecture in RFC4301 only | |||
standardised one tunnelling model equivalent to the uniform model. | standardised one tunnelling model equivalent to the uniform model. | |||
It deemed that simplicity was more important than allowing | It deemed that simplicity was more important than allowing | |||
administrators the option of a tiny increment in security especially | administrators the option of a tiny increment in security, especially | |||
given not copying congestion indications could seriously harm | given not copying congestion indications could seriously harm | |||
everyone's network service. | everyone's network service. | |||
3.2. Control Constraints | 3.2. Control Constraints | |||
Congestion control requires that any congestion notification marked | Congestion control requires that any congestion notification marked | |||
into packets by a resource will be able to traverse a feedback loop | into packets by a resource will be able to traverse a feedback loop | |||
back to a node capable of controlling the load on that resource. To | back to a function capable of controlling the load on that resource. | |||
be precise, rather than calling this node the data source, we will | To be precise, rather than calling this function the data source, we | |||
call it the Load Regulator. This will allow us to deal with | will call it the Load Regulator. This will allow us to deal with | |||
exceptional cases where load is not regulated by the data source, but | exceptional cases where load is not regulated by the data source, but | |||
usually the two terms will be synonymous. Note the term "a node | usually the two terms will be synonymous. Note the term "a function | |||
_capable of_ controlling the load" deliberately includes a source | _capable of_ controlling the load" deliberately includes a source | |||
application that doesn't actually control the load but ought to (e.g. | application that doesn't actually control the load but ought to (e.g. | |||
an application without congestion control that uses UDP). | an application without congestion control that uses UDP). | |||
A--->R--->I=========>M=========>E-------->B | A--->R--->I=========>M=========>E-------->B | |||
Figure 2: Simple Tunnel Scenario | Figure 2: Simple Tunnel Scenario | |||
We now consider a similar tunnelling scenario to the IPsec one just | We now consider a similar tunnelling scenario to the IPsec one just | |||
described, but without the different security domains so we can just | described, but without the different security domains so we can just | |||
skipping to change at page 11, line 14 | skipping to change at page 11, line 37 | |||
congestion occurred across a tunnel or upstream of it. If outer | congestion occurred across a tunnel or upstream of it. If outer | |||
header congestion marking was reset by the tunnel ingress ('I'), at | header congestion marking was reset by the tunnel ingress ('I'), at | |||
the end of a tunnel ('E') the outer headers would indicate congestion | the end of a tunnel ('E') the outer headers would indicate congestion | |||
experienced across the tunnel ('I' to 'E'), while the inner header | experienced across the tunnel ('I' to 'E'), while the inner header | |||
would indicate congestion upstream of 'I'. But similar information | would indicate congestion upstream of 'I'. But similar information | |||
can be gleaned even if the tunnel ingress copies the inner to the | can be gleaned even if the tunnel ingress copies the inner to the | |||
outer headers. At the end of the tunnel ('E'), any packet with an | outer headers. At the end of the tunnel ('E'), any packet with an | |||
_extra_ mark in the outer header relative to the inner header | _extra_ mark in the outer header relative to the inner header | |||
indicates congestion across the tunnel ('I' to 'E'), while the inner | indicates congestion across the tunnel ('I' to 'E'), while the inner | |||
header would still indicate congestion upstream of ('I'). Appendix B | header would still indicate congestion upstream of ('I'). Appendix B | |||
gives a more precise method for inferring the congestion level | gives a simple and precise method for a tunnel egress to infer the | |||
introduced across a tunnel. | congestion level introduced across a tunnel. | |||
All this shows that 'E' can preserve the control loop irrespective of | All this shows that 'E' can preserve the control loop irrespective of | |||
whether 'I' copies congestion notification into the outer header or | whether 'I' copies congestion notification into the outer header or | |||
resets it. | resets it. | |||
That is the situation for existing control arrangements but, because | That is the situation for existing control arrangements but, because | |||
copying reveals more information, it would open up possibilities for | copying reveals more information, it would open up possibilities for | |||
better control system designs. For instance, Appendix A describes | better control system designs. For instance, Appendix A describes | |||
how resetting CE marking at a tunnel ingress confuses a proposed | how resetting CE marking at a tunnel ingress confuses a proposed | |||
congestion marking scheme on the standards track. It ends up | congestion marking scheme on the standards track. It ends up | |||
removing excessive amounts of traffic unnecessarily. Whereas copying | removing excessive amounts of traffic unnecessarily. Whereas copying | |||
CE markings at ingress leads to the correct control behaviour. | CE markings at ingress leads to the correct control behaviour. | |||
3.3. Management Constraints | 3.3. Management Constraints | |||
As well as control, there are also management constraints. | As well as control, there are also management constraints. | |||
Specifically, a management system may monitor congestion markings in | Specifically, a management system may monitor congestion markings in | |||
passing packets, perhaps at the border between networks as part of a | passing packets, perhaps at the border between networks as part of a | |||
service level agreement. For instance, monitors at the borders of | service level agreement. For instance, monitors at the borders of | |||
autonomous systems may need to measure how much congestion has | autonomous systems may need to measure how much congestion has | |||
accumulated since the original source to determine between them how | accumulated since the original source, perhaps to determine between | |||
much of the congestion is contributed by each domain. | them how much of the congestion is contributed by each domain. | |||
Therefore, when monitoring the middle of a path, it should be | Therefore, when monitoring the middle of a path, it should be | |||
possible to establish how far back in the path congestion markings | possible to establish how far back in the path congestion markings | |||
have accumulated from. In this document we term this the baseline of | have accumulated from. In this document we term this the baseline of | |||
congestion marking (or the Congestion Baseline), i.e. the source of | congestion marking (or the Congestion Baseline), i.e. the source of | |||
the layer that last reset (or created) the congestion notification | the layer that last reset (or created) the congestion notification | |||
field. Given some tunnels cross domain borders (e.g. consider M in | field. Given some tunnels cross domain borders (e.g. consider M in | |||
Figure 2 is monitoring a border), it would therefore be desirable for | Figure 2 is monitoring a border), it would therefore be desirable for | |||
'I' to copy congestion accumulated so far into the outer headers | 'I' to copy congestion accumulated so far into the outer headers | |||
exposed across the tunnel. | exposed across the tunnel. | |||
Appendix D discusses various scenarios where the Load Regulator lies | Appendix D discusses various scenarios where the Load Regulator lies | |||
in-path, not at the source host as we would typically expect. It | in-path, not at the source host as we would typically expect. It | |||
concludes that a Congestion Baseline is determined by where the Load | concludes that a Congestion Baseline is determined by where the Load | |||
Regulator function is, which should be identified in the transport | Regulator function is, which should be identified in the transport | |||
layer, not by addresses in network layer headers. This applies | layer, not by addresses in network layer headers. This applies | |||
whether the Load Regulator is at the source host or within the path. | whether the Load Regulator is at the source host or within the path. | |||
The appendix also discusses where a Load Regulator function should be | The appendix also discusses where a Load Regulator function should be | |||
located relative to a local encapsulation function. | located relative to a local tunnel encapsulation function. | |||
4. Design Principles | 4. Design Principles | |||
The constraints from the three perspectives of security, control and | The constraints from the three perspectives of security, control and | |||
management in Section 3 are somewhat in tension as to whether a | management in Section 3 are somewhat in tension as to whether a | |||
tunnel ingress should copy congestion markings into the outer header | tunnel ingress should copy congestion markings into the outer header | |||
it creates or reset them. From the control perspective either | it creates or reset them. From the control perspective either | |||
copying or resetting works for existing arrangements, but copying has | copying or resetting works for existing arrangements, but copying has | |||
more potential for simplifying control. From the management | more potential for simplifying control. From the management | |||
perspective copying is preferable. From the security perspective | perspective copying is preferable. From the security perspective | |||
skipping to change at page 15, line 45 | skipping to change at page 16, line 20 | |||
2-bit ECN field of the arriving IP header into the outer | 2-bit ECN field of the arriving IP header into the outer | |||
encapsulating IP header, for all types of IP in IP tunnel. This | encapsulating IP header, for all types of IP in IP tunnel. This | |||
encapsulation behaviour MUST only be used if the tunnel ingress is in | encapsulation behaviour MUST only be used if the tunnel ingress is in | |||
`normal state'. A `compatibility state' with a different | `normal state'. A `compatibility state' with a different | |||
encapsulation behaviour is also specified in Section 6 for backward | encapsulation behaviour is also specified in Section 6 for backward | |||
compatibility with legacy tunnel egresses that do not understand ECN. | compatibility with legacy tunnel egresses that do not understand ECN. | |||
To decapsulate the inner header at the tunnel egress, a compliant | To decapsulate the inner header at the tunnel egress, a compliant | |||
tunnel egress MUST set the outgoing ECN field to the codepoint at the | tunnel egress MUST set the outgoing ECN field to the codepoint at the | |||
intersection of the appropriate incoming inner header (row) and outer | intersection of the appropriate incoming inner header (row) and outer | |||
header (column) in Table 1. | header (column) in Figure 3. | |||
+--Incoming Outer Header--- | ||||
+---------------------------------------------+ | ||||
| Incoming Outer Header | | ||||
+---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | |||
| Header | | | | | | | Header | | | | | | |||
+---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) | | | Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) | | |||
| ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE | | | ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE | | |||
| ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE | | | ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE | | |||
| CE | CE | CE | CE (!!!) | CE | | | CE | CE | CE | CE (!!!) | CE | | |||
+---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| Outgoing Header | | ||||
+---------------------------------------------+ | ||||
+-----Outgoing Header------ | Figure 3: IP in IP Decapsulation | |||
Table 1: IP in IP Decapsulation | ||||
The exclamation marks '(!!!)' in Table 1 indicate that this | The exclamation marks '(!!!)' in Figure 3 indicate that this | |||
combination of inner and outer headers should not be possible if only | combination of inner and outer headers should not be possible if only | |||
legal transitions have taken place. So, the decapsulator should drop | legal transitions have taken place. So, the decapsulator should drop | |||
or mark the ECN field as the table specifies, but it MAY also raise | or mark the ECN field as the table specifies, but it MAY also raise | |||
an appropriate alarm. It MUST NOT raise an alarm so often that the | an appropriate alarm. It MUST NOT raise an alarm so often that the | |||
illegal combinations would amplify into a flood of alarm messages. | illegal combinations would amplify into a flood of alarm messages. | |||
6. Backward Compatibility | 6. Backward Compatibility | |||
Note: in RFC3168, a tunnel was in one of two modes: limited | Note: in RFC3168, a tunnel was in one of two modes: limited | |||
functionality or full functionality. Rather than working with modes | functionality or full functionality. Rather than working with modes | |||
skipping to change at page 22, line 24 | skipping to change at page 22, line 42 | |||
design of alternate forms of tunnel processing of congestion | design of alternate forms of tunnel processing of congestion | |||
notification, if required for specific Diffserv PHBs or for other | notification, if required for specific Diffserv PHBs or for other | |||
lower layer encapsulating protocols that might support congestion | lower layer encapsulating protocols that might support congestion | |||
notification in the future. | notification in the future. | |||
11. Acknowledgements | 11. Acknowledgements | |||
Thanks to David Black for explaining a better way to think about | Thanks to David Black for explaining a better way to think about | |||
function placement and to Louise Burness for a better way to think | function placement and to Louise Burness for a better way to think | |||
about multilayer transports and networks, having read | about multilayer transports and networks, having read | |||
[Patterns_Arch]. Also thanks to Arnaud Jacquet for ideas behind the | [Patterns_Arch]. Also thanks to Arnaud Jacquet for the idea for | |||
algorithms in Appendix B. Thanks to Bruce Davie, Toby Moncaster, | Appendix B. Thanks to Bruce Davie, Toby Moncaster, Gorry Fairhurst, | |||
Gorry Fairhurst, Sally Floyd, Alfred Hoenes and Gabriele Corliano for | Sally Floyd, Alfred Hoenes and Gabriele Corliano for their thoughts | |||
their thoughts and careful review comments. | and careful review comments. | |||
Bob Briscoe is partly funded by Trilogy, a research project (ICT- | ||||
216372) supported by the European Community under its Seventh | ||||
Framework Programme. The views expressed here are those of the | ||||
author only. | ||||
12. Comments Solicited | 12. Comments Solicited | |||
Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
addressed to the IETF Transport Area working group mailing list | addressed to the IETF Transport Area working group mailing list | |||
<tsvwg@ietf.org>, and/or to the authors. | <tsvwg@ietf.org>, and/or to the authors. | |||
13. References | 13. References | |||
13.1. Normative References | 13.1. Normative References | |||
skipping to change at page 23, line 14 | skipping to change at page 23, line 35 | |||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
RFC 3168, September 2001. | RFC 3168, September 2001. | |||
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the | [RFC4301] Kent, S. and K. Seo, "Security Architecture for the | |||
Internet Protocol", RFC 4301, December 2005. | Internet Protocol", RFC 4301, December 2005. | |||
13.2. Informative References | 13.2. Informative References | |||
[I-D.eardley-pcn-marking-behaviour] | ||||
Eardley, P., "Marking behaviour of PCN-nodes", | ||||
draft-eardley-pcn-marking-behaviour-01 (work in progress), | ||||
June 2008. | ||||
[I-D.ietf-pcn-architecture] | [I-D.ietf-pcn-architecture] | |||
Eardley, P., "Pre-Congestion Notification Architecture", | Eardley, P., "Pre-Congestion Notification (PCN) | |||
draft-ietf-pcn-architecture-03 (work in progress), | Architecture", draft-ietf-pcn-architecture-07 (work in | |||
February 2008. | progress), September 2008. | |||
[I-D.ietf-pcn-marking-behaviour] | ||||
Eardley, P., "Marking behaviour of PCN-nodes", | ||||
draft-ietf-pcn-marking-behaviour-00 (work in progress), | ||||
October 2008. | ||||
[I-D.ietf-pwe3-congestion-frmwk] | [I-D.ietf-pwe3-congestion-frmwk] | |||
Bryant, S., Davie, B., Martini, L., and E. Rosen, | Bryant, S., Davie, B., Martini, L., and E. Rosen, | |||
"Pseudowire Congestion Control Framework", | "Pseudowire Congestion Control Framework", | |||
draft-ietf-pwe3-congestion-frmwk-01 (work in progress), | draft-ietf-pwe3-congestion-frmwk-01 (work in progress), | |||
May 2008. | May 2008. | |||
[I-D.moncaster-pcn-3-state-encoding] | [I-D.moncaster-pcn-3-state-encoding] | |||
Moncaster, T., Briscoe, B., and M. Menth, "A three state | Moncaster, T., Briscoe, B., and M. Menth, "A three state | |||
extended PCN encoding scheme", | extended PCN encoding scheme", | |||
skipping to change at page 25, line 16 | skipping to change at page 25, line 39 | |||
(Expired) | (Expired) | |||
Appendix A. Why resetting CE on encapsulation harms PCN | Appendix A. Why resetting CE on encapsulation harms PCN | |||
Regarding encapsulation, the section of the PCN architecture | Regarding encapsulation, the section of the PCN architecture | |||
[I-D.ietf-pcn-architecture] on tunnelling says that header copying | [I-D.ietf-pcn-architecture] on tunnelling says that header copying | |||
(RFC4301) allows PCN to work correctly. However, resetting CE | (RFC4301) allows PCN to work correctly. However, resetting CE | |||
markings confuses PCN marking. | markings confuses PCN marking. | |||
The specific issue here concerns PCN excess rate marking | The specific issue here concerns PCN excess rate marking | |||
[I-D.eardley-pcn-marking-behaviour], i.e. the bulk marking of traffic | [I-D.ietf-pcn-marking-behaviour], i.e. the bulk marking of traffic | |||
that exceeds a configured threshold rate. One of the goals of excess | that exceeds a configured threshold rate. One of the goals of excess | |||
rate marking is to enable the speedy removal of excess admission | rate marking is to enable the speedy removal of excess admission | |||
controlled traffic following re-routes caused by link failures or | controlled traffic following re-routes caused by link failures or | |||
other disasters. This maintains a share of the capacity for | other disasters. This maintains a share of the capacity for | |||
competing admission controlled traffic and for traffic in lower | competing admission controlled traffic and for traffic in lower | |||
priority classes. After failures, traffic re-routed onto remaining | priority classes. After failures, traffic re-routed onto remaining | |||
links can often stress multiple links along a path. Therefore, | links can often stress multiple links along a path. Therefore, | |||
traffic can arrive at a link under stress with some proportion | traffic can arrive at a link under stress with some proportion | |||
already marked for removal by a previous link. By design, marked | already marked for removal by a previous link. By design, marked | |||
traffic will be removed by the overall system in subsequent round | traffic will be removed by the overall system in subsequent round | |||
trips. So when the excess rate marking algorithm decides how much | trips. So when the excess rate marking algorithm decides how much | |||
traffic to mark for removal, it doesn't include traffic already | traffic to mark for removal, it doesn't include traffic already | |||
marked for removal by another node upstream (the `Excess traffic | marked for removal by another node upstream (the `Excess traffic | |||
meter function' of [I-D.eardley-pcn-marking-behaviour]). | meter function' of [I-D.ietf-pcn-marking-behaviour]). | |||
However, if an RFC3168 tunnel ingress intervenes, it resets the ECN | However, if an RFC3168 tunnel ingress intervenes, it resets the ECN | |||
field in all the outer headers, hiding all the evidence of problems | field in all the outer headers, hiding all the evidence of problems | |||
upstream. Thus, although excess rate marking works fine with RFC4301 | upstream. Thus, although excess rate marking works fine with RFC4301 | |||
IPsec tunnels, with RFC3168 tunnels it typically removes large | IPsec tunnels, with RFC3168 tunnels it typically removes large | |||
volumes of traffic that it didn't need to remove at all. | volumes of traffic that it didn't need to remove at all. | |||
Appendix B. Contribution to Congestion across a Tunnel | Appendix B. Contribution to Congestion across a Tunnel | |||
This specification mandates that a tunnel ingress determines the ECN | This specification mandates that a tunnel ingress determines the ECN | |||
field of each new outer tunnel header by copying the arriving header. | field of each new outer tunnel header by copying the arriving header. | |||
If instead the outer ECN field were reset at a tunnel ingress (as it | Concern has been expressed that this will make it difficult for the | |||
was for the full functionality mode of RFC3168), it would be possible | tunnel egress to monitor congestion introduced along a tunnel, which | |||
for the tunnel egress to measure: | is easy if the outer ECN field is reset at a tunnel ingress (RFC3168 | |||
full functionality mode). However, in fact copying CE marks at | ||||
o congestion marking before the tunnel ingress (fraction of inner | ingress will still make it easy for the egress to measure congestion | |||
header markings, p_i); | introduced across a tunnel, as illustrated below. | |||
o congestion marking across the tunnel (fraction of outer header | ||||
markings, p_t); | ||||
o congestion marking after the tunnel egress (fraction of departing | Consider 100 packets measured at the egress. It measures that 30 are | |||
header markings, p_o). | CE marked in the inner and outer headers and 12 have additional CE | |||
marks in the outer but not the inner. This means packets arriving at | ||||
the ingress had already experienced 30% congestion. However, it does | ||||
not mean there was 12% congestion across the tunnel. The correct | ||||
calculation of congestion across the tunnel is p_t = 12/(100-30) = | ||||
12/70 = 17%. This is easy for the egress to to measure. It is the | ||||
packets with additional CE marking in the outer header (12) as a | ||||
proportion of packets not marked in the inner header (70). | ||||
Although the newly mandated copying behaviour at ingress gains the | Figure 4 illustrates this in a combinatorial probability diagram. | |||
advantages described in the body of this specification, this one | The square represents 100 packets. The 30% division along the bottom | |||
advantage of the resetting behaviour of RFC3168 seems to have been | represents marking before the ingress, and the p_t division up the | |||
lost: on first impressions, it seems that the egress can no longer | side represents marking along the tunnel. | |||
accurately measure congestion contributed along the tunnel (p_t). | ||||
The egress could _estimate _the contribution along the tunnel by | ||||
measure which packets carry only a mark in the outer header (not the | ||||
inner). But this is not precisely the same as the congestion | ||||
contributed along the tunnel; tunnel nodes may have tried to mark | ||||
some packets that already had a marking in both the inner and outer | ||||
header. Measuring only additional outer markings will miss these. | ||||
Nonetheless, with the newly proposed scheme, a tunnel egress can | ||||
derive a precise estimate of marking introduced across a tunnel (p_t) | ||||
as follows. | ||||
The combined fraction of markings at the tunnel egress will be p_o = | +-----+---------+100% | |||
1 - (1 - p_i)(1 - p_t). Explanation: this is (1 - the probability a | | | | | |||
departing packet is not marked), which is (1 - (prob not marked | | 30 | | | |||
before tunnel)(prob not marked along tunnel)). Therefore, | | | | The large square | |||
rearranging, the egress can infer the fraction of marks introduced | | +---------+p_t represents 100 packets | |||
across the tunnel as p_t = (p_o - p_i)/(1 - p_i). If arriving | | | 12 | | |||
congestion is low (p_i <<1), then the approximation p_t ~ (p_o - p_i) | +-----+---------+0 | |||
should be good enough. This is the estimate we advised originally; | 0 30% 100% | |||
i.e. measuring only the extra markings in the outer header that are | inner header marking | |||
not present in the inner header. If a better approximation is needed | ||||
p_t ~ (p_o - p_i)(1 + p_i), which removes the division, but still | ||||
assumes p_i<<1. | ||||
Using any of these formulae (including the precise one), it would be | Figure 4: Tunnel Marking of Packets Already Marked at Ingress | |||
possible for a tunnel egress to calculate a moving average of the | ||||
fraction of packets being marked by tunnel nodes, including those | ||||
already marked in the inner header. Alternatively, it should even be | ||||
possible for a tunnel egress to reverse engineer which packets would | ||||
have been marked across the tunnel if CE was reset on ingress even if | ||||
CE was actually copied on ingress.[[anchor3: Note from Bob: I've | ||||
worked out an algorithm so the tunnel egress can reverse engineer | ||||
marking as if CE was reset at the ingress even though CE was copied | ||||
at the ingress. It typically consumes 2 cycles / pkt, occasionally 4 | ||||
and very occasionally 8. {ToDo: On testing an implementation just now | ||||
it still has a wrinkle in it, but with a little more development I | ||||
believe it would work well. I'll write it into the next revision if | ||||
I get it working.}]] | ||||
Appendix C. Ideal Decapsulation Rules | Appendix C. Ideal Decapsulation Rules | |||
Compliance with this appendix is NOT REQUIRED for compliance with the | This appendix is not normative. Compliance with this appendix is NOT | |||
present specification. | REQUIRED for compliance with the present specification. | |||
If the default ECN encapsulation behaviour does not offer suitable | If the default ECN encapsulation behaviour does not offer suitable | |||
trade offs, procedures exist for associating a new behaviour with a | trade offs, procedures exist for associating a new behaviour with a | |||
new Diffserv PHB. However, it is unrealistic to expect vendors of | new Diffserv PHB. However, it is unrealistic to expect vendors of | |||
all IPSec and all IP in IP tunnel endpoints to cater for the | all IPSec and all IP in IP tunnel endpoints to cater for the | |||
exceptional behaviour of PHB XXX. If all tunnels did require XXX- | exceptional behaviour of PHB XXX. If all tunnels did require XXX- | |||
specific behaviour, the resulting patchy and error-prone deployment | specific behaviour, the resulting patchy and error-prone deployment | |||
would probably cause XXX to suffer byzantine feature interactions | would probably cause XXX to suffer byzantine feature interactions | |||
with poorly implemented tunnels. The default rules for tunnel | with poorly implemented tunnels. The default rules for tunnel | |||
endpoints to handle both the Diffserv field and the ECN field should | endpoints to handle both the Diffserv field and the ECN field should | |||
skipping to change at page 27, line 42 | skipping to change at page 28, line 7 | |||
marking) [I-D.ietf-pcn-architecture]. The aim is for the first level | marking) [I-D.ietf-pcn-architecture]. The aim is for the first level | |||
of marking to stop admitting new traffic and the second level to | of marking to stop admitting new traffic and the second level to | |||
terminate sufficient existing flows to bring a network back to its | terminate sufficient existing flows to bring a network back to its | |||
operating point after a serious failure. | operating point after a serious failure. | |||
Although the ECN field gives sufficient codepoints for these three | Although the ECN field gives sufficient codepoints for these three | |||
states, the PCN working group cannot use them in case any tunnel | states, the PCN working group cannot use them in case any tunnel | |||
decapsulations occur within a PCN region. If a node in a tunnel sets | decapsulations occur within a PCN region. If a node in a tunnel sets | |||
the ECN field to ECT(0) or ECT(1), this change will be discarded by a | the ECN field to ECT(0) or ECT(1), this change will be discarded by a | |||
tunnel egress compliant with RFC4301 and RFC3168. This can be seen | tunnel egress compliant with RFC4301 and RFC3168. This can be seen | |||
in Table 1, where the ECT values in the outer header are ignored | in Figure 3, where the ECT values in the outer header are ignored | |||
unless the inner header is the same. Effectively the ECT(0) and | unless the inner header is the same. Effectively the ECT(0) and | |||
ECT(1) codepoints have to be treated as just one codepoint when they | ECT(1) codepoints have to be treated as just one codepoint when they | |||
could otherwise have been used for their intended purpose of | could otherwise have been used for their intended purpose of | |||
congestion notification. Instead, the PCN w-g has had to propose | congestion notification. Instead, the PCN w-g has had to propose | |||
using extra Diffserv codepoint(s) to encode the extra states | using extra Diffserv codepoint(s) to encode the extra states | |||
[I-D.moncaster-pcn-3-state-encoding], using up the rapidly exhausting | [I-D.moncaster-pcn-3-state-encoding], using up the rapidly exhausting | |||
DSCP space while leaving ECN codepoints unused. | DSCP space while leaving ECN codepoints unused. | |||
Although this is currently most pressing for the PCN working group, | Although this is currently most pressing for the PCN working group, | |||
the issue is more general. Under Security Considerations (Section 9) | the issue is more general. Under Security Considerations (Section 9) | |||
skipping to change at page 28, line 17 | skipping to change at page 28, line 31 | |||
More generally, the currently standardised tunnel decapsulation | More generally, the currently standardised tunnel decapsulation | |||
behaviour unnecessarily wastes a quarter of two bits (i.e. half a | behaviour unnecessarily wastes a quarter of two bits (i.e. half a | |||
bit) in the IP (v4 & v6) header. As explained in Section 3.1, the | bit) in the IP (v4 & v6) header. As explained in Section 3.1, the | |||
original reason for not copying down outer ECT codepoints for onward | original reason for not copying down outer ECT codepoints for onward | |||
forwarding was to limit the covert channel across a decapsulator to 1 | forwarding was to limit the covert channel across a decapsulator to 1 | |||
bit per packet. However, now that the IETF Security Area has deemed | bit per packet. However, now that the IETF Security Area has deemed | |||
that a 2-bit covert channel through an encapsulator is a manageable | that a 2-bit covert channel through an encapsulator is a manageable | |||
risk, the same should be true for a decapsulator. | risk, the same should be true for a decapsulator. | |||
Table 2 proposes a more ideal layered decapsulation behaviour. Note: | Figure 5 proposes a more ideal layered decapsulation behaviour. | |||
this table is only to support discussion. It is not currently | Note: this table is only to support discussion. It is not currently | |||
proposed for standards action. The only difference from Table 1 | proposed for standards action. The only difference from Figure 3 | |||
(that is proposed for standards action), is the swapping of the cells | (that is proposed for standards action), is the swapping of the cells | |||
highlighted as *ECT(X)*. | highlighted as *ECT(X)*. | |||
+--Incoming Outer Header--- | +---------------------------------------------+ | |||
| Incoming Outer Header | | ||||
+---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | |||
| Header | | | | | | | Header | | | | | | |||
+---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) | | | Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) | | |||
| ECT(0) | ECT(0) | ECT(0) | *ECT(1)* | CE | | | ECT(0) | ECT(0) | ECT(0) | *ECT(1)* | CE | | |||
| ECT(1) | ECT(1) | *ECT(0)* | ECT(1) | CE | | | ECT(1) | ECT(1) | *ECT(0)* | ECT(1) | CE | | |||
| CE | CE | CE | CE (!!!) | CE | | | CE | CE | CE | CE (!!!) | CE | | |||
+---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| Outgoing Header | | ||||
+---------------------------------------------+ | ||||
+-----Outgoing Header------ | Figure 5: Ideal IP in IP Decapsulation (currently informative, not | |||
normative) | ||||
Table 2: Ideal IP in IP Decapsulation (currently NOT REQUIRED) | ||||
Note that, if this ideal proposal were taken up, extra backwards | Note that, if this ideal proposal were taken up, a tunnel egress | |||
compatibility issues would have to be resolved. | complying with it would be backwards compatible with all previous | |||
specifications for encapsulation of ECN at the ingress (RFC4301, both | ||||
modes of RFC3168, both modes of RFC2481 and RFC2003). In comparison | ||||
with an RFC3168 or RFC4301 tunnel egress, it would require no | ||||
additional configuration at the ingress nor any additional | ||||
negotiation with the ingress. The only new issue would be the burden | ||||
of an extra standard to be compliant with, adding to the already | ||||
complex history of ECN tunnelling RFCs. | ||||
Appendix D. Non-Dependence of Tunnelling on In-path Load Regulation | Appendix D. Non-Dependence of Tunnelling on In-path Load Regulation | |||
We have said that at any point in a network, the Congestion Baseline | We have said that at any point in a network, the Congestion Baseline | |||
(where congestion notification starts from zero) should be the | (where congestion notification starts from zero) should be the | |||
previous upstream Load Regulator. We have also said that the ingress | previous upstream Load Regulator. We have also said that the ingress | |||
of an IP in IP tunnel must copy congestion indications to the | of an IP in IP tunnel must copy congestion indications to the | |||
encapsulating outer headers it creates. If the Load Regulator is in- | encapsulating outer headers it creates. If the Load Regulator is in- | |||
path rather than at the source, and also a tunnel ingress, these two | path rather than at the source, and also a tunnel ingress, these two | |||
requirements seem to be contradictory. A tunnel ingress must not | requirements seem to be contradictory. A tunnel ingress must not | |||
reset incoming congestion, but a Load Regulator must be the | reset incoming congestion, but a Load Regulator must be the | |||
Congestion Baseline, implying it needs to reset incoming congestion. | Congestion Baseline, implying it needs to reset incoming congestion. | |||
In fact, the two requirements are not contradictory, because a Load | In fact, the two requirements are not contradictory, because a Load | |||
Regulator and a tunnel ingress are functions within a node that occur | Regulator and a tunnel ingress are functions within a node that | |||
in sequence on a stream of packets, not at the same point. Figure 3 | typically occur in sequence on a stream of packets, not at the same | |||
is borrowed from [RFC2983] (which was making a similar point about | point. Figure 6 is borrowed from [RFC2983] (which was making a | |||
the location of Diffserv traffic conditioning relative to the | similar point about the location of Diffserv traffic conditioning | |||
encapsulation function of a tunnel). An in-path Load Regulator can | relative to the encapsulation function of a tunnel). An in-path Load | |||
act on packets either at [1 - Before] encapsulation or at [2 - Outer] | Regulator can act on packets either at [1 - Before] encapsulation or | |||
after encapsulation. Load Regulation does not ever need to be | at [2 - Outer] after encapsulation. Load Regulation does not ever | |||
integrated with the [Encapsulate] function (but it can be for | need to be integrated with the [Encapsulate] function (but it can be | |||
efficiency). Therefore we can still maintain that the [Encapsulate] | for efficiency). Therefore we can still mandate that the | |||
function always copies CE into the outer header. | [Encapsulate] function always copies CE into the outer header. | |||
>>-----[1 - Before]--------[Encapsulate]----[3 - Inner]------------>> | >>-----[1 - Before]--------[Encapsulate]----[3 - Inner]---------->> | |||
\ | \ | |||
\ | \ | |||
+--------[2 - Outer]--------->> | +--------[2 - Outer]------->> | |||
Figure 3: Placement of In-Path Load Regulator Relative to Tunnel | Figure 6: Placement of In-Path Load Regulator Relative to Tunnel | |||
Ingress | Ingress | |||
Then separately, if there is a Load Regulator at location [2 - | Then separately, if there is a Load Regulator at location [2 - | |||
Outer], it might reset CE to ECT(0), say. Then the Congestion | Outer], it might reset CE to ECT(0), say. Then the Congestion | |||
Baseline for the lower layer (outer) will be [2 - Outer], while the | Baseline for the lower layer (outer) will be [2 - Outer], while the | |||
Congestion Baseline of the inner layer will be unchanged. But how | Congestion Baseline of the inner layer will be unchanged. But how | |||
encapsulation works has nothing to do with whether a Load Regulator | encapsulation works has nothing to do with whether a Load Regulator | |||
is present or where it is. | is present or where it is. | |||
If on the other hand a Load Regulator resets CE at [1 - Before], the | If on the other hand a Load Regulator resets CE at [1 - Before], the | |||
skipping to change at page 30, line 12 | skipping to change at page 30, line 47 | |||
desirable or practical for a node part way along the path to regulate | desirable or practical for a node part way along the path to regulate | |||
the load. However, various reasonable proposals for in-path load | the load. However, various reasonable proposals for in-path load | |||
regulation have been made from time to time (e.g. fair queuing, | regulation have been made from time to time (e.g. fair queuing, | |||
traffic engineering, flow admission control). The IETF has recently | traffic engineering, flow admission control). The IETF has recently | |||
chartered a working group to standardise admission control across a | chartered a working group to standardise admission control across a | |||
part of a path using pre-congestion notification (PCN) [PCNcharter]. | part of a path using pre-congestion notification (PCN) [PCNcharter]. | |||
This is of particular relevance here because it involves congestion | This is of particular relevance here because it involves congestion | |||
notification with an in-path Load Regulator, it can involve | notification with an in-path Load Regulator, it can involve | |||
tunnelling and it certainly involves encapsulation more generally. | tunnelling and it certainly involves encapsulation more generally. | |||
We will use the more complex scenario in Figure 4 to tease out all | We will use the more complex scenario in Figure 7 to tease out all | |||
the issues that arise when combining congestion notification and | the issues that arise when combining congestion notification and | |||
tunnelling with various possible in-path load regulation schemes. In | tunnelling with various possible in-path load regulation schemes. In | |||
this case 'I1' and 'E2' break up the path into three separate | this case 'I1' and 'E2' break up the path into three separate | |||
congestion control loops. The feedback for these loops is shown | congestion control loops. The feedback for these loops is shown | |||
going right to left across the top of the figure. The 'V's are arrow | going right to left across the top of the figure. The 'V's are arrow | |||
heads representing the direction of feedback, not letters. But there | heads representing the direction of feedback, not letters. But there | |||
are also two tunnels within the middle control loop: 'I1' to 'E1' and | are also two tunnels within the middle control loop: 'I1' to 'E1' and | |||
'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS | 'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS | |||
core networks. M is a congestion monitoring point, perhaps between | core networks. M is a congestion monitoring point, perhaps between | |||
two border routers where the same tunnel continues unbroken across | two border routers where the same tunnel continues unbroken across | |||
the border. | the border. | |||
______ _______________________________________ _____ | ______ _______________________________________ _____ | |||
/ \ / \ / \ | / \ / \ / \ | |||
V \ V M \ V \ | V \ V M \ V \ | |||
A--->R--->I1===========>E1----->I2=========>==========>E2------->B | A--->R--->I1===========>E1----->I2=========>==========>E2------->B | |||
Figure 4: complex Tunnel Scenario | Figure 7: complex Tunnel Scenario | |||
The question is, should the congestion markings in the outer exposed | The question is, should the congestion markings in the outer exposed | |||
headers of a tunnel represent congestion only since the tunnel | headers of a tunnel represent congestion only since the tunnel | |||
ingress or over the whole upstream path from the source of the inner | ingress or over the whole upstream path from the source of the inner | |||
header (whatever that may mean)? Or put another way, should 'I1' and | header (whatever that may mean)? Or put another way, should 'I1' and | |||
'I2' copy or reset CE markings? | 'I2' copy or reset CE markings? | |||
Based on the design principles in Section 4, the answer is that the | Based on the design principles in Section 4, the answer is that the | |||
Congestion Baseline should be the nearest upstream interface designed | Congestion Baseline should be the nearest upstream interface designed | |||
to regulate traffic load--the Load Regulator. In Figure 4 'A', 'I1' | to regulate traffic load--the Load Regulator. In Figure 7 'A', 'I1' | |||
or 'E2' are all Load Regulators. We have shown the feedback loops | or 'E2' are all Load Regulators. We have shown the feedback loops | |||
returning to each of these nodes so that they can regulate the load | returning to each of these nodes so that they can regulate the load | |||
causing the congestion notification. So the Congestion Baseline | causing the congestion notification. So the Congestion Baseline | |||
exposed to M should be 'I1' (the Load Regulator), not 'I2'. | exposed to M should be 'I1' (the Load Regulator), not 'I2'. | |||
Therefore I1 should reset any arriving CE markings. In this case, | Therefore I1 should reset any arriving CE markings. In this case, | |||
'I1' knows the tunnel to 'E1' is unrelated to its load regulation | 'I1' knows the tunnel to 'E1' is unrelated to its load regulation | |||
function. So the load regulation function within 'I1' should be | function. So the load regulation function within 'I1' should be | |||
placed at [1 - Before] tunnel encapsulation within 'I1' (using the | placed at [1 - Before] tunnel encapsulation within 'I1' (using the | |||
terminology of Figure 3). Then the Congestion Baseline all across | terminology of Figure 6). Then the Congestion Baseline all across | |||
the networks from 'I1' to 'E2' in both inner and outer headers will | the networks from 'I1' to 'E2' in both inner and outer headers will | |||
be 'I1'. | be 'I1'. | |||
The following further examples illustrate how this answer might be | The following further examples illustrate how this answer might be | |||
applied: | applied: | |||
o We argued in Appendix A that resetting CE on encapsulation could | o We argued in Appendix A that resetting CE on encapsulation could | |||
harm PCN excess rate marking, which marks excess traffic for | harm PCN excess rate marking, which marks excess traffic for | |||
removal in subsequent round trips. This marking relies on not | removal in subsequent round trips. This marking relies on not | |||
marking packets if another node upstream has already marked them | marking packets if another node upstream has already marked them | |||
for removal. If there were a tunnel ingress between the two which | for removal. If there were a tunnel ingress between the two which | |||
reset CE markings, it would confuse the downstream node into | reset CE markings, it would confuse the downstream node into | |||
marking far too much traffic for removal. So why do we say that | marking far too much traffic for removal. So why do we say that | |||
'I1' should reset CE, while a tunnel ingress shouldn't? The | 'I1' should reset CE, while a tunnel ingress shouldn't? The | |||
answer is that it is the Load Regulator function at 'I1' that is | answer is that it is the Load Regulator function at 'I1' that is | |||
resetting CE, not the tunnel encapsulator. The Load Regulator | resetting CE, not the tunnel encapsulator. The Load Regulator | |||
needs to set itself as the Congestion Baseline, so the feedback it | needs to set itself as the Congestion Baseline, so the feedback it | |||
gets will only be about congestion on links it can relieve itself | gets will only be about congestion on links it can relieve itself | |||
by regulating the load into them. When it resets CE markings, it | (by regulating the load into them). When it resets CE markings, | |||
knows that something else upstream will have dealt with the | it knows that something else upstream will have dealt with the | |||
congestion notifications it removes, given it is part of an end- | congestion notifications it removes, given it is part of an end- | |||
to-end admission control signalling loop. It therefore knows that | to-end admission control signalling loop. It therefore knows that | |||
previous hops will be covered by other Load Regulators. | previous hops will be covered by other Load Regulators. | |||
Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should | Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should | |||
follow the new rule for any tunnel ingress and copy congestion | follow the new rule for any tunnel ingress and copy congestion | |||
marking into the outer tunnel header. The ingress at 'I1' will | marking into the outer tunnel header. The ingress at 'I1' will | |||
happen to copy headers that have already been reset just | happen to copy headers that have already been reset just | |||
beforehand. But it doesn't need to know that. | beforehand. But it doesn't need to know that. | |||
o [Shayman] suggested feedback of ECN accumulated across an MPLS | o [Shayman] suggested feedback of ECN accumulated across an MPLS | |||
skipping to change at page 32, line 4 | skipping to change at page 32, line 41 | |||
headers. Again, the tunnel encapsulation function at 'I' simply | headers. Again, the tunnel encapsulation function at 'I' simply | |||
copies incoming headers, unaware that the load regulator will | copies incoming headers, unaware that the load regulator will | |||
subsequently reset its outer headers. | subsequently reset its outer headers. | |||
o The PWE3 working group of the IETF is considering the problem of | o The PWE3 working group of the IETF is considering the problem of | |||
how and whether an aggregate edge-to-edge pseudo-wire emulation | how and whether an aggregate edge-to-edge pseudo-wire emulation | |||
should respond to congestion [I-D.ietf-pwe3-congestion-frmwk]. | should respond to congestion [I-D.ietf-pwe3-congestion-frmwk]. | |||
Although the study is still at the requirements stage, some | Although the study is still at the requirements stage, some | |||
(controversial) solution proposals include in-path load regulation | (controversial) solution proposals include in-path load regulation | |||
at the ingress to the tunnel that could lead to tunnel | at the ingress to the tunnel that could lead to tunnel | |||
arrangements with similar complexity to that of Figure 4. | arrangements with similar complexity to that of Figure 7. | |||
These are not contrived scenarios--they could be a lot worse. For | These are not contrived scenarios--they could be a lot worse. For | |||
instance, a host may create a tunnel for IPsec which is placed inside | instance, a host may create a tunnel for IPsec which is placed inside | |||
a tunnel for Mobile IP over a remote part of its path. And around | a tunnel for Mobile IP over a remote part of its path. And around | |||
this all we may have MPLS labels being pushed and popped as packets | this all we may have MPLS labels being pushed and popped as packets | |||
pass across different core networks. Similarly, it is possible that | pass across different core networks. Similarly, it is possible that | |||
subnets could be built from link technology (e.g. future Ethernet | subnets could be built from link technology (e.g. future Ethernet | |||
switches) so that link headers being added and removed could involve | switches) so that link headers being added and removed could involve | |||
congestion notification in future Ethernet link headers with all the | congestion notification in future Ethernet link headers with all the | |||
same issues as with IP in IP tunnels. | same issues as with IP in IP tunnels. | |||
End of changes. 65 change blocks. | ||||
162 lines changed or deleted | 172 lines changed or added | |||
This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |