draft-briscoe-tsvwg-ecn-tunnel-01.txt   draft-ietf-tsvwg-ecn-tunnel-00.txt 
Transport Area Working Group B. Briscoe Transport Area Working Group B. Briscoe
Internet-Draft BT Internet-Draft BT
Intended status: Standards Track July 14, 2008 Intended status: Standards Track Oct 16, 2008
Expires: January 15, 2009 Expires: April 19, 2009
Layered Encapsulation of Congestion Notification Layered Encapsulation of Congestion Notification
draft-briscoe-tsvwg-ecn-tunnel-01 draft-ietf-tsvwg-ecn-tunnel-00
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 34 skipping to change at page 1, line 34
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 15, 2009. This Internet-Draft will expire on April 19, 2009.
Abstract Abstract
This document redefines how the explicit congestion notification This document redefines how the explicit congestion notification
(ECN) field of the outer IP header of a tunnel should be constructed. (ECN) field of the outer IP header of a tunnel should be constructed.
It brings all IP in IP tunnels (v4 or v6) into line with the way It brings all IP in IP tunnels (v4 or v6) into line with the way
IPsec tunnels now construct the ECN field. It includes a thorough IPsec tunnels now construct the ECN field. It includes a thorough
analysis of the reasoning for this change and the implications. It analysis of the reasoning for this change and the implications. It
also gives guidelines on the encapsulation of IP congestion also gives guidelines on the encapsulation of IP congestion
notification by any outer header, whether encapsulated in an IP notification by any outer header, whether encapsulated in an IP
tunnel or in a lower layer header. Following these guidelines should tunnel or in a lower layer header. Following these guidelines should
help interworking, if the IETF or other standards bodies specify any help interworking, if the IETF or other standards bodies specify any
new encapsulation of congestion notification. new encapsulation of congestion notification.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. The Need for Rationalisation . . . . . . . . . . . . . . . 4 1.1. The Need for Rationalisation . . . . . . . . . . . . . . . 5
1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 5 1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 6
1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2. Requirements Language . . . . . . . . . . . . . . . . . . . . 8 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 8
3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 8 3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 8
3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 8 3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 8
3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 10 3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 10
3.3. Management Constraints . . . . . . . . . . . . . . . . . . 11 3.3. Management Constraints . . . . . . . . . . . . . . . . . . 12
4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 12 4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 12
4.1. Design Guidelines for New Encapsulations of Congestion 4.1. Design Guidelines for New Encapsulations of Congestion
Notification . . . . . . . . . . . . . . . . . . . . . . . 13 Notification . . . . . . . . . . . . . . . . . . . . . . . 14
5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 15 5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 15
6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 16 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 16
7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 18 7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 18
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19
9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19
10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 21 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 21
11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22
12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 22 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 23
13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23
13.1. Normative References . . . . . . . . . . . . . . . . . . . 22 13.1. Normative References . . . . . . . . . . . . . . . . . . . 23
13.2. Informative References . . . . . . . . . . . . . . . . . . 23 13.2. Informative References . . . . . . . . . . . . . . . . . . 23
Appendix A. Why resetting CE on encapsulation harms PCN . . . . . 25 Appendix A. Why resetting CE on encapsulation harms PCN . . . . . 25
Appendix B. Contribution to Congestion across a Tunnel . . . . . 25 Appendix B. Contribution to Congestion across a Tunnel . . . . . 26
Appendix C. Ideal Decapsulation Rules . . . . . . . . . . . . . . 27 Appendix C. Ideal Decapsulation Rules . . . . . . . . . . . . . . 27
Appendix D. Non-Dependence of Tunnelling on In-path Load Appendix D. Non-Dependence of Tunnelling on In-path Load
Regulation . . . . . . . . . . . . . . . . . . . . . 28 Regulation . . . . . . . . . . . . . . . . . . . . . 29
D.1. Dependence of In-Path Load Regulation on Tunnelling . . . 29 D.1. Dependence of In-Path Load Regulation on Tunnelling . . . 30
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 32 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 33
Intellectual Property and Copyright Statements . . . . . . . . . . 34 Intellectual Property and Copyright Statements . . . . . . . . . . 34
Changes from previous drafts (to be removed by the RFC Editor) Changes from previous drafts (to be removed by the RFC Editor)
From briscoe-01 to ietf-00 (current):
* Re-wrote Appendix B giving much simpler technique to measure
contribution to congestion across a tunnel.
* Added discussion of backward compatibility of the ideal
decapsulation scheme in Appendix C
* Updated references. Minor corrections & clarifications
throughout.
From -00 to -01: From -00 to -01:
* Related everything conceptually to the uniform and pipe models * Related everything conceptually to the uniform and pipe models
of RFC2983 on Diffserv Tunnels, and completely removed the of RFC2983 on Diffserv Tunnels, and completely removed the
dependence of tunnelling behaviour on the presence of any in- dependence of tunnelling behaviour on the presence of any in-
path load regulation by using the [1 - Before] [2 - Outer] path load regulation by using the [1 - Before] [2 - Outer]
function placement concepts from RFC2983. function placement concepts from RFC2983;
* Added specifc cases where the existing standards limit new * Added specific cases where the existing standards limit new
proposals. proposals, particularly Appendix A;
* Added sub-structure to Introduction (Need for Rationalisation, * Added sub-structure to Introduction (Need for Rationalisation,
Roadmap), added new Introductory subsection on "Scope" and Roadmap), added new Introductory subsection on "Scope" and
improved clarity improved clarity;
* Added Design Guidelines for New Encapsulations of Congestion * Added Design Guidelines for New Encapsulations of Congestion
Notification Notification (Section 4.1);
* Considerably clarified the Backward Compatibility section * Considerably clarified the Backward Compatibility section
(Section 6);
* Considerably extended the Security Considerations section * Considerably extended the Security Considerations section
(Section 9);
* Summarised the primary rationale much better in the conclusions * Summarised the primary rationale much better in the
conclusions;
* Added numerous extra acknowledgements * Added numerous extra acknowledgements;
* Added Appendix A. "Why resetting CE on encapsulation harms * Added Appendix A. "Why resetting CE on encapsulation harms
PCN", Appendix B. "Contribution to Congestion across a Tunnel" PCN", Appendix B. "Contribution to Congestion across a Tunnel"
and Appendix C. "Ideal Decapsulation Rules" and Appendix C. "Ideal Decapsulation Rules";
* Changed Appendix A "In-path Load Regulation" to "Non-Dependence * Re-wrote Appendix D, explaining how tunnel encapsulation no
of Tunnelling on In-path Load Regulation" and added sub-section longer depends on in-path load-regulation (changed title from
on "Dependence of In-Path Load Regulation on Tunnelling" "In-path Load Regulation" to "Non-Dependence of Tunnelling on
In-path Load Regulation"), but explained how an in-path load
regulation function must be carefully placed with respect to
tunnel encapsulation (in a new sub-section entitled "Dependence
of In-Path Load Regulation on Tunnelling").
1. Introduction 1. Introduction
This document redefines how the explicit congestion notification This document redefines how the explicit congestion notification
(ECN) field [RFC3168] of the outer IP header of a tunnel should be (ECN) field [RFC3168] of the outer IP header of a tunnel should be
constructed. It brings all IP in IP tunnels (v4 or v6) into line constructed. It brings all IP in IP tunnels (v4 or v6) into line
with the way IPsec tunnels [RFC4301] now construct the ECN field, with the way IPsec tunnels [RFC4301] now construct the ECN field,
ensuring that the outer header reveals any congestion experienced so ensuring that the outer header reveals any congestion experienced so
far on the whole path, not just since the last tunnel ingress. far on the whole path, not just since the last tunnel ingress.
skipping to change at page 5, line 38 skipping to change at page 6, line 9
makes it harder to design networks and new protocols that work makes it harder to design networks and new protocols that work
predictably. predictably.
Already complicated constraints have had to be added to a standards Already complicated constraints have had to be added to a standards
track congestion marking proposal. The section of the pre-congestion track congestion marking proposal. The section of the pre-congestion
notification (PCN) architecture [I-D.ietf-pcn-architecture] on notification (PCN) architecture [I-D.ietf-pcn-architecture] on
tunnelling says PCN works correctly in the presence of RFC4301 IPsec tunnelling says PCN works correctly in the presence of RFC4301 IPsec
encapsulation (and RFC5129 MPLS encapsulation). However it doesn't encapsulation (and RFC5129 MPLS encapsulation). However it doesn't
work with RFC3168 IP in IP encapsulation (Appendix A explains why). work with RFC3168 IP in IP encapsulation (Appendix A explains why).
Section 3 assesses further security, control and management functions To ensure we do not cause any unintended side-effects, Section 3
that cannot be achieved in each case (resetting vs copying CE assesses whether copying or resetting CE would harm any security,
markings). It finds that resetting CE makes life difficult in a control or management functions. It finds that resetting CE makes
number of directions, while copying CE harms nothing (other than life difficult in a number of directions, while copying CE harms
opening a low bit-rate covert channel vulnerability which the nothing (other than opening a low bit-rate covert channel
Security Area deems is manageable). vulnerability which the IETF Security Area deems is manageable).
1.2. Document Roadmap 1.2. Document Roadmap
Most of the document gives a thorough analysis of the knock-on Most of the document gives a thorough analysis of the knock-on
effects of the apparently minor change to tunnel encapsulation. The effects of the apparently minor change to tunnel encapsulation. The
reader may jump to Section 5 if only interested in standards actions reader may jump to Section 5 if only interested in standards actions
impacting implementation. The whole document is organised as impacting implementation. The whole document is organised as
follows: follows:
o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to
'switch in' different behaviours for marking the ECN field, just 'switch in' different behaviours for marking the ECN field, just
as it switches in different per-hop behaviours (PHBs) for as it switches in different per-hop behaviours (PHBs) for
scheduling. Therefore we cannot only discuss the ECN protocol scheduling. Therefore we cannot only discuss the ECN protocol
that RFC3168 gives as a default. We need to also give guidance that RFC3168 gives as a default. Instead, Section 3 lays out the
for possible different marking schemes. Therefore in Section 3 we design constraints when tunnelling congestion notification without
lay out the design constraints when tunnelling congestion assuming a particular congestion marking scheme.
notification.
o Then in Section 4 we resolve the tensions between these o Then in Section 4 we resolve the tensions between these
constraints to give general design principles and guidelines on constraints to give general design principles and guidelines on
how a tunnel should process congestion notification; principles how a tunnel should process congestion notification; principles
that could apply to any marking behaviour for any PHB, not just that could apply to any marking behaviour for any PHB, not just
the default in RFC3168. In particular, we examine the underlying the default in RFC3168. In particular, we examine the underlying
principles behind whether CE should be reset or copied into the principles behind whether CE should be reset or copied into the
outer header at the ingress to a tunnel--or indeed at the ingress outer header at the ingress to a tunnel--or indeed at the ingress
of any layered encapsulation of headers with congestion of any layered encapsulation of headers with congestion
notification fields. We end this section with a bulleted list of notification fields. We end this section with a bulleted list of
more design guidelines for new encapsulations of congestion design guidelines for new encapsulations of congestion
notification. notification.
o Section 5 then uses precise standards terminology to confirm the o Section 5 then uses precise standards terminology to confirm the
rules for the default ECN tunnelling behaviour based on the above rules for the default ECN tunnelling behaviour based on the above
design principles. design principles.
o Extending the new IPsec tunnel ingress behaviour to all IP in IP o Extending the new IPsec tunnel ingress behaviour to all IP in IP
tunnels requires consideration of backwards compatibility, which tunnels requires consideration of backwards compatibility, which
is covered in Section 6 and changes from earlier RFCs are brought is covered in Section 6 and changes from earlier RFCs are brought
together in Section 7. together in Section 7.
skipping to change at page 7, line 34 skipping to change at page 8, line 4
As well as guiding alternate IP in IP tunnelling schemes, the design As well as guiding alternate IP in IP tunnelling schemes, the design
guidelines of Section 4 are intended to be followed when IP packets guidelines of Section 4 are intended to be followed when IP packets
are encapsulated by any connectionless datagram/packet/frame where are encapsulated by any connectionless datagram/packet/frame where
the outer header is designed to support a congestion notification the outer header is designed to support a congestion notification
capability. [RFC5129] already deals with handling ECN for IP in MPLS capability. [RFC5129] already deals with handling ECN for IP in MPLS
and MPLS in MPLS, and S.9.3 of [RFC3168] lists IP encapsulated in and MPLS in MPLS, and S.9.3 of [RFC3168] lists IP encapsulated in
L2TP [RFC2661], GRE [RFC1701] or PPTP [RFC2637] as possible examples L2TP [RFC2661], GRE [RFC1701] or PPTP [RFC2637] as possible examples
where ECN may be added in future. where ECN may be added in future.
Of course, the IETF does not have standards authority over every link Of course, the IETF does not have standards authority over every link
or tunnel protocol, so this document merely aims to define the or tunnel protocol, so this document merely aims to guide the
interface between IP ECN and lower layer congestion notification. interface between IP ECN and lower layer congestion notification.
Then the IETF or the relevant standards body can be free to define Then the IETF or the relevant standards body can be free to define
the specifics of each lower layer scheme, but a common interface the specifics of each lower layer scheme, but a common interface
should ensure interworking across all technologies. should ensure interworking across all technologies.
Note that just because there is forward congestion notification in a Note that just because there is forward congestion notification in a
lower layer protocol, if the lower layer has its own feedback and lower layer protocol, if the lower layer has its own feedback and
load regulation, there is no need to propagate it up the layers. For load regulation, there is no need to propagate it up the layers. For
instance, FECN (forward ECN) has been present in Frame Relay and EFCI instance, FECN (forward ECN) has been present in Frame Relay and EFCI
(explicit forward congestion indication) in ATM [ITU-T.I.371] for a (explicit forward congestion indication) in ATM [ITU-T.I.371] for a
long time, but they have been used for internal management rather long time. But so far they have been used for internal management
than being propagated to endpoint transports for them to control end- rather than being propagated to endpoint transports for them to
to-end congestion. control end-to-end congestion.
[RFC2983] is a comprehensive primer on differentiated services and [RFC2983] is a comprehensive primer on differentiated services and
tunnels. Given ECN raises similar issues to differentiated services tunnels. Given ECN raises similar issues to differentiated services
when interacting with tunnels, useful concepts introduced in RFC2983 when interacting with tunnels, useful concepts introduced in RFC2983
are used throughout, with brief recaps of the explanations where are used throughout, with brief recaps of the explanations where
necessary. necessary.
2. Requirements Language 2. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
skipping to change at page 8, line 31 skipping to change at page 8, line 49
Information security can be assured by using various end to end Information security can be assured by using various end to end
security solutions (including IPsec in transport mode [RFC4301]), but security solutions (including IPsec in transport mode [RFC4301]), but
a commonly used scenario involves the need to communicate between two a commonly used scenario involves the need to communicate between two
physically protected domains across the public Internet. In this physically protected domains across the public Internet. In this
case there are certain management advantages to using IPsec in tunnel case there are certain management advantages to using IPsec in tunnel
mode solely across the publicly accessible part of the path. The mode solely across the publicly accessible part of the path. The
path followed by a packet then crosses security 'domains'; the ones path followed by a packet then crosses security 'domains'; the ones
protected by physical or other means before and after the tunnel and protected by physical or other means before and after the tunnel and
the one protected by an IPsec tunnel across the otherwise unprotected the one protected by an IPsec tunnel across the otherwise unprotected
domain. We will use the scenario in Figure 1 where endpoints 'A' and domain. We will use the scenario in Figure 1 where endpoints 'A' and
'B' communicate through a tunnel with ingress 'I' and egress 'E' 'B' communicate through a tunnel. The tunnel ingress 'I' and egress
within physically protected edge domains across an unprotected 'E' are within physically protected edge domains, while the tunnel
internetwork where there may be 'men in the middle', M. spans an unprotected internetwork where there may be 'men in the
middle', M.
physically unprotected physically physically unprotected physically
<-protected domain-><--domain--><-protected domain-> <-protected domain-><--domain--><-protected domain->
+------------------+ +------------------+ +------------------+ +------------------+
| | M | | | | M | |
| A-------->I=========>==========>E-------->B | | A-------->I=========>==========>E-------->B |
| | | | | | | |
+------------------+ +------------------+ +------------------+ +------------------+
<----IPsec secured----> <----IPsec secured---->
tunnel tunnel
skipping to change at page 9, line 23 skipping to change at page 9, line 42
of the inner header. And if 'E' copies these fields from the outer of the inner header. And if 'E' copies these fields from the outer
header to the inner, even if it validates authentication from 'I', it header to the inner, even if it validates authentication from 'I', it
will have allowed a covert channel from 'M' to 'B'. will have allowed a covert channel from 'M' to 'B'.
ECN at the IP layer is designed to carry information about congestion ECN at the IP layer is designed to carry information about congestion
from a congested resource towards downstream nodes. Typically a from a congested resource towards downstream nodes. Typically a
downstream transport might feed the information back somehow to the downstream transport might feed the information back somehow to the
point upstream of the congestion that can regulate the load on the point upstream of the congestion that can regulate the load on the
congested resource, but other actions are possible (see [RFC3168] congested resource, but other actions are possible (see [RFC3168]
S.6). In terms of the above unicast scenario, ECN is typically S.6). In terms of the above unicast scenario, ECN is typically
intended to create an information channel from 'M' to 'B', for 'B' to intended to create an information channel from 'M' to 'B' (for 'B' to
forward to 'A'. Therefore the goals of IPsec and ECN are mutually feed back to 'A'). Therefore the goals of IPsec and ECN are mutually
incompatible. incompatible.
With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says,
"controls are provided to manage the bandwidth of this [covert] "controls are provided to manage the bandwidth of this [covert]
channel". Using the ECN processing rules of RFC4301, the channel channel". Using the ECN processing rules of RFC4301, the channel
bandwidth is two bits per datagram from 'A' to 'M' and one bit per bandwidth is two bits per datagram from 'A' to 'M' and one bit per
datagram from 'M' to 'A' (because 'E' limits the combinations of the datagram from 'M' to 'A' (because 'E' limits the combinations of the
2-bit ECN field that it will copy). In both cases the covert channel 2-bit ECN field that it will copy). In both cases the covert channel
bandwidth is further reduced by noise from any real congestion bandwidth is further reduced by noise from any real congestion
marking. RFC4301 therefore implies that these covert channels are marking. RFC4301 therefore implies that these covert channels are
skipping to change at page 10, line 12 skipping to change at page 10, line 30
by copying into the outer header on encapsulation and copying from by copying into the outer header on encapsulation and copying from
the outer header on decapsulation. the outer header on decapsulation.
The pipe model: where the outer header is independent of that in the The pipe model: where the outer header is independent of that in the
inner header so it hides the Diffserv field of the inner header inner header so it hides the Diffserv field of the inner header
from any interaction with nodes along the tunnel. from any interaction with nodes along the tunnel.
However, for ECN, the new IPsec security architecture in RFC4301 only However, for ECN, the new IPsec security architecture in RFC4301 only
standardised one tunnelling model equivalent to the uniform model. standardised one tunnelling model equivalent to the uniform model.
It deemed that simplicity was more important than allowing It deemed that simplicity was more important than allowing
administrators the option of a tiny increment in security especially administrators the option of a tiny increment in security, especially
given not copying congestion indications could seriously harm given not copying congestion indications could seriously harm
everyone's network service. everyone's network service.
3.2. Control Constraints 3.2. Control Constraints
Congestion control requires that any congestion notification marked Congestion control requires that any congestion notification marked
into packets by a resource will be able to traverse a feedback loop into packets by a resource will be able to traverse a feedback loop
back to a node capable of controlling the load on that resource. To back to a function capable of controlling the load on that resource.
be precise, rather than calling this node the data source, we will To be precise, rather than calling this function the data source, we
call it the Load Regulator. This will allow us to deal with will call it the Load Regulator. This will allow us to deal with
exceptional cases where load is not regulated by the data source, but exceptional cases where load is not regulated by the data source, but
usually the two terms will be synonymous. Note the term "a node usually the two terms will be synonymous. Note the term "a function
_capable of_ controlling the load" deliberately includes a source _capable of_ controlling the load" deliberately includes a source
application that doesn't actually control the load but ought to (e.g. application that doesn't actually control the load but ought to (e.g.
an application without congestion control that uses UDP). an application without congestion control that uses UDP).
A--->R--->I=========>M=========>E-------->B A--->R--->I=========>M=========>E-------->B
Figure 2: Simple Tunnel Scenario Figure 2: Simple Tunnel Scenario
We now consider a similar tunnelling scenario to the IPsec one just We now consider a similar tunnelling scenario to the IPsec one just
described, but without the different security domains so we can just described, but without the different security domains so we can just
skipping to change at page 11, line 14 skipping to change at page 11, line 37
congestion occurred across a tunnel or upstream of it. If outer congestion occurred across a tunnel or upstream of it. If outer
header congestion marking was reset by the tunnel ingress ('I'), at header congestion marking was reset by the tunnel ingress ('I'), at
the end of a tunnel ('E') the outer headers would indicate congestion the end of a tunnel ('E') the outer headers would indicate congestion
experienced across the tunnel ('I' to 'E'), while the inner header experienced across the tunnel ('I' to 'E'), while the inner header
would indicate congestion upstream of 'I'. But similar information would indicate congestion upstream of 'I'. But similar information
can be gleaned even if the tunnel ingress copies the inner to the can be gleaned even if the tunnel ingress copies the inner to the
outer headers. At the end of the tunnel ('E'), any packet with an outer headers. At the end of the tunnel ('E'), any packet with an
_extra_ mark in the outer header relative to the inner header _extra_ mark in the outer header relative to the inner header
indicates congestion across the tunnel ('I' to 'E'), while the inner indicates congestion across the tunnel ('I' to 'E'), while the inner
header would still indicate congestion upstream of ('I'). Appendix B header would still indicate congestion upstream of ('I'). Appendix B
gives a more precise method for inferring the congestion level gives a simple and precise method for a tunnel egress to infer the
introduced across a tunnel. congestion level introduced across a tunnel.
All this shows that 'E' can preserve the control loop irrespective of All this shows that 'E' can preserve the control loop irrespective of
whether 'I' copies congestion notification into the outer header or whether 'I' copies congestion notification into the outer header or
resets it. resets it.
That is the situation for existing control arrangements but, because That is the situation for existing control arrangements but, because
copying reveals more information, it would open up possibilities for copying reveals more information, it would open up possibilities for
better control system designs. For instance, Appendix A describes better control system designs. For instance, Appendix A describes
how resetting CE marking at a tunnel ingress confuses a proposed how resetting CE marking at a tunnel ingress confuses a proposed
congestion marking scheme on the standards track. It ends up congestion marking scheme on the standards track. It ends up
removing excessive amounts of traffic unnecessarily. Whereas copying removing excessive amounts of traffic unnecessarily. Whereas copying
CE markings at ingress leads to the correct control behaviour. CE markings at ingress leads to the correct control behaviour.
3.3. Management Constraints 3.3. Management Constraints
As well as control, there are also management constraints. As well as control, there are also management constraints.
Specifically, a management system may monitor congestion markings in Specifically, a management system may monitor congestion markings in
passing packets, perhaps at the border between networks as part of a passing packets, perhaps at the border between networks as part of a
service level agreement. For instance, monitors at the borders of service level agreement. For instance, monitors at the borders of
autonomous systems may need to measure how much congestion has autonomous systems may need to measure how much congestion has
accumulated since the original source to determine between them how accumulated since the original source, perhaps to determine between
much of the congestion is contributed by each domain. them how much of the congestion is contributed by each domain.
Therefore, when monitoring the middle of a path, it should be Therefore, when monitoring the middle of a path, it should be
possible to establish how far back in the path congestion markings possible to establish how far back in the path congestion markings
have accumulated from. In this document we term this the baseline of have accumulated from. In this document we term this the baseline of
congestion marking (or the Congestion Baseline), i.e. the source of congestion marking (or the Congestion Baseline), i.e. the source of
the layer that last reset (or created) the congestion notification the layer that last reset (or created) the congestion notification
field. Given some tunnels cross domain borders (e.g. consider M in field. Given some tunnels cross domain borders (e.g. consider M in
Figure 2 is monitoring a border), it would therefore be desirable for Figure 2 is monitoring a border), it would therefore be desirable for
'I' to copy congestion accumulated so far into the outer headers 'I' to copy congestion accumulated so far into the outer headers
exposed across the tunnel. exposed across the tunnel.
Appendix D discusses various scenarios where the Load Regulator lies Appendix D discusses various scenarios where the Load Regulator lies
in-path, not at the source host as we would typically expect. It in-path, not at the source host as we would typically expect. It
concludes that a Congestion Baseline is determined by where the Load concludes that a Congestion Baseline is determined by where the Load
Regulator function is, which should be identified in the transport Regulator function is, which should be identified in the transport
layer, not by addresses in network layer headers. This applies layer, not by addresses in network layer headers. This applies
whether the Load Regulator is at the source host or within the path. whether the Load Regulator is at the source host or within the path.
The appendix also discusses where a Load Regulator function should be The appendix also discusses where a Load Regulator function should be
located relative to a local encapsulation function. located relative to a local tunnel encapsulation function.
4. Design Principles 4. Design Principles
The constraints from the three perspectives of security, control and The constraints from the three perspectives of security, control and
management in Section 3 are somewhat in tension as to whether a management in Section 3 are somewhat in tension as to whether a
tunnel ingress should copy congestion markings into the outer header tunnel ingress should copy congestion markings into the outer header
it creates or reset them. From the control perspective either it creates or reset them. From the control perspective either
copying or resetting works for existing arrangements, but copying has copying or resetting works for existing arrangements, but copying has
more potential for simplifying control. From the management more potential for simplifying control. From the management
perspective copying is preferable. From the security perspective perspective copying is preferable. From the security perspective
skipping to change at page 15, line 45 skipping to change at page 16, line 20
2-bit ECN field of the arriving IP header into the outer 2-bit ECN field of the arriving IP header into the outer
encapsulating IP header, for all types of IP in IP tunnel. This encapsulating IP header, for all types of IP in IP tunnel. This
encapsulation behaviour MUST only be used if the tunnel ingress is in encapsulation behaviour MUST only be used if the tunnel ingress is in
`normal state'. A `compatibility state' with a different `normal state'. A `compatibility state' with a different
encapsulation behaviour is also specified in Section 6 for backward encapsulation behaviour is also specified in Section 6 for backward
compatibility with legacy tunnel egresses that do not understand ECN. compatibility with legacy tunnel egresses that do not understand ECN.
To decapsulate the inner header at the tunnel egress, a compliant To decapsulate the inner header at the tunnel egress, a compliant
tunnel egress MUST set the outgoing ECN field to the codepoint at the tunnel egress MUST set the outgoing ECN field to the codepoint at the
intersection of the appropriate incoming inner header (row) and outer intersection of the appropriate incoming inner header (row) and outer
header (column) in Table 1. header (column) in Figure 3.
+--Incoming Outer Header---
+---------------------------------------------+
| Incoming Outer Header |
+---------------------+---------+-----------+-----------+-----------+ +---------------------+---------+-----------+-----------+-----------+
| Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE |
| Header | | | | | | Header | | | | |
+---------------------+---------+-----------+-----------+-----------+ +---------------------+---------+-----------+-----------+-----------+
| Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) | | Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) |
| ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE | | ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE |
| ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE | | ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE |
| CE | CE | CE | CE (!!!) | CE | | CE | CE | CE | CE (!!!) | CE |
+---------------------+---------+-----------+-----------+-----------+ +---------------------+---------+-----------+-----------+-----------+
| Outgoing Header |
+---------------------------------------------+
+-----Outgoing Header------ Figure 3: IP in IP Decapsulation
Table 1: IP in IP Decapsulation
The exclamation marks '(!!!)' in Table 1 indicate that this The exclamation marks '(!!!)' in Figure 3 indicate that this
combination of inner and outer headers should not be possible if only combination of inner and outer headers should not be possible if only
legal transitions have taken place. So, the decapsulator should drop legal transitions have taken place. So, the decapsulator should drop
or mark the ECN field as the table specifies, but it MAY also raise or mark the ECN field as the table specifies, but it MAY also raise
an appropriate alarm. It MUST NOT raise an alarm so often that the an appropriate alarm. It MUST NOT raise an alarm so often that the
illegal combinations would amplify into a flood of alarm messages. illegal combinations would amplify into a flood of alarm messages.
6. Backward Compatibility 6. Backward Compatibility
Note: in RFC3168, a tunnel was in one of two modes: limited Note: in RFC3168, a tunnel was in one of two modes: limited
functionality or full functionality. Rather than working with modes functionality or full functionality. Rather than working with modes
skipping to change at page 22, line 24 skipping to change at page 22, line 42
design of alternate forms of tunnel processing of congestion design of alternate forms of tunnel processing of congestion
notification, if required for specific Diffserv PHBs or for other notification, if required for specific Diffserv PHBs or for other
lower layer encapsulating protocols that might support congestion lower layer encapsulating protocols that might support congestion
notification in the future. notification in the future.
11. Acknowledgements 11. Acknowledgements
Thanks to David Black for explaining a better way to think about Thanks to David Black for explaining a better way to think about
function placement and to Louise Burness for a better way to think function placement and to Louise Burness for a better way to think
about multilayer transports and networks, having read about multilayer transports and networks, having read
[Patterns_Arch]. Also thanks to Arnaud Jacquet for ideas behind the [Patterns_Arch]. Also thanks to Arnaud Jacquet for the idea for
algorithms in Appendix B. Thanks to Bruce Davie, Toby Moncaster, Appendix B. Thanks to Bruce Davie, Toby Moncaster, Gorry Fairhurst,
Gorry Fairhurst, Sally Floyd, Alfred Hoenes and Gabriele Corliano for Sally Floyd, Alfred Hoenes and Gabriele Corliano for their thoughts
their thoughts and careful review comments. and careful review comments.
Bob Briscoe is partly funded by Trilogy, a research project (ICT-
216372) supported by the European Community under its Seventh
Framework Programme. The views expressed here are those of the
author only.
12. Comments Solicited 12. Comments Solicited
Comments and questions are encouraged and very welcome. They can be Comments and questions are encouraged and very welcome. They can be
addressed to the IETF Transport Area working group mailing list addressed to the IETF Transport Area working group mailing list
<tsvwg@ietf.org>, and/or to the authors. <tsvwg@ietf.org>, and/or to the authors.
13. References 13. References
13.1. Normative References 13.1. Normative References
skipping to change at page 23, line 14 skipping to change at page 23, line 35
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, September 2001. RFC 3168, September 2001.
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the [RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005. Internet Protocol", RFC 4301, December 2005.
13.2. Informative References 13.2. Informative References
[I-D.eardley-pcn-marking-behaviour]
Eardley, P., "Marking behaviour of PCN-nodes",
draft-eardley-pcn-marking-behaviour-01 (work in progress),
June 2008.
[I-D.ietf-pcn-architecture] [I-D.ietf-pcn-architecture]
Eardley, P., "Pre-Congestion Notification Architecture", Eardley, P., "Pre-Congestion Notification (PCN)
draft-ietf-pcn-architecture-03 (work in progress), Architecture", draft-ietf-pcn-architecture-07 (work in
February 2008. progress), September 2008.
[I-D.ietf-pcn-marking-behaviour]
Eardley, P., "Marking behaviour of PCN-nodes",
draft-ietf-pcn-marking-behaviour-00 (work in progress),
October 2008.
[I-D.ietf-pwe3-congestion-frmwk] [I-D.ietf-pwe3-congestion-frmwk]
Bryant, S., Davie, B., Martini, L., and E. Rosen, Bryant, S., Davie, B., Martini, L., and E. Rosen,
"Pseudowire Congestion Control Framework", "Pseudowire Congestion Control Framework",
draft-ietf-pwe3-congestion-frmwk-01 (work in progress), draft-ietf-pwe3-congestion-frmwk-01 (work in progress),
May 2008. May 2008.
[I-D.moncaster-pcn-3-state-encoding] [I-D.moncaster-pcn-3-state-encoding]
Moncaster, T., Briscoe, B., and M. Menth, "A three state Moncaster, T., Briscoe, B., and M. Menth, "A three state
extended PCN encoding scheme", extended PCN encoding scheme",
skipping to change at page 25, line 16 skipping to change at page 25, line 39
(Expired) (Expired)
Appendix A. Why resetting CE on encapsulation harms PCN Appendix A. Why resetting CE on encapsulation harms PCN
Regarding encapsulation, the section of the PCN architecture Regarding encapsulation, the section of the PCN architecture
[I-D.ietf-pcn-architecture] on tunnelling says that header copying [I-D.ietf-pcn-architecture] on tunnelling says that header copying
(RFC4301) allows PCN to work correctly. However, resetting CE (RFC4301) allows PCN to work correctly. However, resetting CE
markings confuses PCN marking. markings confuses PCN marking.
The specific issue here concerns PCN excess rate marking The specific issue here concerns PCN excess rate marking
[I-D.eardley-pcn-marking-behaviour], i.e. the bulk marking of traffic [I-D.ietf-pcn-marking-behaviour], i.e. the bulk marking of traffic
that exceeds a configured threshold rate. One of the goals of excess that exceeds a configured threshold rate. One of the goals of excess
rate marking is to enable the speedy removal of excess admission rate marking is to enable the speedy removal of excess admission
controlled traffic following re-routes caused by link failures or controlled traffic following re-routes caused by link failures or
other disasters. This maintains a share of the capacity for other disasters. This maintains a share of the capacity for
competing admission controlled traffic and for traffic in lower competing admission controlled traffic and for traffic in lower
priority classes. After failures, traffic re-routed onto remaining priority classes. After failures, traffic re-routed onto remaining
links can often stress multiple links along a path. Therefore, links can often stress multiple links along a path. Therefore,
traffic can arrive at a link under stress with some proportion traffic can arrive at a link under stress with some proportion
already marked for removal by a previous link. By design, marked already marked for removal by a previous link. By design, marked
traffic will be removed by the overall system in subsequent round traffic will be removed by the overall system in subsequent round
trips. So when the excess rate marking algorithm decides how much trips. So when the excess rate marking algorithm decides how much
traffic to mark for removal, it doesn't include traffic already traffic to mark for removal, it doesn't include traffic already
marked for removal by another node upstream (the `Excess traffic marked for removal by another node upstream (the `Excess traffic
meter function' of [I-D.eardley-pcn-marking-behaviour]). meter function' of [I-D.ietf-pcn-marking-behaviour]).
However, if an RFC3168 tunnel ingress intervenes, it resets the ECN However, if an RFC3168 tunnel ingress intervenes, it resets the ECN
field in all the outer headers, hiding all the evidence of problems field in all the outer headers, hiding all the evidence of problems
upstream. Thus, although excess rate marking works fine with RFC4301 upstream. Thus, although excess rate marking works fine with RFC4301
IPsec tunnels, with RFC3168 tunnels it typically removes large IPsec tunnels, with RFC3168 tunnels it typically removes large
volumes of traffic that it didn't need to remove at all. volumes of traffic that it didn't need to remove at all.
Appendix B. Contribution to Congestion across a Tunnel Appendix B. Contribution to Congestion across a Tunnel
This specification mandates that a tunnel ingress determines the ECN This specification mandates that a tunnel ingress determines the ECN
field of each new outer tunnel header by copying the arriving header. field of each new outer tunnel header by copying the arriving header.
If instead the outer ECN field were reset at a tunnel ingress (as it Concern has been expressed that this will make it difficult for the
was for the full functionality mode of RFC3168), it would be possible tunnel egress to monitor congestion introduced along a tunnel, which
for the tunnel egress to measure: is easy if the outer ECN field is reset at a tunnel ingress (RFC3168
full functionality mode). However, in fact copying CE marks at
o congestion marking before the tunnel ingress (fraction of inner ingress will still make it easy for the egress to measure congestion
header markings, p_i); introduced across a tunnel, as illustrated below.
o congestion marking across the tunnel (fraction of outer header
markings, p_t);
o congestion marking after the tunnel egress (fraction of departing Consider 100 packets measured at the egress. It measures that 30 are
header markings, p_o). CE marked in the inner and outer headers and 12 have additional CE
marks in the outer but not the inner. This means packets arriving at
the ingress had already experienced 30% congestion. However, it does
not mean there was 12% congestion across the tunnel. The correct
calculation of congestion across the tunnel is p_t = 12/(100-30) =
12/70 = 17%. This is easy for the egress to to measure. It is the
packets with additional CE marking in the outer header (12) as a
proportion of packets not marked in the inner header (70).
Although the newly mandated copying behaviour at ingress gains the Figure 4 illustrates this in a combinatorial probability diagram.
advantages described in the body of this specification, this one The square represents 100 packets. The 30% division along the bottom
advantage of the resetting behaviour of RFC3168 seems to have been represents marking before the ingress, and the p_t division up the
lost: on first impressions, it seems that the egress can no longer side represents marking along the tunnel.
accurately measure congestion contributed along the tunnel (p_t).
The egress could _estimate _the contribution along the tunnel by
measure which packets carry only a mark in the outer header (not the
inner). But this is not precisely the same as the congestion
contributed along the tunnel; tunnel nodes may have tried to mark
some packets that already had a marking in both the inner and outer
header. Measuring only additional outer markings will miss these.
Nonetheless, with the newly proposed scheme, a tunnel egress can
derive a precise estimate of marking introduced across a tunnel (p_t)
as follows.
The combined fraction of markings at the tunnel egress will be p_o = +-----+---------+100%
1 - (1 - p_i)(1 - p_t). Explanation: this is (1 - the probability a | | |
departing packet is not marked), which is (1 - (prob not marked | 30 | |
before tunnel)(prob not marked along tunnel)). Therefore, | | | The large square
rearranging, the egress can infer the fraction of marks introduced | +---------+p_t represents 100 packets
across the tunnel as p_t = (p_o - p_i)/(1 - p_i). If arriving | | 12 |
congestion is low (p_i <<1), then the approximation p_t ~ (p_o - p_i) +-----+---------+0
should be good enough. This is the estimate we advised originally; 0 30% 100%
i.e. measuring only the extra markings in the outer header that are inner header marking
not present in the inner header. If a better approximation is needed
p_t ~ (p_o - p_i)(1 + p_i), which removes the division, but still
assumes p_i<<1.
Using any of these formulae (including the precise one), it would be Figure 4: Tunnel Marking of Packets Already Marked at Ingress
possible for a tunnel egress to calculate a moving average of the
fraction of packets being marked by tunnel nodes, including those
already marked in the inner header. Alternatively, it should even be
possible for a tunnel egress to reverse engineer which packets would
have been marked across the tunnel if CE was reset on ingress even if
CE was actually copied on ingress.[[anchor3: Note from Bob: I've
worked out an algorithm so the tunnel egress can reverse engineer
marking as if CE was reset at the ingress even though CE was copied
at the ingress. It typically consumes 2 cycles / pkt, occasionally 4
and very occasionally 8. {ToDo: On testing an implementation just now
it still has a wrinkle in it, but with a little more development I
believe it would work well. I'll write it into the next revision if
I get it working.}]]
Appendix C. Ideal Decapsulation Rules Appendix C. Ideal Decapsulation Rules
Compliance with this appendix is NOT REQUIRED for compliance with the This appendix is not normative. Compliance with this appendix is NOT
present specification. REQUIRED for compliance with the present specification.
If the default ECN encapsulation behaviour does not offer suitable If the default ECN encapsulation behaviour does not offer suitable
trade offs, procedures exist for associating a new behaviour with a trade offs, procedures exist for associating a new behaviour with a
new Diffserv PHB. However, it is unrealistic to expect vendors of new Diffserv PHB. However, it is unrealistic to expect vendors of
all IPSec and all IP in IP tunnel endpoints to cater for the all IPSec and all IP in IP tunnel endpoints to cater for the
exceptional behaviour of PHB XXX. If all tunnels did require XXX- exceptional behaviour of PHB XXX. If all tunnels did require XXX-
specific behaviour, the resulting patchy and error-prone deployment specific behaviour, the resulting patchy and error-prone deployment
would probably cause XXX to suffer byzantine feature interactions would probably cause XXX to suffer byzantine feature interactions
with poorly implemented tunnels. The default rules for tunnel with poorly implemented tunnels. The default rules for tunnel
endpoints to handle both the Diffserv field and the ECN field should endpoints to handle both the Diffserv field and the ECN field should
skipping to change at page 27, line 42 skipping to change at page 28, line 7
marking) [I-D.ietf-pcn-architecture]. The aim is for the first level marking) [I-D.ietf-pcn-architecture]. The aim is for the first level
of marking to stop admitting new traffic and the second level to of marking to stop admitting new traffic and the second level to
terminate sufficient existing flows to bring a network back to its terminate sufficient existing flows to bring a network back to its
operating point after a serious failure. operating point after a serious failure.
Although the ECN field gives sufficient codepoints for these three Although the ECN field gives sufficient codepoints for these three
states, the PCN working group cannot use them in case any tunnel states, the PCN working group cannot use them in case any tunnel
decapsulations occur within a PCN region. If a node in a tunnel sets decapsulations occur within a PCN region. If a node in a tunnel sets
the ECN field to ECT(0) or ECT(1), this change will be discarded by a the ECN field to ECT(0) or ECT(1), this change will be discarded by a
tunnel egress compliant with RFC4301 and RFC3168. This can be seen tunnel egress compliant with RFC4301 and RFC3168. This can be seen
in Table 1, where the ECT values in the outer header are ignored in Figure 3, where the ECT values in the outer header are ignored
unless the inner header is the same. Effectively the ECT(0) and unless the inner header is the same. Effectively the ECT(0) and
ECT(1) codepoints have to be treated as just one codepoint when they ECT(1) codepoints have to be treated as just one codepoint when they
could otherwise have been used for their intended purpose of could otherwise have been used for their intended purpose of
congestion notification. Instead, the PCN w-g has had to propose congestion notification. Instead, the PCN w-g has had to propose
using extra Diffserv codepoint(s) to encode the extra states using extra Diffserv codepoint(s) to encode the extra states
[I-D.moncaster-pcn-3-state-encoding], using up the rapidly exhausting [I-D.moncaster-pcn-3-state-encoding], using up the rapidly exhausting
DSCP space while leaving ECN codepoints unused. DSCP space while leaving ECN codepoints unused.
Although this is currently most pressing for the PCN working group, Although this is currently most pressing for the PCN working group,
the issue is more general. Under Security Considerations (Section 9) the issue is more general. Under Security Considerations (Section 9)
skipping to change at page 28, line 17 skipping to change at page 28, line 31
More generally, the currently standardised tunnel decapsulation More generally, the currently standardised tunnel decapsulation
behaviour unnecessarily wastes a quarter of two bits (i.e. half a behaviour unnecessarily wastes a quarter of two bits (i.e. half a
bit) in the IP (v4 & v6) header. As explained in Section 3.1, the bit) in the IP (v4 & v6) header. As explained in Section 3.1, the
original reason for not copying down outer ECT codepoints for onward original reason for not copying down outer ECT codepoints for onward
forwarding was to limit the covert channel across a decapsulator to 1 forwarding was to limit the covert channel across a decapsulator to 1
bit per packet. However, now that the IETF Security Area has deemed bit per packet. However, now that the IETF Security Area has deemed
that a 2-bit covert channel through an encapsulator is a manageable that a 2-bit covert channel through an encapsulator is a manageable
risk, the same should be true for a decapsulator. risk, the same should be true for a decapsulator.
Table 2 proposes a more ideal layered decapsulation behaviour. Note: Figure 5 proposes a more ideal layered decapsulation behaviour.
this table is only to support discussion. It is not currently Note: this table is only to support discussion. It is not currently
proposed for standards action. The only difference from Table 1 proposed for standards action. The only difference from Figure 3
(that is proposed for standards action), is the swapping of the cells (that is proposed for standards action), is the swapping of the cells
highlighted as *ECT(X)*. highlighted as *ECT(X)*.
+--Incoming Outer Header--- +---------------------------------------------+
| Incoming Outer Header |
+---------------------+---------+-----------+-----------+-----------+ +---------------------+---------+-----------+-----------+-----------+
| Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE |
| Header | | | | | | Header | | | | |
+---------------------+---------+-----------+-----------+-----------+ +---------------------+---------+-----------+-----------+-----------+
| Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) | | Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) |
| ECT(0) | ECT(0) | ECT(0) | *ECT(1)* | CE | | ECT(0) | ECT(0) | ECT(0) | *ECT(1)* | CE |
| ECT(1) | ECT(1) | *ECT(0)* | ECT(1) | CE | | ECT(1) | ECT(1) | *ECT(0)* | ECT(1) | CE |
| CE | CE | CE | CE (!!!) | CE | | CE | CE | CE | CE (!!!) | CE |
+---------------------+---------+-----------+-----------+-----------+ +---------------------+---------+-----------+-----------+-----------+
| Outgoing Header |
+---------------------------------------------+
+-----Outgoing Header------ Figure 5: Ideal IP in IP Decapsulation (currently informative, not
normative)
Table 2: Ideal IP in IP Decapsulation (currently NOT REQUIRED)
Note that, if this ideal proposal were taken up, extra backwards Note that, if this ideal proposal were taken up, a tunnel egress
compatibility issues would have to be resolved. complying with it would be backwards compatible with all previous
specifications for encapsulation of ECN at the ingress (RFC4301, both
modes of RFC3168, both modes of RFC2481 and RFC2003). In comparison
with an RFC3168 or RFC4301 tunnel egress, it would require no
additional configuration at the ingress nor any additional
negotiation with the ingress. The only new issue would be the burden
of an extra standard to be compliant with, adding to the already
complex history of ECN tunnelling RFCs.
Appendix D. Non-Dependence of Tunnelling on In-path Load Regulation Appendix D. Non-Dependence of Tunnelling on In-path Load Regulation
We have said that at any point in a network, the Congestion Baseline We have said that at any point in a network, the Congestion Baseline
(where congestion notification starts from zero) should be the (where congestion notification starts from zero) should be the
previous upstream Load Regulator. We have also said that the ingress previous upstream Load Regulator. We have also said that the ingress
of an IP in IP tunnel must copy congestion indications to the of an IP in IP tunnel must copy congestion indications to the
encapsulating outer headers it creates. If the Load Regulator is in- encapsulating outer headers it creates. If the Load Regulator is in-
path rather than at the source, and also a tunnel ingress, these two path rather than at the source, and also a tunnel ingress, these two
requirements seem to be contradictory. A tunnel ingress must not requirements seem to be contradictory. A tunnel ingress must not
reset incoming congestion, but a Load Regulator must be the reset incoming congestion, but a Load Regulator must be the
Congestion Baseline, implying it needs to reset incoming congestion. Congestion Baseline, implying it needs to reset incoming congestion.
In fact, the two requirements are not contradictory, because a Load In fact, the two requirements are not contradictory, because a Load
Regulator and a tunnel ingress are functions within a node that occur Regulator and a tunnel ingress are functions within a node that
in sequence on a stream of packets, not at the same point. Figure 3 typically occur in sequence on a stream of packets, not at the same
is borrowed from [RFC2983] (which was making a similar point about point. Figure 6 is borrowed from [RFC2983] (which was making a
the location of Diffserv traffic conditioning relative to the similar point about the location of Diffserv traffic conditioning
encapsulation function of a tunnel). An in-path Load Regulator can relative to the encapsulation function of a tunnel). An in-path Load
act on packets either at [1 - Before] encapsulation or at [2 - Outer] Regulator can act on packets either at [1 - Before] encapsulation or
after encapsulation. Load Regulation does not ever need to be at [2 - Outer] after encapsulation. Load Regulation does not ever
integrated with the [Encapsulate] function (but it can be for need to be integrated with the [Encapsulate] function (but it can be
efficiency). Therefore we can still maintain that the [Encapsulate] for efficiency). Therefore we can still mandate that the
function always copies CE into the outer header. [Encapsulate] function always copies CE into the outer header.
>>-----[1 - Before]--------[Encapsulate]----[3 - Inner]------------>> >>-----[1 - Before]--------[Encapsulate]----[3 - Inner]---------->>
\ \
\ \
+--------[2 - Outer]--------->> +--------[2 - Outer]------->>
Figure 3: Placement of In-Path Load Regulator Relative to Tunnel Figure 6: Placement of In-Path Load Regulator Relative to Tunnel
Ingress Ingress
Then separately, if there is a Load Regulator at location [2 - Then separately, if there is a Load Regulator at location [2 -
Outer], it might reset CE to ECT(0), say. Then the Congestion Outer], it might reset CE to ECT(0), say. Then the Congestion
Baseline for the lower layer (outer) will be [2 - Outer], while the Baseline for the lower layer (outer) will be [2 - Outer], while the
Congestion Baseline of the inner layer will be unchanged. But how Congestion Baseline of the inner layer will be unchanged. But how
encapsulation works has nothing to do with whether a Load Regulator encapsulation works has nothing to do with whether a Load Regulator
is present or where it is. is present or where it is.
If on the other hand a Load Regulator resets CE at [1 - Before], the If on the other hand a Load Regulator resets CE at [1 - Before], the
skipping to change at page 30, line 12 skipping to change at page 30, line 47
desirable or practical for a node part way along the path to regulate desirable or practical for a node part way along the path to regulate
the load. However, various reasonable proposals for in-path load the load. However, various reasonable proposals for in-path load
regulation have been made from time to time (e.g. fair queuing, regulation have been made from time to time (e.g. fair queuing,
traffic engineering, flow admission control). The IETF has recently traffic engineering, flow admission control). The IETF has recently
chartered a working group to standardise admission control across a chartered a working group to standardise admission control across a
part of a path using pre-congestion notification (PCN) [PCNcharter]. part of a path using pre-congestion notification (PCN) [PCNcharter].
This is of particular relevance here because it involves congestion This is of particular relevance here because it involves congestion
notification with an in-path Load Regulator, it can involve notification with an in-path Load Regulator, it can involve
tunnelling and it certainly involves encapsulation more generally. tunnelling and it certainly involves encapsulation more generally.
We will use the more complex scenario in Figure 4 to tease out all We will use the more complex scenario in Figure 7 to tease out all
the issues that arise when combining congestion notification and the issues that arise when combining congestion notification and
tunnelling with various possible in-path load regulation schemes. In tunnelling with various possible in-path load regulation schemes. In
this case 'I1' and 'E2' break up the path into three separate this case 'I1' and 'E2' break up the path into three separate
congestion control loops. The feedback for these loops is shown congestion control loops. The feedback for these loops is shown
going right to left across the top of the figure. The 'V's are arrow going right to left across the top of the figure. The 'V's are arrow
heads representing the direction of feedback, not letters. But there heads representing the direction of feedback, not letters. But there
are also two tunnels within the middle control loop: 'I1' to 'E1' and are also two tunnels within the middle control loop: 'I1' to 'E1' and
'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS 'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS
core networks. M is a congestion monitoring point, perhaps between core networks. M is a congestion monitoring point, perhaps between
two border routers where the same tunnel continues unbroken across two border routers where the same tunnel continues unbroken across
the border. the border.
______ _______________________________________ _____ ______ _______________________________________ _____
/ \ / \ / \ / \ / \ / \
V \ V M \ V \ V \ V M \ V \
A--->R--->I1===========>E1----->I2=========>==========>E2------->B A--->R--->I1===========>E1----->I2=========>==========>E2------->B
Figure 4: complex Tunnel Scenario Figure 7: complex Tunnel Scenario
The question is, should the congestion markings in the outer exposed The question is, should the congestion markings in the outer exposed
headers of a tunnel represent congestion only since the tunnel headers of a tunnel represent congestion only since the tunnel
ingress or over the whole upstream path from the source of the inner ingress or over the whole upstream path from the source of the inner
header (whatever that may mean)? Or put another way, should 'I1' and header (whatever that may mean)? Or put another way, should 'I1' and
'I2' copy or reset CE markings? 'I2' copy or reset CE markings?
Based on the design principles in Section 4, the answer is that the Based on the design principles in Section 4, the answer is that the
Congestion Baseline should be the nearest upstream interface designed Congestion Baseline should be the nearest upstream interface designed
to regulate traffic load--the Load Regulator. In Figure 4 'A', 'I1' to regulate traffic load--the Load Regulator. In Figure 7 'A', 'I1'
or 'E2' are all Load Regulators. We have shown the feedback loops or 'E2' are all Load Regulators. We have shown the feedback loops
returning to each of these nodes so that they can regulate the load returning to each of these nodes so that they can regulate the load
causing the congestion notification. So the Congestion Baseline causing the congestion notification. So the Congestion Baseline
exposed to M should be 'I1' (the Load Regulator), not 'I2'. exposed to M should be 'I1' (the Load Regulator), not 'I2'.
Therefore I1 should reset any arriving CE markings. In this case, Therefore I1 should reset any arriving CE markings. In this case,
'I1' knows the tunnel to 'E1' is unrelated to its load regulation 'I1' knows the tunnel to 'E1' is unrelated to its load regulation
function. So the load regulation function within 'I1' should be function. So the load regulation function within 'I1' should be
placed at [1 - Before] tunnel encapsulation within 'I1' (using the placed at [1 - Before] tunnel encapsulation within 'I1' (using the
terminology of Figure 3). Then the Congestion Baseline all across terminology of Figure 6). Then the Congestion Baseline all across
the networks from 'I1' to 'E2' in both inner and outer headers will the networks from 'I1' to 'E2' in both inner and outer headers will
be 'I1'. be 'I1'.
The following further examples illustrate how this answer might be The following further examples illustrate how this answer might be
applied: applied:
o We argued in Appendix A that resetting CE on encapsulation could o We argued in Appendix A that resetting CE on encapsulation could
harm PCN excess rate marking, which marks excess traffic for harm PCN excess rate marking, which marks excess traffic for
removal in subsequent round trips. This marking relies on not removal in subsequent round trips. This marking relies on not
marking packets if another node upstream has already marked them marking packets if another node upstream has already marked them
for removal. If there were a tunnel ingress between the two which for removal. If there were a tunnel ingress between the two which
reset CE markings, it would confuse the downstream node into reset CE markings, it would confuse the downstream node into
marking far too much traffic for removal. So why do we say that marking far too much traffic for removal. So why do we say that
'I1' should reset CE, while a tunnel ingress shouldn't? The 'I1' should reset CE, while a tunnel ingress shouldn't? The
answer is that it is the Load Regulator function at 'I1' that is answer is that it is the Load Regulator function at 'I1' that is
resetting CE, not the tunnel encapsulator. The Load Regulator resetting CE, not the tunnel encapsulator. The Load Regulator
needs to set itself as the Congestion Baseline, so the feedback it needs to set itself as the Congestion Baseline, so the feedback it
gets will only be about congestion on links it can relieve itself gets will only be about congestion on links it can relieve itself
by regulating the load into them. When it resets CE markings, it (by regulating the load into them). When it resets CE markings,
knows that something else upstream will have dealt with the it knows that something else upstream will have dealt with the
congestion notifications it removes, given it is part of an end- congestion notifications it removes, given it is part of an end-
to-end admission control signalling loop. It therefore knows that to-end admission control signalling loop. It therefore knows that
previous hops will be covered by other Load Regulators. previous hops will be covered by other Load Regulators.
Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should
follow the new rule for any tunnel ingress and copy congestion follow the new rule for any tunnel ingress and copy congestion
marking into the outer tunnel header. The ingress at 'I1' will marking into the outer tunnel header. The ingress at 'I1' will
happen to copy headers that have already been reset just happen to copy headers that have already been reset just
beforehand. But it doesn't need to know that. beforehand. But it doesn't need to know that.
o [Shayman] suggested feedback of ECN accumulated across an MPLS o [Shayman] suggested feedback of ECN accumulated across an MPLS
skipping to change at page 32, line 4 skipping to change at page 32, line 41
headers. Again, the tunnel encapsulation function at 'I' simply headers. Again, the tunnel encapsulation function at 'I' simply
copies incoming headers, unaware that the load regulator will copies incoming headers, unaware that the load regulator will
subsequently reset its outer headers. subsequently reset its outer headers.
o The PWE3 working group of the IETF is considering the problem of o The PWE3 working group of the IETF is considering the problem of
how and whether an aggregate edge-to-edge pseudo-wire emulation how and whether an aggregate edge-to-edge pseudo-wire emulation
should respond to congestion [I-D.ietf-pwe3-congestion-frmwk]. should respond to congestion [I-D.ietf-pwe3-congestion-frmwk].
Although the study is still at the requirements stage, some Although the study is still at the requirements stage, some
(controversial) solution proposals include in-path load regulation (controversial) solution proposals include in-path load regulation
at the ingress to the tunnel that could lead to tunnel at the ingress to the tunnel that could lead to tunnel
arrangements with similar complexity to that of Figure 4. arrangements with similar complexity to that of Figure 7.
These are not contrived scenarios--they could be a lot worse. For These are not contrived scenarios--they could be a lot worse. For
instance, a host may create a tunnel for IPsec which is placed inside instance, a host may create a tunnel for IPsec which is placed inside
a tunnel for Mobile IP over a remote part of its path. And around a tunnel for Mobile IP over a remote part of its path. And around
this all we may have MPLS labels being pushed and popped as packets this all we may have MPLS labels being pushed and popped as packets
pass across different core networks. Similarly, it is possible that pass across different core networks. Similarly, it is possible that
subnets could be built from link technology (e.g. future Ethernet subnets could be built from link technology (e.g. future Ethernet
switches) so that link headers being added and removed could involve switches) so that link headers being added and removed could involve
congestion notification in future Ethernet link headers with all the congestion notification in future Ethernet link headers with all the
same issues as with IP in IP tunnels. same issues as with IP in IP tunnels.
 End of changes. 65 change blocks. 
162 lines changed or deleted 172 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/