Diff: draft-briscoe-tsvwg-ecn-tunnel-00.txt - draft-briscoe-tsvwg-ecn-tunnel-01.txt

	draft-briscoe-tsvwg-ecn-tunnel-00.txt	draft-briscoe-tsvwg-ecn-tunnel-01.txt

	Transport Area Working Group B. Briscoe	Transport Area Working Group B. Briscoe
	Internet-Draft BT	Internet-Draft BT

	Intended status: Standards Track June 30, 2007	Intended status: Standards Track July 14, 2008
	Expires: January 1, 2008	Expires: January 15, 2009

	Layered Encapsulation of Congestion Notification	Layered Encapsulation of Congestion Notification

	draft-briscoe-tsvwg-ecn-tunnel-00	draft-briscoe-tsvwg-ecn-tunnel-01

	Status of this Memo	Status of this Memo

	By submitting this Internet-Draft, each author represents that any	By submitting this Internet-Draft, each author represents that any
	applicable patent or other IPR claims of which he or she is aware	applicable patent or other IPR claims of which he or she is aware
	have been or will be disclosed, and any of which he or she becomes	have been or will be disclosed, and any of which he or she becomes
	aware will be disclosed, in accordance with Section 6 of BCP 79.	aware will be disclosed, in accordance with Section 6 of BCP 79.

	Internet-Drafts are working documents of the Internet Engineering	Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF), its areas, and its working groups. Note that	Task Force (IETF), its areas, and its working groups. Note that

	skipping to change at page 1, line 34	skipping to change at page 1, line 34
	and may be updated, replaced, or obsoleted by other documents at any	and may be updated, replaced, or obsoleted by other documents at any
	time. It is inappropriate to use Internet-Drafts as reference	time. It is inappropriate to use Internet-Drafts as reference
	material or to cite them other than as "work in progress."	material or to cite them other than as "work in progress."

	The list of current Internet-Drafts can be accessed at	The list of current Internet-Drafts can be accessed at
	http://www.ietf.org/ietf/1id-abstracts.txt.	http://www.ietf.org/ietf/1id-abstracts.txt.

	The list of Internet-Draft Shadow Directories can be accessed at	The list of Internet-Draft Shadow Directories can be accessed at
	http://www.ietf.org/shadow.html.	http://www.ietf.org/shadow.html.


	This Internet-Draft will expire on January 1, 2008.	This Internet-Draft will expire on January 15, 2009.

	Copyright Notice

	Copyright (C) The IETF Trust (2007).

	Abstract	Abstract

	This document redefines how the explicit congestion notification	This document redefines how the explicit congestion notification
	(ECN) field of the outer IP header of a tunnel should be constructed.	(ECN) field of the outer IP header of a tunnel should be constructed.
	It brings all IP in IP tunnels (v4 or v6) into line with the way	It brings all IP in IP tunnels (v4 or v6) into line with the way

	IPsec tunnels now construct the ECN field, ensuring that the outer	IPsec tunnels now construct the ECN field. It includes a thorough
	header reveals any congestion experienced so far on the path. It	analysis of the reasoning for this change and the implications. It
	specifies the default ECN tunneling behaviour for any Diffserv per-	also gives guidelines on the encapsulation of IP congestion
	hop behaviour (PHB), but also gives general principles to guide the	notification by any outer header, whether encapsulated in an IP
	design of alternate congestion marking behaviours for specific PHBs	tunnel or in a lower layer header. Following these guidelines should
	and for lower layer congestion notification schemes.	help interworking, if the IETF or other standards bodies specify any
		new encapsulation of congestion notification.

	Table of Contents	Table of Contents

	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3

	2. Requirements notation . . . . . . . . . . . . . . . . . . . . 5	1.1. The Need for Rationalisation . . . . . . . . . . . . . . . 4
	3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 6	1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 5
	3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 6	1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6
	3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 7	2. Requirements Language . . . . . . . . . . . . . . . . . . . . 8
	3.3. Management Constraints . . . . . . . . . . . . . . . . . . 8	3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 8
	4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 9	3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 8
	5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 11	3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 10
	6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 12	3.3. Management Constraints . . . . . . . . . . . . . . . . . . 11
	7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 13	4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 12
	8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14	4.1. Design Guidelines for New Encapsulations of Congestion
	9. Security Considerations . . . . . . . . . . . . . . . . . . . 14	Notification . . . . . . . . . . . . . . . . . . . . . . . 13
	10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 14	5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 15
	11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15	6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 16
	12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 15	7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 18
	13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15	8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19
	13.1. Normative References . . . . . . . . . . . . . . . . . . . 15	9. Security Considerations . . . . . . . . . . . . . . . . . . . 19
	13.2. Informative References . . . . . . . . . . . . . . . . . . 16	10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 21
	Appendix A. In-path Load Regulation . . . . . . . . . . . . . . . 17	11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22
	Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 20	12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 22
	Intellectual Property and Copyright Statements . . . . . . . . . . 21	13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22
		13.1. Normative References . . . . . . . . . . . . . . . . . . . 22
		13.2. Informative References . . . . . . . . . . . . . . . . . . 23
		Appendix A. Why resetting CE on encapsulation harms PCN . . . . . 25
		Appendix B. Contribution to Congestion across a Tunnel . . . . . 25
		Appendix C. Ideal Decapsulation Rules . . . . . . . . . . . . . . 27
		Appendix D. Non-Dependence of Tunnelling on In-path Load
		Regulation . . . . . . . . . . . . . . . . . . . . . 28
		D.1. Dependence of In-Path Load Regulation on Tunnelling . . . 29
		Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 32
		Intellectual Property and Copyright Statements . . . . . . . . . . 34

		Changes from previous drafts (to be removed by the RFC Editor)

		From -00 to -01:

		* Related everything conceptually to the uniform and pipe models
		of RFC2983 on Diffserv Tunnels, and completely removed the
		dependence of tunnelling behaviour on the presence of any in-
		path load regulation by using the [1 - Before] [2 - Outer]
		function placement concepts from RFC2983.

		* Added specifc cases where the existing standards limit new
		proposals.

		* Added sub-structure to Introduction (Need for Rationalisation,
		Roadmap), added new Introductory subsection on "Scope" and
		improved clarity

		* Added Design Guidelines for New Encapsulations of Congestion
		Notification

		* Considerably clarified the Backward Compatibility section

		* Considerably extended the Security Considerations section

		* Summarised the primary rationale much better in the conclusions

		* Added numerous extra acknowledgements

		* Added Appendix A. "Why resetting CE on encapsulation harms
		PCN", Appendix B. "Contribution to Congestion across a Tunnel"
		and Appendix C. "Ideal Decapsulation Rules"

		* Changed Appendix A "In-path Load Regulation" to "Non-Dependence
		of Tunnelling on In-path Load Regulation" and added sub-section
		on "Dependence of In-Path Load Regulation on Tunnelling"

	1. Introduction	1. Introduction

	This document redefines how the explicit congestion notification	This document redefines how the explicit congestion notification
	(ECN) field [RFC3168] of the outer IP header of a tunnel should be	(ECN) field [RFC3168] of the outer IP header of a tunnel should be
	constructed. It brings all IP in IP tunnels (v4 or v6) into line	constructed. It brings all IP in IP tunnels (v4 or v6) into line
	with the way IPsec tunnels [RFC4301] now construct the ECN field,	with the way IPsec tunnels [RFC4301] now construct the ECN field,
	ensuring that the outer header reveals any congestion experienced so	ensuring that the outer header reveals any congestion experienced so

	far on the path. Although this memo focuses on IP in IP tunnelling	far on the whole path, not just since the last tunnel ingress.
	it also gives generalised advice for any encapsulation by lower layer
	headers.

	ECN allows a congested resource to notify the onset of congestion	ECN allows a congested resource to notify the onset of congestion
	without having to drop packets, by explicitly marking a proportion of	without having to drop packets, by explicitly marking a proportion of

	packets with the congestion experienced (CE) codepoint. Congestion	packets with the congestion experienced (CE) codepoint. Because
	notification is unusual in that it propagates from the physical layer	congestion is exhaustion of a physical resource, if the transport
	upwards to the transport layer, because congestion is exhaustion of a	layer is to deal with congestion, congestion notification must
	physical resource. The transport layer can directly detect loss of a	propagate upwards; from the physical layer to the transport layer.
	packet (or frame) by a lower layer. But if a lower layer marks a	The transport layer can directly detect loss of a packet (or frame)
	packet (or frame) to notify incipient congestion, this marking has to	by a lower layer. But if a lower layer marks rather than drops a
	be explicitly copied up the layers at every header decapsulation.	forward-travelling data packet (or frame) in order to notify
	So, at each decapsulation of an outer (lower layer) header a	incipient congestion, this marking has to be explicitly copied up the
	congestion marking has to be arranged to propagate into the forwarded	layers at every header decapsulation. So, at each decapsulation of
	(upper layer) header. It must continue upwards until it reaches the	an outer (lower layer) header a congestion marking has to be arranged
	destination transport, which should feed congestion notification back	to propagate into the forwarded (upper layer) header. It must
	to the source transport.	continue upwards until it reaches the destination transport. Then
		typically the destination feeds this congestion notification back to
		the source transport. Given encapsulation by lower layer headers is
		functionally similar to tunnelling, it is necessary to arrange
		similar propagation of congestion notification up the layers. For
		instance, ECN and its propagation up the layers has recently been
		specified for MPLS [RFC5129].


	Note that often lower layer resources are arranged to be protected by	As packets pass up the layers, current specifications of
	higher layer buffers, so instead of blocking occurring at the lower	decapsulation behaviours are largely all consistent and correct.
	layer, it occurs when the higher layer queue overflows. Thus, non-	However, as packets pass down the layers, specifications of
	blocking link and physical layer technologies do not have to	encapsulation behaviours are not consistent. This document is
	implement congestion notification, which can be introduced solely in	primarily aimed at rationalising encapsulation. (Nevertheless,
	IP layer active queue management (AQM). However, if we want to use	Appendix C explains why the consistency of decapsulation solutions
	congestion notification, we have to arrange for it to be explicitly	will not last for long and proposes a fix to decapsulation rules as
	copied up the layers when IP is tunnelled in IP (and if a particular	well. The IETF can then discuss whether to rationalise decapsulation
	link layer technology isn't protected from blocking by network layer	at the same time as encapsulation.)
	queues).
		1.1. The Need for Rationalisation

	IPsec tunnel mode is a specific form of tunnelling that can hide the	IPsec tunnel mode is a specific form of tunnelling that can hide the
	inner headers. Because the ECN field has to be mutable, it cannot be	inner headers. Because the ECN field has to be mutable, it cannot be
	covered by IPsec encryption or authentication calculations.	covered by IPsec encryption or authentication calculations.
	Therefore concern has been raised in the past that the ECN field	Therefore concern has been raised in the past that the ECN field
	could be used as a low bandwidth covert channel to communicate with	could be used as a low bandwidth covert channel to communicate with
	someone on the unprotected public Internet even if an end-host is	someone on the unprotected public Internet even if an end-host is
	restricted to only communicate with the public Internet through an	restricted to only communicate with the public Internet through an

	IPsec gateway. However, the recently updated version of IPsec	IPsec gateway. However, the updated version of IPsec [RFC4301] chose
	[RFC4301] chose not to block this covert channel, deciding that the	not to block this covert channel, deciding that the threat could be
	threat could be managed given the channel bandwidth is so limited	managed given the channel bandwidth is so limited (ECN is a 2-bit
	(ECN is a 2-bit field).	field).

	An unfortunate sequence of standards actions leading up to this	An unfortunate sequence of standards actions leading up to this
	latest change in IPsec has left us with nearly the worst of all	latest change in IPsec has left us with nearly the worst of all
	possible combinations of outcomes, despite the best endeavours of	possible combinations of outcomes, despite the best endeavours of

	everyone concerned. Even though information about congestion	everyone concerned. The controversy has been over whether to reveal
	experienced on the upstream path has various uses if it is revealed	information about congestion experienced on the path upstream of the
		tunnel ingress. Even though this has various uses if it is revealed
	in the outer header of a tunnel, when ECN was standardised[RFC3168]	in the outer header of a tunnel, when ECN was standardised[RFC3168]

	it was decided that all IP in IP tunnels should hide upstream	it was decided that all IP in IP tunnels should hide this upstream
	congestion information simply to avoid the extra complexity of two	congestion simply to avoid the extra complexity of two different
	different mechanisms for IPsec and non-IPsec tunnels. However, now	mechanisms for IPsec and non-IPsec tunnels. However, now that
	that [RFC4301] IPsec tunnels deliberately no longer hide this	[RFC4301] IPsec tunnels deliberately no longer hide this information,
	information, we are left in the perverse position where non-IPsec	we are left in the perverse position where non-IPsec tunnels still
	tunnels still hide congestion information unnecessarily. This	hide congestion information unnecessarily. This document is designed
	document is designed to correct that anomaly.	to correct that anomaly.


	Specifically, RFC3168 says that, if a tunnel supports ECN (termed a	Specifically, RFC3168 says that, if a tunnel fully supports ECN
	'full-functionality' ECN tunnel), the tunnel ingress must not copy a	(termed a 'full-functionality' ECN tunnel in [RFC3168]), the tunnel
	CE marking from the inner header into the outer header that it	ingress must not copy a CE marking from the inner header into the
	creates. Instead the tunnel ingress has to set the ECN field of the	outer header that it creates. Instead the tunnel ingress has to set
	outer header to ECT(0) (i.e. codepoint 10). We term this 'resetting'	the ECN field of the outer header to ECT(0) (i.e. codepoint 10). We
	a CE codepoint. However, RFC4301 reverses this, stating that the	term this 'resetting' a CE codepoint. However, RFC4301 reverses
	tunnel ingress must simply copy the ECN field from the inner to the	this, stating that the tunnel ingress must simply copy the ECN field
	outer header. The main purpose of this document is to carry over	from the inner to the outer header. The main purpose of this
	this new relaxed attitude to covert channels from IPsec to all IP in	document is to carry the new behaviour of IPsec over to all IP in IP
	IP tunnels, so all tunnel ingress nodes consistently copy the ECN	tunnels, so all tunnel ingress nodes consistently copy the ECN field.
	field.


	The rest of the document deals with the knock-on effects of this	Why does it matter if we have different ECN encapsulation behaviours
	apparently minor change. It is organised as follows:	for IPsec and non-IPsec tunnels? The general argument is that
		gratuitous inconsistency constrains the available design space and
		makes it harder to design networks and new protocols that work
		predictably.

		Already complicated constraints have had to be added to a standards
		track congestion marking proposal. The section of the pre-congestion
		notification (PCN) architecture [I-D.ietf-pcn-architecture] on
		tunnelling says PCN works correctly in the presence of RFC4301 IPsec
		encapsulation (and RFC5129 MPLS encapsulation). However it doesn't
		work with RFC3168 IP in IP encapsulation (Appendix A explains why).

		Section 3 assesses further security, control and management functions
		that cannot be achieved in each case (resetting vs copying CE
		markings). It finds that resetting CE makes life difficult in a
		number of directions, while copying CE harms nothing (other than
		opening a low bit-rate covert channel vulnerability which the
		Security Area deems is manageable).

		1.2. Document Roadmap

		Most of the document gives a thorough analysis of the knock-on
		effects of the apparently minor change to tunnel encapsulation. The
		reader may jump to Section 5 if only interested in standards actions
		impacting implementation. The whole document is organised as
		follows:

	o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to	o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to
	'switch in' different behaviours for marking the ECN field, just	'switch in' different behaviours for marking the ECN field, just
	as it switches in different per-hop behaviours (PHBs) for	as it switches in different per-hop behaviours (PHBs) for
	scheduling. Therefore we cannot only discuss the ECN protocol	scheduling. Therefore we cannot only discuss the ECN protocol
	that RFC3168 gives as a default. We need to also give guidance	that RFC3168 gives as a default. We need to also give guidance
	for possible different marking schemes. Therefore in Section 3 we	for possible different marking schemes. Therefore in Section 3 we

	lay out the design constraints when tunneling congestion	lay out the design constraints when tunnelling congestion
	notification.	notification.

	o Then in Section 4 we resolve the tensions between these	o Then in Section 4 we resolve the tensions between these

	constraints to give general design principles on how a tunnel	constraints to give general design principles and guidelines on
	should process congestion notification; principles that could	how a tunnel should process congestion notification; principles
	apply to any marking behaviour for any PHB, not just the default	that could apply to any marking behaviour for any PHB, not just
	in RFC3168. In particular, we examine the underlying principles	the default in RFC3168. In particular, we examine the underlying
	behind whether CE should be reset or copied into the outer header	principles behind whether CE should be reset or copied into the
	at the ingress to a tunnel--or indeed at the ingress of any	outer header at the ingress to a tunnel--or indeed at the ingress
	layered encapsulation of headers with congestion notification	of any layered encapsulation of headers with congestion
	fields.	notification fields. We end this section with a bulleted list of
		more design guidelines for new encapsulations of congestion
		notification.


	o Section 5 then confirms the precise rules for the default ECN	o Section 5 then uses precise standards terminology to confirm the
	tunnelling behaviour based on the above design principles. These	rules for the default ECN tunnelling behaviour based on the above
	rules apply to all PHBs, unless stated otherwise in the	design principles.
	specification of a PHB. There is no requirement for a PHB to
	state anything about ECN behaviour if the default behaviour is
	sufficient.

	o Extending the new IPsec tunnel ingress behaviour to all IP in IP	o Extending the new IPsec tunnel ingress behaviour to all IP in IP

	tunnels causes one further knock-on effect that is dealt with in	tunnels requires consideration of backwards compatibility, which
	Section 6 on Backward Compatibility. If one end of an IPsec	is covered in Section 6 and changes from earlier RFCs are brought
	tunnel is compliant with [RFC4301], assuming IKEv2 key management	together in Section 7.
	is used, the other end can be guaranteed to also be [RFC4301]
	compliant. So there is no backward compatibility problem with
	IKEv2 RFC4301 IPsec tunnels. But once we extend our scope to any
	IP in IP tunnel, we have to cater for the possibility that a
	tunnel ingress compliant with this specification is sending to an
	egress that doesn't even understand ECN (e.g. a legacy [RFC2003]
	tunnel egress). If a tunnel ingress copied incoming ECN-capable
	headers into outer headers, then a legacy tunnel egress would
	discard any congestion markings added to the outer header within
	the tunnel. ECN-capable traffic sources would not see any
	congestion feedback and instead continually ratchet up their share
	of the bandwidth without realising that cross-flows from other ECN
	sources were continually having to ratchet down.


	The scope of this document is all IP in IP tunnelling, irrespective	o Finally, a number of security considerations are discussed and
	of whether IPv4 or IPv6 is used for either of the inner and outer	conclusions are drawn.
	headers. The document only concerns wire protocol processing at
	tunnel endpoints and makes no changes or recommendations concerning
	algorithms for congestion marking or congestion response. The
	general design principles of Section 4 may also be useful when any
	datagram/packet/frame with a congestion notification capability is
	encapsulated by a connectionless outer header [BBnet] that might also
	support a congestion notification capability in the future as
	discussed in S.9.3 of [RFC3168] (e.g. IP encapsulated in L2TP
	[RFC2661], GRE [RFC1701] or PPTP [RFC2637]). However, of course, the
	IETF does not have standards authority over every link or tunnel
	protocol, so this document focuses only on IP in IP.
	[I-D.ietf-tsvwg-ecn-mpls] applies these principles to IP in MPLS and
	to MPLS in MPLS.


	2. Requirements notation	1.3. Scope

		This document only concerns wire protocol processing at tunnel
		endpoints and makes no changes or recommendations concerning
		algorithms for congestion marking or congestion response.

		This document specifies a common, default congestion encapsulation
		for any IP in IP tunnelling, based on that now specified for IPsec.
		It applies irrespective of whether IPv4 or IPv6 is used for either of
		the inner and outer headers. It applies to all PHBs, unless stated
		otherwise in the specification of a PHB. It is intended to be a good
		trade off between somewhat conflicting security, control and
		management requirements.

		Nonetheless, if necessary, an alternate congestion encapsulation
		behaviour can be introduced as part of the definition of an alternate
		congestion marking scheme used by a specific Diffserv PHB (see S.5 of
		[RFC3168] and [RFC4774]). When designing such new encapsulation
		schemes, the principles in Section 4 should be followed as closely as
		possible. There is no requirement for a PHB to state anything about
		ECN tunnelling behaviour if the default behaviour is sufficient.

		Often lower layer resources (e.g. a point-to-point Ethernet link) are
		arranged to be protected by higher layer buffers, so instead of
		congestion occurring at the lower layer, it merely causes the queue
		from the higher layer to overflow. Such non-blocking link and
		physical layer technologies do not have to implement congestion
		notification, which can be introduced solely in the active queue
		management (AQM) from the IP layer. However, not all link layer
		technologies are always protected from congestion by buffers at
		higher layers (e.g. a subnetwork of Ethernet links and switches can
		congest internally). In these cases, when adding congestion
		notification to lower layers, we have to arrange for it to be
		explicitly copied up the layers, just as when IP is tunnelled in IP.

		As well as guiding alternate IP in IP tunnelling schemes, the design
		guidelines of Section 4 are intended to be followed when IP packets
		are encapsulated by any connectionless datagram/packet/frame where
		the outer header is designed to support a congestion notification
		capability. [RFC5129] already deals with handling ECN for IP in MPLS
		and MPLS in MPLS, and S.9.3 of [RFC3168] lists IP encapsulated in
		L2TP [RFC2661], GRE [RFC1701] or PPTP [RFC2637] as possible examples
		where ECN may be added in future.

		Of course, the IETF does not have standards authority over every link
		or tunnel protocol, so this document merely aims to define the
		interface between IP ECN and lower layer congestion notification.
		Then the IETF or the relevant standards body can be free to define
		the specifics of each lower layer scheme, but a common interface
		should ensure interworking across all technologies.

		Note that just because there is forward congestion notification in a
		lower layer protocol, if the lower layer has its own feedback and
		load regulation, there is no need to propagate it up the layers. For
		instance, FECN (forward ECN) has been present in Frame Relay and EFCI
		(explicit forward congestion indication) in ATM [ITU-T.I.371] for a
		long time, but they have been used for internal management rather
		than being propagated to endpoint transports for them to control end-
		to-end congestion.

		[RFC2983] is a comprehensive primer on differentiated services and
		tunnels. Given ECN raises similar issues to differentiated services
		when interacting with tunnels, useful concepts introduced in RFC2983
		are used throughout, with brief recaps of the explanations where
		necessary.

		2. Requirements Language

	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this

	document are to be interpreted as described in [RFC2119].	document are to be interpreted as described in RFC 2119 [RFC2119].

	3. Design Constraints	3. Design Constraints

	Tunnel processing of a congestion notification field has to meet	Tunnel processing of a congestion notification field has to meet
	congestion control needs without creating new information security	congestion control needs without creating new information security
	vulnerabilities (if information security is required).	vulnerabilities (if information security is required).

	3.1. Security Constraints	3.1. Security Constraints

	Information security can be assured by using various end to end	Information security can be assured by using various end to end

	skipping to change at page 6, line 48	skipping to change at page 9, line 10

	IPsec encryption is typically used to prevent 'M' seeing messages	IPsec encryption is typically used to prevent 'M' seeing messages
	from 'A' to 'B'. IPsec authentication is used to prevent 'M'	from 'A' to 'B'. IPsec authentication is used to prevent 'M'
	masquerading as the sender of messages from 'A' to 'B' or altering	masquerading as the sender of messages from 'A' to 'B' or altering
	their contents. But 'I' can also use IPsec tunnel mode to allow 'A'	their contents. But 'I' can also use IPsec tunnel mode to allow 'A'
	to communicate with 'B', but impose encryption to prevent 'A' leaking	to communicate with 'B', but impose encryption to prevent 'A' leaking
	information to 'M'. Or 'E' can insist that 'I' uses tunnel mode	information to 'M'. Or 'E' can insist that 'I' uses tunnel mode
	authentication to prevent 'M' communicating information to 'B'.	authentication to prevent 'M' communicating information to 'B'.
	Mutable IP header fields such as the ECN field (as well as the TTL/	Mutable IP header fields such as the ECN field (as well as the TTL/
	Hop Limit and DS fields) cannot be included in the cryptographic	Hop Limit and DS fields) cannot be included in the cryptographic

	calculations of IPsec. Therefore, if 'I' encrypts but copies these	calculations of IPsec. Therefore, if 'I' copies these mutable fields
	mutable fields into the outer header that is exposed across the	into the outer header that is exposed across the tunnel it will have
	tunnel it will have allowed a covert channel from 'A' to M. And if	allowed a covert channel from 'A' to M that bypasses its encryption
	'E' copies these fields from the outer header to the inner, even if	of the inner header. And if 'E' copies these fields from the outer
	it validates authentication from 'I', it will have allowed a covert	header to the inner, even if it validates authentication from 'I', it
	channel from 'M' to 'B'.	will have allowed a covert channel from 'M' to 'B'.

	ECN at the IP layer is designed to carry information about congestion	ECN at the IP layer is designed to carry information about congestion

	from a congested resource to some downstream node that will feed the	from a congested resource towards downstream nodes. Typically a
	information back somehow to the point upstream of the congestion that	downstream transport might feed the information back somehow to the
	can regulate the load on the congested resource. In terms of the	point upstream of the congestion that can regulate the load on the
	above scenario, ECN is effectively intended to create an information	congested resource, but other actions are possible (see [RFC3168]
	channel from 'M' to 'B', for 'B' to forward to 'A'. Therefore the	S.6). In terms of the above unicast scenario, ECN is typically
	goals of IPsec and ECN are mutually incompatible.	intended to create an information channel from 'M' to 'B', for 'B' to
		forward to 'A'. Therefore the goals of IPsec and ECN are mutually
		incompatible.

	With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says,	With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says,
	"controls are provided to manage the bandwidth of this [covert]	"controls are provided to manage the bandwidth of this [covert]
	channel". Using the ECN processing rules of RFC4301, the channel	channel". Using the ECN processing rules of RFC4301, the channel
	bandwidth is two bits per datagram from 'A' to 'M' and one bit per	bandwidth is two bits per datagram from 'A' to 'M' and one bit per

	datagram from 'M' to 'A' because 'E' limits the combinations it will	datagram from 'M' to 'A' (because 'E' limits the combinations of the
	copy. In both cases the covert channel bandwidth is further reduced	2-bit ECN field that it will copy). In both cases the covert channel
	by noise from any real congestion marking. RFC4301 therefore implies	bandwidth is further reduced by noise from any real congestion
	that these covert channels are sufficiently limited to be considered	marking. RFC4301 therefore implies that these covert channels are
	a manageable threat. However, with respect to the larger (6b) DS	sufficiently limited to be considered a manageable threat. However,
	field, the same section of RFC4301 says not copying is the default,	with respect to the larger (6b) DS field, the same section of RFC4301
	but a configuration option can allow copying "to allow a local	says not copying is the default, but a configuration option can allow
	administrator to decide whether the covert channel provided by	copying "to allow a local administrator to decide whether the covert
	copying these bits outweighs the benefits of copying". Of course, an	channel provided by copying these bits outweighs the benefits of
	administrator considering copying of the DS field has to take into	copying". Of course, an administrator considering copying of the DS
	account that it could be concatenated with the ECN field giving an 8b	field has to take into account that it could be concatenated with the
	per datagram channel.	ECN field giving an 8b per datagram covert channel.

		Thus, for tunnelling the 6b Diffserv field two conceptual models have
		had to be defined so that administrators can trade off security
		against the needs of traffic conditioning [RFC2983]:

		The uniform model: where the DIffserv field is preserved end-to-end
		by copying into the outer header on encapsulation and copying from
		the outer header on decapsulation.

		The pipe model: where the outer header is independent of that in the
		inner header so it hides the Diffserv field of the inner header
		from any interaction with nodes along the tunnel.

		However, for ECN, the new IPsec security architecture in RFC4301 only
		standardised one tunnelling model equivalent to the uniform model.
		It deemed that simplicity was more important than allowing
		administrators the option of a tiny increment in security especially
		given not copying congestion indications could seriously harm
		everyone's network service.

	3.2. Control Constraints	3.2. Control Constraints

	Congestion control requires that any congestion notification marked	Congestion control requires that any congestion notification marked
	into packets by a resource will be able to traverse a feedback loop	into packets by a resource will be able to traverse a feedback loop
	back to a node capable of controlling the load on that resource. To	back to a node capable of controlling the load on that resource. To

	avoid ambiguity later rather than calling this node the data source	be precise, rather than calling this node the data source, we will
	we will call it the Load Regulator. This will allow us to deal with	call it the Load Regulator. This will allow us to deal with
	exceptional cases where load is not regulated by the data source, but	exceptional cases where load is not regulated by the data source, but

	usually the two will be synonymous. Note the term "a node _capable	usually the two terms will be synonymous. Note the term "a node
	of_ controlling the load" deliberately includes a source application	_capable of_ controlling the load" deliberately includes a source
	that doesn't actually control the load but ought to (e.g. an	application that doesn't actually control the load but ought to (e.g.
	application without congestion control that uses UDP).	an application without congestion control that uses UDP).

	A--->R--->I=========>M=========>E-------->B	A--->R--->I=========>M=========>E-------->B

	Figure 2: Simple Tunnel Scenario	Figure 2: Simple Tunnel Scenario


	We now consider a similar tunneling scenario to the IPsec one just	We now consider a similar tunnelling scenario to the IPsec one just
	described, but without the different security domains so we can just	described, but without the different security domains so we can just
	focus on ensuring the control loop and management monitoring can work	focus on ensuring the control loop and management monitoring can work
	(Figure 2). If we want resources in the tunnel to be able to	(Figure 2). If we want resources in the tunnel to be able to

	explicitly notify congestion and the feedback loop is from 'B' to	explicitly notify congestion and the feedback path is from 'B' to
	'A', it will certainly be necessary for 'E' to copy any CE marking	'A', it will certainly be necessary for 'E' to copy any CE marking
	from the outer header to the inner header for onward transmission to	from the outer header to the inner header for onward transmission to
	'B', otherwise congestion notification from resources like 'M' cannot	'B', otherwise congestion notification from resources like 'M' cannot
	be fed back to the Load Regulator ('A'). But it doesn't seem	be fed back to the Load Regulator ('A'). But it doesn't seem
	necessary for 'I' to copy CE markings from the inner to the outer	necessary for 'I' to copy CE markings from the inner to the outer
	header. For instance, if resource 'R' is congested, it can send	header. For instance, if resource 'R' is congested, it can send
	congestion information to 'B' using the congestion field in the inner	congestion information to 'B' using the congestion field in the inner
	header without 'I' copying the congestion field into the outer header	header without 'I' copying the congestion field into the outer header

	and 'E' copying it back to the inner header. 'E' can then write any	and 'E' copying it back to the inner header. 'E' can still write any
	additional congestion marking introduced across the tunnel into the	additional congestion marking introduced across the tunnel into the
	congestion field of the inner header.	congestion field of the inner header.


	Indeed, this arrangement can be extended to multi-level congestion
	marking (such as that proposed for PCN [PCN-arch]) as long as all the
	marks have unambiguously ranked values. For instance, if a
	hypothetical multi-level marking scheme for PCN had PCN-capable
	codepoints ranked 1, 2 and 3, then, if 'I' reset the outer congestion
	field to the lowest ranked value that is PCN-capable (1), 'E' would
	simply write the highest ranked of the inner and outer congestion
	markings into the forwarded header. For instance, if the inner
	marking on arrival at 'I' was 3 and 'I' reset the outer to 1, but 'M'
	subsequently set it to 2, then the header forwarded by 'E' would be
	max(3,2) = 3.

	It might be useful for the tunnel egress to be able to tell whether	It might be useful for the tunnel egress to be able to tell whether
	congestion occurred across a tunnel or upstream of it. If outer	congestion occurred across a tunnel or upstream of it. If outer

	header congestion marking was reset at the tunnel ingress ('I'), by	header congestion marking was reset by the tunnel ingress ('I'), at
	the end of a tunnel ('E') the outer headers would indicate congestion	the end of a tunnel ('E') the outer headers would indicate congestion
	experienced across the tunnel ('I' to 'E'), while the inner header	experienced across the tunnel ('I' to 'E'), while the inner header

	would indicate congestion upstream of 'I'. But the same information	would indicate congestion upstream of 'I'. But similar information
	could be gleaned even if the tunnel ingress copied the inner to the	can be gleaned even if the tunnel ingress copies the inner to the
	outer headers. By the end of the tunnel ('E'), any packet with an	outer headers. At the end of the tunnel ('E'), any packet with an
	_extra_ mark in the outer header relative to the inner header would	_extra_ mark in the outer header relative to the inner header
	indicate congestion across the tunnel ('I' to 'E'), while the inner	indicates congestion across the tunnel ('I' to 'E'), while the inner
	header would still indicate congestion upstream of ('I').	header would still indicate congestion upstream of ('I'). Appendix B
		gives a more precise method for inferring the congestion level
		introduced across a tunnel.

	All this shows that 'E' can preserve the control loop irrespective of	All this shows that 'E' can preserve the control loop irrespective of
	whether 'I' copies congestion notification into the outer header or	whether 'I' copies congestion notification into the outer header or
	resets it.	resets it.


		That is the situation for existing control arrangements but, because
		copying reveals more information, it would open up possibilities for
		better control system designs. For instance, Appendix A describes
		how resetting CE marking at a tunnel ingress confuses a proposed
		congestion marking scheme on the standards track. It ends up
		removing excessive amounts of traffic unnecessarily. Whereas copying
		CE markings at ingress leads to the correct control behaviour.

	3.3. Management Constraints	3.3. Management Constraints

	As well as control, there are also management constraints.	As well as control, there are also management constraints.
	Specifically, a management system may monitor congestion markings in	Specifically, a management system may monitor congestion markings in
	passing packets, perhaps at the border between networks as part of a	passing packets, perhaps at the border between networks as part of a
	service level agreement. For instance, monitors at the borders of	service level agreement. For instance, monitors at the borders of
	autonomous systems may need to measure how much congestion has	autonomous systems may need to measure how much congestion has
	accumulated since the original source to determine between them how	accumulated since the original source to determine between them how
	much of the congestion is contributed by each domain.	much of the congestion is contributed by each domain.


	Therefore it should be clear how far back in the path the congestion	Therefore, when monitoring the middle of a path, it should be
	markings have accumulated from. In this document we term this the	possible to establish how far back in the path congestion markings
	baseline of the congestion marking, i.e. the source of the layer that	have accumulated from. In this document we term this the baseline of
	last reset rather than copied the congestion notification field when	congestion marking (or the Congestion Baseline), i.e. the source of
	creating an outer header. Given some tunnels cross domain borders	the layer that last reset (or created) the congestion notification
	(e.g. consider M in Figure 2 is monitoring a border), it is therefore	field. Given some tunnels cross domain borders (e.g. consider M in
	desirable for 'I' to copy congestion accumulated so far into the	Figure 2 is monitoring a border), it would therefore be desirable for
	outer headers exposed across the tunnel.	'I' to copy congestion accumulated so far into the outer headers
		exposed across the tunnel.


	Appendix A discusses various scenarios where the Load Regulator lies	Appendix D discusses various scenarios where the Load Regulator lies
	in-path, not at the source host as we would typically expect. It	in-path, not at the source host as we would typically expect. It

	concludes that the baseline for congestion notification should be	concludes that a Congestion Baseline is determined by where the Load
	determined by where the Load Regulator function is, whether it is at	Regulator function is, which should be identified in the transport
	the source host or within the path. Therefore every tunnel ingress	layer, not by addresses in network layer headers. This applies
	should copy the ECN field into the outer header it creates unless it	whether the Load Regulator is at the source host or within the path.
	is also a Load Regulator, in which case it should reset any CE	The appendix also discusses where a Load Regulator function should be
	markings, which is an exception to the normal copying rule for a	located relative to a local encapsulation function.
	tunnel ingress.

	4. Design Principles	4. Design Principles

	The constraints from the three perspectives of security, control and	The constraints from the three perspectives of security, control and
	management in Section 3 are somewhat in tension as to whether a	management in Section 3 are somewhat in tension as to whether a
	tunnel ingress should copy congestion markings into the outer header	tunnel ingress should copy congestion markings into the outer header
	it creates or reset them. From the control perspective either	it creates or reset them. From the control perspective either

	copying or resetting works. From the management perspective copying	copying or resetting works for existing arrangements, but copying has
	is preferable (with the exception of an in-path load regulator).	more potential for simplifying control. From the management
	From the security perspective resetting is preferable but copying is	perspective copying is preferable. From the security perspective
	now considered acceptable given the bandwidth of a 2-bit covert	resetting is preferable but copying is now considered acceptable
	channel can be managed.	given the bandwidth of a 2-bit covert channel can be managed.

	Therefore an outer encapsulating header capable of carrying	Therefore an outer encapsulating header capable of carrying
	congestion markings SHOULD reflect accumulated congestion since the	congestion markings SHOULD reflect accumulated congestion since the
	last interface designed to regulate load (the Load Regulator). This	last interface designed to regulate load (the Load Regulator). This
	implies congestion notification SHOULD be copied into the outer	implies congestion notification SHOULD be copied into the outer

	header of each new encapsulating header that supports it--except at	header of each new encapsulating header that supports it.
	an in-path Load Regulator. An in-path Load Regulator knows its
	function is to regulate load, so if it also acts as the ingress to a	We have said that a tunnel ingress SHOULD (as opposed to MUST) copy
	tunnel, in every new outer header it creates it MUST reset any	incoming congestion notification into an outer encapsulating header
	congestion marking.	that supports it. In the case of 2-bit ECN, the IETF security area
		has deemed the benefit always outweighs the risk. Therefore for
		2-bit ECN we can and we will say 'MUST' (Section 5). But in this
		section where we are setting down general design principles, we leave
		it as a 'SHOULD'. This allows for future multi-bit congestion
		notification fields where the risk from the covert channel created by
		copying congestion notification might outweigh the congestion control
		benefit of copying.

	The Load Regulator is the node to which congestion feedback should be	The Load Regulator is the node to which congestion feedback should be

	returned by the next downstream node with a transport layer function	returned by the next downstream node with a transport layer feedback
	(typically but not always the data receiver). The Load Regulator is	function (typically but not always the data receiver). The Load
	not always (or even typically) the same thing as the node identified	Regulator is not always (or even typically) the same thing as the
	by the source address of the outermost exposed header. In general	node identified by the source address of the outermost exposed
	the addressing of the outermost encapsulation header says nothing	header. In general the addressing of the outermost encapsulation
	about the identifiers of either the upstream or the downstream	header says nothing about the identifiers of either the upstream or
	transport layer functions. As long as the transport functions know	the downstream transport layer functions. As long as the transport
	each other's addresses, they don't have to be identified in the	functions know each other's addresses, they don't have to be
	network layer or in any link layer. It was only a convenience that a	identified in the network layer or in any link layer. It was only a
	TCP receiver assumed that the address of the source transport is the	convenience that a TCP receiver assumed that the address of the
	same as the network layer source address of a packet it receives.	source transport is the same as the network layer source address of
		an IP packet it receives.


	More generally, the return transport address could be identified	More generally, the return transport address for feedback could be
	solely in the transport layer protocol. For instance, a signalling	identified solely in the transport layer protocol. For instance, a
	protocol like RSVP [RFC2205] breaks up a path into transport layer	signalling protocol like RSVP [RFC2205] breaks up a path into
	hops and informs each hop of the address of its transport layer	transport layer hops and informs each hop of the address of its
	neighbour without any need to identify these hops in the network	transport layer neighbour without any need to identify these hops in
	layer. RSVP can be arranged so that these transport layer hops are	the network layer. RSVP can be arranged so that these transport
	bigger than the underlying network layer hops. The host identity	layer hops are bigger than the underlying network layer hops. The
	protocol (HIP) architecture [RFC4423] also supports the same	host identity protocol (HIP) architecture [RFC4423] also supports the
	principled separation (for mobility amongst other things), where the	same principled separation (for mobility amongst other things), where
	transport layer receiver identifies the transport layer sender using	the transport layer sender identifies its transport address for
	an identifier provided by the transport layer, which gets mapped to a	feedback to be sent to, using an identifier provided by a shim below
	network layer address below the transport layer.	the transport layer.


	Note that this principle deliberately doesn't require a packet header	Keeping to this layering principle deliberately doesn't require a
	to reveal the origin address of the baseline that congestion	network layer packet header to reveal the origin address from where
	notification has accumulated from. It is not necessary for the	congestion notification accumulates (its Congestion Baseline). It is
	network and lower layers to know the address of the Load Regulator.	not necessary for the network and lower layers to know the address of
	Only the destination transport needs to know that. With congestion	the Load Regulator. Only the destination transport needs to know
	notification, the network and link layers only notify congestion	that. With forward congestion notification, the network and link
	forwards, they aren't involved in feeding it backwards. If they are,	layers only notify congestion forwards; they aren't involved in
	e.g. backward congestion notification (BCN) in Ethernet [802.1au],	feeding it backwards. If they are (e.g. backward congestion
	that should be considered as a transport function added to the lower	notification (BCN) in Ethernet [IEEE802.1au] or EFCI in ATM
	layer, which must sort out its own addressing. Indeed, this is one	[ITU-T.I.371]), that should be considered as a transport function
	reason why ICMP source quench is now deprecated [RFC1254]; when	added to the lower layer, which must sort out its own addressing.
	congestion occurs within a tunnel it is complex (particularly in the	Indeed, this is one reason why ICMP source quench is now deprecated
	case of IPsec tunnels) to return the ICMP messages beyond the tunnel	[RFC1254]; when congestion occurs within a tunnel it is complex
	ingress back to the Load Regulator .	(particularly in the case of IPsec tunnels) to return the ICMP
		messages beyond the tunnel ingress back to the Load Regulator.

	Similarly, if a management system is monitoring congestion and needs	Similarly, if a management system is monitoring congestion and needs

	to know the baseline of congestion notification, the management	to know the Congestion Baseline, the management system has to find
	system has to find this out from the transport; in general it cannot	this out from the transport; in general it cannot tell solely by
	tell solely by looking at the network or link layer headers.	looking at the network or link layer headers.


	We have said that a tunnel ingress that is not a Load Regulator	4.1. Design Guidelines for New Encapsulations of Congestion
	SHOULD (as opposed to MUST) copy incoming congestion notification	Notification
	into an outer encapsulating header that supports it. In the case of
	2-bit ECN, the IETF security area have deemed the benefit always	The following guidelines are for specifications of new schemes for
	outweighs the risk. Therefore for 2-bit ECN we can and we will say	encapsulating congestion notification (e.g. for specialised Diffserv
	'MUST' (Section 5). But in this section where we are setting down	PHBs in IP, or for lower layer technologies):
	general design principles, we leave it as a 'SHOULD'. This allows
	for future multi-bit congestion notification fields where the risk	1. Congestion notification in outer headers SHOULD be relative to a
	from the covert channel created by copying congestion notification	Congestion Baseline at the node expected to regulate the load on
	might outweigh the congestion control benefit of copying.	the link in question (the Load Regulator). This implies incoming
		congestion notifications from the higher layer SHOULD be copied
		into encapsulating headers. This guideline is particularly
		important where outer headers might cross trust boundaries, but
		less important otherwise.

		2. Congestion notification MUST NOT simply be copied from outer
		headers to the forwarded header on decapsulation. The forwarded
		congestion notification field SHOULD be calculated from the inner
		and outer headers, taking account of the following, in the order
		given:

		1. If the inner header does not support congestion notification,
		or indicates that the transport does not support congestion
		notification, any explicit congestion notifications in the
		outer header will not be understood if propagated further, so
		if the only way to indicate congestion to onward nodes is to
		drop the packet, it MUST be dropped.

		2. If the outer header does not support explicit congestion
		notification, but the inner header does, the inner header
		SHOULD be forwarded unchanged.

		3. Congestion indications may be ranked by strength. For
		instance no congestion would be the weakest indication, with
		possibly increasing levels of congestion given increasingly
		stronger indications.

		4. Where the inner and outer headers carry indications of
		congestion of different strengths, the stronger indication
		SHOULD be forwarded in preference to the weaker. Obviously,
		if the strengths in both inner and outer are the same, the
		same strength should be forwarded.

		5. If the outer header carries a weaker indication of congestion
		than the inner, it MAY be appropriate to raise a warning, as
		this would be in illegal combination if Guideline Paragraph 1
		had been followed.

		3. Where framing boundaries are different between the two layers,
		congestion indications SHOULD be propagated on the basis that a
		congestion indication in a packet or frame applies to all the
		octets in the frame/packet. On average, a tunnel endpoint SHOULD
		approximately preserve the number of marked octets arriving and
		leaving. An algorithm for spreading congestion indications over
		multiple smaller `fragments' SHOULD propagate congestion
		indications as soon as they arrive, and SHOULD NOT hold them back
		for later frames.

		4. Assumptions on incremental deployment MUST be stated.

		Regarding incremental deployment, the Per-Domain ECT Checking
		of[RFC5129] is a good example to follow. In this example, header
		space in the lower layer protocol (MPLS) was extremely limited, so no
		ECN-capable transport codepoint was added to the MPLS header.
		Interior nodes in a domain were allowed to set explicit congestion
		indications without checking whether the frame was destined for a
		transport that would understand them. This was made safe by
		emphasising repeatedly that all the decapsulating edges of a whole
		domain had to be upgraded at once, so there would always be a check
		that the higher layer transport was ECN-capable on decapsulation. If
		the decapsulator discovered that the higher layer showed the
		transport would not understand ECN, it dropped the packet on behalf
		of the earlier congestion node (see Guideline Paragraph 2.1).

		Note that such a deployment strategy that assumes a savvy operator
		was only appropriate because MPLS is targeted solely at professional
		operators. This strategy would not be appropriate for other link
		technologies (e.g. Ethernet) targeted at deployment by the general
		public.

	5. Default ECN Tunnelling Rules	5. Default ECN Tunnelling Rules

	The following ECN tunnel processing rules are the default for a	The following ECN tunnel processing rules are the default for a

	packet with any DSCP. If required, different ECN processing rules	packet with any DSCP. If required, different ECN encapsulation rules
	MAY be defined for the appropriate Diffserv PHB using the guidelines	MAY be defined as part of the definition of an appropriate Diffserv
	in Section 4.	PHB using the guidelines in Section 4. However, the burden of
		handling exceptional PHBs in implementations of all affected tunnels
		and lower layer link protocols should not be underestimated.


	When a tunnel ingress creates an encapsulating IP header, the 2-bit	A tunnel ingress compliant with this specification MUST copy the
	ECN field of the inner IP header MUST be copied into the outer IP	2-bit ECN field of the arriving IP header into the outer
	header, for all types of IP in IP tunnel (except if the tunnel	encapsulating IP header, for all types of IP in IP tunnel. This
	ingress is in compatibility mode--see Section 6). If the tunnel	encapsulation behaviour MUST only be used if the tunnel ingress is in
	ingress is also a Load Regulator, it MUST instead reset the outer	`normal state'. A `compatibility state' with a different
	header to ECT(0).	encapsulation behaviour is also specified in Section 6 for backward
		compatibility with legacy tunnel egresses that do not understand ECN.


	To decapsulate the inner header at the tunnel egress, the outgoing	To decapsulate the inner header at the tunnel egress, a compliant
	inner header MUST be calculated from the combination of the incoming	tunnel egress MUST set the outgoing ECN field to the codepoint at the
	inner and outer headers setting the outgoing ECN field to the	intersection of the appropriate incoming inner header (row) and outer
	codepoints displayed in the body of Table 1.	header (column) in Table 1.

	+--Incoming Outer Header---	+--Incoming Outer Header---


	+--------------------+---------+------------+-----------+-----------+	+---------------------+---------+-----------+-----------+-----------+
	\| Incoming Inner \| Not-ECT \| ECT(0) \| ECT(1) \| CE \|	\| Incoming Inner \| Not-ECT \| ECT(0) \| ECT(1) \| CE \|
	\| Header \| \| \| \| \|	\| Header \| \| \| \| \|

	+--------------------+---------+------------+-----------+-----------+	+---------------------+---------+-----------+-----------+-----------+
	\| Not-ECT \| Not-ECT \| drop (!!!) \| drop(!!!) \| drop(!!!) \|	\| Not-ECT \| Not-ECT \| drop (!!!) \| drop(!!!) \| drop(!!!) \|
	\| ECT(0) \| ECT(0) \| ECT(0) \| ECT(0) \| CE \|	\| ECT(0) \| ECT(0) \| ECT(0) \| ECT(0) \| CE \|
	\| ECT(1) \| ECT(1) \| ECT(1) \| ECT(1) \| CE \|	\| ECT(1) \| ECT(1) \| ECT(1) \| ECT(1) \| CE \|

	\| CE \| CE \| CE (!!!) \| CE (!!!) \| CE \|	\| CE \| CE \| CE \| CE (!!!) \| CE \|
	+--------------------+---------+------------+-----------+-----------+	+---------------------+---------+-----------+-----------+-----------+

	+-----Outgoing Header------	+-----Outgoing Header------

	Table 1: IP in IP Decapsulation	Table 1: IP in IP Decapsulation

	The exclamation marks '(!!!)' in Table 1 indicate that this	The exclamation marks '(!!!)' in Table 1 indicate that this
	combination of inner and outer headers should not be possible if only	combination of inner and outer headers should not be possible if only
	legal transitions have taken place. So, the decapsulator should drop	legal transitions have taken place. So, the decapsulator should drop
	or mark the ECN field as the table specifies, but it MAY also raise	or mark the ECN field as the table specifies, but it MAY also raise
	an appropriate alarm. It MUST NOT raise an alarm so often that the	an appropriate alarm. It MUST NOT raise an alarm so often that the
	illegal combinations would amplify into a flood of alarm messages.	illegal combinations would amplify into a flood of alarm messages.

	6. Backward Compatibility	6. Backward Compatibility


	A legacy tunnel egress may not know how to process an ECN field, so	Note: in RFC3168, a tunnel was in one of two modes: limited
	it will most likely simply disregard all outer headers. Therefore,	functionality or full functionality. Rather than working with modes
	unless a compliant tunnel ingress has established that the tunnel	of the tunnel as a whole, this specification uses the term `state' to
	egress understands ECN processing, it MUST only send packets with the	refer separately to the state of each tunnel end point, which is how
	ECN field set to Not-ECT in the outer header. Otherwise, if ECN	implementations have to work.
	capable outer headers were sent towards a legacy egress, it would
	dangerously remove information about congestion experienced within
	the tunnel.


	A tunnel ingress may establish whether its tunnel egress will	If one end of an IPsec tunnel is compliant with [RFC4301], the other
	understand ECN processing by configuration or by negotiation. Note	end can be guaranteed to also be [RFC4301] compliant (there could be
	that a [RFC4301] tunnel ingress that has used IKEv2 key management	corner cases where manual keying is used, but they will be ignored
	[RFC4306] can guarantee that the tunnel egress is also RFC4301-	here). So there is no backward compatibility problem with IKEv2
	compliant and therefore need not negotiate ECN capabilities.	RFC4301 IPsec tunnels. But once we extend our scope to any IP in IP
		tunnel, we have to cater for the possibility that a legacy tunnel
		egress may not know how to process an ECN field, so if ECN capable
		outer headers were sent towards a legacy (e.g. [RFC2003]) egress, it
		would most likely simply disregard the outer headers, dangerously
		discarding information about congestion experienced within the
		tunnel. ECN-capable traffic sources would not see any congestion
		feedback and instead continually ratchet up their share of the
		bandwidth without realising that cross-flows from other ECN sources
		were continually having to ratchet down.

	To be compliant with this specification a tunnel ingress that does	To be compliant with this specification a tunnel ingress that does

	not know the egress ECN capability (e.g. by configuration) MUST	not always know the ECN capability of its tunnel egress MUST
	implement a 'normal' mode and a 'compatibility' mode, and it MUST	implement a 'normal' state and a 'compatibility' state, and it MUST
	initiate each negotiated tunnel in compatibility mode. On the other	initiate each negotiated tunnel in the compatibility state.
	hand, a compliant tunnel egress MUST merely implement the one
	behaviour in Section 5, which we term 'full-functionality' mode.


	Before switching to normal mode, a compliant tunnel ingress that does	However, a tunnel ingress can be compliant even if it only implements
	not know the egress ECN capability (e.g. by configuration) MUST	the 'normal state' of encapsulation behaviour, but only as long as it
	negotiate with the tunnel egress to establish whether the egress is	is designed or configured so that all possible tunnel egress nodes it
	in full functionality mode. If the egress is in full functionality	will ever talk to will have full ECN functionality (RFC3168 full
	mode, the ingress puts itself into normal mode. In normal mode the	functionality mode, RFC4301 and this present specification). The
	ingress follows the encapsulation rule in Section 5 (i.e. it copies	`normal state' is that defined in Section 5 (i.e. header copying).
	the inner ECN field into the outer header). If the egress is not in	Note that a [RFC4301] tunnel ingress that has used IKEv2 key
	full-functionality mode or doesn't understand the question, the	management [RFC4306] can guarantee that its tunnel egress is also
	tunnel ingress MUST remain in compatibility mode.	RFC4301-compliant and therefore need not further negotiate ECN
		capabilities.


	A tunnel ingress in compatibility mode MUST set all outer headers to	Before switching to normal state, a compliant tunnel ingress that
	Not-ECT.	does not know the egress ECN capability MUST negotiate with the
		tunnel egress. If the egress says it is in full functionality state
		(or mode), the ingress puts itself into normal state. In normal
		state the ingress follows the encapsulation rule in Section 5 (i.e.
		header copying). If the egress says it is not in full-functionality
		state/mode or doesn't understand the question, the tunnel ingress
		MUST remain in compatibility state.

		A tunnel ingress in compatibility state MUST set all outer headers to
		Not-ECT. This is the same per packet behaviour as the ingress end of
		RFC3168's limited functionality mode.

		A tunnel ingress that only implements compatibility state is at least
		safe with the ECN behaviour of any egress it may encounter (any of
		RFC2003, RFC2401, either mode of RFC2481 and RFC3168's limited
		functionality mode). But an ingress cannot claim compliance with
		this specification simply by disabling ECN processing across the
		tunnel. A compliant tunnel ingress MUST at least implement `normal
		state' and, if it might be used with arbitrary tunnel egress nodes,
		it MUST also implement `compatibility state'.

		A compliant tunnel egress on the other hand merely needs to implement
		the one behaviour in Section 5, which we term 'full-functionality'
		state, as it is the same as the egress end of the full-functionality
		mode of [RFC3168]. It is also the same as the [RFC4301] egress
		behaviour.

	The decapsulation rules for the egress of the tunnel in Section 5	The decapsulation rules for the egress of the tunnel in Section 5
	have been defined in such a way that congestion control will still	have been defined in such a way that congestion control will still
	work safely if any of the earlier versions of ECN processing are used	work safely if any of the earlier versions of ECN processing are used

	unilaterally at the encapsulating ingress of the tunnel. If a tunnel	unilaterally at the encapsulating ingress of the tunnel (any of
	ingress tries to negotiate to use limited functionality mode or full	RFC2003, RFC2401, either mode of RFC2481, either mode of RFC3168,
	functionality mode, a decapsulating tunnel egress compliant with this	RFC4301 and this present specification). If a tunnel ingress tries
	specification MUST agree to the request, even though its behaviour	to negotiate to use limited functionality mode or full functionality
	will be the same in both cases. For 'forward compatibility', a	mode [RFC3168], a decapsulating tunnel egress compliant with this
	compliant tunnel egress MUST raise a warning about any requests to	specification MUST agree to either request, as its behaviour will be
	enter modes it doesn't recognise, but it can continue operating. If	the same in both cases.
	no ECN-related mode is requested, no error or warning need be raised
	as the egress behaviour is compatible with all the legacy ingress
	behaviours that don't negotiate capabilities.


	Note that if a compliant node is the ingress for multiple tunnels, a	For 'forward compatibility', a compliant tunnel egress SHOULD raise a
	mode setting will need to be stored for each tunnel ingress.	warning about any requests to enter states or modes it doesn't
	However, if a node is the egress for multiple tunnels, none of the	recognise, but it can continue operating. If no ECN-related state or
	tunnels will need to store a mode setting, because a compliant egress	mode is requested, a compliant tunnel egress need not raise an error
	can only be in one mode.	or warning as its egress behaviour is compatible with all the legacy
		ingress behaviours that don't negotiate capabilities.

		Implementation note: if a compliant node is the ingress for multiple
		tunnels, a state setting will need to be stored for each tunnel
		ingress. However, if a node is the egress for multiple tunnels, none
		of the tunnels will need to store a state setting, because a
		compliant egress can only be in one state.

	7. Changes from Earlier RFCs	7. Changes from Earlier RFCs


	The rule that a tunnel ingress MUST copy any ECN field into the outer	The rule that a normal state tunnel ingress MUST copy any ECN field
	header is a change to RFC3168 (unless it is a Load Regulator as well,	into the outer header is a change to the ingress behaviour of
	in which case there is no change).	RFC3168, but it is the same as the rules for IPsec tunnels in
		RFC4301.

	The rules for calculating the outgoing ECN field on decapsulation at	The rules for calculating the outgoing ECN field on decapsulation at
	a tunnel egress are in line with the full functionality mode of ECN	a tunnel egress are in line with the full functionality mode of ECN

	in RFC3168 and with RFC4301, except that neither identified the need	in RFC3168 and with RFC4301, except that neither identified that an
	to raise an alarm if the inner header was CE but the outer header was	outer header of ECT(1) combined with an inner header of CE was an
	ECT.	illegal combination.

	The rules for how a tunnel establishes whether the egress has full	The rules for how a tunnel establishes whether the egress has full
	functionality ECN capabilities are an update to RFC3168. For all the	functionality ECN capabilities are an update to RFC3168. For all the
	typical cases, RFC4301 is not updated by the ECN capability check in	typical cases, RFC4301 is not updated by the ECN capability check in
	this specification, because a typical RFC4301 tunnel ingress will	this specification, because a typical RFC4301 tunnel ingress will
	have already established that it is talking to an RFC4301 tunnel	have already established that it is talking to an RFC4301 tunnel
	egress (e.g. if it uses IKEv2). However, there may be some corner	egress (e.g. if it uses IKEv2). However, there may be some corner
	cases (e.g. manual keying) where an RFC4301 tunnel ingress talks with	cases (e.g. manual keying) where an RFC4301 tunnel ingress talks with

	an egress with limited functionality ECN handling. For such corner	an egress with limited functionality ECN handling. Strictly, for
	cases, the requirement to use compatibility mode in this	such corner cases, the requirement to use compatibility mode in this
	specification updates RFC4301.	specification updates RFC4301.

	The optional ECN Tunnel field in the IPsec security association	The optional ECN Tunnel field in the IPsec security association
	database (SAD) and the optional ECN Tunnel Security Association	database (SAD) and the optional ECN Tunnel Security Association
	Attribute defined in RFC3168 are no longer needed. The security	Attribute defined in RFC3168 are no longer needed. The security
	association (SA) has no policy on ECN usage, because all RFC4301	association (SA) has no policy on ECN usage, because all RFC4301
	tunnels now support ECN without any policy choice.	tunnels now support ECN without any policy choice.

	RFC3168 defines a (required) limited functionality mode and an	RFC3168 defines a (required) limited functionality mode and an
	(optional) full functionality mode for a tunnel, but RFC4301 doesn't	(optional) full functionality mode for a tunnel, but RFC4301 doesn't
	need modes. In this specification only the ingress might need two	need modes. In this specification only the ingress might need two

	modes, unlike the modes of RFC3168 that were properties of the pair	states: a normal state (required) and a compatibility state (required
	of tunnel endpoints after negotiation.	in some scenarios, optional in others). The egress needs only full-
		functionality state which handles ECN the same as either mode of
		RFC3168 or RFC4301.


	All these ECN processing rules update RFC2003 on IP in IP tunnelling.	Additional changes to the RFC Index (to be removed by the RFC Editor):

		In the RFC index, RFC3168 should be identified as an update to
		RFC2003 and RFC4301 should be identified as an update to RFC3168.

		This specification updates RFC3168. It also suggests a minor
		optional warning and a corner-case change to RFC4301, but these don't
		really count as an update.

	8. IANA Considerations	8. IANA Considerations

	This memo includes no request to IANA.	This memo includes no request to IANA.

	9. Security Considerations	9. Security Considerations

	Section 3.1 discusses the security constraints imposed on ECN tunnel	Section 3.1 discusses the security constraints imposed on ECN tunnel
	processing. The Design Principles of Section 4 trade-off between	processing. The Design Principles of Section 4 trade-off between
	security (covert channels) and congestion monitoring & control. In	security (covert channels) and congestion monitoring & control. In
	fact, ensuring congestion markings are not lost is itself another	fact, ensuring congestion markings are not lost is itself another
	aspect of security, because if we allowed congestion notification to	aspect of security, because if we allowed congestion notification to
	be lost, any attempt to enforce a response to congestion would be	be lost, any attempt to enforce a response to congestion would be
	much harder.	much harder.


	We keep the behaviour defined in both RFC3168 and RFC4301 where, if	If alternate congestion notification semantics are defined for a
	the inner and outer headers carry contradictory ECT values the inner	certain PHB (e.g. the pre-congestion notification architecture
	header is preserved for onward forwarding. However, in writing this	[I-D.ietf-pcn-architecture]), the scope of the alternate semantics
	document we noticed this behaviour would hide illegal suppression of	might typically be bounded by the limits of a Diffserv region or
	congestion notification from the detection mechanism designed for	regions, as envisaged in [RFC4774]. The inner headers in tunnels
	this attack. One reason two ECT codepoints were defined was to	crossing the boundary of such a Diffserv region but ending within the
	enable the source to detect if a CE marking had been applied then	region can potentially leak the external congestion notification
	subsequently removed. The source could detect this by weaving a	semantics into the region, or leak the internal semantics out of the
	pseudo-random sequence of ECT(0) and ECT(1) values into a stream of	region. [RFC2983] discusses the need for Diffserv traffic
	packets [RFC3540]. With the rules as they stand in RFC3168 and	conditioning to be applied at these tunnel endpoints as if they are
	RFC4301, within a tunnel a CE marking could be added and subsequently	at the edge of the Diffserv region. Similar concerns apply to any
	removed by a non-compliant node without detection, because the	processing or propagation of the ECN field at the edges of a Diffserv
	evidence of such misbehaviour is removed by the decapsulator.	region with alternate ECN semantics. Such edge processing must also
		be applied at the endpoints of tunnels with ends both inside and
		outside the domain. [I-D.ietf-pcn-architecture] gives specific
		advice on this for the PCN case, but other definitions of alternate
		semantics will need to discuss the specific security implications in
		their case.


	We could have specified that an outer header value of ECT should	With the rules as they stand in RFC3168 and RFC4301, a small part of
	overwrite a contradictory ECT value in the inner header to close this	the protection of the ECN nonce [RFC3540] is compromised. One reason
	loophole. But we chose not to for two reasons: i) we wanted to avoid	two ECT codepoints were defined was to enable the data source to
	any changes to IPsec tunnelling behaviour; ii) allowing ECT values in	detect if a CE marking had been applied then subsequently removed.
	the outer header to override the inner header would have increased	The source could detect this by weaving a pseudo-random sequence of
	the bandwidth of the covert channel through the egress gateway from 1	ECT(0) and ECT(1) values into a stream of packets, which is termed an
	to 1.5 bit per datagram, potentially threatening to upset the	ECN nonce. By the decapsulation rules in RFC3168 and RFC4301, if the
	consensus established in the security area that says that the	inner and outer headers carry contradictory ECT values only the inner
	bandwidth of this covert channel can now be safely managed.	header is preserved for onward forwarding. So if a CE marking added
		to the outer ECN field has been illegally (or accidentally)
		suppressed by a subsequent node in the tunnel, the decapsulator will
		revert the ECN field to its value before tampering, hiding all
		evidence of the crime from the onward feedback loop. To close this
		loophole, we could have specified that an outer header value of ECT
		should overwrite a contradictory ECT value in the inner header (for
		how, see the ideal decapsulation rules proposed in Appendix C). But
		currently we choose to keep the 'broken' behaviour defined in RFC3168
		& RFC4301 for all the following reasons:

		1. We wanted to avoid any changes to IPsec tunnelling behaviour;

		2. Allowing ECT values in the outer header to override the inner
		header would have increased the bandwidth of the covert channel
		through the egress gateway from 1 to 1.5 bit per datagram,
		potentially threatening to upset the consensus established in the
		security area that says that the bandwidth of this covert channel
		can now be safely managed;

		3. This loophole is only applicable in the corner case where the
		attacker is a network node downstream of a congested node in the
		same tunnel;

		4. In tunnelling scenarios, the ECN nonce is already vulnerable to
		suppression by nodes downstream of a congested node in the same
		tunnel, if they can copy the ECT value in the inner header to the
		outer header (any node in the tunnel can do this if the inner
		header is not encrypted, and an IPsec tunnel egress can do it
		whether or not the tunnel is encrypted);

		5. Although the 'broken' decapsulation behaviour removes evidence of
		congestion suppression from the onward feedback loop, the
		decapsulator itself can at least detect that congestion within
		the tunnel has been suppressed;

		6. The ECN nonce [RFC3540] currently has experimental status and
		there has been no evidence that anyone has implemented it beyond
		the author's prototype.

		If a legacy security policy configures a legacy tunnel ingress to
		negotiate to turn off ECN processing, a compliant tunnel egress will
		agree to a request to turn off ECN processing but it will actually
		still copy CE markings from the outer to the forwarded header.
		Although the tunnel ingress 'I' in Figure 1 will set all ECN fields
		in outer headers to Not-ECT, 'M' could still toggle CE on and off to
		communicate covertly with 'B', because we have specified that 'E'
		only has one mode regardless of what mode it says it has negotiated.
		We could have specified that 'E' should have a limited functionality
		mode and check for such behaviour. But we decided not to add the
		extra complexity of two modes on a compliant tunnel egress merely to
		cater for a legacy security concern that is now considered
		manageable.

	10. Conclusions	10. Conclusions


	This document updates the tunnelling treatment of RFC3168 ECN for all	This document updates the ingress tunnelling encapsulation of RFC3168
	IP in IP tunnels to bring it into line with the new behaviour in the	ECN for all IP in IP tunnels to bring it into line with the new
	IPsec architecture of RFC4301.	behaviour in the IPsec architecture of RFC4301.


	At the tunnel egress, header decapsulation for the default ECN	At a tunnel egress, header decapsulation for the default ECN marking
	marking behaviour is broadly unchanged except that one exceptional	behaviour is broadly unchanged except that one exceptional case has
	case has been catered for. At the ingress, for all forms of IP in IP	been catered for. At the ingress, for all forms of IP in IP tunnel,
	tunnel, encapsulation has been brought into line with the new IPsec	encapsulation has been brought into line with the new IPsec rules in
	rules in RFC4301 which copy rather than reset CE markings when	RFC4301 which copy rather than reset CE markings when creating outer
	creating outer headers. Previously, upstream congestion information	headers.
	was not revealed in the outer header, which limited the scope of some
	management monitoring techniques and prevented certain active queue	This change to encapsulation has been motivated by analysis from the
	management algorithms from taking account of upstream congestion	three perspectives of security, control and management. They are
	markings. The change ensures all IP in IP tunnels reflect the more	somewhat in tension as to whether a tunnel ingress should copy
	relaxed attitude to revealing congestion information in the new IPsec	congestion markings into the outer header it creates or reset them.
	architecture, which now deems that the threat from 2-bit covert	From the control perspective either copying or resetting works for
	channels can be managed without disabling ECN.	existing arrangements, but copying has more potential for simplifying
		control and resetting breaks at least one proposal already on the
		standards track. From the management and monitoring perspective
		copying is preferable. From the network security perspective (theft
		of service etc) copying is preferable. From the information security
		perspective resetting is preferable, but the IETF Security Area now
		considers copying acceptable given the bandwidth of a 2-bit covert
		channel can be managed. Therefore there are no points against
		copying and a number against resetting CE on ingress.

		The change ensures ECN processing in all IP in IP tunnels reflects
		this slightly more permissive attitude to revealing congestion
		information in the new IPsec architecture. Once all tunnelling of
		ECN works the same, ECN markings will have a defined meaning when
		measured at any point in a network. This new certainty will enable
		new uses of the ECN field that would otherwise be confounded by
		ambiguity.

	Also, this document defines more generic principles to guide the	Also, this document defines more generic principles to guide the
	design of alternate forms of tunnel processing of congestion	design of alternate forms of tunnel processing of congestion

	notification, if required for specific Diffserv PHBs (such as will be	notification, if required for specific Diffserv PHBs or for other
	required for the PCN working group) or for other lower layer	lower layer encapsulating protocols that might support congestion
	encapsulating protocols that might support congestion notification in	notification in the future.
	the future (e.g. MPLS).

	11. Acknowledgements	11. Acknowledgements


	Thanks to David Black, Bruce Davie, Toby Moncaster and Gabriele	Thanks to David Black for explaining a better way to think about
	Corliano for their careful review comments.	function placement and to Louise Burness for a better way to think
		about multilayer transports and networks, having read
		[Patterns_Arch]. Also thanks to Arnaud Jacquet for ideas behind the
		algorithms in Appendix B. Thanks to Bruce Davie, Toby Moncaster,
		Gorry Fairhurst, Sally Floyd, Alfred Hoenes and Gabriele Corliano for
		their thoughts and careful review comments.

	12. Comments Solicited	12. Comments Solicited

	Comments and questions are encouraged and very welcome. They can be	Comments and questions are encouraged and very welcome. They can be
	addressed to the IETF Transport Area working group mailing list	addressed to the IETF Transport Area working group mailing list
	<tsvwg@ietf.org>, and/or to the authors.	<tsvwg@ietf.org>, and/or to the authors.

	13. References	13. References

	13.1. Normative References	13.1. Normative References

	skipping to change at page 16, line 14	skipping to change at page 23, line 14

	[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition	[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
	of Explicit Congestion Notification (ECN) to IP",	of Explicit Congestion Notification (ECN) to IP",
	RFC 3168, September 2001.	RFC 3168, September 2001.

	[RFC4301] Kent, S. and K. Seo, "Security Architecture for the	[RFC4301] Kent, S. and K. Seo, "Security Architecture for the
	Internet Protocol", RFC 4301, December 2005.	Internet Protocol", RFC 4301, December 2005.

	13.2. Informative References	13.2. Informative References


	[802.1au] "IEEE Standard for Local and Metropolitan Area Networks--	[I-D.eardley-pcn-marking-behaviour]
	Virtual Bridged Local Area Networks - Amendment 10:	Eardley, P., "Marking behaviour of PCN-nodes",
	Congestion Notification", 2006,	draft-eardley-pcn-marking-behaviour-01 (work in progress),
	<http://www.ieee802.org/1/pages/802.1au.html>.	June 2008.


	(Work in Progress; Access Controlled link within page)	[I-D.ietf-pcn-architecture]
		Eardley, P., "Pre-Congestion Notification Architecture",
		draft-ietf-pcn-architecture-03 (work in progress),
		February 2008.


	[BBnet] Sexton, M. and A. Reid, "Broadband Networking: {ATM},	[I-D.ietf-pwe3-congestion-frmwk]
	{SDH} and {SONET}", Artech House telecommunications	Bryant, S., Davie, B., Martini, L., and E. Rosen,
	library ISBN: 0-89006-578-0, 1997.	"Pseudowire Congestion Control Framework",
		draft-ietf-pwe3-congestion-frmwk-01 (work in progress),
		May 2008.


	[I-D.ietf-tsvwg-ecn-mpls]	[I-D.moncaster-pcn-3-state-encoding]
	Davie, B., "Explicit Congestion Marking in MPLS",	Moncaster, T., Briscoe, B., and M. Menth, "A three state
	draft-ietf-tsvwg-ecn-mpls-00 (work in progress),	extended PCN encoding scheme",
	March 2007.	draft-moncaster-pcn-3-state-encoding-00 (work in
		progress), June 2008.


	[I-D.rosen-pwe3-congestion]	[IEEE802.1au]
	Rosen, E., "Pseudowire Congestion Control Framework",	IEEE, "IEEE Standard for Local and Metropolitan Area
	draft-rosen-pwe3-congestion-04 (work in progress),	Networks--Virtual Bridged Local Area Networks - Amendment
	October 2006.	10: Congestion Notification", 2008,
		<http://www.ieee802.org/1/pages/802.1au.html>.


	[PCN-arch]	(Work in Progress; Access Controlled link within page)
	Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R.,
	Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion	[ITU-T.I.371]
	Notification Architecture",	ITU-T, "Traffic Control and Congestion Control in
	draft-eardley-pcn-architecture-00 (work in progress),	{B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004.
	June 2007.

	[PCNcharter]	[PCNcharter]
	IETF, "Congestion and Pre-Congestion Notification (pcn)",	IETF, "Congestion and Pre-Congestion Notification (pcn)",
	IETF w-g charter , Feb 2007,	IETF w-g charter , Feb 2007,
	<http://www.ietf.org/html.charters/pcn-charter.html>.	<http://www.ietf.org/html.charters/pcn-charter.html>.


		[Patterns_Arch]
		Day, J., "Patterns in Network Architecture: A Return to
		Fundamentals", Pub: Prentice Hall ISBN-13: 9780132252423,
		Jan 2008.

	[RFC1254] Mankin, A. and K. Ramakrishnan, "Gateway Congestion	[RFC1254] Mankin, A. and K. Ramakrishnan, "Gateway Congestion
	Control Survey", RFC 1254, August 1991.	Control Survey", RFC 1254, August 1991.

	[RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic	[RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic
	Routing Encapsulation (GRE)", RFC 1701, October 1994.	Routing Encapsulation (GRE)", RFC 1701, October 1994.

	[RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S.	[RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
	Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1	Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
	Functional Specification", RFC 2205, September 1997.	Functional Specification", RFC 2205, September 1997.

	[RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little,	[RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little,
	W., and G. Zorn, "Point-to-Point Tunneling Protocol",	W., and G. Zorn, "Point-to-Point Tunneling Protocol",
	RFC 2637, July 1999.	RFC 2637, July 1999.

	[RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,	[RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn,
	G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"",	G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"",
	RFC 2661, August 1999.	RFC 2661, August 1999.


		[RFC2983] Black, D., "Differentiated Services and Tunnels",
		RFC 2983, October 2000.

	[RFC3426] Floyd, S., "General Architectural and Policy	[RFC3426] Floyd, S., "General Architectural and Policy
	Considerations", RFC 3426, November 2002.	Considerations", RFC 3426, November 2002.

	[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit	[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
	Congestion Notification (ECN) Signaling with Nonces",	Congestion Notification (ECN) Signaling with Nonces",
	RFC 3540, June 2003.	RFC 3540, June 2003.

	[RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",	[RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
	RFC 4306, December 2005.	RFC 4306, December 2005.

	[RFC4423] Moskowitz, R. and P. Nikander, "Host Identity Protocol	[RFC4423] Moskowitz, R. and P. Nikander, "Host Identity Protocol
	(HIP) Architecture", RFC 4423, May 2006.	(HIP) Architecture", RFC 4423, May 2006.


		[RFC4774] Floyd, S., "Specifying Alternate Semantics for the
		Explicit Congestion Notification (ECN) Field", BCP 124,
		RFC 4774, November 2006.

		[RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion
		Marking in MPLS", RFC 5129, January 2008.

	[Shayman] "Using ECN to Signal Congestion Within an MPLS Domain",	[Shayman] "Using ECN to Signal Congestion Within an MPLS Domain",
	2000, <http://www.ee.umd.edu/~shayman/papers.d/	2000, <http://www.ee.umd.edu/~shayman/papers.d/
	draft-shayman-mpls-ecn-00.txt>.	draft-shayman-mpls-ecn-00.txt>.

	(Expired)	(Expired)


	Appendix A. In-path Load Regulation	Appendix A. Why resetting CE on encapsulation harms PCN

		Regarding encapsulation, the section of the PCN architecture
		[I-D.ietf-pcn-architecture] on tunnelling says that header copying
		(RFC4301) allows PCN to work correctly. However, resetting CE
		markings confuses PCN marking.

		The specific issue here concerns PCN excess rate marking
		[I-D.eardley-pcn-marking-behaviour], i.e. the bulk marking of traffic
		that exceeds a configured threshold rate. One of the goals of excess
		rate marking is to enable the speedy removal of excess admission
		controlled traffic following re-routes caused by link failures or
		other disasters. This maintains a share of the capacity for
		competing admission controlled traffic and for traffic in lower
		priority classes. After failures, traffic re-routed onto remaining
		links can often stress multiple links along a path. Therefore,
		traffic can arrive at a link under stress with some proportion
		already marked for removal by a previous link. By design, marked
		traffic will be removed by the overall system in subsequent round
		trips. So when the excess rate marking algorithm decides how much
		traffic to mark for removal, it doesn't include traffic already
		marked for removal by another node upstream (the `Excess traffic
		meter function' of [I-D.eardley-pcn-marking-behaviour]).

		However, if an RFC3168 tunnel ingress intervenes, it resets the ECN
		field in all the outer headers, hiding all the evidence of problems
		upstream. Thus, although excess rate marking works fine with RFC4301
		IPsec tunnels, with RFC3168 tunnels it typically removes large
		volumes of traffic that it didn't need to remove at all.

		Appendix B. Contribution to Congestion across a Tunnel

		This specification mandates that a tunnel ingress determines the ECN
		field of each new outer tunnel header by copying the arriving header.
		If instead the outer ECN field were reset at a tunnel ingress (as it
		was for the full functionality mode of RFC3168), it would be possible
		for the tunnel egress to measure:

		o congestion marking before the tunnel ingress (fraction of inner
		header markings, p_i);
		o congestion marking across the tunnel (fraction of outer header
		markings, p_t);

		o congestion marking after the tunnel egress (fraction of departing
		header markings, p_o).

		Although the newly mandated copying behaviour at ingress gains the
		advantages described in the body of this specification, this one
		advantage of the resetting behaviour of RFC3168 seems to have been
		lost: on first impressions, it seems that the egress can no longer
		accurately measure congestion contributed along the tunnel (p_t).
		The egress could _estimate _the contribution along the tunnel by
		measure which packets carry only a mark in the outer header (not the
		inner). But this is not precisely the same as the congestion
		contributed along the tunnel; tunnel nodes may have tried to mark
		some packets that already had a marking in both the inner and outer
		header. Measuring only additional outer markings will miss these.
		Nonetheless, with the newly proposed scheme, a tunnel egress can
		derive a precise estimate of marking introduced across a tunnel (p_t)
		as follows.

		The combined fraction of markings at the tunnel egress will be p_o =
		1 - (1 - p_i)(1 - p_t). Explanation: this is (1 - the probability a
		departing packet is not marked), which is (1 - (prob not marked
		before tunnel)(prob not marked along tunnel)). Therefore,
		rearranging, the egress can infer the fraction of marks introduced
		across the tunnel as p_t = (p_o - p_i)/(1 - p_i). If arriving
		congestion is low (p_i <<1), then the approximation p_t ~ (p_o - p_i)
		should be good enough. This is the estimate we advised originally;
		i.e. measuring only the extra markings in the outer header that are
		not present in the inner header. If a better approximation is needed
		p_t ~ (p_o - p_i)(1 + p_i), which removes the division, but still
		assumes p_i<<1.

		Using any of these formulae (including the precise one), it would be
		possible for a tunnel egress to calculate a moving average of the
		fraction of packets being marked by tunnel nodes, including those
		already marked in the inner header. Alternatively, it should even be
		possible for a tunnel egress to reverse engineer which packets would
		have been marked across the tunnel if CE was reset on ingress even if
		CE was actually copied on ingress.[[anchor3: Note from Bob: I've
		worked out an algorithm so the tunnel egress can reverse engineer
		marking as if CE was reset at the ingress even though CE was copied
		at the ingress. It typically consumes 2 cycles / pkt, occasionally 4
		and very occasionally 8. {ToDo: On testing an implementation just now
		it still has a wrinkle in it, but with a little more development I
		believe it would work well. I'll write it into the next revision if
		I get it working.}]]

		Appendix C. Ideal Decapsulation Rules

		Compliance with this appendix is NOT REQUIRED for compliance with the
		present specification.

		If the default ECN encapsulation behaviour does not offer suitable
		trade offs, procedures exist for associating a new behaviour with a
		new Diffserv PHB. However, it is unrealistic to expect vendors of
		all IPSec and all IP in IP tunnel endpoints to cater for the
		exceptional behaviour of PHB XXX. If all tunnels did require XXX-
		specific behaviour, the resulting patchy and error-prone deployment
		would probably cause XXX to suffer byzantine feature interactions
		with poorly implemented tunnels. The default rules for tunnel
		endpoints to handle both the Diffserv field and the ECN field should
		'just work' when handling packets with an XXX Diffserv codepoint.

		Given this specification requests a standards action to update the
		RFC3168 encapsulation behaviour, this appendix explores a further
		change to decapsulation that we ought to specify at the same time.
		If instead this further change is added later, it will add another
		set of backward compatibility combinations to the already complicated
		change history of ECN tunnelling.

		Multi-level congestion notification is currently on the IETF's
		standards track agenda in the Congestion and Pre-Congestion
		Notification (PCN) working group. The PCN working group requires
		three congestion states (not marked and two levels of congestion
		marking) [I-D.ietf-pcn-architecture]. The aim is for the first level
		of marking to stop admitting new traffic and the second level to
		terminate sufficient existing flows to bring a network back to its
		operating point after a serious failure.

		Although the ECN field gives sufficient codepoints for these three
		states, the PCN working group cannot use them in case any tunnel
		decapsulations occur within a PCN region. If a node in a tunnel sets
		the ECN field to ECT(0) or ECT(1), this change will be discarded by a
		tunnel egress compliant with RFC4301 and RFC3168. This can be seen
		in Table 1, where the ECT values in the outer header are ignored
		unless the inner header is the same. Effectively the ECT(0) and
		ECT(1) codepoints have to be treated as just one codepoint when they
		could otherwise have been used for their intended purpose of
		congestion notification. Instead, the PCN w-g has had to propose
		using extra Diffserv codepoint(s) to encode the extra states
		[I-D.moncaster-pcn-3-state-encoding], using up the rapidly exhausting
		DSCP space while leaving ECN codepoints unused.

		Although this is currently most pressing for the PCN working group,
		the issue is more general. Under Security Considerations (Section 9)
		it has already been explained that a data sender cannot use the
		experimental ECN nonce [RFC3540] to detect suppression of congestion
		notification along a tunnel.

		More generally, the currently standardised tunnel decapsulation
		behaviour unnecessarily wastes a quarter of two bits (i.e. half a
		bit) in the IP (v4 & v6) header. As explained in Section 3.1, the
		original reason for not copying down outer ECT codepoints for onward
		forwarding was to limit the covert channel across a decapsulator to 1
		bit per packet. However, now that the IETF Security Area has deemed
		that a 2-bit covert channel through an encapsulator is a manageable
		risk, the same should be true for a decapsulator.

		Table 2 proposes a more ideal layered decapsulation behaviour. Note:
		this table is only to support discussion. It is not currently
		proposed for standards action. The only difference from Table 1
		(that is proposed for standards action), is the swapping of the cells
		highlighted as ECT(X).

		+--Incoming Outer Header---

		+---------------------+---------+-----------+-----------+-----------+
		\| Incoming Inner \| Not-ECT \| ECT(0) \| ECT(1) \| CE \|
		\| Header \| \| \| \| \|
		+---------------------+---------+-----------+-----------+-----------+
		\| Not-ECT \| Not-ECT \| drop(!!!) \| drop(!!!) \| drop(!!!) \|
		\| ECT(0) \| ECT(0) \| ECT(0) \| ECT(1) \| CE \|
		\| ECT(1) \| ECT(1) \| ECT(0) \| ECT(1) \| CE \|
		\| CE \| CE \| CE \| CE (!!!) \| CE \|
		+---------------------+---------+-----------+-----------+-----------+

		+-----Outgoing Header------

		Table 2: Ideal IP in IP Decapsulation (currently NOT REQUIRED)

		Note that, if this ideal proposal were taken up, extra backwards
		compatibility issues would have to be resolved.

		Appendix D. Non-Dependence of Tunnelling on In-path Load Regulation

		We have said that at any point in a network, the Congestion Baseline
		(where congestion notification starts from zero) should be the
		previous upstream Load Regulator. We have also said that the ingress
		of an IP in IP tunnel must copy congestion indications to the
		encapsulating outer headers it creates. If the Load Regulator is in-
		path rather than at the source, and also a tunnel ingress, these two
		requirements seem to be contradictory. A tunnel ingress must not
		reset incoming congestion, but a Load Regulator must be the
		Congestion Baseline, implying it needs to reset incoming congestion.

		In fact, the two requirements are not contradictory, because a Load
		Regulator and a tunnel ingress are functions within a node that occur
		in sequence on a stream of packets, not at the same point. Figure 3
		is borrowed from [RFC2983] (which was making a similar point about
		the location of Diffserv traffic conditioning relative to the
		encapsulation function of a tunnel). An in-path Load Regulator can
		act on packets either at [1 - Before] encapsulation or at [2 - Outer]
		after encapsulation. Load Regulation does not ever need to be
		integrated with the [Encapsulate] function (but it can be for
		efficiency). Therefore we can still maintain that the [Encapsulate]
		function always copies CE into the outer header.

		>>-----[1 - Before]--------[Encapsulate]----[3 - Inner]------------>>
		\
		\
		+--------[2 - Outer]--------->>

		Figure 3: Placement of In-Path Load Regulator Relative to Tunnel
		Ingress

		Then separately, if there is a Load Regulator at location [2 -
		Outer], it might reset CE to ECT(0), say. Then the Congestion
		Baseline for the lower layer (outer) will be [2 - Outer], while the
		Congestion Baseline of the inner layer will be unchanged. But how
		encapsulation works has nothing to do with whether a Load Regulator
		is present or where it is.

		If on the other hand a Load Regulator resets CE at [1 - Before], the
		Congestion Baseline of both the inner and outer headers will be [1 -
		Before]. But again, encapsulation is independent of load regulation.

		D.1. Dependence of In-Path Load Regulation on Tunnelling

		Although encapsulation doesn't need to depend on in-path load
		regulation, the reverse is not true. The placement of an in-path
		Load Regulator must be carefully considered relative to
		encapsulation. Some examples are given in the following for
		guidance.

	In the traditional Internet architecture one tends to think of the	In the traditional Internet architecture one tends to think of the
	source host as the Load Regulator for a path. It is generally not	source host as the Load Regulator for a path. It is generally not
	desirable or practical for a node part way along the path to regulate	desirable or practical for a node part way along the path to regulate
	the load. However, various reasonable proposals for in-path load	the load. However, various reasonable proposals for in-path load
	regulation have been made from time to time (e.g. fair queuing,	regulation have been made from time to time (e.g. fair queuing,

	traffic engineering). Also the IETF has recently chartered a working	traffic engineering, flow admission control). The IETF has recently
	group to standardise admission control across a part of a path using	chartered a working group to standardise admission control across a
	pre-congestion notification (PCN) [PCNcharter], which involves in-	part of a path using pre-congestion notification (PCN) [PCNcharter].
	path load regulation. This is of particular relevance here because	This is of particular relevance here because it involves congestion
	it involves congestion notification with an in-path Load Regulator	notification with an in-path Load Regulator, it can involve
	and it can involve tunnelling.	tunnelling and it certainly involves encapsulation more generally.


	We will use the more complex scenario in Figure 3 to tease out all	We will use the more complex scenario in Figure 4 to tease out all
	the issues that arise when combining congestion notification and	the issues that arise when combining congestion notification and
	tunnelling with various possible in-path load regulation schemes. In	tunnelling with various possible in-path load regulation schemes. In
	this case 'I1' and 'E2' break up the path into three separate	this case 'I1' and 'E2' break up the path into three separate
	congestion control loops. The feedback for these loops is shown	congestion control loops. The feedback for these loops is shown
	going right to left across the top of the figure. The 'V's are arrow	going right to left across the top of the figure. The 'V's are arrow
	heads representing the direction of feedback, not letters. But there	heads representing the direction of feedback, not letters. But there
	are also two tunnels within the middle control loop: 'I1' to 'E1' and	are also two tunnels within the middle control loop: 'I1' to 'E1' and
	'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS	'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS
	core networks. M is a congestion monitoring point, perhaps between	core networks. M is a congestion monitoring point, perhaps between
	two border routers where the same tunnel continues unbroken across	two border routers where the same tunnel continues unbroken across
	the border.	the border.
	______ _______________________________________ _____	______ _______________________________________ _____
	/ \ / \ / \	/ \ / \ / \
	V \ V M \ V \	V \ V M \ V \
	A--->R--->I1===========>E1----->I2=========>==========>E2------->B	A--->R--->I1===========>E1----->I2=========>==========>E2------->B


	Figure 3: complex Tunnel Scenario	Figure 4: complex Tunnel Scenario

	The question is, should the congestion markings in the outer exposed	The question is, should the congestion markings in the outer exposed
	headers of a tunnel represent congestion only since the tunnel	headers of a tunnel represent congestion only since the tunnel
	ingress or over the whole upstream path from the source of the inner	ingress or over the whole upstream path from the source of the inner
	header (whatever that may mean)? Or put another way, should 'I1' and	header (whatever that may mean)? Or put another way, should 'I1' and
	'I2' copy or reset CE markings?	'I2' copy or reset CE markings?


	The answer is that the baseline of congestion marking should be the	Based on the design principles in Section 4, the answer is that the
	nearest upstream interface designed to regulate traffic load--the	Congestion Baseline should be the nearest upstream interface designed
	Load Regulator. In Figure 3 'A', 'I1' or 'E2' are all Load	to regulate traffic load--the Load Regulator. In Figure 4 'A', 'I1'
	Regulators. We have shown the feedback loops returning to each of	or 'E2' are all Load Regulators. We have shown the feedback loops
	these nodes so that they can regulate the load causing the congestion	returning to each of these nodes so that they can regulate the load
	notification. So the baseline for congestion markings exposed to M	causing the congestion notification. So the Congestion Baseline
	should be 'I1' (the Load Regulator), not 'I2'. That is, 'I2' SHOULD	exposed to M should be 'I1' (the Load Regulator), not 'I2'.
	copy any CE marking into the outer header it creates, while 'I1' is	Therefore I1 should reset any arriving CE markings. In this case,
	an exception because it is an in-path load regulator, so it should	'I1' knows the tunnel to 'E1' is unrelated to its load regulation
	reset the ECN field in the outer header it creates.	function. So the load regulation function within 'I1' should be
		placed at [1 - Before] tunnel encapsulation within 'I1' (using the
		terminology of Figure 3). Then the Congestion Baseline all across
		the networks from 'I1' to 'E2' in both inner and outer headers will
		be 'I1'.

	The following further examples illustrate how this answer might be	The following further examples illustrate how this answer might be
	applied:	applied:


	o Preemption marking is currently defined for PCN [PCN-arch] so that	o We argued in Appendix A that resetting CE on encapsulation could
	the rate of unmarked packets at the end of a path of multiple	harm PCN excess rate marking, which marks excess traffic for
	bottlenecks determines the maximum sustainable aggregate bit rate	removal in subsequent round trips. This marking relies on not
	over that path. To produce the correct marking by the end, each	marking packets if another node upstream has already marked them
	congested node must only consider packets to be eligible for	for removal. If there were a tunnel ingress between the two which
	marking if they have not already been marked by any previous	reset CE markings, it would confuse the downstream node into
	bottleneck along a path that may span multiple tunnels (including	marking far too much traffic for removal. So why do we say that
	MPLS encapsulations etc.). This scheme only results in the	'I1' should reset CE, while a tunnel ingress shouldn't? The
	correct marking rate if the markings accumulated so far along the	answer is that it is the Load Regulator function at 'I1' that is
	path are copied into the outer exposed header of each tunnel or	resetting CE, not the tunnel encapsulator. The Load Regulator
	encapsulation. Consider that 'I1' and 'E2' in the complex	needs to set itself as the Congestion Baseline, so the feedback it
	scenario of Figure 3 are edge gateways of a PCN region. Admission	gets will only be about congestion on links it can relieve itself
	control based on PCN measurements is a form of load regulation, so	by regulating the load into them. When it resets CE markings, it
	'I1' regulates the load on the PCN region. Therefore 'I1' should	knows that something else upstream will have dealt with the
	be the baseline of congestion marking for _both_ tunnels within	congestion notifications it removes, given it is part of an end-
	the scope of its feedback loop. Therefore 'I2' should follow the	to-end admission control signalling loop. It therefore knows that
	normal rules and copy congestion marking into the outer tunnel	previous hops will be covered by other Load Regulators.
	header, while 'I1' is an exception because it is also a load	Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should
	regulator, so it should reset CE markings in the outer header.	follow the new rule for any tunnel ingress and copy congestion
		marking into the outer tunnel header. The ingress at 'I1' will
		happen to copy headers that have already been reset just
		beforehand. But it doesn't need to know that.

	o [Shayman] suggested feedback of ECN accumulated across an MPLS	o [Shayman] suggested feedback of ECN accumulated across an MPLS
	domain could cause the ingress to trigger re-routing to mitigate	domain could cause the ingress to trigger re-routing to mitigate
	congestion. This case is more like the simple scenario of	congestion. This case is more like the simple scenario of
	Figure 2, with a feedback loop across the MPLS domain ('E' back to	Figure 2, with a feedback loop across the MPLS domain ('E' back to

	'I'). The baseline for congestion exposed in outer headers in	'I'). I is a Load Regulator because re-routing around congestion
	this case will be the tunnel ingress, which should therefore reset	is a load regulation function. But in this case 'I' should only
	the ECN field in the outer headers it creates. But the reason it	reset itself as the Congestion Baseline in outer headers, as it is
	should act as the baseline is because it is an in-path load	not handling congestion outside its domain, so it must preserve
	regulator (re-routing around congestion is a load regulation	the end-to-end congestion feedback loop for something else to
	function), not just because it is a tunnel ingress.	handle (probably the data source). Therefore the Load Regulator
		within 'I' should be placed at [2 - Outer] to reset CE markings
		just after the tunnel ingress has copied them from arriving
		headers. Again, the tunnel encapsulation function at 'I' simply
		copies incoming headers, unaware that the load regulator will
		subsequently reset its outer headers.

	o The PWE3 working group of the IETF is considering the problem of	o The PWE3 working group of the IETF is considering the problem of

	how and whether an aggregate private wire emulation should respond	how and whether an aggregate edge-to-edge pseudo-wire emulation
	to congestion [I-D.rosen-pwe3-congestion]. Although the study is	should respond to congestion [I-D.ietf-pwe3-congestion-frmwk].
	still at the requirements stage, some (controversial) solution	Although the study is still at the requirements stage, some
	proposals include in-path load regulation at the ingress to the	(controversial) solution proposals include in-path load regulation
	tunnel that could lead to tunnel arrangements with similar	at the ingress to the tunnel that could lead to tunnel
	complexity to that of Figure 3.	arrangements with similar complexity to that of Figure 4.

	These are not contrived scenarios--they could be a lot worse. For	These are not contrived scenarios--they could be a lot worse. For
	instance, a host may create a tunnel for IPsec which is placed inside	instance, a host may create a tunnel for IPsec which is placed inside
	a tunnel for Mobile IP over a remote part of its path. And around	a tunnel for Mobile IP over a remote part of its path. And around
	this all we may have MPLS labels being pushed and popped as packets	this all we may have MPLS labels being pushed and popped as packets
	pass across different core networks. Similarly, it is possible that	pass across different core networks. Similarly, it is possible that

	subnets could be built from link technology (e.g. ethernet switches)	subnets could be built from link technology (e.g. future Ethernet
	so that link headers being added and removed could involve congestion	switches) so that link headers being added and removed could involve
	notification in future link headers with all the same issues as with	congestion notification in future Ethernet link headers with all the
	IP in IP tunnels.	same issues as with IP in IP tunnels.


	The reason we introduced the concept of a Load Regulator was to allow	One reason we introduced the concept of a Load Regulator was to allow
	for in-path load regulation. In the traditional Internet	for in-path load regulation. In the traditional Internet
	architecture one tends to think of a host and a Load Regulator as	architecture one tends to think of a host and a Load Regulator as
	synonymous, but when considering tunnelling, even the definition of a	synonymous, but when considering tunnelling, even the definition of a
	host is too fuzzy, whereas a Load Regulator is a clearly defined	host is too fuzzy, whereas a Load Regulator is a clearly defined
	function. Similarly, the concept of innermost header is too fuzzy to	function. Similarly, the concept of innermost header is too fuzzy to
	be able to (wrongly) say that the source address of the innermost	be able to (wrongly) say that the source address of the innermost

	header should be the baseline. Which is the innermost header when	header should be the Congestion Baseline. Which is the innermost
	multiple encapsulations may be in use? Where do we stop? If we say	header when multiple encapsulations may be in use? Where do we stop?
	the original source in the above IPsec-Mobile IP case is the host,	If we say the original source in the above IPsec-Mobile IP case is
	how do we know it isn't tunnelling an encrypted packet stream on	the host, how do we know it isn't tunnelling an encrypted packet
	behalf of another host in a p2p network?	stream on behalf of another host in a p2p network?


	The reason there has been so much confusion over the question of	We have become used to thinking that only hosts regulate load. The
	whether a tunnel ingress should copy or reset CE markings is that we	end to end design principle advises that this is a good idea
	have become used to thinking that only hosts regulate load. The end	[RFC3426], but it also advises that it is solely a guiding principle
	to end design principle advises that this is a good idea [RFC3426],	intended to make the designer think very carefully before breaking
	but it also advises that it is only a guiding principle intended to	it. We do have proposals where load regulation functions sit within
	make the designer think very carefully before breaking it. We do	a network path for good, if sometimes controversial, reasons, e.g.
	have proposals where load regulation functions sit within a network	PCN edge admission control gateways [I-D.ietf-pcn-architecture] or
	path for good, if sometimes controversial, reasons, e.g. PCN edge	traffic engineering functions at domain borders to re-route around
	admission control gateways [PCN-arch] or traffic engineering	congestion [Shayman]. Whether or not we want in-path load
	functions at domain borders to re-route around congestion [Shayman].	regulation, we have to work round the fact that it will not go away.

	Author's Address	Author's Address

	Bob Briscoe	Bob Briscoe
	BT	BT
	B54/77, Adastral Park	B54/77, Adastral Park
	Martlesham Heath	Martlesham Heath
	Ipswich IP5 3RE	Ipswich IP5 3RE
	UK	UK

	Phone: +44 1473 645196	Phone: +44 1473 645196
	Email: bob.briscoe@bt.com	Email: bob.briscoe@bt.com
	URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/	URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/

	Full Copyright Statement	Full Copyright Statement


	Copyright (C) The IETF Trust (2007).	Copyright (C) The IETF Trust (2008).

	This document is subject to the rights, licenses and restrictions	This document is subject to the rights, licenses and restrictions
	contained in BCP 78, and except as set forth therein, the authors	contained in BCP 78, and except as set forth therein, the authors
	retain all their rights.	retain all their rights.

	This document and the information contained herein are provided on an	This document and the information contained herein are provided on an
	"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS	"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
	OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND	OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
	THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS	THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
	OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF	OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF

	skipping to change at page 21, line 45	skipping to change at page 34, line 45
	such proprietary rights by implementers or users of this	such proprietary rights by implementers or users of this
	specification can be obtained from the IETF on-line IPR repository at	specification can be obtained from the IETF on-line IPR repository at
	http://www.ietf.org/ipr.	http://www.ietf.org/ipr.

	The IETF invites any interested party to bring to its attention any	The IETF invites any interested party to bring to its attention any
	copyrights, patents or patent applications, or other proprietary	copyrights, patents or patent applications, or other proprietary
	rights that may cover technology that may be required to implement	rights that may cover technology that may be required to implement
	this standard. Please address the information to the IETF at	this standard. Please address the information to the IETF at
	ietf-ipr@ietf.org.	ietf-ipr@ietf.org.


	Acknowledgments	Acknowledgment


	Funding for the RFC Editor function is provided by the IETF	This document was produced using xml2rfc v1.33 (of
	Administrative Support Activity (IASA). This document was produced	http://xml.resource.org/) from a source in RFC-2629 XML format.
	using xml2rfc v1.32 (of http://xml.resource.org/) from a source in
	RFC-2629 XML format.

End of changes. 90 change blocks.
	471 lines changed or deleted	1061 lines changed or added
This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/