Diff: draft-briscoe-tsvwg-ecn-tunnel-01.txt - draft-ietf-tsvwg-ecn-tunnel-00.txt

	draft-briscoe-tsvwg-ecn-tunnel-01.txt		draft-ietf-tsvwg-ecn-tunnel-00.txt

	Transport Area Working Group B. Briscoe		Transport Area Working Group B. Briscoe
	Internet-Draft BT		Internet-Draft BT

	Intended status: Standards Track July 14, 2008		Intended status: Standards Track Oct 16, 2008
	Expires: January 15, 2009		Expires: April 19, 2009

	Layered Encapsulation of Congestion Notification		Layered Encapsulation of Congestion Notification

	draft-briscoe-tsvwg-ecn-tunnel-01		draft-ietf-tsvwg-ecn-tunnel-00

	Status of this Memo		Status of this Memo

	By submitting this Internet-Draft, each author represents that any		By submitting this Internet-Draft, each author represents that any
	applicable patent or other IPR claims of which he or she is aware		applicable patent or other IPR claims of which he or she is aware
	have been or will be disclosed, and any of which he or she becomes		have been or will be disclosed, and any of which he or she becomes
	aware will be disclosed, in accordance with Section 6 of BCP 79.		aware will be disclosed, in accordance with Section 6 of BCP 79.

	Internet-Drafts are working documents of the Internet Engineering		Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF), its areas, and its working groups. Note that		Task Force (IETF), its areas, and its working groups. Note that

	skipping to change at page 1, line 34		skipping to change at page 1, line 34
	and may be updated, replaced, or obsoleted by other documents at any		and may be updated, replaced, or obsoleted by other documents at any
	time. It is inappropriate to use Internet-Drafts as reference		time. It is inappropriate to use Internet-Drafts as reference
	material or to cite them other than as "work in progress."		material or to cite them other than as "work in progress."

	The list of current Internet-Drafts can be accessed at		The list of current Internet-Drafts can be accessed at
	http://www.ietf.org/ietf/1id-abstracts.txt.		http://www.ietf.org/ietf/1id-abstracts.txt.

	The list of Internet-Draft Shadow Directories can be accessed at		The list of Internet-Draft Shadow Directories can be accessed at
	http://www.ietf.org/shadow.html.		http://www.ietf.org/shadow.html.


	This Internet-Draft will expire on January 15, 2009.		This Internet-Draft will expire on April 19, 2009.

	Abstract		Abstract

	This document redefines how the explicit congestion notification		This document redefines how the explicit congestion notification
	(ECN) field of the outer IP header of a tunnel should be constructed.		(ECN) field of the outer IP header of a tunnel should be constructed.
	It brings all IP in IP tunnels (v4 or v6) into line with the way		It brings all IP in IP tunnels (v4 or v6) into line with the way
	IPsec tunnels now construct the ECN field. It includes a thorough		IPsec tunnels now construct the ECN field. It includes a thorough
	analysis of the reasoning for this change and the implications. It		analysis of the reasoning for this change and the implications. It
	also gives guidelines on the encapsulation of IP congestion		also gives guidelines on the encapsulation of IP congestion
	notification by any outer header, whether encapsulated in an IP		notification by any outer header, whether encapsulated in an IP
	tunnel or in a lower layer header. Following these guidelines should		tunnel or in a lower layer header. Following these guidelines should
	help interworking, if the IETF or other standards bodies specify any		help interworking, if the IETF or other standards bodies specify any
	new encapsulation of congestion notification.		new encapsulation of congestion notification.

	Table of Contents		Table of Contents


	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3		1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
	1.1. The Need for Rationalisation . . . . . . . . . . . . . . . 4		1.1. The Need for Rationalisation . . . . . . . . . . . . . . . 5
	1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 5		1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 6
	1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6		1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 7
	2. Requirements Language . . . . . . . . . . . . . . . . . . . . 8		2. Requirements Language . . . . . . . . . . . . . . . . . . . . 8
	3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 8		3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 8
	3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 8		3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 8
	3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 10		3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 10

	3.3. Management Constraints . . . . . . . . . . . . . . . . . . 11		3.3. Management Constraints . . . . . . . . . . . . . . . . . . 12
	4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 12		4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 12
	4.1. Design Guidelines for New Encapsulations of Congestion		4.1. Design Guidelines for New Encapsulations of Congestion

	Notification . . . . . . . . . . . . . . . . . . . . . . . 13		Notification . . . . . . . . . . . . . . . . . . . . . . . 14
	5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 15		5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 15
	6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 16		6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 16
	7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 18		7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 18
	8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19		8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19
	9. Security Considerations . . . . . . . . . . . . . . . . . . . 19		9. Security Considerations . . . . . . . . . . . . . . . . . . . 19
	10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 21		10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 21
	11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22		11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22

	12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 22		12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 23
	13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22		13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23
	13.1. Normative References . . . . . . . . . . . . . . . . . . . 22		13.1. Normative References . . . . . . . . . . . . . . . . . . . 23
	13.2. Informative References . . . . . . . . . . . . . . . . . . 23		13.2. Informative References . . . . . . . . . . . . . . . . . . 23
	Appendix A. Why resetting CE on encapsulation harms PCN . . . . . 25		Appendix A. Why resetting CE on encapsulation harms PCN . . . . . 25

	Appendix B. Contribution to Congestion across a Tunnel . . . . . 25		Appendix B. Contribution to Congestion across a Tunnel . . . . . 26
	Appendix C. Ideal Decapsulation Rules . . . . . . . . . . . . . . 27		Appendix C. Ideal Decapsulation Rules . . . . . . . . . . . . . . 27
	Appendix D. Non-Dependence of Tunnelling on In-path Load		Appendix D. Non-Dependence of Tunnelling on In-path Load

	Regulation . . . . . . . . . . . . . . . . . . . . . 28		Regulation . . . . . . . . . . . . . . . . . . . . . 29
	D.1. Dependence of In-Path Load Regulation on Tunnelling . . . 29		D.1. Dependence of In-Path Load Regulation on Tunnelling . . . 30
	Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 32		Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 33
	Intellectual Property and Copyright Statements . . . . . . . . . . 34		Intellectual Property and Copyright Statements . . . . . . . . . . 34

	Changes from previous drafts (to be removed by the RFC Editor)		Changes from previous drafts (to be removed by the RFC Editor)


			From briscoe-01 to ietf-00 (current):

			* Re-wrote Appendix B giving much simpler technique to measure
			contribution to congestion across a tunnel.

			* Added discussion of backward compatibility of the ideal
			decapsulation scheme in Appendix C

			* Updated references. Minor corrections & clarifications
			throughout.

	From -00 to -01:		From -00 to -01:

	* Related everything conceptually to the uniform and pipe models		* Related everything conceptually to the uniform and pipe models
	of RFC2983 on Diffserv Tunnels, and completely removed the		of RFC2983 on Diffserv Tunnels, and completely removed the
	dependence of tunnelling behaviour on the presence of any in-		dependence of tunnelling behaviour on the presence of any in-
	path load regulation by using the [1 - Before] [2 - Outer]		path load regulation by using the [1 - Before] [2 - Outer]

	function placement concepts from RFC2983.		function placement concepts from RFC2983;


	* Added specifc cases where the existing standards limit new		* Added specific cases where the existing standards limit new
	proposals.		proposals, particularly Appendix A;

	* Added sub-structure to Introduction (Need for Rationalisation,		* Added sub-structure to Introduction (Need for Rationalisation,
	Roadmap), added new Introductory subsection on "Scope" and		Roadmap), added new Introductory subsection on "Scope" and

	improved clarity		improved clarity;

	* Added Design Guidelines for New Encapsulations of Congestion		* Added Design Guidelines for New Encapsulations of Congestion

	Notification		Notification (Section 4.1);

	* Considerably clarified the Backward Compatibility section		* Considerably clarified the Backward Compatibility section

			(Section 6);

	* Considerably extended the Security Considerations section		* Considerably extended the Security Considerations section

			(Section 9);


	* Summarised the primary rationale much better in the conclusions		* Summarised the primary rationale much better in the
			conclusions;


	* Added numerous extra acknowledgements		* Added numerous extra acknowledgements;

	* Added Appendix A. "Why resetting CE on encapsulation harms		* Added Appendix A. "Why resetting CE on encapsulation harms
	PCN", Appendix B. "Contribution to Congestion across a Tunnel"		PCN", Appendix B. "Contribution to Congestion across a Tunnel"

	and Appendix C. "Ideal Decapsulation Rules"		and Appendix C. "Ideal Decapsulation Rules";


	* Changed Appendix A "In-path Load Regulation" to "Non-Dependence		* Re-wrote Appendix D, explaining how tunnel encapsulation no
	of Tunnelling on In-path Load Regulation" and added sub-section		longer depends on in-path load-regulation (changed title from
	on "Dependence of In-Path Load Regulation on Tunnelling"		"In-path Load Regulation" to "Non-Dependence of Tunnelling on
			In-path Load Regulation"), but explained how an in-path load
			regulation function must be carefully placed with respect to
			tunnel encapsulation (in a new sub-section entitled "Dependence
			of In-Path Load Regulation on Tunnelling").

	1. Introduction		1. Introduction

	This document redefines how the explicit congestion notification		This document redefines how the explicit congestion notification
	(ECN) field [RFC3168] of the outer IP header of a tunnel should be		(ECN) field [RFC3168] of the outer IP header of a tunnel should be
	constructed. It brings all IP in IP tunnels (v4 or v6) into line		constructed. It brings all IP in IP tunnels (v4 or v6) into line
	with the way IPsec tunnels [RFC4301] now construct the ECN field,		with the way IPsec tunnels [RFC4301] now construct the ECN field,
	ensuring that the outer header reveals any congestion experienced so		ensuring that the outer header reveals any congestion experienced so
	far on the whole path, not just since the last tunnel ingress.		far on the whole path, not just since the last tunnel ingress.


	skipping to change at page 5, line 38		skipping to change at page 6, line 9
	makes it harder to design networks and new protocols that work		makes it harder to design networks and new protocols that work
	predictably.		predictably.

	Already complicated constraints have had to be added to a standards		Already complicated constraints have had to be added to a standards
	track congestion marking proposal. The section of the pre-congestion		track congestion marking proposal. The section of the pre-congestion
	notification (PCN) architecture [I-D.ietf-pcn-architecture] on		notification (PCN) architecture [I-D.ietf-pcn-architecture] on
	tunnelling says PCN works correctly in the presence of RFC4301 IPsec		tunnelling says PCN works correctly in the presence of RFC4301 IPsec
	encapsulation (and RFC5129 MPLS encapsulation). However it doesn't		encapsulation (and RFC5129 MPLS encapsulation). However it doesn't
	work with RFC3168 IP in IP encapsulation (Appendix A explains why).		work with RFC3168 IP in IP encapsulation (Appendix A explains why).


	Section 3 assesses further security, control and management functions		To ensure we do not cause any unintended side-effects, Section 3
	that cannot be achieved in each case (resetting vs copying CE		assesses whether copying or resetting CE would harm any security,
	markings). It finds that resetting CE makes life difficult in a		control or management functions. It finds that resetting CE makes
	number of directions, while copying CE harms nothing (other than		life difficult in a number of directions, while copying CE harms
	opening a low bit-rate covert channel vulnerability which the		nothing (other than opening a low bit-rate covert channel
	Security Area deems is manageable).		vulnerability which the IETF Security Area deems is manageable).

	1.2. Document Roadmap		1.2. Document Roadmap

	Most of the document gives a thorough analysis of the knock-on		Most of the document gives a thorough analysis of the knock-on
	effects of the apparently minor change to tunnel encapsulation. The		effects of the apparently minor change to tunnel encapsulation. The
	reader may jump to Section 5 if only interested in standards actions		reader may jump to Section 5 if only interested in standards actions
	impacting implementation. The whole document is organised as		impacting implementation. The whole document is organised as
	follows:		follows:

	o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to		o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to
	'switch in' different behaviours for marking the ECN field, just		'switch in' different behaviours for marking the ECN field, just
	as it switches in different per-hop behaviours (PHBs) for		as it switches in different per-hop behaviours (PHBs) for
	scheduling. Therefore we cannot only discuss the ECN protocol		scheduling. Therefore we cannot only discuss the ECN protocol

	that RFC3168 gives as a default. We need to also give guidance		that RFC3168 gives as a default. Instead, Section 3 lays out the
	for possible different marking schemes. Therefore in Section 3 we		design constraints when tunnelling congestion notification without
	lay out the design constraints when tunnelling congestion		assuming a particular congestion marking scheme.
	notification.

	o Then in Section 4 we resolve the tensions between these		o Then in Section 4 we resolve the tensions between these
	constraints to give general design principles and guidelines on		constraints to give general design principles and guidelines on
	how a tunnel should process congestion notification; principles		how a tunnel should process congestion notification; principles
	that could apply to any marking behaviour for any PHB, not just		that could apply to any marking behaviour for any PHB, not just
	the default in RFC3168. In particular, we examine the underlying		the default in RFC3168. In particular, we examine the underlying
	principles behind whether CE should be reset or copied into the		principles behind whether CE should be reset or copied into the
	outer header at the ingress to a tunnel--or indeed at the ingress		outer header at the ingress to a tunnel--or indeed at the ingress
	of any layered encapsulation of headers with congestion		of any layered encapsulation of headers with congestion
	notification fields. We end this section with a bulleted list of		notification fields. We end this section with a bulleted list of

	more design guidelines for new encapsulations of congestion		design guidelines for new encapsulations of congestion
	notification.		notification.

	o Section 5 then uses precise standards terminology to confirm the		o Section 5 then uses precise standards terminology to confirm the
	rules for the default ECN tunnelling behaviour based on the above		rules for the default ECN tunnelling behaviour based on the above
	design principles.		design principles.

	o Extending the new IPsec tunnel ingress behaviour to all IP in IP		o Extending the new IPsec tunnel ingress behaviour to all IP in IP
	tunnels requires consideration of backwards compatibility, which		tunnels requires consideration of backwards compatibility, which
	is covered in Section 6 and changes from earlier RFCs are brought		is covered in Section 6 and changes from earlier RFCs are brought
	together in Section 7.		together in Section 7.

	skipping to change at page 7, line 34		skipping to change at page 8, line 4
	As well as guiding alternate IP in IP tunnelling schemes, the design		As well as guiding alternate IP in IP tunnelling schemes, the design
	guidelines of Section 4 are intended to be followed when IP packets		guidelines of Section 4 are intended to be followed when IP packets
	are encapsulated by any connectionless datagram/packet/frame where		are encapsulated by any connectionless datagram/packet/frame where
	the outer header is designed to support a congestion notification		the outer header is designed to support a congestion notification
	capability. [RFC5129] already deals with handling ECN for IP in MPLS		capability. [RFC5129] already deals with handling ECN for IP in MPLS
	and MPLS in MPLS, and S.9.3 of [RFC3168] lists IP encapsulated in		and MPLS in MPLS, and S.9.3 of [RFC3168] lists IP encapsulated in
	L2TP [RFC2661], GRE [RFC1701] or PPTP [RFC2637] as possible examples		L2TP [RFC2661], GRE [RFC1701] or PPTP [RFC2637] as possible examples
	where ECN may be added in future.		where ECN may be added in future.

	Of course, the IETF does not have standards authority over every link		Of course, the IETF does not have standards authority over every link

	or tunnel protocol, so this document merely aims to define the		or tunnel protocol, so this document merely aims to guide the
	interface between IP ECN and lower layer congestion notification.		interface between IP ECN and lower layer congestion notification.
	Then the IETF or the relevant standards body can be free to define		Then the IETF or the relevant standards body can be free to define
	the specifics of each lower layer scheme, but a common interface		the specifics of each lower layer scheme, but a common interface
	should ensure interworking across all technologies.		should ensure interworking across all technologies.

	Note that just because there is forward congestion notification in a		Note that just because there is forward congestion notification in a
	lower layer protocol, if the lower layer has its own feedback and		lower layer protocol, if the lower layer has its own feedback and
	load regulation, there is no need to propagate it up the layers. For		load regulation, there is no need to propagate it up the layers. For
	instance, FECN (forward ECN) has been present in Frame Relay and EFCI		instance, FECN (forward ECN) has been present in Frame Relay and EFCI
	(explicit forward congestion indication) in ATM [ITU-T.I.371] for a		(explicit forward congestion indication) in ATM [ITU-T.I.371] for a

	long time, but they have been used for internal management rather		long time. But so far they have been used for internal management
	than being propagated to endpoint transports for them to control end-		rather than being propagated to endpoint transports for them to
	to-end congestion.		control end-to-end congestion.

	[RFC2983] is a comprehensive primer on differentiated services and		[RFC2983] is a comprehensive primer on differentiated services and
	tunnels. Given ECN raises similar issues to differentiated services		tunnels. Given ECN raises similar issues to differentiated services
	when interacting with tunnels, useful concepts introduced in RFC2983		when interacting with tunnels, useful concepts introduced in RFC2983
	are used throughout, with brief recaps of the explanations where		are used throughout, with brief recaps of the explanations where
	necessary.		necessary.

	2. Requirements Language		2. Requirements Language

	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",		The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",

	skipping to change at page 8, line 31		skipping to change at page 8, line 49
	Information security can be assured by using various end to end		Information security can be assured by using various end to end
	security solutions (including IPsec in transport mode [RFC4301]), but		security solutions (including IPsec in transport mode [RFC4301]), but
	a commonly used scenario involves the need to communicate between two		a commonly used scenario involves the need to communicate between two
	physically protected domains across the public Internet. In this		physically protected domains across the public Internet. In this
	case there are certain management advantages to using IPsec in tunnel		case there are certain management advantages to using IPsec in tunnel
	mode solely across the publicly accessible part of the path. The		mode solely across the publicly accessible part of the path. The
	path followed by a packet then crosses security 'domains'; the ones		path followed by a packet then crosses security 'domains'; the ones
	protected by physical or other means before and after the tunnel and		protected by physical or other means before and after the tunnel and
	the one protected by an IPsec tunnel across the otherwise unprotected		the one protected by an IPsec tunnel across the otherwise unprotected
	domain. We will use the scenario in Figure 1 where endpoints 'A' and		domain. We will use the scenario in Figure 1 where endpoints 'A' and

	'B' communicate through a tunnel with ingress 'I' and egress 'E'		'B' communicate through a tunnel. The tunnel ingress 'I' and egress
	within physically protected edge domains across an unprotected		'E' are within physically protected edge domains, while the tunnel
	internetwork where there may be 'men in the middle', M.		spans an unprotected internetwork where there may be 'men in the
			middle', M.

	physically unprotected physically		physically unprotected physically
	<-protected domain-><--domain--><-protected domain->		<-protected domain-><--domain--><-protected domain->
	+------------------+ +------------------+		+------------------+ +------------------+
	\| \| M \| \|		\| \| M \| \|
	\| A-------->I=========>==========>E-------->B \|		\| A-------->I=========>==========>E-------->B \|
	\| \| \| \|		\| \| \| \|
	+------------------+ +------------------+		+------------------+ +------------------+
	<----IPsec secured---->		<----IPsec secured---->
	tunnel		tunnel

	skipping to change at page 9, line 23		skipping to change at page 9, line 42
	of the inner header. And if 'E' copies these fields from the outer		of the inner header. And if 'E' copies these fields from the outer
	header to the inner, even if it validates authentication from 'I', it		header to the inner, even if it validates authentication from 'I', it
	will have allowed a covert channel from 'M' to 'B'.		will have allowed a covert channel from 'M' to 'B'.

	ECN at the IP layer is designed to carry information about congestion		ECN at the IP layer is designed to carry information about congestion
	from a congested resource towards downstream nodes. Typically a		from a congested resource towards downstream nodes. Typically a
	downstream transport might feed the information back somehow to the		downstream transport might feed the information back somehow to the
	point upstream of the congestion that can regulate the load on the		point upstream of the congestion that can regulate the load on the
	congested resource, but other actions are possible (see [RFC3168]		congested resource, but other actions are possible (see [RFC3168]
	S.6). In terms of the above unicast scenario, ECN is typically		S.6). In terms of the above unicast scenario, ECN is typically

	intended to create an information channel from 'M' to 'B', for 'B' to		intended to create an information channel from 'M' to 'B' (for 'B' to
	forward to 'A'. Therefore the goals of IPsec and ECN are mutually		feed back to 'A'). Therefore the goals of IPsec and ECN are mutually
	incompatible.		incompatible.

	With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says,		With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says,
	"controls are provided to manage the bandwidth of this [covert]		"controls are provided to manage the bandwidth of this [covert]
	channel". Using the ECN processing rules of RFC4301, the channel		channel". Using the ECN processing rules of RFC4301, the channel
	bandwidth is two bits per datagram from 'A' to 'M' and one bit per		bandwidth is two bits per datagram from 'A' to 'M' and one bit per
	datagram from 'M' to 'A' (because 'E' limits the combinations of the		datagram from 'M' to 'A' (because 'E' limits the combinations of the
	2-bit ECN field that it will copy). In both cases the covert channel		2-bit ECN field that it will copy). In both cases the covert channel
	bandwidth is further reduced by noise from any real congestion		bandwidth is further reduced by noise from any real congestion
	marking. RFC4301 therefore implies that these covert channels are		marking. RFC4301 therefore implies that these covert channels are

	skipping to change at page 10, line 12		skipping to change at page 10, line 30
	by copying into the outer header on encapsulation and copying from		by copying into the outer header on encapsulation and copying from
	the outer header on decapsulation.		the outer header on decapsulation.

	The pipe model: where the outer header is independent of that in the		The pipe model: where the outer header is independent of that in the
	inner header so it hides the Diffserv field of the inner header		inner header so it hides the Diffserv field of the inner header
	from any interaction with nodes along the tunnel.		from any interaction with nodes along the tunnel.

	However, for ECN, the new IPsec security architecture in RFC4301 only		However, for ECN, the new IPsec security architecture in RFC4301 only
	standardised one tunnelling model equivalent to the uniform model.		standardised one tunnelling model equivalent to the uniform model.
	It deemed that simplicity was more important than allowing		It deemed that simplicity was more important than allowing

	administrators the option of a tiny increment in security especially		administrators the option of a tiny increment in security, especially
	given not copying congestion indications could seriously harm		given not copying congestion indications could seriously harm
	everyone's network service.		everyone's network service.

	3.2. Control Constraints		3.2. Control Constraints

	Congestion control requires that any congestion notification marked		Congestion control requires that any congestion notification marked
	into packets by a resource will be able to traverse a feedback loop		into packets by a resource will be able to traverse a feedback loop

	back to a node capable of controlling the load on that resource. To		back to a function capable of controlling the load on that resource.
	be precise, rather than calling this node the data source, we will		To be precise, rather than calling this function the data source, we
	call it the Load Regulator. This will allow us to deal with		will call it the Load Regulator. This will allow us to deal with
	exceptional cases where load is not regulated by the data source, but		exceptional cases where load is not regulated by the data source, but

	usually the two terms will be synonymous. Note the term "a node		usually the two terms will be synonymous. Note the term "a function
	_capable of_ controlling the load" deliberately includes a source		_capable of_ controlling the load" deliberately includes a source
	application that doesn't actually control the load but ought to (e.g.		application that doesn't actually control the load but ought to (e.g.
	an application without congestion control that uses UDP).		an application without congestion control that uses UDP).

	A--->R--->I=========>M=========>E-------->B		A--->R--->I=========>M=========>E-------->B

	Figure 2: Simple Tunnel Scenario		Figure 2: Simple Tunnel Scenario

	We now consider a similar tunnelling scenario to the IPsec one just		We now consider a similar tunnelling scenario to the IPsec one just
	described, but without the different security domains so we can just		described, but without the different security domains so we can just

	skipping to change at page 11, line 14		skipping to change at page 11, line 37
	congestion occurred across a tunnel or upstream of it. If outer		congestion occurred across a tunnel or upstream of it. If outer
	header congestion marking was reset by the tunnel ingress ('I'), at		header congestion marking was reset by the tunnel ingress ('I'), at
	the end of a tunnel ('E') the outer headers would indicate congestion		the end of a tunnel ('E') the outer headers would indicate congestion
	experienced across the tunnel ('I' to 'E'), while the inner header		experienced across the tunnel ('I' to 'E'), while the inner header
	would indicate congestion upstream of 'I'. But similar information		would indicate congestion upstream of 'I'. But similar information
	can be gleaned even if the tunnel ingress copies the inner to the		can be gleaned even if the tunnel ingress copies the inner to the
	outer headers. At the end of the tunnel ('E'), any packet with an		outer headers. At the end of the tunnel ('E'), any packet with an
	_extra_ mark in the outer header relative to the inner header		_extra_ mark in the outer header relative to the inner header
	indicates congestion across the tunnel ('I' to 'E'), while the inner		indicates congestion across the tunnel ('I' to 'E'), while the inner
	header would still indicate congestion upstream of ('I'). Appendix B		header would still indicate congestion upstream of ('I'). Appendix B

	gives a more precise method for inferring the congestion level		gives a simple and precise method for a tunnel egress to infer the
	introduced across a tunnel.		congestion level introduced across a tunnel.

	All this shows that 'E' can preserve the control loop irrespective of		All this shows that 'E' can preserve the control loop irrespective of
	whether 'I' copies congestion notification into the outer header or		whether 'I' copies congestion notification into the outer header or
	resets it.		resets it.

	That is the situation for existing control arrangements but, because		That is the situation for existing control arrangements but, because
	copying reveals more information, it would open up possibilities for		copying reveals more information, it would open up possibilities for
	better control system designs. For instance, Appendix A describes		better control system designs. For instance, Appendix A describes
	how resetting CE marking at a tunnel ingress confuses a proposed		how resetting CE marking at a tunnel ingress confuses a proposed
	congestion marking scheme on the standards track. It ends up		congestion marking scheme on the standards track. It ends up
	removing excessive amounts of traffic unnecessarily. Whereas copying		removing excessive amounts of traffic unnecessarily. Whereas copying
	CE markings at ingress leads to the correct control behaviour.		CE markings at ingress leads to the correct control behaviour.

	3.3. Management Constraints		3.3. Management Constraints

	As well as control, there are also management constraints.		As well as control, there are also management constraints.
	Specifically, a management system may monitor congestion markings in		Specifically, a management system may monitor congestion markings in
	passing packets, perhaps at the border between networks as part of a		passing packets, perhaps at the border between networks as part of a
	service level agreement. For instance, monitors at the borders of		service level agreement. For instance, monitors at the borders of
	autonomous systems may need to measure how much congestion has		autonomous systems may need to measure how much congestion has

	accumulated since the original source to determine between them how		accumulated since the original source, perhaps to determine between
	much of the congestion is contributed by each domain.		them how much of the congestion is contributed by each domain.

	Therefore, when monitoring the middle of a path, it should be		Therefore, when monitoring the middle of a path, it should be
	possible to establish how far back in the path congestion markings		possible to establish how far back in the path congestion markings
	have accumulated from. In this document we term this the baseline of		have accumulated from. In this document we term this the baseline of
	congestion marking (or the Congestion Baseline), i.e. the source of		congestion marking (or the Congestion Baseline), i.e. the source of
	the layer that last reset (or created) the congestion notification		the layer that last reset (or created) the congestion notification
	field. Given some tunnels cross domain borders (e.g. consider M in		field. Given some tunnels cross domain borders (e.g. consider M in
	Figure 2 is monitoring a border), it would therefore be desirable for		Figure 2 is monitoring a border), it would therefore be desirable for
	'I' to copy congestion accumulated so far into the outer headers		'I' to copy congestion accumulated so far into the outer headers
	exposed across the tunnel.		exposed across the tunnel.

	Appendix D discusses various scenarios where the Load Regulator lies		Appendix D discusses various scenarios where the Load Regulator lies
	in-path, not at the source host as we would typically expect. It		in-path, not at the source host as we would typically expect. It
	concludes that a Congestion Baseline is determined by where the Load		concludes that a Congestion Baseline is determined by where the Load
	Regulator function is, which should be identified in the transport		Regulator function is, which should be identified in the transport
	layer, not by addresses in network layer headers. This applies		layer, not by addresses in network layer headers. This applies
	whether the Load Regulator is at the source host or within the path.		whether the Load Regulator is at the source host or within the path.
	The appendix also discusses where a Load Regulator function should be		The appendix also discusses where a Load Regulator function should be

	located relative to a local encapsulation function.		located relative to a local tunnel encapsulation function.

	4. Design Principles		4. Design Principles

	The constraints from the three perspectives of security, control and		The constraints from the three perspectives of security, control and
	management in Section 3 are somewhat in tension as to whether a		management in Section 3 are somewhat in tension as to whether a
	tunnel ingress should copy congestion markings into the outer header		tunnel ingress should copy congestion markings into the outer header
	it creates or reset them. From the control perspective either		it creates or reset them. From the control perspective either
	copying or resetting works for existing arrangements, but copying has		copying or resetting works for existing arrangements, but copying has
	more potential for simplifying control. From the management		more potential for simplifying control. From the management
	perspective copying is preferable. From the security perspective		perspective copying is preferable. From the security perspective

	skipping to change at page 15, line 45		skipping to change at page 16, line 20
	2-bit ECN field of the arriving IP header into the outer		2-bit ECN field of the arriving IP header into the outer
	encapsulating IP header, for all types of IP in IP tunnel. This		encapsulating IP header, for all types of IP in IP tunnel. This
	encapsulation behaviour MUST only be used if the tunnel ingress is in		encapsulation behaviour MUST only be used if the tunnel ingress is in
	`normal state'. A `compatibility state' with a different		`normal state'. A `compatibility state' with a different
	encapsulation behaviour is also specified in Section 6 for backward		encapsulation behaviour is also specified in Section 6 for backward
	compatibility with legacy tunnel egresses that do not understand ECN.		compatibility with legacy tunnel egresses that do not understand ECN.

	To decapsulate the inner header at the tunnel egress, a compliant		To decapsulate the inner header at the tunnel egress, a compliant
	tunnel egress MUST set the outgoing ECN field to the codepoint at the		tunnel egress MUST set the outgoing ECN field to the codepoint at the
	intersection of the appropriate incoming inner header (row) and outer		intersection of the appropriate incoming inner header (row) and outer

	header (column) in Table 1.		header (column) in Figure 3.

	+--Incoming Outer Header---


			+---------------------------------------------+
			\| Incoming Outer Header \|
	+---------------------+---------+-----------+-----------+-----------+		+---------------------+---------+-----------+-----------+-----------+
	\| Incoming Inner \| Not-ECT \| ECT(0) \| ECT(1) \| CE \|		\| Incoming Inner \| Not-ECT \| ECT(0) \| ECT(1) \| CE \|
	\| Header \| \| \| \| \|		\| Header \| \| \| \| \|
	+---------------------+---------+-----------+-----------+-----------+		+---------------------+---------+-----------+-----------+-----------+
	\| Not-ECT \| Not-ECT \| drop(!!!) \| drop(!!!) \| drop(!!!) \|		\| Not-ECT \| Not-ECT \| drop(!!!) \| drop(!!!) \| drop(!!!) \|
	\| ECT(0) \| ECT(0) \| ECT(0) \| ECT(0) \| CE \|		\| ECT(0) \| ECT(0) \| ECT(0) \| ECT(0) \| CE \|
	\| ECT(1) \| ECT(1) \| ECT(1) \| ECT(1) \| CE \|		\| ECT(1) \| ECT(1) \| ECT(1) \| ECT(1) \| CE \|
	\| CE \| CE \| CE \| CE (!!!) \| CE \|		\| CE \| CE \| CE \| CE (!!!) \| CE \|
	+---------------------+---------+-----------+-----------+-----------+		+---------------------+---------+-----------+-----------+-----------+

			\| Outgoing Header \|
			+---------------------------------------------+


	+-----Outgoing Header------		Figure 3: IP in IP Decapsulation

	Table 1: IP in IP Decapsulation


	The exclamation marks '(!!!)' in Table 1 indicate that this		The exclamation marks '(!!!)' in Figure 3 indicate that this
	combination of inner and outer headers should not be possible if only		combination of inner and outer headers should not be possible if only
	legal transitions have taken place. So, the decapsulator should drop		legal transitions have taken place. So, the decapsulator should drop
	or mark the ECN field as the table specifies, but it MAY also raise		or mark the ECN field as the table specifies, but it MAY also raise
	an appropriate alarm. It MUST NOT raise an alarm so often that the		an appropriate alarm. It MUST NOT raise an alarm so often that the
	illegal combinations would amplify into a flood of alarm messages.		illegal combinations would amplify into a flood of alarm messages.

	6. Backward Compatibility		6. Backward Compatibility

	Note: in RFC3168, a tunnel was in one of two modes: limited		Note: in RFC3168, a tunnel was in one of two modes: limited
	functionality or full functionality. Rather than working with modes		functionality or full functionality. Rather than working with modes

	skipping to change at page 22, line 24		skipping to change at page 22, line 42
	design of alternate forms of tunnel processing of congestion		design of alternate forms of tunnel processing of congestion
	notification, if required for specific Diffserv PHBs or for other		notification, if required for specific Diffserv PHBs or for other
	lower layer encapsulating protocols that might support congestion		lower layer encapsulating protocols that might support congestion
	notification in the future.		notification in the future.

	11. Acknowledgements		11. Acknowledgements

	Thanks to David Black for explaining a better way to think about		Thanks to David Black for explaining a better way to think about
	function placement and to Louise Burness for a better way to think		function placement and to Louise Burness for a better way to think
	about multilayer transports and networks, having read		about multilayer transports and networks, having read

	[Patterns_Arch]. Also thanks to Arnaud Jacquet for ideas behind the		[Patterns_Arch]. Also thanks to Arnaud Jacquet for the idea for
	algorithms in Appendix B. Thanks to Bruce Davie, Toby Moncaster,		Appendix B. Thanks to Bruce Davie, Toby Moncaster, Gorry Fairhurst,
	Gorry Fairhurst, Sally Floyd, Alfred Hoenes and Gabriele Corliano for		Sally Floyd, Alfred Hoenes and Gabriele Corliano for their thoughts
	their thoughts and careful review comments.		and careful review comments.

			Bob Briscoe is partly funded by Trilogy, a research project (ICT-
			216372) supported by the European Community under its Seventh
			Framework Programme. The views expressed here are those of the
			author only.

	12. Comments Solicited		12. Comments Solicited

	Comments and questions are encouraged and very welcome. They can be		Comments and questions are encouraged and very welcome. They can be
	addressed to the IETF Transport Area working group mailing list		addressed to the IETF Transport Area working group mailing list
	<tsvwg@ietf.org>, and/or to the authors.		<tsvwg@ietf.org>, and/or to the authors.

	13. References		13. References

	13.1. Normative References		13.1. Normative References

	skipping to change at page 23, line 14		skipping to change at page 23, line 35

	[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition		[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
	of Explicit Congestion Notification (ECN) to IP",		of Explicit Congestion Notification (ECN) to IP",
	RFC 3168, September 2001.		RFC 3168, September 2001.

	[RFC4301] Kent, S. and K. Seo, "Security Architecture for the		[RFC4301] Kent, S. and K. Seo, "Security Architecture for the
	Internet Protocol", RFC 4301, December 2005.		Internet Protocol", RFC 4301, December 2005.

	13.2. Informative References		13.2. Informative References


	[I-D.eardley-pcn-marking-behaviour]
	Eardley, P., "Marking behaviour of PCN-nodes",
	draft-eardley-pcn-marking-behaviour-01 (work in progress),
	June 2008.

	[I-D.ietf-pcn-architecture]		[I-D.ietf-pcn-architecture]

	Eardley, P., "Pre-Congestion Notification Architecture",		Eardley, P., "Pre-Congestion Notification (PCN)
	draft-ietf-pcn-architecture-03 (work in progress),		Architecture", draft-ietf-pcn-architecture-07 (work in
	February 2008.		progress), September 2008.

			[I-D.ietf-pcn-marking-behaviour]
			Eardley, P., "Marking behaviour of PCN-nodes",
			draft-ietf-pcn-marking-behaviour-00 (work in progress),
			October 2008.

	[I-D.ietf-pwe3-congestion-frmwk]		[I-D.ietf-pwe3-congestion-frmwk]
	Bryant, S., Davie, B., Martini, L., and E. Rosen,		Bryant, S., Davie, B., Martini, L., and E. Rosen,
	"Pseudowire Congestion Control Framework",		"Pseudowire Congestion Control Framework",
	draft-ietf-pwe3-congestion-frmwk-01 (work in progress),		draft-ietf-pwe3-congestion-frmwk-01 (work in progress),
	May 2008.		May 2008.

	[I-D.moncaster-pcn-3-state-encoding]		[I-D.moncaster-pcn-3-state-encoding]
	Moncaster, T., Briscoe, B., and M. Menth, "A three state		Moncaster, T., Briscoe, B., and M. Menth, "A three state
	extended PCN encoding scheme",		extended PCN encoding scheme",

	skipping to change at page 25, line 16		skipping to change at page 25, line 39
	(Expired)		(Expired)

	Appendix A. Why resetting CE on encapsulation harms PCN		Appendix A. Why resetting CE on encapsulation harms PCN

	Regarding encapsulation, the section of the PCN architecture		Regarding encapsulation, the section of the PCN architecture
	[I-D.ietf-pcn-architecture] on tunnelling says that header copying		[I-D.ietf-pcn-architecture] on tunnelling says that header copying
	(RFC4301) allows PCN to work correctly. However, resetting CE		(RFC4301) allows PCN to work correctly. However, resetting CE
	markings confuses PCN marking.		markings confuses PCN marking.

	The specific issue here concerns PCN excess rate marking		The specific issue here concerns PCN excess rate marking

	[I-D.eardley-pcn-marking-behaviour], i.e. the bulk marking of traffic		[I-D.ietf-pcn-marking-behaviour], i.e. the bulk marking of traffic
	that exceeds a configured threshold rate. One of the goals of excess		that exceeds a configured threshold rate. One of the goals of excess
	rate marking is to enable the speedy removal of excess admission		rate marking is to enable the speedy removal of excess admission
	controlled traffic following re-routes caused by link failures or		controlled traffic following re-routes caused by link failures or
	other disasters. This maintains a share of the capacity for		other disasters. This maintains a share of the capacity for
	competing admission controlled traffic and for traffic in lower		competing admission controlled traffic and for traffic in lower
	priority classes. After failures, traffic re-routed onto remaining		priority classes. After failures, traffic re-routed onto remaining
	links can often stress multiple links along a path. Therefore,		links can often stress multiple links along a path. Therefore,
	traffic can arrive at a link under stress with some proportion		traffic can arrive at a link under stress with some proportion
	already marked for removal by a previous link. By design, marked		already marked for removal by a previous link. By design, marked
	traffic will be removed by the overall system in subsequent round		traffic will be removed by the overall system in subsequent round
	trips. So when the excess rate marking algorithm decides how much		trips. So when the excess rate marking algorithm decides how much
	traffic to mark for removal, it doesn't include traffic already		traffic to mark for removal, it doesn't include traffic already
	marked for removal by another node upstream (the `Excess traffic		marked for removal by another node upstream (the `Excess traffic

	meter function' of [I-D.eardley-pcn-marking-behaviour]).		meter function' of [I-D.ietf-pcn-marking-behaviour]).

	However, if an RFC3168 tunnel ingress intervenes, it resets the ECN		However, if an RFC3168 tunnel ingress intervenes, it resets the ECN
	field in all the outer headers, hiding all the evidence of problems		field in all the outer headers, hiding all the evidence of problems
	upstream. Thus, although excess rate marking works fine with RFC4301		upstream. Thus, although excess rate marking works fine with RFC4301
	IPsec tunnels, with RFC3168 tunnels it typically removes large		IPsec tunnels, with RFC3168 tunnels it typically removes large
	volumes of traffic that it didn't need to remove at all.		volumes of traffic that it didn't need to remove at all.

	Appendix B. Contribution to Congestion across a Tunnel		Appendix B. Contribution to Congestion across a Tunnel

	This specification mandates that a tunnel ingress determines the ECN		This specification mandates that a tunnel ingress determines the ECN
	field of each new outer tunnel header by copying the arriving header.		field of each new outer tunnel header by copying the arriving header.

	If instead the outer ECN field were reset at a tunnel ingress (as it		Concern has been expressed that this will make it difficult for the
	was for the full functionality mode of RFC3168), it would be possible		tunnel egress to monitor congestion introduced along a tunnel, which
	for the tunnel egress to measure:		is easy if the outer ECN field is reset at a tunnel ingress (RFC3168
			full functionality mode). However, in fact copying CE marks at
	o congestion marking before the tunnel ingress (fraction of inner		ingress will still make it easy for the egress to measure congestion
	header markings, p_i);		introduced across a tunnel, as illustrated below.
	o congestion marking across the tunnel (fraction of outer header
	markings, p_t);


	o congestion marking after the tunnel egress (fraction of departing		Consider 100 packets measured at the egress. It measures that 30 are
	header markings, p_o).		CE marked in the inner and outer headers and 12 have additional CE
			marks in the outer but not the inner. This means packets arriving at
			the ingress had already experienced 30% congestion. However, it does
			not mean there was 12% congestion across the tunnel. The correct
			calculation of congestion across the tunnel is p_t = 12/(100-30) =
			12/70 = 17%. This is easy for the egress to to measure. It is the
			packets with additional CE marking in the outer header (12) as a
			proportion of packets not marked in the inner header (70).


	Although the newly mandated copying behaviour at ingress gains the		Figure 4 illustrates this in a combinatorial probability diagram.
	advantages described in the body of this specification, this one		The square represents 100 packets. The 30% division along the bottom
	advantage of the resetting behaviour of RFC3168 seems to have been		represents marking before the ingress, and the p_t division up the
	lost: on first impressions, it seems that the egress can no longer		side represents marking along the tunnel.
	accurately measure congestion contributed along the tunnel (p_t).
	The egress could _estimate _the contribution along the tunnel by
	measure which packets carry only a mark in the outer header (not the
	inner). But this is not precisely the same as the congestion
	contributed along the tunnel; tunnel nodes may have tried to mark
	some packets that already had a marking in both the inner and outer
	header. Measuring only additional outer markings will miss these.
	Nonetheless, with the newly proposed scheme, a tunnel egress can
	derive a precise estimate of marking introduced across a tunnel (p_t)
	as follows.


	The combined fraction of markings at the tunnel egress will be p_o =		+-----+---------+100%
	1 - (1 - p_i)(1 - p_t). Explanation: this is (1 - the probability a		\| \| \|
	departing packet is not marked), which is (1 - (prob not marked		\| 30 \| \|
	before tunnel)(prob not marked along tunnel)). Therefore,		\| \| \| The large square
	rearranging, the egress can infer the fraction of marks introduced		\| +---------+p_t represents 100 packets
	across the tunnel as p_t = (p_o - p_i)/(1 - p_i). If arriving		\| \| 12 \|
	congestion is low (p_i <<1), then the approximation p_t ~ (p_o - p_i)		+-----+---------+0
	should be good enough. This is the estimate we advised originally;		0 30% 100%
	i.e. measuring only the extra markings in the outer header that are		inner header marking
	not present in the inner header. If a better approximation is needed
	p_t ~ (p_o - p_i)(1 + p_i), which removes the division, but still
	assumes p_i<<1.


	Using any of these formulae (including the precise one), it would be		Figure 4: Tunnel Marking of Packets Already Marked at Ingress
	possible for a tunnel egress to calculate a moving average of the
	fraction of packets being marked by tunnel nodes, including those
	already marked in the inner header. Alternatively, it should even be
	possible for a tunnel egress to reverse engineer which packets would
	have been marked across the tunnel if CE was reset on ingress even if
	CE was actually copied on ingress.[[anchor3: Note from Bob: I've
	worked out an algorithm so the tunnel egress can reverse engineer
	marking as if CE was reset at the ingress even though CE was copied
	at the ingress. It typically consumes 2 cycles / pkt, occasionally 4
	and very occasionally 8. {ToDo: On testing an implementation just now
	it still has a wrinkle in it, but with a little more development I
	believe it would work well. I'll write it into the next revision if
	I get it working.}]]

	Appendix C. Ideal Decapsulation Rules		Appendix C. Ideal Decapsulation Rules


	Compliance with this appendix is NOT REQUIRED for compliance with the		This appendix is not normative. Compliance with this appendix is NOT
	present specification.		REQUIRED for compliance with the present specification.

	If the default ECN encapsulation behaviour does not offer suitable		If the default ECN encapsulation behaviour does not offer suitable
	trade offs, procedures exist for associating a new behaviour with a		trade offs, procedures exist for associating a new behaviour with a
	new Diffserv PHB. However, it is unrealistic to expect vendors of		new Diffserv PHB. However, it is unrealistic to expect vendors of
	all IPSec and all IP in IP tunnel endpoints to cater for the		all IPSec and all IP in IP tunnel endpoints to cater for the
	exceptional behaviour of PHB XXX. If all tunnels did require XXX-		exceptional behaviour of PHB XXX. If all tunnels did require XXX-
	specific behaviour, the resulting patchy and error-prone deployment		specific behaviour, the resulting patchy and error-prone deployment
	would probably cause XXX to suffer byzantine feature interactions		would probably cause XXX to suffer byzantine feature interactions
	with poorly implemented tunnels. The default rules for tunnel		with poorly implemented tunnels. The default rules for tunnel
	endpoints to handle both the Diffserv field and the ECN field should		endpoints to handle both the Diffserv field and the ECN field should

	skipping to change at page 27, line 42		skipping to change at page 28, line 7
	marking) [I-D.ietf-pcn-architecture]. The aim is for the first level		marking) [I-D.ietf-pcn-architecture]. The aim is for the first level
	of marking to stop admitting new traffic and the second level to		of marking to stop admitting new traffic and the second level to
	terminate sufficient existing flows to bring a network back to its		terminate sufficient existing flows to bring a network back to its
	operating point after a serious failure.		operating point after a serious failure.

	Although the ECN field gives sufficient codepoints for these three		Although the ECN field gives sufficient codepoints for these three
	states, the PCN working group cannot use them in case any tunnel		states, the PCN working group cannot use them in case any tunnel
	decapsulations occur within a PCN region. If a node in a tunnel sets		decapsulations occur within a PCN region. If a node in a tunnel sets
	the ECN field to ECT(0) or ECT(1), this change will be discarded by a		the ECN field to ECT(0) or ECT(1), this change will be discarded by a
	tunnel egress compliant with RFC4301 and RFC3168. This can be seen		tunnel egress compliant with RFC4301 and RFC3168. This can be seen

	in Table 1, where the ECT values in the outer header are ignored		in Figure 3, where the ECT values in the outer header are ignored
	unless the inner header is the same. Effectively the ECT(0) and		unless the inner header is the same. Effectively the ECT(0) and
	ECT(1) codepoints have to be treated as just one codepoint when they		ECT(1) codepoints have to be treated as just one codepoint when they
	could otherwise have been used for their intended purpose of		could otherwise have been used for their intended purpose of
	congestion notification. Instead, the PCN w-g has had to propose		congestion notification. Instead, the PCN w-g has had to propose
	using extra Diffserv codepoint(s) to encode the extra states		using extra Diffserv codepoint(s) to encode the extra states
	[I-D.moncaster-pcn-3-state-encoding], using up the rapidly exhausting		[I-D.moncaster-pcn-3-state-encoding], using up the rapidly exhausting
	DSCP space while leaving ECN codepoints unused.		DSCP space while leaving ECN codepoints unused.

	Although this is currently most pressing for the PCN working group,		Although this is currently most pressing for the PCN working group,
	the issue is more general. Under Security Considerations (Section 9)		the issue is more general. Under Security Considerations (Section 9)

	skipping to change at page 28, line 17		skipping to change at page 28, line 31

	More generally, the currently standardised tunnel decapsulation		More generally, the currently standardised tunnel decapsulation
	behaviour unnecessarily wastes a quarter of two bits (i.e. half a		behaviour unnecessarily wastes a quarter of two bits (i.e. half a
	bit) in the IP (v4 & v6) header. As explained in Section 3.1, the		bit) in the IP (v4 & v6) header. As explained in Section 3.1, the
	original reason for not copying down outer ECT codepoints for onward		original reason for not copying down outer ECT codepoints for onward
	forwarding was to limit the covert channel across a decapsulator to 1		forwarding was to limit the covert channel across a decapsulator to 1
	bit per packet. However, now that the IETF Security Area has deemed		bit per packet. However, now that the IETF Security Area has deemed
	that a 2-bit covert channel through an encapsulator is a manageable		that a 2-bit covert channel through an encapsulator is a manageable
	risk, the same should be true for a decapsulator.		risk, the same should be true for a decapsulator.


	Table 2 proposes a more ideal layered decapsulation behaviour. Note:		Figure 5 proposes a more ideal layered decapsulation behaviour.
	this table is only to support discussion. It is not currently		Note: this table is only to support discussion. It is not currently
	proposed for standards action. The only difference from Table 1		proposed for standards action. The only difference from Figure 3
	(that is proposed for standards action), is the swapping of the cells		(that is proposed for standards action), is the swapping of the cells
	highlighted as ECT(X).		highlighted as ECT(X).


	+--Incoming Outer Header---		+---------------------------------------------+
			\| Incoming Outer Header \|
	+---------------------+---------+-----------+-----------+-----------+		+---------------------+---------+-----------+-----------+-----------+
	\| Incoming Inner \| Not-ECT \| ECT(0) \| ECT(1) \| CE \|		\| Incoming Inner \| Not-ECT \| ECT(0) \| ECT(1) \| CE \|
	\| Header \| \| \| \| \|		\| Header \| \| \| \| \|
	+---------------------+---------+-----------+-----------+-----------+		+---------------------+---------+-----------+-----------+-----------+
	\| Not-ECT \| Not-ECT \| drop(!!!) \| drop(!!!) \| drop(!!!) \|		\| Not-ECT \| Not-ECT \| drop(!!!) \| drop(!!!) \| drop(!!!) \|
	\| ECT(0) \| ECT(0) \| ECT(0) \| ECT(1) \| CE \|		\| ECT(0) \| ECT(0) \| ECT(0) \| ECT(1) \| CE \|
	\| ECT(1) \| ECT(1) \| ECT(0) \| ECT(1) \| CE \|		\| ECT(1) \| ECT(1) \| ECT(0) \| ECT(1) \| CE \|
	\| CE \| CE \| CE \| CE (!!!) \| CE \|		\| CE \| CE \| CE \| CE (!!!) \| CE \|
	+---------------------+---------+-----------+-----------+-----------+		+---------------------+---------+-----------+-----------+-----------+

			\| Outgoing Header \|
			+---------------------------------------------+


	+-----Outgoing Header------		Figure 5: Ideal IP in IP Decapsulation (currently informative, not
			normative)
	Table 2: Ideal IP in IP Decapsulation (currently NOT REQUIRED)


	Note that, if this ideal proposal were taken up, extra backwards		Note that, if this ideal proposal were taken up, a tunnel egress
	compatibility issues would have to be resolved.		complying with it would be backwards compatible with all previous
			specifications for encapsulation of ECN at the ingress (RFC4301, both
			modes of RFC3168, both modes of RFC2481 and RFC2003). In comparison
			with an RFC3168 or RFC4301 tunnel egress, it would require no
			additional configuration at the ingress nor any additional
			negotiation with the ingress. The only new issue would be the burden
			of an extra standard to be compliant with, adding to the already
			complex history of ECN tunnelling RFCs.

	Appendix D. Non-Dependence of Tunnelling on In-path Load Regulation		Appendix D. Non-Dependence of Tunnelling on In-path Load Regulation

	We have said that at any point in a network, the Congestion Baseline		We have said that at any point in a network, the Congestion Baseline
	(where congestion notification starts from zero) should be the		(where congestion notification starts from zero) should be the
	previous upstream Load Regulator. We have also said that the ingress		previous upstream Load Regulator. We have also said that the ingress
	of an IP in IP tunnel must copy congestion indications to the		of an IP in IP tunnel must copy congestion indications to the
	encapsulating outer headers it creates. If the Load Regulator is in-		encapsulating outer headers it creates. If the Load Regulator is in-
	path rather than at the source, and also a tunnel ingress, these two		path rather than at the source, and also a tunnel ingress, these two
	requirements seem to be contradictory. A tunnel ingress must not		requirements seem to be contradictory. A tunnel ingress must not
	reset incoming congestion, but a Load Regulator must be the		reset incoming congestion, but a Load Regulator must be the
	Congestion Baseline, implying it needs to reset incoming congestion.		Congestion Baseline, implying it needs to reset incoming congestion.

	In fact, the two requirements are not contradictory, because a Load		In fact, the two requirements are not contradictory, because a Load

	Regulator and a tunnel ingress are functions within a node that occur		Regulator and a tunnel ingress are functions within a node that
	in sequence on a stream of packets, not at the same point. Figure 3		typically occur in sequence on a stream of packets, not at the same
	is borrowed from [RFC2983] (which was making a similar point about		point. Figure 6 is borrowed from [RFC2983] (which was making a
	the location of Diffserv traffic conditioning relative to the		similar point about the location of Diffserv traffic conditioning
	encapsulation function of a tunnel). An in-path Load Regulator can		relative to the encapsulation function of a tunnel). An in-path Load
	act on packets either at [1 - Before] encapsulation or at [2 - Outer]		Regulator can act on packets either at [1 - Before] encapsulation or
	after encapsulation. Load Regulation does not ever need to be		at [2 - Outer] after encapsulation. Load Regulation does not ever
	integrated with the [Encapsulate] function (but it can be for		need to be integrated with the [Encapsulate] function (but it can be
	efficiency). Therefore we can still maintain that the [Encapsulate]		for efficiency). Therefore we can still mandate that the
	function always copies CE into the outer header.		[Encapsulate] function always copies CE into the outer header.


	>>-----[1 - Before]--------[Encapsulate]----[3 - Inner]------------>>		>>-----[1 - Before]--------[Encapsulate]----[3 - Inner]---------->>
	\		\
	\		\

	+--------[2 - Outer]--------->>		+--------[2 - Outer]------->>


	Figure 3: Placement of In-Path Load Regulator Relative to Tunnel		Figure 6: Placement of In-Path Load Regulator Relative to Tunnel
	Ingress		Ingress

	Then separately, if there is a Load Regulator at location [2 -		Then separately, if there is a Load Regulator at location [2 -
	Outer], it might reset CE to ECT(0), say. Then the Congestion		Outer], it might reset CE to ECT(0), say. Then the Congestion
	Baseline for the lower layer (outer) will be [2 - Outer], while the		Baseline for the lower layer (outer) will be [2 - Outer], while the
	Congestion Baseline of the inner layer will be unchanged. But how		Congestion Baseline of the inner layer will be unchanged. But how
	encapsulation works has nothing to do with whether a Load Regulator		encapsulation works has nothing to do with whether a Load Regulator
	is present or where it is.		is present or where it is.

	If on the other hand a Load Regulator resets CE at [1 - Before], the		If on the other hand a Load Regulator resets CE at [1 - Before], the

	skipping to change at page 30, line 12		skipping to change at page 30, line 47
	desirable or practical for a node part way along the path to regulate		desirable or practical for a node part way along the path to regulate
	the load. However, various reasonable proposals for in-path load		the load. However, various reasonable proposals for in-path load
	regulation have been made from time to time (e.g. fair queuing,		regulation have been made from time to time (e.g. fair queuing,
	traffic engineering, flow admission control). The IETF has recently		traffic engineering, flow admission control). The IETF has recently
	chartered a working group to standardise admission control across a		chartered a working group to standardise admission control across a
	part of a path using pre-congestion notification (PCN) [PCNcharter].		part of a path using pre-congestion notification (PCN) [PCNcharter].
	This is of particular relevance here because it involves congestion		This is of particular relevance here because it involves congestion
	notification with an in-path Load Regulator, it can involve		notification with an in-path Load Regulator, it can involve
	tunnelling and it certainly involves encapsulation more generally.		tunnelling and it certainly involves encapsulation more generally.


	We will use the more complex scenario in Figure 4 to tease out all		We will use the more complex scenario in Figure 7 to tease out all
	the issues that arise when combining congestion notification and		the issues that arise when combining congestion notification and
	tunnelling with various possible in-path load regulation schemes. In		tunnelling with various possible in-path load regulation schemes. In
	this case 'I1' and 'E2' break up the path into three separate		this case 'I1' and 'E2' break up the path into three separate
	congestion control loops. The feedback for these loops is shown		congestion control loops. The feedback for these loops is shown
	going right to left across the top of the figure. The 'V's are arrow		going right to left across the top of the figure. The 'V's are arrow
	heads representing the direction of feedback, not letters. But there		heads representing the direction of feedback, not letters. But there
	are also two tunnels within the middle control loop: 'I1' to 'E1' and		are also two tunnels within the middle control loop: 'I1' to 'E1' and
	'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS		'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS
	core networks. M is a congestion monitoring point, perhaps between		core networks. M is a congestion monitoring point, perhaps between
	two border routers where the same tunnel continues unbroken across		two border routers where the same tunnel continues unbroken across
	the border.		the border.
	______ _______________________________________ _____		______ _______________________________________ _____
	/ \ / \ / \		/ \ / \ / \
	V \ V M \ V \		V \ V M \ V \
	A--->R--->I1===========>E1----->I2=========>==========>E2------->B		A--->R--->I1===========>E1----->I2=========>==========>E2------->B


	Figure 4: complex Tunnel Scenario		Figure 7: complex Tunnel Scenario

	The question is, should the congestion markings in the outer exposed		The question is, should the congestion markings in the outer exposed
	headers of a tunnel represent congestion only since the tunnel		headers of a tunnel represent congestion only since the tunnel
	ingress or over the whole upstream path from the source of the inner		ingress or over the whole upstream path from the source of the inner
	header (whatever that may mean)? Or put another way, should 'I1' and		header (whatever that may mean)? Or put another way, should 'I1' and
	'I2' copy or reset CE markings?		'I2' copy or reset CE markings?

	Based on the design principles in Section 4, the answer is that the		Based on the design principles in Section 4, the answer is that the
	Congestion Baseline should be the nearest upstream interface designed		Congestion Baseline should be the nearest upstream interface designed

	to regulate traffic load--the Load Regulator. In Figure 4 'A', 'I1'		to regulate traffic load--the Load Regulator. In Figure 7 'A', 'I1'
	or 'E2' are all Load Regulators. We have shown the feedback loops		or 'E2' are all Load Regulators. We have shown the feedback loops
	returning to each of these nodes so that they can regulate the load		returning to each of these nodes so that they can regulate the load
	causing the congestion notification. So the Congestion Baseline		causing the congestion notification. So the Congestion Baseline
	exposed to M should be 'I1' (the Load Regulator), not 'I2'.		exposed to M should be 'I1' (the Load Regulator), not 'I2'.
	Therefore I1 should reset any arriving CE markings. In this case,		Therefore I1 should reset any arriving CE markings. In this case,
	'I1' knows the tunnel to 'E1' is unrelated to its load regulation		'I1' knows the tunnel to 'E1' is unrelated to its load regulation
	function. So the load regulation function within 'I1' should be		function. So the load regulation function within 'I1' should be
	placed at [1 - Before] tunnel encapsulation within 'I1' (using the		placed at [1 - Before] tunnel encapsulation within 'I1' (using the

	terminology of Figure 3). Then the Congestion Baseline all across		terminology of Figure 6). Then the Congestion Baseline all across
	the networks from 'I1' to 'E2' in both inner and outer headers will		the networks from 'I1' to 'E2' in both inner and outer headers will
	be 'I1'.		be 'I1'.

	The following further examples illustrate how this answer might be		The following further examples illustrate how this answer might be
	applied:		applied:

	o We argued in Appendix A that resetting CE on encapsulation could		o We argued in Appendix A that resetting CE on encapsulation could
	harm PCN excess rate marking, which marks excess traffic for		harm PCN excess rate marking, which marks excess traffic for
	removal in subsequent round trips. This marking relies on not		removal in subsequent round trips. This marking relies on not
	marking packets if another node upstream has already marked them		marking packets if another node upstream has already marked them
	for removal. If there were a tunnel ingress between the two which		for removal. If there were a tunnel ingress between the two which
	reset CE markings, it would confuse the downstream node into		reset CE markings, it would confuse the downstream node into
	marking far too much traffic for removal. So why do we say that		marking far too much traffic for removal. So why do we say that
	'I1' should reset CE, while a tunnel ingress shouldn't? The		'I1' should reset CE, while a tunnel ingress shouldn't? The
	answer is that it is the Load Regulator function at 'I1' that is		answer is that it is the Load Regulator function at 'I1' that is
	resetting CE, not the tunnel encapsulator. The Load Regulator		resetting CE, not the tunnel encapsulator. The Load Regulator
	needs to set itself as the Congestion Baseline, so the feedback it		needs to set itself as the Congestion Baseline, so the feedback it
	gets will only be about congestion on links it can relieve itself		gets will only be about congestion on links it can relieve itself

	by regulating the load into them. When it resets CE markings, it		(by regulating the load into them). When it resets CE markings,
	knows that something else upstream will have dealt with the		it knows that something else upstream will have dealt with the
	congestion notifications it removes, given it is part of an end-		congestion notifications it removes, given it is part of an end-
	to-end admission control signalling loop. It therefore knows that		to-end admission control signalling loop. It therefore knows that
	previous hops will be covered by other Load Regulators.		previous hops will be covered by other Load Regulators.
	Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should		Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should
	follow the new rule for any tunnel ingress and copy congestion		follow the new rule for any tunnel ingress and copy congestion
	marking into the outer tunnel header. The ingress at 'I1' will		marking into the outer tunnel header. The ingress at 'I1' will
	happen to copy headers that have already been reset just		happen to copy headers that have already been reset just
	beforehand. But it doesn't need to know that.		beforehand. But it doesn't need to know that.

	o [Shayman] suggested feedback of ECN accumulated across an MPLS		o [Shayman] suggested feedback of ECN accumulated across an MPLS

	skipping to change at page 32, line 4		skipping to change at page 32, line 41
	headers. Again, the tunnel encapsulation function at 'I' simply		headers. Again, the tunnel encapsulation function at 'I' simply
	copies incoming headers, unaware that the load regulator will		copies incoming headers, unaware that the load regulator will
	subsequently reset its outer headers.		subsequently reset its outer headers.

	o The PWE3 working group of the IETF is considering the problem of		o The PWE3 working group of the IETF is considering the problem of
	how and whether an aggregate edge-to-edge pseudo-wire emulation		how and whether an aggregate edge-to-edge pseudo-wire emulation
	should respond to congestion [I-D.ietf-pwe3-congestion-frmwk].		should respond to congestion [I-D.ietf-pwe3-congestion-frmwk].
	Although the study is still at the requirements stage, some		Although the study is still at the requirements stage, some
	(controversial) solution proposals include in-path load regulation		(controversial) solution proposals include in-path load regulation
	at the ingress to the tunnel that could lead to tunnel		at the ingress to the tunnel that could lead to tunnel

	arrangements with similar complexity to that of Figure 4.		arrangements with similar complexity to that of Figure 7.

	These are not contrived scenarios--they could be a lot worse. For		These are not contrived scenarios--they could be a lot worse. For
	instance, a host may create a tunnel for IPsec which is placed inside		instance, a host may create a tunnel for IPsec which is placed inside
	a tunnel for Mobile IP over a remote part of its path. And around		a tunnel for Mobile IP over a remote part of its path. And around
	this all we may have MPLS labels being pushed and popped as packets		this all we may have MPLS labels being pushed and popped as packets
	pass across different core networks. Similarly, it is possible that		pass across different core networks. Similarly, it is possible that
	subnets could be built from link technology (e.g. future Ethernet		subnets could be built from link technology (e.g. future Ethernet
	switches) so that link headers being added and removed could involve		switches) so that link headers being added and removed could involve
	congestion notification in future Ethernet link headers with all the		congestion notification in future Ethernet link headers with all the
	same issues as with IP in IP tunnels.		same issues as with IP in IP tunnels.

End of changes. 65 change blocks.
	162 lines changed or deleted		172 lines changed or added
This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/