Diff: draft-briscoe-re-pcn-border-cheat-01.txt - draft-briscoe-re-pcn-border-cheat-02.txt

	draft-briscoe-re-pcn-border-cheat-01.txt	draft-briscoe-re-pcn-border-cheat-02.txt

	PCN Working Group B. Briscoe	PCN Working Group B. Briscoe
	Internet-Draft BT & UCL	Internet-Draft BT & UCL

	Intended status: Informational February 25, 2008	Intended status: Standards Track September 13, 2008
	Expires: August 28, 2008	Expires: March 17, 2009


	Emulating Border Flow Policing using Re-ECN on Bulk Data	Emulating Border Flow Policing using Re-PCN on Bulk Data
	draft-briscoe-re-pcn-border-cheat-01	draft-briscoe-re-pcn-border-cheat-02

	Status of this Memo	Status of this Memo

	By submitting this Internet-Draft, each author represents that any	By submitting this Internet-Draft, each author represents that any
	applicable patent or other IPR claims of which he or she is aware	applicable patent or other IPR claims of which he or she is aware
	have been or will be disclosed, and any of which he or she becomes	have been or will be disclosed, and any of which he or she becomes
	aware will be disclosed, in accordance with Section 6 of BCP 79.	aware will be disclosed, in accordance with Section 6 of BCP 79.

	Internet-Drafts are working documents of the Internet Engineering	Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF), its areas, and its working groups. Note that	Task Force (IETF), its areas, and its working groups. Note that

	skipping to change at page 1, line 34	skipping to change at page 1, line 34
	and may be updated, replaced, or obsoleted by other documents at any	and may be updated, replaced, or obsoleted by other documents at any
	time. It is inappropriate to use Internet-Drafts as reference	time. It is inappropriate to use Internet-Drafts as reference
	material or to cite them other than as "work in progress."	material or to cite them other than as "work in progress."

	The list of current Internet-Drafts can be accessed at	The list of current Internet-Drafts can be accessed at
	http://www.ietf.org/ietf/1id-abstracts.txt.	http://www.ietf.org/ietf/1id-abstracts.txt.

	The list of Internet-Draft Shadow Directories can be accessed at	The list of Internet-Draft Shadow Directories can be accessed at
	http://www.ietf.org/shadow.html.	http://www.ietf.org/shadow.html.


	This Internet-Draft will expire on August 28, 2008.	This Internet-Draft will expire on March 17, 2009.

	Copyright Notice

	Copyright (C) The IETF Trust (2008).

	Abstract	Abstract

	Scaling per flow admission control to the Internet is a hard problem.	Scaling per flow admission control to the Internet is a hard problem.

	A recently proposed approach combines Diffserv and pre-congestion	The approach of combining Diffserv and pre-congestion notification
	notification (PCN) to provide a service slightly better than Intserv	(PCN) provides a service slightly better than Intserv controlled load
	controlled load. It scales to networks of any size, but only if	that scales to networks of any size without needing Diffserv's usual
	domains trust each other to comply with admission control and rate	overprovisioning, but only if domains trust each other to comply with
	policing. This memo claims to solve this trust problem without	admission control and rate policing. This memo claims to solve this
	losing scalability. It describes bulk border policing that provides	trust problem without losing scalability. It provides a sufficient
	a sufficient emulation of per-flow policing with the help of another	emulation of per-flow policing at borders but with only passive bulk
	recently proposed extension to ECN, involving re-echoing ECN feedback	metering rather than per-flow processing. Measurements are
	(re-ECN). With only passive bulk measurements at borders, sanctions	sufficient to apply penalties against cheating neighbour networks.
	can be applied against cheating networks.

	Table of Contents	Table of Contents


	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 7	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 8
	2. Requirements Notation . . . . . . . . . . . . . . . . . . . . 9	2. Requirements Notation . . . . . . . . . . . . . . . . . . . . 11
	3. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 10	3. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 11
	3.1. The Traditional Per-flow Policing Problem . . . . . . . . 10	3.1. The Traditional Per-flow Policing Problem . . . . . . . . 11
	3.2. Generic Scenario . . . . . . . . . . . . . . . . . . . . . 12	3.2. Generic Scenario . . . . . . . . . . . . . . . . . . . . . 14
	4. Re-ECN Protocol for an RSVP (or similar) Transport . . . . . . 14	4. Re-ECN Protocol in IP with Two Congestion Marking Levels . . . 17
	4.1. Protocol Overview . . . . . . . . . . . . . . . . . . . . 14	4.1. Protocol Overview . . . . . . . . . . . . . . . . . . . . 17
	4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or	4.2. Re-PCN Abstracted Network Layer Wire Protocol (IPv4 or
	v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 16	v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
	4.2.1. Re-ECN Recap . . . . . . . . . . . . . . . . . . . . . 16	4.2.1. Re-ECN Recap . . . . . . . . . . . . . . . . . . . . . 18
	4.2.2. Re-ECN Combined with Pre-Congestion Notification	4.2.2. Re-ECN Combined with Pre-Congestion Notification

	(re-PCN) . . . . . . . . . . . . . . . . . . . . . . . 18	(re-PCN) . . . . . . . . . . . . . . . . . . . . . . . 20
	4.3. Protocol Operation . . . . . . . . . . . . . . . . . . . . 20	4.3. Protocol Operation . . . . . . . . . . . . . . . . . . . . 22
	4.3.1. Protocol Operation for an Established Flow . . . . . . 20	4.3.1. Protocol Operation for an Established Flow . . . . . . 23
	4.3.2. Aggregate Bootstrap . . . . . . . . . . . . . . . . . 21	4.3.2. Aggregate Bootstrap . . . . . . . . . . . . . . . . . 24
	4.3.3. Flow Bootstrap . . . . . . . . . . . . . . . . . . . . 22	4.3.3. Flow Bootstrap . . . . . . . . . . . . . . . . . . . . 26
	4.3.4. Router Forwarding Behaviour . . . . . . . . . . . . . 23	4.3.4. Router Forwarding Behaviour . . . . . . . . . . . . . 26
	4.3.5. Extensions . . . . . . . . . . . . . . . . . . . . . . 25	4.3.5. Extensions . . . . . . . . . . . . . . . . . . . . . . 28
	5. Emulating Border Policing with Re-ECN . . . . . . . . . . . . 25	5. Emulating Border Policing with Re-ECN . . . . . . . . . . . . 28
	5.1. Informal Terminology . . . . . . . . . . . . . . . . . . . 25	5.1. Informal Terminology . . . . . . . . . . . . . . . . . . . 28
	5.2. Policing Overview . . . . . . . . . . . . . . . . . . . . 26	5.2. Policing Overview . . . . . . . . . . . . . . . . . . . . 30
	5.3. Pre-requisite Contractual Arrangements . . . . . . . . . . 28	5.3. Pre-requisite Contractual Arrangements . . . . . . . . . . 31
	5.4. Emulation of Per-Flow Rate Policing: Rationale and	5.4. Emulation of Per-Flow Rate Policing: Rationale and

	Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 31	Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 34
	5.5. Sanctioning Dishonest Marking . . . . . . . . . . . . . . 32	5.5. Sanctioning Dishonest Marking . . . . . . . . . . . . . . 36
	5.6. Border Mechanisms . . . . . . . . . . . . . . . . . . . . 34	5.6. Border Mechanisms . . . . . . . . . . . . . . . . . . . . 38
	5.6.1. Border Accounting Mechanisms . . . . . . . . . . . . . 34	5.6.1. Border Accounting Mechanisms . . . . . . . . . . . . . 38
	5.6.2. Competitive Routing . . . . . . . . . . . . . . . . . 38	5.6.2. Competitive Routing . . . . . . . . . . . . . . . . . 41
	5.6.3. Fail-safes . . . . . . . . . . . . . . . . . . . . . . 39	5.6.3. Fail-safes . . . . . . . . . . . . . . . . . . . . . . 42
	6. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 40	6. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
	7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 42	7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 46
	8. Design Choices and Rationale . . . . . . . . . . . . . . . . . 43	8. Design Choices and Rationale . . . . . . . . . . . . . . . . . 47
	9. Security Considerations . . . . . . . . . . . . . . . . . . . 45	9. Security Considerations . . . . . . . . . . . . . . . . . . . 49
	10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46	10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 50
	11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 46	11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 50
	12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 47	12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 51
	13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 47	13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 52
	14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 48	14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 52
	14.1. Normative References . . . . . . . . . . . . . . . . . . . 48	14.1. Normative References . . . . . . . . . . . . . . . . . . . 52
	14.2. Informative References . . . . . . . . . . . . . . . . . . 48	14.2. Informative References . . . . . . . . . . . . . . . . . . 53
	Appendix A. Implementation . . . . . . . . . . . . . . . . . . . 50	Appendix A. Implementation . . . . . . . . . . . . . . . . . . . 55
	A.1. Ingress Gateway Algorithm for Blanking the RE flag . . . . 50	A.1. Ingress Gateway Algorithm for Blanking the RE flag . . . . 55
	A.2. Downstream Congestion Metering Algorithms . . . . . . . . 51	A.2. Downstream Congestion Metering Algorithms . . . . . . . . 56
	A.2.1. Bulk Downstream Congestion Metering Algorithm . . . . 51	A.2.1. Bulk Downstream Congestion Metering Algorithm . . . . 56
	A.2.2. Inflation Factor for Persistently Negative Flows . . . 52	A.2.2. Inflation Factor for Persistently Negative Flows . . . 56
	A.3. Algorithm for Sanctioning Negative Traffic . . . . . . . . 52	A.3. Algorithm for Sanctioning Negative Traffic . . . . . . . . 57
	Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 53	Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 57
	Intellectual Property and Copyright Statements . . . . . . . . . . 54	Intellectual Property and Copyright Statements . . . . . . . . . . 59

	Status (to be removed by the RFC Editor)	Status (to be removed by the RFC Editor)


		The IETF PCN working group is initially chartered to consider PCN
		domains only under a single trust authority. However, after its
		initial work is complete the charter says the working group may re-
		charter to consider concatenated Diffserv domains, amongst other new
		work items. The charter ends by stating "The details of these work
		items are outside the scope of the initial phase; but the WG may
		consider their requirements to design components that are
		sufficiently general to support such extensions in the future."

		This memo is therefore contributed to describe how PCN could be
		extended to inter-domain. We wanted to document the solution to
		reduce the chances that something else eats up the codepoint space
		needed before PCN re-charters to consider inter-domain. Losing the
		chance to standardise this simple, scalable solution to the problem
		of inter-domain flow admission control would be unfortunate
		(understatement), given it took years to find, and even then it was
		very difficult to find codepoint space for it.

		The scheme described here (Section 4) requires the PCN ingress
		gateway to re-echo any PCN feedback it receives back into the forward
		stream of IP packets (hence we call this scheme re-PCN). Re-PCN
		works in a very similar way to the re-ECN proposal on which it is
		based [I-D.briscoe-tsvwg-re-ecn-tcp], the only difference being that
		PCN might encode three states of congestion, whereas ECN encodes two.
		This document is written to stand alone from re-ECN, so that readers
		do not have to read [I-D.briscoe-tsvwg-re-ecn-tcp].

		The authors seek comments from the Internet community on whether
		combining PCN and re-ECN to create re-PCN in this way is a sufficient
		solution to the problem of scaling microflow admission control to the
		Internet as a whole. Here we emphasise that scaling is not just an
		issue of numbers of flows, but also the number of security entities--
		networks and users--who may all have conflicting interests.

	This memo is posted as an Internet-Draft with the intent to	This memo is posted as an Internet-Draft with the intent to
	eventually be broken down in two documents; one for the standards	eventually be broken down in two documents; one for the standards
	track and one for informational status. But until it becomes an item	track and one for informational status. But until it becomes an item
	of IETF working group business the whole proposal has been kept	of IETF working group business the whole proposal has been kept
	together to aid understanding. Only the text of Section 4 of this	together to aid understanding. Only the text of Section 4 of this

	document requires standardisation. The rest of the sections describe	document is intended to be normative (requiring standardisation).
	how a system might be built from these protocols by the operators of	The rest of the sections are merely informative, describing how a
	an internetwork. Note in particular that the policing and monitoring	system might be built from these protocols by the operators of an
		internetwork. Note in particular that the policing and monitoring
	functions proposed for the trust boundaries between operators would	functions proposed for the trust boundaries between operators would

	not need standardisation by the IETF. They simply represent one way	not need standardisation by the IETF. They simply represent one
	that the proposed protocols could be used to extend the PCN	possible way that the proposed protocols could be used to extend the
	architecture [I-D.ietf-pcn-architecture] to span multiple domains	PCN architecture [I-D.ietf-pcn-architecture] to span multiple domains
	without mutual trust between the operators.	without mutual trust between the operators.


	To realise the system described, this document also depends on	Dependencies (to be removed by the RFC Editor)
	standardisation of three other documents currently being discussed
	(but not on the standards track) in the IETF Transport Area: pre-
	congestion notification (PCN) marking on interior nodes [PCN];
	feedback of aggregate PCN measurements by suitably extending the
	admission control signalling protocol (e.g. RSVP) [RSVP-ECN]; and
	re-insertion of the feedback into the forward stream of IP packets by
	the PCN ingress gateway in a similar way to that proposed for a TCP
	source [Re-TCP].


	The authors seek comments from the Internet community on whether	To realise the system described, this document also depends on other
	combining PCN and re-ECN in this way is a sufficient solution to the	documents chartered in the IETF Transport Area progressing along the
	problem of scaling microflow admission control to the Internet as a	standards track:
	whole, even though such scaling must take account of the increasing
	numbers of networks and users who may all have conflicting interests.	o Pre-congestion notification (PCN) marking on interior nodes
		[I-D.eardley-pcn-marking-behaviour], chartered for standardisation
		in the PCN w-g;

		o The baseline encoding of pre-congestion notification in the IP
		header [I-D.moncaster-pcn-baseline-encoding], also chartered for
		standardisation in the PCN w-g;

		o Feedback of aggregate PCN measurements by suitably extending the
		admission control signalling protocol (e.g. RSVP extension
		[RSVP-ECN] or NSIS extension [I-D.arumaithurai-nsis-pcn]).

		The baseline encoding makes no new demands on codepoint space in the
		IP header but provides just two PCN encoding states (not marked and
		marked). The PCN architecture recognises that operators might want
		PCN marking to trigger two functions (admission control and flow
		termination) at different levels of pre-congestion, which seems to
		require three encoding states. A scheme has been proposed
		[I-D.charny-pcn-single-marking] that can do both functions with just
		two encoding states, but simulations have shown it performs poorly
		under certain conditions that might be typical. As it seems likely
		that PCN might need three encoding states to be fully operational, we
		want to be sure that three encoding states can be extended to work
		inter-domain. Therefore, we have defined a three-state extension
		encoding scheme in this document, then we have added the re-PCN
		scheme to it. The three-state encoding we have chosen depends on
		standardisation of yet another document in the IETF Transport Area:

		o Propagation beyond the tunnel decapsulator of any changes in the
		ECN field to ECT(0) or ECT(1) made within a tunnel (the ideal
		decapsulation rules of [I-D.briscoe-tsvwg-ecn-tunnel]);

	Changes from previous drafts (to be removed by the RFC Editor)	Changes from previous drafts (to be removed by the RFC Editor)

	Full diffs of incremental changes between drafts are available at	Full diffs of incremental changes between drafts are available at
	URL: <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#repcn>	URL: <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#repcn>

		Changes from <draft-briscoe-re-pcn-border-cheat-01> to
		<draft-briscoe-re-pcn-border-cheat-02> (current version):

		Considerably updated the 'Status' note to explain the
		relationship of this draft to other documents in the IETF
		process (or not) and to chartered PCN w-g activity.

		Split out the dependencies into a separate note and added
		dependencies on new PCN documents in progress.

		Made scalability motivation in the introduction clearer,
		explaining why Diffserv over-provisioning doesn't scale unless
		PCN is used.

		Clarified that the standards action in Section 4 is to define
		the meanings of the combination of fields in the IP header: the
		RE flag and 2-level congestion marking in the ECN field. And
		that it is not characterised by a particular feedback style in
		the transport.

		Switched round the two ECT codepoints to be compatible with the
		new PCN baseline encoding and used less confusing naming for
		re-PCN codepoints (Section 4).

		Generalised rules for encoding probes when bootstrapping or re-
		starting aggregates & flows (Section 4.3.2).

		Downgraded drop sanction behaviour from MUST to conditional
		SHOULD (Section 5.5).

		Added incremental deployment safety justification for choice of
		which way round the RE flag works (Section 7).

		Added possible vulnerability to brief attacks and possible
		solution to security considerations (Section 9).

		Updated references and terminology, particularly taking account
		of recent new PCN w-g documents;

		Replaced suggested Ingress Gateway Algorithm for Blanking the
		RE flag (Appendix A.1)


		Clarifications throughout;
	Changes from <draft-briscoe-re-pcn-border-cheat-00> to	Changes from <draft-briscoe-re-pcn-border-cheat-00> to

	<draft-briscoe-re-pcn-border-cheat-01> (current version):	<draft-briscoe-re-pcn-border-cheat-01>:

	Updated references.	Updated references.

	Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-01>	Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-01>
	to <draft-briscoe-re-pcn-border-cheat-00>:	to <draft-briscoe-re-pcn-border-cheat-00>:


	Changed filename to associate it with the new IETF PCN w-g, rather	Changed filename to associate it with the new IETF PCN w-g,
	than the TSVWG w-g.	rather than the TSVWG w-g.


	Introduction: Clarified that bulk policing only replaces per-flow	Introduction: Clarified that bulk policing only replaces per-
	policing at interior inter-domain borders, while per-flow policing	flow policing at interior inter-domain borders, while per-flow
	is still needed at the access interface to the internetwork. Also	policing is still needed at the access interface to the
	clarified that the aim is to neutralise any gains from cheating	internetwork. Also clarified that the aim is to neutralise any
	using local bilateral contracts between neighbouring networks,	gains from cheating using local bilateral contracts between
	rather than merely identifying remote cheaters.	neighbouring networks, rather than merely identifying remote
		cheaters.


	Section 3.1: Described the traditional per-flow policing problem	Section 3.1: Described the traditional per-flow policing
	with inter-domain reservations more precisely, particularly with	problem with inter-domain reservations more precisely,
	respect to direction of reservations and of traffic flows.	particularly with respect to direction of reservations and of
		traffic flows.


	Clarified status of Section 5 onwards, in particular that policers	Clarified status of Section 5 onwards, in particular that
	and monitors would not need standardisation, but that the protocol	policers and monitors would not need standardisation, but that
	in Section 4 would require standardisation.	the protocol in Section 4 would require standardisation.


	Section 5.6.2 on competitive routing: Added discussion of direct	Section 5.6.2 on competitive routing: Added discussion of
	incentives for a receiver to switch to a different provider even	direct incentives for a receiver to switch to a different
	if the provider has a termination monopoly.	provider even if the provider has a termination monopoly.


	Clarified that "Designing in security from the start" merely means	Clarified that "Designing in security from the start" merely
	allowing codepoint space in the PCN protocol encoding. There is	means allowing codepoint space in the PCN protocol encoding.
	no need to actually implement inter-domain security mechanisms for	There is no need to actually implement inter-domain security
	solutions confined to a single domain.	mechanisms for solutions confined to a single domain.

	Updated some references and added a ref to the Security	Updated some references and added a ref to the Security
	Considerations, as well as other minor corrections and	Considerations, as well as other minor corrections and
	improvements.	improvements.

	Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-00> to	Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-00> to
	<draft-briscoe-tsvwg-re-ecn-border-cheat-01>:	<draft-briscoe-tsvwg-re-ecn-border-cheat-01>:


	Added subsection on Border Accounting Mechanisms (Section 5.6.1)	Added subsection on Border Accounting Mechanisms
		(Section 5.6.1)
	Section 4.2 on the re-ECN wire protocol clarified and re-organised	Section 4.2 on the re-ECN wire protocol clarified and re-
	to separately discuss re-ECN for default ECN marking and for pre-	organised to separately discuss re-ECN for default ECN marking
	congestion marking (PCN).	and for pre-congestion marking (PCN).

	Router Forwarding Behaviour subsection added to re-organised	Router Forwarding Behaviour subsection added to re-organised

	section on Protocol Operation (Section 4.3). Extensions section	section on Protocol Operation (Section 4.3). Extensions
	moved within Protocol Operations.	section moved within Protocol Operations.


	Emulating Border Policing (Section 5) reorganised, starting with a	Emulating Border Policing (Section 5) reorganised, starting
	new Terminology subsection heading, and a simplified overview	with a new Terminology subsection heading, and a simplified
	section. Added a large new subsection on Border Accounting	overview section. Added a large new subsection on Border
	Mechanisms within a new section bringing together other	Accounting Mechanisms within a new section bringing together
	subsections on Border Mechanisms generally (Section 5.6). Some	other subsections on Border Mechanisms generally (Section 5.6).
	text moved from old subsections into these new ones.	Some text moved from old subsections into these new ones.

	Added section on Incremental Deployment (Section 7), drawing	Added section on Incremental Deployment (Section 7), drawing
	together relevant points about deployment made throughout.	together relevant points about deployment made throughout.

	Sections on Design Rationale (Section 8) and Security	Sections on Design Rationale (Section 8) and Security
	Considerations (Section 9) expanded with some new material,	Considerations (Section 9) expanded with some new material,
	including new attacks and their defences.	including new attacks and their defences.


	Suggested Border Metering Algorithms improved (Appendix A.2) for	Suggested Border Metering Algorithms improved (Appendix A.2)
	resilience to newly identified attacks.	for resilience to newly identified attacks.

	1. Introduction	1. Introduction

	The Internet community largely lost interest in the Intserv	The Internet community largely lost interest in the Intserv
	architecture after it was clarified that it would be unlikely to	architecture after it was clarified that it would be unlikely to
	scale to the whole Internet [RFC2208]. Although Intserv mechanisms	scale to the whole Internet [RFC2208]. Although Intserv mechanisms
	proved impractical, the bandwidth reservation service it aimed to	proved impractical, the bandwidth reservation service it aimed to
	offer is still very much required.	offer is still very much required.

	A recently proposed approach [I-D.ietf-pcn-architecture] combines	A recently proposed approach [I-D.ietf-pcn-architecture] combines
	Diffserv and pre-congestion notification (PCN) to provide a service	Diffserv and pre-congestion notification (PCN) to provide a service

	slightly better than Intserv controlled load [RFC2211]. It scales to	slightly better than Intserv controlled load [RFC2211]. PCN does not
	any size network, but only if domains trust their neighbours to have	require the considerable over-provisioning that is normally required
	checked that upstream customers aren't taking more bandwidth than	for admission control over Diffserv [RFC2998] to be robust against
	they reserved, either accidentally or deliberately. This memo	re-routes or variation in the traffic matrix. It has been proved
	describes border policing measures so that one network can protect	that Diffserv's over-provisioning requirement grows linearly with the
	its interests, even if networks around it are deliberately trying to	network diameter in hops [QoS_scale].
	cheat. The approach provides a sufficient emulation of flow rate
	policing at trust boundaries but without per-flow processing. The
	emulation is not perfect, but it is sufficient to ensure that the
	punishment is at least proportionate to the severity of the cheat.
	Per-flow rate policing for each reservation is still expected to be
	used at the access edge of the internetwork, but at the borders
	between networks bulk policing can be used to emulate per-flow
	policing.


	The aim is to be able to scale controlled load service to any number	A number of PCN domains can be concatenated into a larger PCN region
	of endpoints, even though such scaling must take account of the	without any per-flow processing between them, but only if each domain
	increasing numbers of networks and users who may all have conflicting	trusts the ingress network to have checked that upstream customers
	interests. To achieve such scaling, this memo combines two recent	aren't taking more bandwidth than they reserved, either accidentally
	proposals, both of which it briefly recaps:	or deliberately. Unfortunately, networks can gain considerably by
		breaking this trust. One way for a network to protect itself against
		others is to handle flow signalling at its own border and police
		traffic against reservations itself. However, this reintroduces the
		per-flow unscalability at borders that Intserv over Diffserv suffers
		from.


	o A deployment model for admission control over Diffserv using pre-	This memo describes a protocol called re-PCN that enables bulk border
	congestion notification [I-D.ietf-pcn-architecture] describes how	measurements so that one network can protect its interests, even if
	bulk pre-congestion notification on routers within an edge-to-edge	networks around it are deliberately trying to cheat. The approach
	Diffserv region can emulate the precision of per-flow admission	provides a sufficient emulation of flow rate policing at trust
	control to provide controlled load service without unscalable per-	boundaries but without per-flow processing. Per-flow rate policing
	flow processing;	for each reservation is still expected to be used at the access edge
		of the internetwork, but at the borders between networks bulk
		policing can be used to emulate per-flow policing. The emulation is
		not perfect, but it is sufficient to ensure that the punishment is at
		least proportionate to the severity of the cheat. Re-PCN neither
		requires the unscalable over-provisioning of Diffserv nor the per-
		flow processing at borders of Intserv over Diffserv.


	o Re-ECN: Adding Accountability to TCP/IP [Re-TCP]. The trick that	It should therefore scale controlled load service to the whole
	addresses cheating at borders is to recognise that border policing	internetwork without the cost of Diffserv's linearly increasing over-
	is mainly necessary because cheating upstream networks will admit	provisioning, or the cost of per-flow policing at each border. To
	traffic when they shouldn't only as long as they don't directly	achieve such scaling, this memo combines two recent proposals, both
	experience the downstream congestion their misbehaviour can cause.	of which it briefly recaps:
	The re-ECN protocol requires upstream nodes to declare expected
	downstream congestion in all forwarded packets and it makes it in	o The pre-congestion notification (PCN)
	their interests to declare it honestly. Operators can then	architecture[I-D.ietf-pcn-architecture] describes how bulk pre-
	monitor downstream congestion in bulk at borders to emulate	congestion notification on routers within an edge-to-edge Diffserv
	policing.	region can emulate the precision of per-flow admission control to
		provide controlled load service without unscalable per-flow
		processing;

		o Re-ECN: Adding Accountability to TCP/
		IP [I-D.briscoe-tsvwg-re-ecn-tcp].

		We coin the term re-PCN for the combination of PCN and re-ECN.

		The trick that addresses cheating at borders is to recognise that
		border policing is mainly necessary because cheating upstream
		networks will admit traffic when they shouldn't only as long as they
		don't directly experience the downstream congestion their
		misbehaviour can cause. The re-ECN protocol ensures a network can be
		made to experience the congestion it causes in other networks. Re-
		ECN requires the sending node to declare expected downstream
		congestion in all packets and it makes it in its interest to declare
		this honestly. At the border between upstream network 'A' and
		downstream network 'B' (say), both networks can monitor packets
		crossing the border to measure how much congestion 'A' is causing in
		'B' and beyond. 'B' can then include a limit or penalty based on
		this metric in its contract with 'A'. This is how 'A' experiences
		the effect of congestion it causes in other networks. 'A' no longer
		gains by admitting traffic when it shouldn't, which is why we can say
		re-PCN emulates flow policing, even though it doesn't measure flows.

	The aim is not to enable a network to _identify_ some remote cheating	The aim is not to enable a network to _identify_ some remote cheating
	party, which would rarely be useful given the victim network would be	party, which would rarely be useful given the victim network would be
	unlikely to be able to seek redress from a cheater in some remote	unlikely to be able to seek redress from a cheater in some remote
	part of the world with whom no direct contractual relationship	part of the world with whom no direct contractual relationship
	exists. Rather the aim is to ensure that any gain from cheating will	exists. Rather the aim is to ensure that any gain from cheating will
	be cancelled out by penalties applied to the cheating party by its	be cancelled out by penalties applied to the cheating party by its
	local network. Further, the solution ensures each of the chain of	local network. Further, the solution ensures each of the chain of
	networks between the cheater and the victim will lose out if it	networks between the cheater and the victim will lose out if it
	doesn't apply penalties to its neighbour. Thus the solution builds	doesn't apply penalties to its neighbour. Thus the solution builds
	on the local bilateral contractual relationships that already exist	on the local bilateral contractual relationships that already exist
	between neighbouring networks.	between neighbouring networks.

	Rather than the end-to-end arrangement used when re-ECN was specified	Rather than the end-to-end arrangement used when re-ECN was specified

	for the TCP transport [Re-TCP], this memo specifies re-ECN in an	for the TCP transport [I-D.briscoe-tsvwg-re-ecn-tcp], this memo
	edge-to-edge arrangement, making it applicable to the above	specifies re-ECN in an edge-to-edge arrangement, making it applicable
	deployment model for admission control over Diffserv. Also, rather	to deployment models where admission control over Diffserv is based
	than using a TCP transport for regular congestion feedback, this memo	on pre-congestion notification. Also, rather than using a TCP
	specifies re-ECN using RSVP as the transport for feedback [RSVP-ECN].	transport for regular congestion feedback, this memo specifies re-ECN
	A similar deployment model, but with a different transport for	using RSVP as the transport for feedback [RSVP-ECN]. RSVP is used to
	signalling congestion feedback could be used (e.g. Arumaithurai	be concrete, but a similar deployment model, but with a different
	[I-D.arumaithurai-nsis-pcn] and RMD [I-D.ietf-nsis-rmd] use NSIS).	transport for signalling congestion feedback could be used (e.g.
		Arumaithurai [I-D.arumaithurai-nsis-pcn] and RMD [I-D.ietf-nsis-rmd]
		both use NSIS).


	This memo aims to do two things: i) define how to apply the re-ECN	This memo aims to do two things: i) define how to apply the re-PCN
	protocol to the admission control over Diffserv scenario; and ii)	protocol to the admission control over Diffserv scenario; and ii)

	explain why re-ECN sufficiently emulates border policing in that	explain why re-PCN sufficiently emulates border policing in that
	scenario. Most of the memo is taken up with the second aim;	scenario. Most of the memo is taken up with the second aim;

	explaining why it works. Applying re-ECN to the scenario actually	explaining why it works. Applying re-PCN to the scenario actually
	involves quite a trivial modification to the ingress gateway. That	involves quite a trivial modification to the ingress gateway. That
	modification can be added to gateways later, so our immediate goal is	modification can be added to gateways later, so our immediate goal is
	to convince everyone to have the foresight to define the PCN wire	to convince everyone to have the foresight to define the PCN wire
	protocol encoding to accommodate the extended codepoints defined in	protocol encoding to accommodate the extended codepoints defined in
	this document, whether first deployments require border policing or	this document, whether first deployments require border policing or
	not. Otherwise, when we want to add policing, we will have built	not. Otherwise, when we want to add policing, we will have built
	ourselves a legacy problem. In other words, we aim to convince	ourselves a legacy problem. In other words, we aim to convince
	people to "Design in security from the start."	people to "Design in security from the start."

	The body of this memo is structured as follows:	The body of this memo is structured as follows:

	Section 3 describes the border policing problem. We recap the	Section 3 describes the border policing problem. We recap the
	traditional, unscalable view of how to solve the problem, and we	traditional, unscalable view of how to solve the problem, and we
	recap the admission control solution which has the scalability we	recap the admission control solution which has the scalability we
	do not want to lose when we add border policing;	do not want to lose when we add border policing;


	Section 4 specifies the re-ECN protocol solution in detail;	Section 4 specifies the re-PCN protocol solution in detail;

	Section 5 explains how to use the protocol to emulate border	Section 5 explains how to use the protocol to emulate border
	policing, and why it works;	policing, and why it works;

	Section 6 analyses the security of the proposed solution;	Section 6 analyses the security of the proposed solution;

	Section 8 explains the sometimes subtle rationale behind our	Section 8 explains the sometimes subtle rationale behind our
	design decisions;	design decisions;

	Section 9 comments on the overall robustness of the security	Section 9 comments on the overall robustness of the security

	skipping to change at page 10, line 49	skipping to change at page 12, line 41
	were permitted, the ability of admission control to give assurances	were permitted, the ability of admission control to give assurances
	to other flows will break.	to other flows will break.

	Just as sources need not be trusted to keep within the requested flow	Just as sources need not be trusted to keep within the requested flow
	spec, whole networks might also try to cheat. We will now set up a	spec, whole networks might also try to cheat. We will now set up a
	concrete scenario to illustrate such cheats. Imagine reservations	concrete scenario to illustrate such cheats. Imagine reservations
	for unidirectional flows, through at least two networks, an edge	for unidirectional flows, through at least two networks, an edge
	network and its downstream transit provider. Imagine the edge	network and its downstream transit provider. Imagine the edge
	network charges its retail customers per reservation but also has to	network charges its retail customers per reservation but also has to
	pay its transit provider a charge per reservation. Typically, both	pay its transit provider a charge per reservation. Typically, both

	its selling and buying charges might depend on the duration and rate	the charges for buying from the transit and selling to the retail
	of each reservation. The level of the actual selling and buying	customer might depend on the duration and rate of each reservation.
	prices are irrelevant to our discussion (most likely the network will	The level of the actual selling and buying prices are irrelevant to
	sell at a higher price than it buys, of course).	our discussion (most likely the network will sell at a higher price
		than it buys, of course).

	A cheating ingress network could systematically reduce the size of	A cheating ingress network could systematically reduce the size of
	its retail customers' reservation signalling requests (e.g. the	its retail customers' reservation signalling requests (e.g. the
	SENDER_TSPEC object in RSVP's PATH message) before forwarding them to	SENDER_TSPEC object in RSVP's PATH message) before forwarding them to
	its transit provider and systematically reinstate the responses on	its transit provider and systematically reinstate the responses on
	the way back (e.g. the FLOWSPEC object in RSVP's RESV message). It	the way back (e.g. the FLOWSPEC object in RSVP's RESV message). It
	would then receive an honest income from its upstream retail customer	would then receive an honest income from its upstream retail customer
	but only pay for fraudulently smaller reservations downstream. A	but only pay for fraudulently smaller reservations downstream. A
	similar but opposite trick (increasing the TSPEC and decreasing the	similar but opposite trick (increasing the TSPEC and decreasing the
	FLOWSPEC) could be perpetrated by the receiver's access network if	FLOWSPEC) could be perpetrated by the receiver's access network if
	the reservation was paid for by the receiver.	the reservation was paid for by the receiver.

	Equivalently, a cheating ingress network may feed the traffic from a	Equivalently, a cheating ingress network may feed the traffic from a
	number of flows into an aggregate reservation over the transit that	number of flows into an aggregate reservation over the transit that
	is smaller than the total of all the flows. Because of these fraud	is smaller than the total of all the flows. Because of these fraud
	possibilities, in traditional QoS reservation architectures the	possibilities, in traditional QoS reservation architectures the

	downstream network polices at each border. The policer checks that	downstream network polices traffic at each border. The policer
	the actual sent data rate of each flow is within the signalled	checks that the actual sent data rate of each flow is within the
	reservation.	signalled reservation.

	Reservation signalling could be authenticated end to end, but this	Reservation signalling could be authenticated end to end, but this
	wouldn't prevent the aggregation cheat just described. For this	wouldn't prevent the aggregation cheat just described. For this
	reason, and to avoid the need for a global PKI, signalling integrity	reason, and to avoid the need for a global PKI, signalling integrity
	is typically only protected on a hop-by-hop basis [RFC2747].	is typically only protected on a hop-by-hop basis [RFC2747].

	A variant of the above cheat is where a router in an honest	A variant of the above cheat is where a router in an honest
	downstream network denies admission to a new reservation, but a	downstream network denies admission to a new reservation, but a
	cheating upstream network still admits the flow. For instance, the	cheating upstream network still admits the flow. For instance, the
	networks may be using Diffserv internally, but Intserv admission	networks may be using Diffserv internally, but Intserv admission

	skipping to change at page 12, line 47	skipping to change at page 14, line 45
	<-------- edge-to-edge signalling ------->	<-------- edge-to-edge signalling ------->
	(for admission control)	(for admission control)

	<-------------------end-to-end QoS signalling protocol------------->	<-------------------end-to-end QoS signalling protocol------------->

	Figure 1: Generic Scenario (see text for explanation of terms)	Figure 1: Generic Scenario (see text for explanation of terms)

	An ingress and egress gateway (Ingr G/W and Egr G/W in Figure 1)	An ingress and egress gateway (Ingr G/W and Egr G/W in Figure 1)
	connect the interior Diffserv region to the edge access networks	connect the interior Diffserv region to the edge access networks
	where routers (not shown) use per-flow reservation processing.	where routers (not shown) use per-flow reservation processing.

	Within the Diffserv region are three interior domains, A, B and C, as	Within the Diffserv region are three interior domains, 'A', 'B' and
	well as the inward facing interfaces of the ingress and egress	'C', as well as the inward facing interfaces of the ingress and
	gateways. An ingress and egress border router (BR) is shown	egress gateways. An ingress and egress border router (BR) is shown
	interconnecting each interior domain with the next. There may be	interconnecting each interior domain with the next. There will
	other interior routers (not shown) within each interior domain.	typically be other interior routers (not shown) within each interior
		domain.

	In two paragraphs we now briefly recap how pre-congestion	In two paragraphs we now briefly recap how pre-congestion
	notification is intended to be used to control flow admission to a	notification is intended to be used to control flow admission to a
	large Diffserv region. The first paragraph describes data plane	large Diffserv region. The first paragraph describes data plane
	functions and the second describes signalling in the control plane.	functions and the second describes signalling in the control plane.
	We omit many details from [I-D.ietf-pcn-architecture] including	We omit many details from [I-D.ietf-pcn-architecture] including
	behaviour during routing changes. For brevity here we assume other	behaviour during routing changes. For brevity here we assume other
	flows are already in progress across a path through the Diffserv	flows are already in progress across a path through the Diffserv
	region before a new one arrives, but how bootstrap works is described	region before a new one arrives, but how bootstrap works is described
	in Section 4.3.2.	in Section 4.3.2.

	Figure 1 shows a single simplex reserved flow from the sending (Sx)	Figure 1 shows a single simplex reserved flow from the sending (Sx)
	end host to the receiving (Rx) end host. The ingress gateway polices	end host to the receiving (Rx) end host. The ingress gateway polices

	incoming traffic within its admitted reservation and remarks it to	incoming traffic and colours conforming traffic within an admitted
	turn on an ECN-capable codepoint [RFC3168] and the controlled load	reservation to a combination of Diffserv codepoint and ECN field that
	(CL) Diffserv codepoint. Together, these codepoints define which	defines the traffic as 'PCN-enabled'. This redefines the meaning of
	traffic is entitled to the enhanced scheduling of the CL behaviour	the ECN field as a PCN field, which is largely the same as ECN
	aggregate on routers within the Diffserv region. The CL PHB of	[RFC3168], but with slightly different semantics defined in
	interior routers consists of a scheduling behaviour and a new ECN	[I-D.moncaster-pcn-baseline-encoding] (or various extensions that are
	marking behaviour that we call `pre-congestion notification' [PCN].	currently experimental). The Diffserv region is called a PCN-region
	The CL PHB simply re-uses the definition of expedited forwarding	because all the queues within it are PCN-enabled. This means the
	(EF) [RFC3246] for its scheduling behaviour. But it incorporates a	per-hop behaviour they apply to PCN-enabled traffic consists of both
	new ECN marking behaviour, which sets the ECN field of an increasing	a scheduling behaviour and a new ECN marking behaviour that we call
	number of CL packets to the admission marked (AM) codepoint as they	`pre-congestion notification' [I-D.eardley-pcn-marking-behaviour]. A
	approach a threshold rate that is lower than the line rate. The use	PCN-enabled queue typically re-uses the definition of expedited
	of virtual queues ensures real queues have hardly built up any	forwarding (EF) [RFC3246] for its scheduling behaviour. The new
	congestion delay. The level of marking detected at the egress of the	congestion marking behaviour sets the PCN field of an increasing
	Diffserv region is then used by the signalling system in order to	proportion of PCN packets to the PCN-marked (PM) codepoint
	determine admission control as follows.	[I-D.moncaster-pcn-baseline-encoding] as their load approaches a
		threshold rate that is lower than the line rate
		[I-D.eardley-pcn-marking-behaviour]. This can be achieved with an
		algorithm similar to a token-bucket called a virtual queue. The aim
		is for a queue to start marking PCN traffic to trigger admission
		control before the real queue builds up any congestion delay. The
		level of a queue's pre-congestion marking is detected at the egress
		of the Diffserv region and used by the signalling system to control
		admission of further traffic that would otherwise overload that
		queue, as follows.


	The end-to-end QoS signalling (e.g. RSVP) for a new reservation	The end-to-end QoS signalling for a new reservation (to be concrete
	takes one giant hop from ingress to egress gateway, because interior	we will use RSVP) takes one giant hop from ingress to egress gateway,
	routers within the Diffserv region are configured to ignore RSVP.	because interior routers within the Diffserv region are configured to
	The egress gateway holds flow state because it takes part in the end-	ignore RSVP. The egress gateway holds flow state because it takes
	to-end reservation. So it can classify all packets by flow and it	part in the end-to-end reservation. So it can classify all packets
	can identify all flows that have the same previous RSVP hop (a CL-	by flow and it can identify all flows that have the same previous
	region-aggregate). For each CL-region-aggregate of flows in	RSVP hop (an ingress-egress-aggregate). For each ingress-egress-
	progress, the egress gateway maintains a per-packet moving average of	aggregate of flows in progress, the egress gateway maintains a per-
	the fraction of pre-congestion-marked traffic. Once an RSVP PATH	packet moving average of the fraction of pre-congestion-marked
	message for a new reservation has hopped across the Diffserv region	traffic. Once an RSVP PATH message for a new reservation has hopped
	and reached the destination, an RSVP RESV message is returned. As	across the Diffserv region and reached the destination, an RSVP RESV
	the RESV message passes, the egress gateway piggy-backs the relevant	message is returned. As the RESV message passes, the egress gateway
	pre-congestion level onto it [RSVP-ECN]. Again, interior routers	piggy-backs the relevant pre-congestion level onto it [RSVP-ECN].
	ignore the RSVP message, but the ingress gateway strips off the pre-	Again, interior routers ignore the RSVP message, but the ingress
	congestion level. If the pre-congestion level is above a threshold,	gateway strips off the pre-congestion level. If the pre-congestion
	the ingress gateway denies admission to the new reservation,	level is above a threshold, the ingress gateway denies admission to
	otherwise it returns the original RESV signal back towards the data	the new reservation, otherwise it returns the original RESV signal
	sender.	back towards the data sender.

	Once a reservation is admitted, its traffic will always receive low	Once a reservation is admitted, its traffic will always receive low
	delay service for the duration of the reservation. This is because	delay service for the duration of the reservation. This is because
	ingress gateways ensure that traffic not under a reservation cannot	ingress gateways ensure that traffic not under a reservation cannot

	pass into the Diffserv region with the CL DSCP set. So non-reserved	pass into the PCN-region with a Diffserv codepoint that gives it
	traffic will always be treated with a lower priority PHB at each	priority over the capacity used for PCN traffic.
	interior router. And even if some disaster re-routes traffic after
	it has been admitted, if the traffic through any resource tips over a	Even if some disaster re-routes traffic after it has been admitted,
	fail-safe threshold, pre-congestion notification will trigger flow	if the PCN traffic through any PCN resource tips over a higher, fail-
	pre-emption to very quickly bring every router within the whole	safe threshold, pre-congestion notification can trigger flow
	Diffserv region back below its operating point.	termination to very quickly bring every router within the whole PCN-
		region back below its operating point. The same marking process and
		ECN codepoint can be used for both admission control and flow
		termination, by simply triggering them at different fractions of
		marking [I-D.charny-pcn-single-marking]. However simulations have
		confirmed that this approach is not robust in all circumstances that
		might typically be encountered, so approaches with two thresholds and
		two congestion encodings are expected to be required in production
		networks.

	The whole admission control system just described deliberately	The whole admission control system just described deliberately
	confines per-flow processing to the access edges of the network,	confines per-flow processing to the access edges of the network,
	where it will not limit the system's scalability. But ideally we	where it will not limit the system's scalability. But ideally we
	want to extend this approach to multiple networks, to take even more	want to extend this approach to multiple networks, to take even more
	advantage of its scaling potential. We would still need per-flow	advantage of its scaling potential. We would still need per-flow
	processing at the access edges of each network, but not at the high	processing at the access edges of each network, but not at the high
	speed interfaces where they interconnect. Even though such an	speed interfaces where they interconnect. Even though such an
	admission control system would work technically, it would gain us no	admission control system would work technically, it would gain us no
	scaling advantage if each network also wanted to police the rate of	scaling advantage if each network also wanted to police the rate of
	each admitted flow for itself--border routers would still have to do	each admitted flow for itself--border routers would still have to do
	complex packet operations per-flow anyway, given they don't trust	complex packet operations per-flow anyway, given they don't trust
	upstream networks to do their policing for them.	upstream networks to do their policing for them.

	This memo describes how to emulate per-flow rate policing using bulk	This memo describes how to emulate per-flow rate policing using bulk

	mechanisms at border routers, so the full scalability potential of	mechanisms at border routers. Otherwise the full scalability
	pre-congestion notification is not limited by the need for per-flow	potential of pre-congestion notification would be limited by the need
	policing mechanisms at borders, which would make borders the most	for per-flow policing mechanisms at borders, which would make borders
	cost-critical pinch-points. Then we can achieve the long sought-for	the most cost-critical pinch-points. Instead we can achieve the long
	vision of secure Internet-wide bandwidth reservations without needing	sought-for vision of secure Internet-wide bandwidth reservations
	per-flow processing at all in core and border routers--where	without over-generous provisioning or per-flow processing. We still
	scalability is most critical.	use per-flow processing at the edge routers closest to the end-user,
		but we need no per-flow processing at all in core _or border
		routers_--where scalability is most critical.


	4. Re-ECN Protocol for an RSVP (or similar) Transport	4. Re-ECN Protocol in IP with Two Congestion Marking Levels

	4.1. Protocol Overview	4.1. Protocol Overview


	First we need to recap the way routers accumulate congestion marking	First we need to recap the way routers accumulate PCN congestion
	along a path. Each ECN-capable router marks some packets with CE,	marking along a path (it accumulates the same way as ECN). Each PCN-
	the marking probability increasing with the length of the queue at	capable queue into a link might mark some packets with a PCN-marked
	its egress link. The only difference with pre-congestion	(PM) codepoint, the marking probability increasing with the length of
	marking [PCN] is that marking is based on the length of a virtual	the queue [I-D.eardley-pcn-marking-behaviour]. With a series of PCN-
	queue, so that the real queue occupancy can remain very low. We will	capable routers on a path, a stream of packets accumulates the
	use the terms congestion and pre-congestion interchangeably in the	fraction of PCN markings that each queue adds. The combined effect
	following unless it is important to distinguish between them.	of the packet marking of all the queues along the path signals
		congestion of the whole path to the receiver. So, for example, if
		one queue early in a path is marking 1% of packets and another later
		in a path is marking 2%, flows that pass through both queues will
		experience approximately 3% marking over a sequence of packets.


	With multiple ECN-capable routers on a path, the ECN field	(Note: Whenever the word 'congestion' is used in this document it
	accumulates the fraction of CE marking that each router adds. The	should be taken to mean congestion of the virtual resource assigned
	combined effect of the packet marking of all the routers along the	for use by PCN-traffic. This avoids cumbersome repetition of the
	path signals congestion of the whole path to the receiver. So, for	strictly correct term 'pre-congestion'.)
	example, if one router early in a path is marking 1% of packets and
	another later in a path is marking 2%, flows that pass through both
	routers will experience approximately 3% marking.


	The packets crossing an inter-domain trust boundary within the	The packets crossing an inter-domain trust boundary within the PCN-
	Diffserv region will all have come from different ingress gateways	region will all have come from different ingress gateways and will
	and will all be destined for different egress gateways. We will show	all be destined for different egress gateways. We will show that the
	that the key to policing against theft of service is for a border	key to policing against theft of service is for a border router to be
	router to be able to directly measure the congestion that is about to	able to directly measure the congestion that is about to be caused by
	be caused by the traffic it forwards. That is, it can measure	the packets it forwards into any of the downstream paths between
	locally the congestion on each of the downstream paths between itself	itself and the egress gateways that each packet is destined for. The
	and the egress gateways that its traffic is destined for.	purpose of the re-PCN protocol is to make packets automatically carry
		this information, which then merely needs to be counted locally at
		the border.


	With the original ECN protocol, if CE markings crossing the border	With the original PCN protocol, if a border router, e.g. that between
	had been counted over a period, they would have represented the	domains 'A' & 'B' Figure 2), counts PCN markings crossing the border
	accumulated upstream congestion that had already been experienced by	over a period, they represent the accumulated congestion that has
	those packets. The general idea of re-ECN is for the ingress gateway	already been experienced by those packets (congestion upstream of the
	to continuously encode path congestion into the IP header where, in	border, u). The idea of re-PCN is to make the ingress gateway
	this case, `path' means from ingress to egress gateway. Then at any	continuously encode the path congestion it knows into a new field in
	point on that path (e.g. between domains A & B in Figure 2 below), IP	the IP header (in this case, `path' means the path from the ingress
	headers can be monitored to subtract upstream congestion from	to the egress gateway). This new field is _not_ altered by queues
	expected path congestion in order to give the expected downstream	along the path. Then at any point on that path (e.g. between domains
	congestion still to be experienced until the egress gateway.	'A' & 'B'), IP headers can be monitored to measure both expected path
		congestion, p and upstream congestion, u. Then congestion expected
		downstream of the border, v, can be derived simply by subtracting
		upstream congestion from expected path congestion. That is v ~= p -
		u.

	Importantly, it turns out that there is no need to monitor downstream	Importantly, it turns out that there is no need to monitor downstream

	congestion on a per-flow basis. We will show that accounting for it	congestion on a per-flow, per-path or per-aggregate basis. We will
	in bulk across all flows will be sufficient.	show that accounting for it in bulk by counting the volume of all
		marked packet will be sufficient.

	_____________________________________	_____________________________________
	_\|__ ______ ______ ______ _\|__	_\|__ ______ ______ ______ _\|__
	\| \| \| A \| \| B \| \| C \| \| \|	\| \| \| A \| \| B \| \| C \| \| \|
	+----+ +-+ +-+ +-+ +-+ +-+ +-+ +----+	+----+ +-+ +-+ +-+ +-+ +-+ +-+ +----+
	\| \| \|B\| \|B\| \|B\| \|B\| \|B\| \|B\| \| \|	\| \| \|B\| \|B\| \|B\| \|B\| \|B\| \|B\| \| \|
	\|Ingr\|==\|R\| \|R\|==\|R\| \|R\|==\|R\| \|R\|==\|Egr \|	\|Ingr\|==\|R\| \|R\|==\|R\| \|R\|==\|R\| \|R\|==\|Egr \|
	\|G/W \| \| \| \| \|: \| \| \| \| \| \| \| \| \|G/W \|	\|G/W \| \| \| \| \|: \| \| \| \| \| \| \| \| \|G/W \|
	+----+ +-+ +-+: +-+ +-+ +-+ +-+ +----+	+----+ +-+ +-+: +-+ +-+ +-+ +-+ +----+
	\| \| \| \|: \| \| \| \| \| \|	\| \| \| \|: \| \| \| \| \| \|

	skipping to change at page 16, line 26	skipping to change at page 18, line 34
	:	:
	\| : \|	\| : \|
	\|<-upstream-->:<-expected downstream->\|	\|<-upstream-->:<-expected downstream->\|
	\| congestion : congestion \|	\| congestion : congestion \|
	\| u v ~= p - u \|	\| u v ~= p - u \|
	\| \|	\| \|
	\|<--- expected path congestion, p --->\|	\|<--- expected path congestion, p --->\|

	Figure 2: Re-ECN concept	Figure 2: Re-ECN concept


	4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6)	4.2. Re-PCN Abstracted Network Layer Wire Protocol (IPv4 or v6)

	In this section we define the names of the various codepoints of the	In this section we define the names of the various codepoints of the

	re-ECN protocol when used with pre-congestion notification, deferring	extended ECN field when used with pre-congestion notification,
	description of their semantics to the following sections. But first	deferring description of their semantics to the following sections.
	we recap the re-ECN wire protocol proposed in [Re-TCP].	But first we recap the re-ECN wire protocol proposed in
		[I-D.briscoe-tsvwg-re-ecn-tcp].

	4.2.1. Re-ECN Recap	4.2.1. Re-ECN Recap

	Re-ECN uses the two bit ECN field broadly as in RFC3168 [RFC3168].	Re-ECN uses the two bit ECN field broadly as in RFC3168 [RFC3168].
	It also uses a new re-ECN extension (RE) flag. The actual position	It also uses a new re-ECN extension (RE) flag. The actual position
	of the RE flag is different between IPv4 & v6 headers so we will use	of the RE flag is different between IPv4 & v6 headers so we will use
	an abstraction of the IPv4 and v6 wire protocols by just calling it	an abstraction of the IPv4 and v6 wire protocols by just calling it

	the RE flag. [Re-TCP] proposes using bit 48 (currently unused) in	the RE flag. [I-D.briscoe-tsvwg-re-ecn-tcp] proposes using bit 48
	the IPv4 header for the RE flag, while for IPv6 it proposes an ECN	(currently unused) in the IPv4 header for the RE flag, while for IPv6
	extension header.	it proposes an congestion extension header.

	Unlike the ECN field, the RE flag is intended to be set by the sender	Unlike the ECN field, the RE flag is intended to be set by the sender
	and remain unchanged along the path, although it can be read by	and remain unchanged along the path, although it can be read by
	network elements that understand the re-ECN protocol. In the	network elements that understand the re-ECN protocol. In the

	scenario used in this memo, the ingress gateway acts as a proxy for	scenario used in this memo, the ingress gateway is the 'sender' as
	the sender, setting the RE flag as permitted in the specification of	far as the scope of the PCN region is concerned, so it sets the RE
	re-ECN.	flag (as permitted for sender proxies in the specification of re-
		ECN).

	Note that general-purpose routers do not have to read the RE flag,	Note that general-purpose routers do not have to read the RE flag,
	only special policing elements at borders do. And no general-purpose	only special policing elements at borders do. And no general-purpose
	routers have to change the RE flag, although the ingress and egress	routers have to change the RE flag, although the ingress and egress
	gateways do because in the edge-to-edge deployment model we are	gateways do because in the edge-to-edge deployment model we are

	using, they act as proxies for the endpoints. Therefore the RE flag	using, they act as the endpoints of the PCN region. Therefore the RE
	does not even have to be visible to interior routers. So the RE flag	flag does not even have to be visible to interior routers. So the RE
	has no implications on protocols like MPLS. Congested label	flag has no implications on protocols like MPLS. Congested label
	switching routers (LSRs) would have to be able to notify their	switching routers (LSRs) would have to be able to notify their
	congestion with an ECN/PCN codepoint in the MPLS shim [RFC5129], but	congestion with an ECN/PCN codepoint in the MPLS shim [RFC5129], but
	like any interior IP router, they can be oblivious to the RE flag,	like any interior IP router, they can be oblivious to the RE flag,
	which need only be read by border policing functions.	which need only be read by border policing functions.


	Although the RE flag is a separate, single bit field, it can be read	Although the RE flag is a separate single bit field, it can be read
	as an extension to the two-bit ECN field; the three concatenated bits	as an extension to the two-bit ECN field; the three concatenated bits
	in what we will call the extended ECN field (EECN) make eight	in what we will call the extended ECN field (EECN) make eight
	codepoints available. When the RE flag setting is "don't care", we	codepoints available. When the RE flag setting is "don't care", we

	use the RFC3168 names of the ECN codepoints, but [Re-TCP] proposes	use the RFC3168 names of the ECN codepoints, but
	the following six codepoint names for when there is a need to be more	[I-D.briscoe-tsvwg-re-ecn-tcp] proposes the following six codepoint
	specific.	names for when there is a need to be more specific.

	+--------+-------------+-------+-------------+----------------------+	+--------+-------------+-------+-------------+----------------------+
	\| ECN \| RFC3168 \| RE \| Extended \| Re-ECN meaning \|	\| ECN \| RFC3168 \| RE \| Extended \| Re-ECN meaning \|
	\| field \| codepoint \| flag \| ECN \| \|	\| field \| codepoint \| flag \| ECN \| \|
	\| \| \| \| codepoint \| \|	\| \| \| \| codepoint \| \|
	+--------+-------------+-------+-------------+----------------------+	+--------+-------------+-------+-------------+----------------------+
	\| 00 \| Not-ECT \| 0 \| Not-RECT \| Not re-ECN-capable \|	\| 00 \| Not-ECT \| 0 \| Not-RECT \| Not re-ECN-capable \|
	\| \| \| \| \| transport \|	\| \| \| \| \| transport \|
	\| 00 \| Not-ECT \| 1 \| FNE \| Feedback not \|	\| 00 \| Not-ECT \| 1 \| FNE \| Feedback not \|
	\| \| \| \| \| established \|	\| \| \| \| \| established \|

	\| 01 \| ECT(1) \| 0 \| Re-Echo \| Re-echoed congestion \|
	\| \| \| \| \| and RECT \|
	\| 01 \| ECT(1) \| 1 \| RECT \| Re-ECN capable \|
	\| \| \| \| \| transport \|
	\| 10 \| ECT(0) \| 0 \| --- \| Legacy ECN use \|	\| 10 \| ECT(0) \| 0 \| --- \| Legacy ECN use \|
	\| \| \| \| \| only \|	\| \| \| \| \| only \|
	\| 10 \| ECT(0) \| 1 \| --CU-- \| Currently unused \|	\| 10 \| ECT(0) \| 1 \| --CU-- \| Currently unused \|
	\| \| \| \| \| \|	\| \| \| \| \| \|

		\| 01 \| ECT(1) \| 0 \| Re-Echo \| Re-echoed congestion \|
		\| \| \| \| \| and RECT \|
		\| 01 \| ECT(1) \| 1 \| RECT \| Re-ECN capable \|
		\| \| \| \| \| transport \|
	\| 11 \| CE \| 0 \| CE(0) \| Congestion \|	\| 11 \| CE \| 0 \| CE(0) \| Congestion \|
	\| \| \| \| \| experienced with \|	\| \| \| \| \| experienced with \|
	\| \| \| \| \| Re-Echo \|	\| \| \| \| \| Re-Echo \|
	\| 11 \| CE \| 1 \| CE(-1) \| Congestion \|	\| 11 \| CE \| 1 \| CE(-1) \| Congestion \|
	\| \| \| \| \| experienced \|	\| \| \| \| \| experienced \|
	+--------+-------------+-------+-------------+----------------------+	+--------+-------------+-------+-------------+----------------------+

	Table 1: Re-cap of Default Extended ECN Codepoints Proposed for Re-	Table 1: Re-cap of Default Extended ECN Codepoints Proposed for Re-
	ECN	ECN

	4.2.2. Re-ECN Combined with Pre-Congestion Notification (re-PCN)	4.2.2. Re-ECN Combined with Pre-Congestion Notification (re-PCN)


	As permitted by the ECN specification [RFC3168], a proposal is	As permitted by the ECN specification [RFC3168] and by the guidelines
	currently being advanced in the IETF to define different semantics	for specifying alternative semantics for the ECN field [RFC4774], a
	for how routers might mark the ECN field of certain packets. The	proposal is currently being advanced in the IETF to define different
	idea is to be able to notify congestion when the router's load	semantics for how queues might mark the ECN field of certain packets.
		The idea is to be able to notify congestion when the queue's load
	approaches a logical limit, rather than the physical limit of the	approaches a logical limit, rather than the physical limit of the

	line. This new marking is called pre-congestion notification [PCN]	line. This new marking is called pre-congestion
	and we will use the term PCN-enabled router for a router that can	notification [I-D.eardley-pcn-marking-behaviour] and we will use the
	apply pre-congestion notification marking to the ECN fields of	term PCN-enabled queue for a queue that can apply pre-congestion
	packets.	notification marking to the ECN fields of packets.

	[RFC3168] recommends that a packet's Diffserv codepoint should	[RFC3168] recommends that a packet's Diffserv codepoint should

	determine which type of ECN marking it receives. A Diffserv per-hop	determine which type of ECN marking it receives. A PCN-capable
	behaviour (PHB) can specify that routers should apply pre-congestion	packet must meet two conditions; it must carry a DSCP that has been
	notification marking to PCN-capable packets. We will call this a	associated with PCN marking and it must carry an ECN field that turns
	PCN-enhanced PHB. A PCN-capable packet must meet two conditions, it	on PCN marking.
	must carry a DSCP that maps to a PCN-enhanced PHB and it must carry
	an ECN field that turns on PCN marking.


	As an example, the controlled load (CL) PHB might specify expedited	As an example, a packet carrying the VOICE-ADMIT
	forwarding as its scheduling behaviour and PCN marking as its	[I-D.ietf-tsvwg-admitted-realtime-dscp] DSCP would be associated with
	congestion marking behaviour. Then we would say the CL PHB is a PCN-	expedited forwarding [RFC3246] as its scheduling behaviour and pre-
	enhanced PHB, and that packets with a DSCP that maps to the CL PHB	congestion notification as its congestion marking behaviour. PCN
	and with ECN turned on are PCN-capable packets.	would only be turned on within a PCN-region by an ECN codepoint other
		than Not-ECT (00). Then we would describe packets with the VOICE-
		ADMIT DSCP and with ECN turned on as PCN-capable packets.


	[PCN] actually proposes that two logical limits should be used for	[I-D.eardley-pcn-marking-behaviour] actually proposes that two
	pre-congestion notification, with the higher limit as a back-stop for	logical limits can be used for pre-congestion notification, with the
	dealing with anomalous events. It envisages PCN will be used to	higher limit as a back-stop for dealing with anomalous events. It
	admission control inelastic real-time traffic, so marking at the	envisages PCN will be used to admission control inelastic real-time
	lower limit will trigger admission control, while at the higher limit	traffic, so marking at the lower limit will trigger admission
	it will trigger flow pre-emption.	control, while at the higher limit it will trigger flow termination.


	Because it needs two types of congestion marking, PCN seems to need	Because it needs two types of congestion marking, PCN needs four
	five states: Not-ECT, ECT (ECN-capable transport), the ECN Nonce,	states: Not PCN-capable (Not-PCN), PCN-capable but not PCN-marked
	Admission Marking (AM) and Flow Pre-emption Marking (PM). [PCN]	(NM), Admission Marked (AM) and Flow Termination Marked (TM). A
	proposes various alternative encodings of the ECN field, attempting	proposed encoding of the four required PCN states is shown on the
	various compromises to fit these five states into the four available	left of Table 2. Note that these codepoints of the ECN field only
	ECN codepoints.	take on the semantics of pre-congestion notification if they are
		combined with a Diffserv codepoint that the operator has configured
		to be associated with PCN marking.


	One of the five states to make room for is the ECN Nonce [RFC3540],	This encoding only correctly traverses an IP in IP tunnel if the
	but the capability we describe in this memo supersedes any need for	ideal decapsulation rules in [I-D.briscoe-tsvwg-ecn-tunnel] are
	the Nonce. The ECN Nonce is an elegant scheme, but it only allows a	followed when combining the ECN fields of the outer and inner
	sending node (or its proxy) to detect suppression of congestion	headers. If instead the decapsulation rules in [RFC3168] or
	marking in the feedback loop. Thus the Nonce requires the sender or	[RFC4301] are followed, any admission marking applied to an outer
	its proxy to be trusted to respond correctly to congestion. But this	header will be incorrectly removed on decapsulation at the tunnel
	is precisely the main cheat we want to protect against (as well as	egress.
	many others).


	One of the compromise protocol encodings that [PCN] explores	The RFC3168 ECN field includes space for the experimental ECN
	("Alternative 5") leaves out support for the ECN Nonce. Therefore we	Nonce [RFC3540], which seems to require a fifth state if it is also
	use that one. This encoding of PCN markings is shown on the left of	needed with re-PCN. But re-PCN supersedes any need for the Nonce
	Table 2. Note that these codepoints of the ECN field only take on	within the PCN-region. The ECN Nonce is an elegant scheme, but it
	the semantics of pre-congestion notification if they are combined	only allows a sending node (or its proxy) to detect suppression of
	with a Diffserv codepoint that the operator has configured to cause	congestion marking in the feedback loop. Thus the Nonce requires the
	PCN marking, by mapping it to a PCN-enhanced PHB.	sender (or in our case the PCN ingress) to be trusted to respond
		correctly to congestion. But this is precisely the main cheat we
		want to protect against (as well as many others). Also, the ECN
		nonce only works once the receiver has placed packets in the same
		order as they left the ingress, which cannot be done by an edge node
		without adding unnecessary edge-edge packet ordering. Nonetheless,
		if the ECN nonce were in use outside the PCN region (end-to-end), the
		ingress would have to tunnel the arriving IP header across the PCN
		region ([I-D.ietf-pcn-architecture]).


	For the rest of this memo, we will not distinguish between Admission	For the rest of this memo, to mean either Admission Marking or
	Marking and Pre-emption Marking unless we need to be specific. We	Termination Marking we will call both "congestion marking" or "PCN
	will call both "congestion marking". With the above encoding,	marking" unless we need to be specific. With the above encoding,
	congestion marking can be read to mean any packet with the left-most	congestion marking can be read to mean any packet with the right-most
	bit of the ECN field set.	bit of the ECN field set.

	The re-ECN protocol can be used to control misbehaving sources	The re-ECN protocol can be used to control misbehaving sources
	whether congestion is with respect to a logical threshold (PCN) or	whether congestion is with respect to a logical threshold (PCN) or
	the physical line rate (ECN). In either case the RE flag can be used	the physical line rate (ECN). In either case the RE flag can be used
	to create an extended ECN field. For PCN-capable packets, the 8	to create an extended ECN field. For PCN-capable packets, the 8

	possible encodings of this 3-bit extended ECN (EECN) field are	possible encodings of this 3-bit extended PCN (EPCN) field are
	defined on the right of Table 2 below. The purposes of these	defined on the right of Table 2 below. The purposes of these
	different codepoints will be introduced in subsequent sections.	different codepoints will be introduced in subsequent sections.


	+-------+-----------------+------+--------------+-------------------+	+--------+-----------+-------+-----------------+--------------------+
	\| ECN \| PCN codepoint \| RE \| Extended ECN \| Re-ECN meaning \|	\| ECN \| PCN \| RE \| Extended PCN \| Re-PCN meaning \|
	\| field \| (Alternative 5) \| flag \| codepoint \| \|	\| field \| codepoint \| flag \| codepoint \| \|
	+-------+-----------------+------+--------------+-------------------+	+--------+-----------+-------+-----------------+--------------------+
	\| 00 \| Not-ECT \| 0 \| Not-RECT \| Not \|	\| 00 \| Not-PCN \| 0 \| Not-PCN \| Not PCN-capable \|
	\| \| \| \| \| re-ECN-capable \|
	\| \| \| \| \| transport \|	\| \| \| \| \| transport \|

	\| 00 \| Not-ECT \| 1 \| FNE \| Feedback not \|	\| 00 \| Not-PCN \| 1 \| FNE \| Feedback not \|
	\| \| \| \| \| established \|	\| \| \| \| \| established \|

	\| 01 \| ECT(1) \| 0 \| Re-Echo \| Re-echoed \|	\| 10 \| NM \| 0 \| Re-PCT-Echo \| Re-echoed \|
	\| \| \| \| \| congestion and \|	\| \| \| \| \| congestion and \|

	\| \| \| \| \| RECT \|	\| \| \| \| \| Re-PCT \|
	\| 01 \| ECT(1) \| 1 \| RECT \| Re-ECN capable \|	\| 10 \| NM \| 1 \| Re-PCT \| Re-PCN capable \|
	\| \| \| \| \| transport \|	\| \| \| \| \| transport \|

	\| 10 \| AM \| 0 \| AM(0) \| Admission Marking \|	\| 01 \| AM \| 0 \| AM(0) \| Admission Marking \|
	\| \| \| \| \| with Re-Echo \|	\| \| \| \| \| with Re-Echo \|

	\| 10 \| AM \| 1 \| AM(-1) \| Admission Marking \|	\| 01 \| AM \| 1 \| AM(-1) \| Admission Marking \|
	\| \| \| \| \| \|	\| \| \| \| \| \|

	\| 11 \| PM \| 0 \| PM(0) \| Pre-emption \|	\| 11 \| TM \| 0 \| TM(0) \| Termination \|
	\| \| \| \| \| Marking with \|	\| \| \| \| \| Marking with \|
	\| \| \| \| \| Re-Echo \|	\| \| \| \| \| Re-Echo \|

	\| 11 \| PM \| 1 \| PM(-1) \| Pre-emption \|	\| 11 \| TM \| 1 \| TM(-1) \| Termination \|
	\| \| \| \| \| Marking \|	\| \| \| \| \| Marking \|

	+-------+-----------------+------+--------------+-------------------+	+--------+-----------+-------+-----------------+--------------------+

	Table 2: Extended ECN Codepoints if the Diffserv codepoint uses Pre-	Table 2: Extended ECN Codepoints if the Diffserv codepoint uses Pre-
	congestion Notification (PCN)	congestion Notification (PCN)


		Note that Table 2 shows re-PCN uses ECT(0) but Table 1 shows re-ECN
		uses ECT(1) for the unmarked state. The difference is intended--
		although it makes it harder to remember the two schemes, it makes
		them both safer during incremental deployment.

	4.3. Protocol Operation	4.3. Protocol Operation

	4.3.1. Protocol Operation for an Established Flow	4.3.1. Protocol Operation for an Established Flow


	The re-ECN protocol involves a simple tweak to the action of the	The re-PCN protocol involves a simple addition to the action of the
	gateway at the ingress edge of the CL region. In the deployment	gateway at the ingress edge of the PCN region (the PCN-ingress-node).
	model just described [I-D.ietf-pcn-architecture], for each active	But first we will recap how PCN works without the addition. For each
	traffic aggregate across the CL region (CL-region-aggregate) the	active traffic aggregate across a PCN region (ingress-egress-
	ingress gateway will hold a fairly recent Congestion-Level-Estimate	aggregate) the egress gateway measures the level of PCN marking and
	that the egress gateway will have fed back to it, piggybacked on the	feeds it back to the ingress piggy-backed as 'PCN-feedback-
	signalling that sets up each flow. For instance, one aggregate might	information' on any control signal passing between the nodes (e.g.
	have been experiencing 3% pre-congestion (that is, congestion marked	every flow set-up, refresh or tear-down). Therefore the ingress
	octets whether Admission Marked or Pre-emption Marked). In this	gateway will always hold a fairly recent (typically at most 30sec)
	case, the ingress gateway MUST clear the RE flag to "0" for the same	estimate of the ingress-egress-aggregate congestion level. For
	percentage of octets of CL-packets (3%) and set it to "1" in the rest	instance, one aggregate might have been experiencing 3% pre-
		congestion (that is, congestion marked octets whether Admission
		Marked or Termination Marked).

		To comply with the re-PCN protocol, for all PCN packets in each
		ingress-egress-aggregate the ingress gateway MUST clear the RE flag
		to "0" for the same percentage of octets as its current estimate of
		congestion on the aggregate (e.g. 3%) and set it to "1" in the rest
	(97%). Appendix A.1 gives a simple pseudo-code algorithm that the	(97%). Appendix A.1 gives a simple pseudo-code algorithm that the
	ingress gateway may use to do this.	ingress gateway may use to do this.

	The RE flag is set and cleared this way round for incremental	The RE flag is set and cleared this way round for incremental

	deployment reasons (see [Re-TCP]). To avoid confusion we will use	deployment reasons (see Section 7). To avoid confusion we will use
	the term `blanking' (rather than marking) when the RE flag is cleared	the term `blanking' (rather than marking) when the RE flag is cleared
	to "0", so we will talk of the `RE blanking fraction' as the fraction	to "0", so we will talk of the `RE blanking fraction' as the fraction
	of octets with the RE flag cleared to "0".	of octets with the RE flag cleared to "0".

	^	^
	\|	\|
	\| RE blanking fraction	\| RE blanking fraction
	3% \| +----------------------------+====+	3% \| +----------------------------+====+
	\| \| \| \|	\| \| \| \|
	2% \| \| \| \|	2% \| \| \| \|

	skipping to change at page 20, line 51	skipping to change at page 24, line 6
	\| ^ ^ \|	\| ^ ^ \|
	ingress \| \| egress	ingress \| \| egress
	1.00% 2.00% marking fraction	1.00% 2.00% marking fraction

	Figure 3: Example Extended ECN codepoint Marking fractions	Figure 3: Example Extended ECN codepoint Marking fractions
	(Imprecise)	(Imprecise)

	Figure 3 illustrates our example. The horizontal axis represents the	Figure 3 illustrates our example. The horizontal axis represents the
	index of each congestible resource (typically queues) along a path	index of each congestible resource (typically queues) along a path
	through the Internet. The two superimposed plots show the fraction	through the Internet. The two superimposed plots show the fraction

	of each ECN codepoint observed along this path, assuming there are	of each extended PCN codepoint observed along this path, assuming
	two congested routers somewhere within domains A and C. And Table 3	there are two congested routers somewhere within domains A and C. And
	below shows the downstream pre-congestion measured at various border	Table 3 below shows the downstream pre-congestion measured at various
	observation points along the path. Figure 4 (later) shows the same	border observation points along the path. Figure 4 (later) shows the
	results of these subtractions, but in graphical form like the above	same results of these subtractions, but in graphical form like the
	figure. The tabulated figures are actually reasonable approximations	above figure. The tabulated figures are actually reasonable
	derived from more precise formulae given in Appendix A of [Re-TCP].	approximations derived from more precise formulae given in Appendix A
	The RE flag is not changed by interior routers, so it can be seen	of [I-D.briscoe-tsvwg-re-ecn-tcp]. The RE flag is not changed by
	that it acts as a reference against which the congestion marking	interior routers, so it can be seen that it acts as a reference
	fraction can be compared along the path.	against which the congestion marking fraction can be compared along
		the path.

	+--------------------------+---------------------------------------+	+--------------------------+---------------------------------------+
	\| Border observation point \| Approximate Downstream pre-congestion \|	\| Border observation point \| Approximate Downstream pre-congestion \|
	+--------------------------+---------------------------------------+	+--------------------------+---------------------------------------+
	\| ingress -- A \| 3% - 0% = 3% \|	\| ingress -- A \| 3% - 0% = 3% \|
	\| A -- B \| 3% - 1% = 2% \|	\| A -- B \| 3% - 1% = 2% \|
	\| B -- C \| 3% - 1% = 2% \|	\| B -- C \| 3% - 1% = 2% \|
	\| C -- egress \| 3% - 3% = 0% \|	\| C -- egress \| 3% - 3% = 0% \|
	+--------------------------+---------------------------------------+	+--------------------------+---------------------------------------+


	skipping to change at page 21, line 36	skipping to change at page 24, line 40
	aggregate using the most recent feedback from the relevant egress,	aggregate using the most recent feedback from the relevant egress,
	arriving with each new reservation, or each refresh. These updates	arriving with each new reservation, or each refresh. These updates
	arrive relatively infrequently compared to the speed with which	arrive relatively infrequently compared to the speed with which
	congestion changes. Although this feedback will always be out of	congestion changes. Although this feedback will always be out of
	date, on average positive errors should cancel out negative over a	date, on average positive errors should cancel out negative over a
	sufficiently long duration.	sufficiently long duration.

	In summary, the network adds pre-congestion marking in the forward	In summary, the network adds pre-congestion marking in the forward
	data path, the egress feeds its level back to the ingress in RSVP (or	data path, the egress feeds its level back to the ingress in RSVP (or
	similar signalling), then the ingress gateway re-echoes it into the	similar signalling), then the ingress gateway re-echoes it into the

	forward data path by blanking the RE flag. Hence the name re-ECN.	forward data path by blanking the RE flag. Then at any border within
	Then at any border within the Diffserv region, the pre-congestion	the PCN-region, the pre-congestion marking that every passing packet
	marking that every passing packet will be expected to experience	will be expected to experience downstream can be measured to be the
	downstream can be measured to be the RE blanking fraction minus the	RE blanking fraction minus the congestion marking fraction.
	congestion marking fraction.

	4.3.2. Aggregate Bootstrap	4.3.2. Aggregate Bootstrap

	When a new reservation PATH message arrives at the egress, if there	When a new reservation PATH message arrives at the egress, if there
	are currently no flows in progress from the same ingress, there will	are currently no flows in progress from the same ingress, there will
	be no state maintaining the current level of pre-congestion marking	be no state maintaining the current level of pre-congestion marking

	for the aggregate. While the reservation signalling continues onward	for the aggregate. In the case of RSVP reservation signalling, while
	towards the receiving host, the egress gateway returns an RSVP	the signal continues onward towards the receiving host, the egress
	message to the ingress with a flag [RSVP-ECN] asking the ingress to	gateway can return an RSVP message to the ingress with a
	send a specified number of data probes between them. This bootstrap	flag [RSVP-ECN] asking the ingress to send a specified number of data
	behaviour is all described in the deployment	probes between them. The more general possibilities for bootstrap
	model [I-D.ietf-pcn-architecture].	behaviour are described in the PCN
		architecture [I-D.ietf-pcn-architecture], including using the
		reservation signal itself as a probe.


	However, with our new re-ECN scheme, the ingress does not know what	However, with our new re-PCN scheme, the ingress does not know what
	proportion of the data probes should have the RE flag blanked,	proportion of the data probes should have the RE flag blanked,
	because it has no estimate yet of pre-congestion for the path across	because it has no estimate yet of pre-congestion for the path across

	the Diffserv region.	the PCN-region.

	To be conservative, following the guidance for specifying other re-	To be conservative, following the guidance for specifying other re-

	ECN transports in [Re-TCP], the ingress SHOULD set the FNE codepoint	ECN transports in [I-D.briscoe-tsvwg-re-ecn-tcp], the ingress SHOULD
	of the extended ECN header in all probe packets (Table 2). As per	set the FNE codepoint of the extended PCN header in all probe packets
	the deployment model, the egress gateway measures the fraction of	(Table 2). As per the PCN deployment model, the egress gateway
	congestion-marked probe octets and feeds back the resulting pre-	measures the fraction of congestion-marked probe octets and feeds
	congestion level to the ingress, piggy-backed on the returning	back the resulting pre-congestion level to the ingress, piggy-backed
	reservation response (RESV) for the new flow. Probe packets are	on the returning reservation response (RESV) for the new flow. Probe
	identifiable by the egress because they have the ingress as the	packets are identifiable by the egress because they carry the FNE
	source and the egress as the destination in the IP header.	codepoint.

	It may seem inadvisable to expect the FNE codepoint to be set on	It may seem inadvisable to expect the FNE codepoint to be set on
	probes, given legacy firewalls etc. might discard such packets	probes, given legacy firewalls etc. might discard such packets
	(because this flag had no previous legitimate use). However, in the	(because this flag had no previous legitimate use). However, in the

	deployment scenarios envisaged, each domain in the Diffserv region	deployment scenarios envisaged, each domain in the PCN-region has to
	has to be explicitly configured to support the controlled load	be explicitly configured to support the admission controlled service.
	service. So, before deploying the service, the operator MUST	So, before deploying the service, the operator MUST reconfigure such
	reconfigure such a misbehaving middlebox to allow through packets	a badly implemented middlebox to allow through packets with the RE
	with the RE flag set.	flag set.

	Note that we have said SHOULD rather than MUST for the FNE setting	Note that we have said SHOULD rather than MUST for the FNE setting
	behaviour of the ingress for probe packets. This entertains the	behaviour of the ingress for probe packets. This entertains the
	possibility of an ingress implementation having the benefit of other	possibility of an ingress implementation having the benefit of other
	knowledge of the path, which it re-uses for a newly starting	knowledge of the path, which it re-uses for a newly starting
	aggregate. For instance, it may hold cached information from a	aggregate. For instance, it may hold cached information from a
	recent use of the aggregate that is still sufficiently current to be	recent use of the aggregate that is still sufficiently current to be

	useful.	useful. If not all probe packets are set to FNE, the ingress will
		have to ensure probe packets are identifiable by some other means,
		perhaps by using the egress as the destination address.

	It might seem pedantic worrying about these few probe packets, but	It might seem pedantic worrying about these few probe packets, but
	this behaviour ensures the system is safe, even if the proportion of	this behaviour ensures the system is safe, even if the proportion of
	probe packets becomes large.	probe packets becomes large.

	4.3.3. Flow Bootstrap	4.3.3. Flow Bootstrap

	It might be expected that a new flow within an active aggregate would	It might be expected that a new flow within an active aggregate would
	need no special bootstrap behaviour. If there was an aggregate	need no special bootstrap behaviour. If there was an aggregate
	already in progress between the gateways the new flow was about to	already in progress between the gateways the new flow was about to

	skipping to change at page 23, line 21	skipping to change at page 26, line 32
	that sanctions may be too strict at the interface before the egress	that sanctions may be too strict at the interface before the egress
	gateway. It will often be possible to apply sanctions at the	gateway. It will often be possible to apply sanctions at the
	granularity of aggregates rather than flows, but in an internetworked	granularity of aggregates rather than flows, but in an internetworked
	environment it cannot be guaranteed that aggregates will be	environment it cannot be guaranteed that aggregates will be
	identifiable in remote networks. So setting FNE at the start of each	identifiable in remote networks. So setting FNE at the start of each
	flow is a safe strategy. For instance, a remote network may have	flow is a safe strategy. For instance, a remote network may have
	equal cost multi-path (ECMP) routing enabled, causing different flows	equal cost multi-path (ECMP) routing enabled, causing different flows
	between the same gateways to traverse different paths.	between the same gateways to traverse different paths.

	After an idle period of more than 1 second, the ingress gateway	After an idle period of more than 1 second, the ingress gateway

	SHOULD set the EECN field of the next packet it sends to FNE. This	SHOULD set the EPCN field of the next packet it sends to FNE. This
	allows the design of network policers to be deterministic (see	allows the design of network policers to be deterministic (see

	[Re-TCP]).	[I-D.briscoe-tsvwg-re-ecn-tcp]).

	However, if the ingress gateway can guarantee that the network(s)	However, if the ingress gateway can guarantee that the network(s)
	that will carry the flow to its egress gateway all use a common	that will carry the flow to its egress gateway all use a common
	identifier for the aggregate (e.g. a single MPLS network without ECMP	identifier for the aggregate (e.g. a single MPLS network without ECMP
	routing), it MAY NOT set FNE when it adds a new flow to an active	routing), it MAY NOT set FNE when it adds a new flow to an active
	aggregate. And an FNE packet need only be sent if a whole aggregate	aggregate. And an FNE packet need only be sent if a whole aggregate
	has been idle for more than 1 second.	has been idle for more than 1 second.

	4.3.4. Router Forwarding Behaviour	4.3.4. Router Forwarding Behaviour


	Adding re-ECN works well without modifying the forwarding behaviour	Adding re-PCN works well with the regular PCN forwarding behaviour of
	of any routers. However, below, two changes are proposed when	interior queues. However, below, two optional changes are proposed
	forwarding packets with a per-hop-behaviour that requires pre-	when forwarding packets with a per-hop-behaviour that requires pre-
	congestion notification:	congestion notification:


	Preferential drop: When a router cannot avoid dropping ECN-capable	Preferential drop: When a router cannot avoid dropping PCN-capable
	packets, preferential dropping of packets with different extended	packets, preferential dropping of packets with different extended

	ECN codepoints SHOULD be implemented between packets within a PHB	PCN codepoints SHOULD be implemented between packets within a PHB
	that uses PCN marking. The drop preference order to use is	that uses PCN marking. The drop preference order to use is
	defined in Table 4. Note that to reduce configuration complexity,	defined in Table 4. Note that to reduce configuration complexity,

	Re-Echo and FNE MAY be given the same drop preference, but if	Re-PCT-Echo and FNE MAY be given the same drop preference, but if
	feasible, FNE should be dropped in preference to Re-Echo.	feasible, FNE SHOULD be dropped in preference to Re-PCT-Echo.

	+---------+-------+----------------+---------+----------------------+
	\| ECN \| RE \| Extended ECN \| Drop \| Re-ECN meaning \|
	\| field \| flag \| codepoint \| Pref \| \|
	+---------+-------+----------------+---------+----------------------+
	\| 01 \| 0 \| Re-Echo \| 5/4 \| Re-echoed congestion \|
	\| \| \| \| \| and RECT \|
	\| 00 \| 1 \| FNE \| 4 \| Feedback not \|
	\| \| \| \| \| established \|
	\| 01 \| 1 \| RECT \| 3 \| Re-ECN capable \|
	\| \| \| \| \| transport \|
	\| 10 \| 0 \| AM(0) \| 3 \| Admission Marking \|
	\| \| \| \| \| with Re-Echo \|
	\| 10 \| 1 \| AM(-1) \| 3 \| Admission Marking \|
	\| \| \| \| \| \|
	\| 11 \| 0 \| PM(0) \| 2 \| Pre-emption Marking \|
	\| \| \| \| \| with Re-Echo \|
	\| 11 \| 1 \| PM(-1) \| 2 \| Pre-emption Marking \|
	\| \| \| \| \| \|
	\| 00 \| 0 \| Not-RECT \| 1 \| Not re-ECN-capable \|
	\| \| \| \| \| transport \|
	+---------+-------+----------------+---------+----------------------+

	Table 4: Drop Preference of Extended ECN Codepoints (1 = drop 1st)


	Given this proposal is being advanced at the same time as PCN	If this proposal were advanced at the same time as PCN itself, we
	itself, we strongly RECOMMEND that preferential drop based on	would recommend that preferential drop based on extended PCN
	extended ECN codepoint is added to router forwarding at the same	codepoint SHOULD be added to router forwarding at the same time as
	time as PCN marking. Preferential dropping can be difficult to	PCN marking. Preferential dropping can be difficult to implement,
	implement, but we strongly RECOMMEND this security-related re-ECN	but we RECOMMEND this security-related re-PCN improvement where
	improvement where feasible as it is an effective defence against	feasible as it is an effective defence against flooding attacks.
	flooding attacks.

	Marking vs. Drop: We propose that PCN-routers SHOULD inspect the RE	Marking vs. Drop: We propose that PCN-routers SHOULD inspect the RE
	flag as well as the ECN field to decide whether to drop or mark	flag as well as the ECN field to decide whether to drop or mark
	PCN DSCPs. They MUST choose drop if the codepoint of this	PCN DSCPs. They MUST choose drop if the codepoint of this

	extended ECN field is Not-RECT. Otherwise they SHOULD mark	extended ECN field is Not-PCN. Otherwise they SHOULD mark
	(unless, of course, buffer space is exhausted).	(unless, of course, buffer space is exhausted).

	A PCN-capable router MUST NOT ever congestion mark a packet	A PCN-capable router MUST NOT ever congestion mark a packet

	carrying the Not-RECT codepoint because the transport will only	carrying the Not-PCN codepoint because the transport will only
	understand drop, not congestion marking. But a PCN-capable router	understand drop, not congestion marking. But a PCN-capable router
	can mark rather than drop an FNE packet, even though its ECN field	can mark rather than drop an FNE packet, even though its ECN field
	when looked at in isolation is '00' which appears to be a legacy	when looked at in isolation is '00' which appears to be a legacy
	Not-ECT packet. Therefore, if a packet's RE flag is '1', even if	Not-ECT packet. Therefore, if a packet's RE flag is '1', even if
	its ECN field is '00', a PCN-enabled router SHOULD use congestion	its ECN field is '00', a PCN-enabled router SHOULD use congestion
	marking. This allows the `feedback not established' (FNE)	marking. This allows the `feedback not established' (FNE)
	codepoint to be used for probe packets, in order to pick up PCN	codepoint to be used for probe packets, in order to pick up PCN
	marking when bootstrapping an aggregate.	marking when bootstrapping an aggregate.


	ECN marking rather than dropping of FNE packets MUST only be	PCN marking rather than dropping of FNE packets MUST only be
	deployed in controlled environments, such as that in	deployed in controlled environments, such as that in
	[I-D.ietf-pcn-architecture], where the presence of an egress node	[I-D.ietf-pcn-architecture], where the presence of an egress node

	that understands ECN marking is assured. Congestion events might	that understands PCN marking is assured. Congestion events might
	otherwise be ignored if the receiver only understands drop, rather	otherwise be ignored if the receiver only understands drop, rather

	than ECN marking. This is because there is no guarantee that ECN	than PCN marking. This is because there is no guarantee that PCN
	capability has been negotiated if feedback is not established	capability has been negotiated if feedback is not established

	(FNE). Also, [Re-TCP] places the strong condition that a router	(FNE). Also, [I-D.briscoe-tsvwg-re-ecn-tcp] places the strong
	MUST apply drop rather than marking to FNE packets unless it can	condition that a router MUST apply drop rather than marking to FNE
	guarantee that FNE packets are rate limited either locally or	packets unless it can guarantee that FNE packets are rate limited
	upstream.	either locally or upstream.

		+---------+-------+-----------------+---------+---------------------+
		\| PCN \| RE \| Extended PCN \| Drop \| Re-PCN meaning \|
		\| field \| flag \| codepoint \| Pref \| \|
		+---------+-------+-----------------+---------+---------------------+
		\| 10 \| 0 \| Re-PCT-Echo \| 5/4 \| Re-echoed \|
		\| \| \| \| \| congestion and \|
		\| \| \| \| \| Re-PCT \|
		\| 00 \| 1 \| FNE \| 4 \| Feedback not \|
		\| \| \| \| \| established \|
		\| 10 \| 1 \| Re-PCT \| 3 \| Re-PCN capable \|
		\| \| \| \| \| transport \|
		\| 01 \| 0 \| AM(0) \| 3 \| Admission Marking \|
		\| \| \| \| \| with Re-Echo \|
		\| 01 \| 1 \| AM(-1) \| 3 \| Admission Marking \|
		\| \| \| \| \| \|
		\| 11 \| 0 \| TM(0) \| 2 \| Termination Marking \|
		\| \| \| \| \| with Re-Echo \|
		\| 11 \| 1 \| TM(-1) \| 2 \| Termination Marking \|
		\| \| \| \| \| \|
		\| 00 \| 0 \| Not-PCN \| 1 \| Not PCN-capable \|
		\| \| \| \| \| transport \|
		+---------+-------+-----------------+---------+---------------------+

		Table 4: Drop Preference of Extended ECN Codepoints (1 = drop 1st)

	4.3.5. Extensions	4.3.5. Extensions


	If a different signalling system, such as NSIS, were used, but it	If a different signalling system, such as NSIS, were used but it
	provided admission control in a similar way, using pre-congestion	provided admission control in a similar way using pre-congestion
	notification (e.g. Arumaithurai [I-D.arumaithurai-nsis-pcn] or	notification (e.g. Arumaithurai [I-D.arumaithurai-nsis-pcn] or

	RMD [I-D.ietf-nsis-rmd]) we believe re-ECN could be used to protect	RMD [I-D.ietf-nsis-rmd]), we believe re-PCN could be used to protect
	against misbehaving networks in the same way as proposed above.	against misbehaving networks in the same way as proposed above.

	5. Emulating Border Policing with Re-ECN	5. Emulating Border Policing with Re-ECN


	Note that the re-ECN protocol described in Section 4 above would	The following sections are informative, not normative. The re-PCN
	require standardisation, whereas operators acting in their own	protocol described in Section 4 above would require standardisation,
	interests would be expected to deploy policing and monitoring	whereas operators acting in their own interests would be expected to
	functions similar to those proposed in the sections below without any	deploy policing and monitoring functions similar to those proposed in
	further need for standardisation by the IETF. Flexibility is	the sections below without any further need for standardisation by
	expected in exactly how policing and monitoring is done.	the IETF. Flexibility is expected in exactly how policing and
		monitoring is done.

	5.1. Informal Terminology	5.1. Informal Terminology

	In the rest of this memo, where the context makes it clear, we will	In the rest of this memo, where the context makes it clear, we will
	sometimes loosely use the term `congestion' rather than using the	sometimes loosely use the term `congestion' rather than using the
	stricter `downstream pre-congestion'. Also we will loosely talk of	stricter `downstream pre-congestion'. Also we will loosely talk of
	positive or negative flows, meaning flows where the moving average of	positive or negative flows, meaning flows where the moving average of
	the downstream pre-congestion metric is persistently positive or	the downstream pre-congestion metric is persistently positive or
	negative. The notion of a negative metric arises because it is	negative. The notion of a negative metric arises because it is
	derived by subtracting one metric from another. Of course actual	derived by subtracting one metric from another. Of course actual
	downstream congestion cannot be negative, only the metric can	downstream congestion cannot be negative, only the metric can
	(whether due to time lags or deliberate malice).	(whether due to time lags or deliberate malice).

	Just as we will loosely talk of positive and negative flows, we will	Just as we will loosely talk of positive and negative flows, we will
	also talk of positive or negative packets, meaning packets that	also talk of positive or negative packets, meaning packets that
	contribute positively or negatively to downstream pre-congestion.	contribute positively or negatively to downstream pre-congestion.

	Therefore packets can be considered to have a `worth' of +1, 0 or -1,	Therefore packets can be considered to have a `worth' of +1, 0 or -1,
	which, when multiplied by their size, indicates their contribution to	which, when multiplied by their size, indicates their contribution to

	downstream congestion. Packets will usually be sent with a worth of	downstream congestion. Packets will usually be initialised by the
	0. Blanking the RE flag increments the worth of a packet to +1.	PCN ingress with a worth of 0. Blanking the RE flag increments the
	Congestion marking a packet decrements its worth (whether admission	worth of a packet to +1. Congestion marking a packet decrements its
	marking or pre-emption marking). Congestion marking a previously	worth (whether admission marking or termination marking). Congestion
	blanked packet cancel out the positive and negative worth of each	marking a previously blanked packet cancels out the positive worth
	marking (a worth of 0). The FNE codepoint is an exception. It has	with the negative worth of the congestion marking (resulting in a
	the same positive worth as a packet with the Re-Echo codepoint. The	packet worth 0). The FNE codepoint is an exception. It has the same
	table below specifies unambiguously the worth of each extended ECN	positive worth as a packet with the Re-PCT-Echo codepoint. The table
		below specifies unambiguously the worth of each extended PCN
	codepoint. Note the order is different from the previous table to	codepoint. Note the order is different from the previous table to

	emphasise how congestion marking processes decrement the worth.	emphasise how congestion marking processes decrement the worth (with
		the exception of FNE).


	+---------+-------+-----------------+-------+-----------------------+	+---------+-------+------------------+-------+----------------------+
	\| ECN \| RE \| Extended ECN \| Worth \| Re-ECN meaning \|	\| ECN \| RE \| Extended PCN \| Worth \| Re-PCN meaning \|
	\| field \| flag \| codepoint \| \| \|	\| field \| flag \| codepoint \| \| \|

	+---------+-------+-----------------+-------+-----------------------+	+---------+-------+------------------+-------+----------------------+
	\| 00 \| 0 \| Not-RECT \| n/a \| Not re-ECN-capable \|	\| 00 \| 0 \| Not-PCN \| n/a \| Not PCN-capable \|
	\| \| \| \| \| transport \|	\| \| \| \| \| transport \|

	\| 01 \| 0 \| Re-Echo \| +1 \| Re-echoed congestion \|	\| 10 \| 0 \| Re-PCT-Echo \| +1 \| Re-echoed congestion \|
	\| \| \| \| \| and RECT \|	\| \| \| \| \| and Re-PCT \|
	\| 10 \| 0 \| AM(0) \| 0 \| Admission Marking \|	\| 01 \| 0 \| AM(0) \| 0 \| Admission Marking \|
	\| \| \| \| \| with Re-Echo \|	\| \| \| \| \| with Re-Echo \|

	\| 11 \| 0 \| PM(0) \| 0 \| Pre-emption Marking \|	\| 11 \| 0 \| TM(0) \| 0 \| Termination Marking \|
	\| \| \| \| \| with Re-Echo \|	\| \| \| \| \| with Re-Echo \|
	\| 00 \| 1 \| FNE \| +1 \| Feedback not \|	\| 00 \| 1 \| FNE \| +1 \| Feedback not \|
	\| \| \| \| \| established \|	\| \| \| \| \| established \|

	\| 01 \| 1 \| RECT \| 0 \| Re-ECN capable \|	\| 10 \| 1 \| Re-PCT \| 0 \| Re-PCN capable \|
	\| \| \| \| \| transport \|	\| \| \| \| \| transport \|

	\| 10 \| 1 \| AM(-1) \| -1 \| Admission Marking \|	\| 01 \| 1 \| AM(-1) \| -1 \| Admission Marking \|
	\| \| \| \| \| \|	\| \| \| \| \| \|

	\| 11 \| 1 \| PM(-1) \| -1 \| Pre-emption Marking \|	\| 11 \| 1 \| TM(-1) \| -1 \| Termination Marking \|
	+---------+-------+-----------------+-------+-----------------------+	+---------+-------+------------------+-------+----------------------+

	Table 5: 'Worth' of Extended ECN Codepoints	Table 5: 'Worth' of Extended ECN Codepoints

	5.2. Policing Overview	5.2. Policing Overview

	It will be recalled that downstream congestion can be found by	It will be recalled that downstream congestion can be found by
	subtracting upstream congestion from path congestion. Figure 4	subtracting upstream congestion from path congestion. Figure 4
	displays the difference between the two plots in Figure 3 to show	displays the difference between the two plots in Figure 3 to show
	downstream pre-congestion across the same path through the Internet.	downstream pre-congestion across the same path through the Internet.

	To emulate border policing, the general idea is for each domain to	To emulate border policing, the general idea is for each domain to

	skipping to change at page 27, line 30	skipping to change at page 30, line 46
	1.00% 2.00%: pre-congestion	1.00% 2.00%: pre-congestion
	\|	\|
	sanctions	sanctions

	Figure 4: Policing Framework, showing creation of opposing pressures	Figure 4: Policing Framework, showing creation of opposing pressures
	to under-declare and over-declare downstream pre-congestion, using	to under-declare and over-declare downstream pre-congestion, using
	penalties and sanctions	penalties and sanctions

	These penalties seem to encourage everyone to understate downstream	These penalties seem to encourage everyone to understate downstream
	congestion in order to reduce the penalties they incur. But a	congestion in order to reduce the penalties they incur. But a

	balancing pressure is introduced by the last domain, which applies	balancing pressure is introduced by the last domain (strictly by any
	sanctions to flows if downstream congestion goes negative before the	domain), which applies sanctions to flows if downstream congestion
	egress gateway. The upward arrow at Domain C's border with the	goes negative before the egress gateway. The upward arrow at Domain
	egress gateway represents the incentive the sanctions would create to	C's border with the egress gateway represents the incentive the
	prevent negative traffic. The same upward pressure can be applied at	sanctions would create to prevent negative traffic. The same upward
	any domain border (arrows not shown).	pressure can be applied at any domain border (arrows not shown).

	Any flow that persistently goes negative by the time it leaves a	Any flow that persistently goes negative by the time it leaves a
	domain must not have been marked correctly in the first place. A	domain must not have been marked correctly in the first place. A
	domain that discovers such a flow can adopt a range of strategies to	domain that discovers such a flow can adopt a range of strategies to
	protect itself. Which strategy it uses will depend on policy,	protect itself. Which strategy it uses will depend on policy,
	because it cannot immediately assume malice--there may be an innocent	because it cannot immediately assume malice--there may be an innocent
	configuration error somewhere in the system.	configuration error somewhere in the system.

	This memo does not propose to standardise any particular mechanism to	This memo does not propose to standardise any particular mechanism to
	detect persistently negative flows, but Section 5.5 does give	detect persistently negative flows, but Section 5.5 does give
	examples. Note that we have used the term flow, but there will be no	examples. Note that we have used the term flow, but there will be no
	need to bury into the transport layer for port numbers; identifiers	need to bury into the transport layer for port numbers; identifiers
	visible in the network layer will be sufficient (IP address pair,	visible in the network layer will be sufficient (IP address pair,

	DSCP, protocol ID). The appendix also gives a mechanism to bound the	DSCP, protocol ID). The appendix also gives a mechanism to limit the
	required flow state, preventing state exhaustion attacks.	required flow state, preventing state exhaustion attacks.

	Of course, some domains may trust other domains to comply with	Of course, some domains may trust other domains to comply with
	admission control without applying sanctions or penalties. In these	admission control without applying sanctions or penalties. In these
	cases, the protocol should still be used but no penalties need be	cases, the protocol should still be used but no penalties need be

	applied. The re-ECN protocol ensures downstream pre-congestion	applied. The re-PCN protocol ensures downstream pre-congestion
	marking is passed on correctly whether or not penalties are applied	marking is passed on correctly whether or not penalties are applied
	to it, so the system works just as well with a mixture of some	to it, so the system works just as well with a mixture of some
	domains trusting each other and others not.	domains trusting each other and others not.

	Providers should be free to agree the contractual terms they wish	Providers should be free to agree the contractual terms they wish
	between themselves, so this memo does not propose to standardise how	between themselves, so this memo does not propose to standardise how
	these penalties would be applied. It is sufficient to standardise	these penalties would be applied. It is sufficient to standardise

	the re-ECN protocol so the downstream pre-congestion metric is	the re-PCN protocol so the downstream pre-congestion metric is
	available if providers choose to use it. However, the next section	available if providers choose to use it. However, the next section
	(Section 5.3) gives some examples of how these penalties might be	(Section 5.3) gives some examples of how these penalties might be
	implemented.	implemented.

	5.3. Pre-requisite Contractual Arrangements	5.3. Pre-requisite Contractual Arrangements


	The re-ECN protocol has been chosen to solve the policing problem	The re-PCN protocol has been chosen to solve the policing problem
	because it embeds a downstream pre-congestion metric in passing CL	because it embeds a downstream pre-congestion metric in passing PCN
	traffic that is difficult to lie about and can be measured in bulk.	traffic that is difficult to lie about and can be measured in bulk.
	The ability to emulate border policing depends on network operators	The ability to emulate border policing depends on network operators
	choosing to use this metric as one of the elements in their contracts	choosing to use this metric as one of the elements in their contracts
	with each other.	with each other.

	Already many inter-domain agreements involve a capacity and a usage	Already many inter-domain agreements involve a capacity and a usage
	element. The usage element may be based on volume or various	element. The usage element may be based on volume or various
	measures of peak demand. We expect that those network operators who	measures of peak demand. We expect that those network operators who
	choose to use pre-congestion notification for admission control would	choose to use pre-congestion notification for admission control would
	also be willing to consider using this downstream pre-congestion	also be willing to consider using this downstream pre-congestion
	metric as a usage element in their interconnection contracts for	metric as a usage element in their interconnection contracts for

	admission controlled (CL) traffic.	admission controlled (PCN) traffic.

	Congestion (or pre-congestion) has the dimension of [octet], being	Congestion (or pre-congestion) has the dimension of [octet], being
	the product of volume transferred [octet] and the congestion fraction	the product of volume transferred [octet] and the congestion fraction
	[dimensionless], which is the fraction of the offered load that the	[dimensionless], which is the fraction of the offered load that the
	network isn't able to serve (or would rather not serve in the case of	network isn't able to serve (or would rather not serve in the case of
	pre-congestion). Measuring downstream congestion gives a measure of	pre-congestion). Measuring downstream congestion gives a measure of
	the volume transferred but modulated by congestion expected	the volume transferred but modulated by congestion expected
	downstream. So volume transferred during off-peak periods counts as	downstream. So volume transferred during off-peak periods counts as

	nearly nothing, while volume transferred at peak times counts very	nearly nothing, while volume transferred at peak times or over
	highly. The re-ECN protocol allows one network to measure how much	temporarily congested links counts very highly. The re-PCN protocol
	pre-congestion has been `dumped' into it by another network. And	allows one network to measure how much pre-congestion has been
	then in turn how much of that pre-congestion it dumped into the next	`dumped' into it by another network. And then in turn how much of
	downstream network.	that pre-congestion it dumped into the next downstream network.

	Section 5.6 describes mechanisms for calculating border penalties	Section 5.6 describes mechanisms for calculating border penalties
	referring to Appendix A.2 for suggested metering algorithms for	referring to Appendix A.2 for suggested metering algorithms for
	downstream congestion at a border router. Conceptually, it could	downstream congestion at a border router. Conceptually, it could
	hardly be simpler. It broadly involves accumulating the volume of	hardly be simpler. It broadly involves accumulating the volume of
	packets with the RE flag blanked and the volume of those with	packets with the RE flag blanked and the volume of those with
	congestion marking then subtracting the two.	congestion marking then subtracting the two.

	Once this downstream pre-congestion metric is available, operators	Once this downstream pre-congestion metric is available, operators
	are free to choose how they incorporate it into their interconnection	are free to choose how they incorporate it into their interconnection

	skipping to change at page 30, line 9	skipping to change at page 33, line 25
	other words, penalties are always paid in the same direction as the	other words, penalties are always paid in the same direction as the
	data, and never against the data flow, even if downstream congestion	data, and never against the data flow, even if downstream congestion
	seems to be negative. This is consistent with the definition of	seems to be negative. This is consistent with the definition of
	physical congestion; when a resource is underutilised, it is not	physical congestion; when a resource is underutilised, it is not
	negatively congested. Its congestion is just zero. So, although	negatively congested. Its congestion is just zero. So, although
	short periods of negative marking can be tolerated to correct	short periods of negative marking can be tolerated to correct
	temporary over-declarations due to lags in the feedback system,	temporary over-declarations due to lags in the feedback system,
	persistent downstream negative congestion can have no physical	persistent downstream negative congestion can have no physical
	meaning and therefore must signify a problem. The incentive for	meaning and therefore must signify a problem. The incentive for
	domains not to tolerate persistently negative traffic depends on this	domains not to tolerate persistently negative traffic depends on this

	principle that penalties must never be paid against the data flow.	principle that negative penalties must never be paid for negative
		congestion.


	Also note that at the last egress of the Diffserv region, domain C	Also note that at the last egress of the PCN-region, domain C should
	should not agree to pay any penalties to the egress gateway for pre-	not agree to pay any penalties to the egress gateway for pre-
	congestion passed to the egress gateway. Downstream pre-congestion	congestion passed to the egress gateway. Downstream pre-congestion
	to the egress gateway should have reached zero here. If domain C	to the egress gateway should have reached zero here. If domain C
	were to agree to pay for any remaining downstream pre-congestion, it	were to agree to pay for any remaining downstream pre-congestion, it
	would give the egress gateway an incentive to over-declare pre-	would give the egress gateway an incentive to over-declare pre-
	congestion feedback and take the resulting profit from domain C.	congestion feedback and take the resulting profit from domain C.

	To focus the discussion, from now on, unless otherwise stated, we	To focus the discussion, from now on, unless otherwise stated, we
	will assume a downstream network charges its upstream neighbour in	will assume a downstream network charges its upstream neighbour in
	proportion to the pre-congestion it sends (V_b in the notation of	proportion to the pre-congestion it sends (V_b in the notation of
	Appendix A.2). Effectively tiered thresholds would be just more	Appendix A.2). Effectively tiered thresholds would be just more
	coarse-grained approximations of the fine-grained case we choose to	coarse-grained approximations of the fine-grained case we choose to
	examine. If these neighbours had previously agreed that the (fixed)	examine. If these neighbours had previously agreed that the (fixed)
	price per octet of pre-congestion would be L, then the bill at the	price per octet of pre-congestion would be L, then the bill at the
	end of the month would simply be the product L*V_b, plus any fixed	end of the month would simply be the product L*V_b, plus any fixed
	charges they may also have agreed.	charges they may also have agreed.

	We are well aware that the IETF tries to avoid standardising	We are well aware that the IETF tries to avoid standardising
	technology that depends on a particular business model. Indeed, this	technology that depends on a particular business model. Indeed, this
	principle is at the heart of all our own work. Our aim here is to	principle is at the heart of all our own work. Our aim here is to
	make a new metric available that we believe is superior to all	make a new metric available that we believe is superior to all

	existing metrics. Then, our aim is to show that border policing can	existing metrics. Then, our aim is to show that bulk border policing
	at least work with the one model we have just outlined. We assume	can at least work with the one model we have just outlined. Of
	that operators might then experiment with the metric in other models.	course, operators are free to complement this pre-congestion-based
	Of course, operators are free to complement this pre-congestion-based
	usage element of their charges with traditional capacity charging,	usage element of their charges with traditional capacity charging,

	and we expect they will.	and we expect they will. But if operators don't want to use this
		business model at all, they don't have to do bulk border policing.
		We also assume that operators might experiment with the metric in
		other models.

	Also note well that everything we discuss in this memo only concerns	Also note well that everything we discuss in this memo only concerns

	interconnection within the Diffserv region. ISPs are free to sell or	interconnection within the PCN-region. ISPs are free to sell or give
	give away reservations however they want on the retail market. But	away reservations however they want on the retail market. But of
	of course, interconnection charges will have a bearing on that.	course, interconnection charges will have a bearing on that. Indeed,
	Indeed, in the present scenario, the ingress gateway effectively	in the present scenario, the ingress gateway effectively sells
	sells reservations on one side and buys congestion penalties on the	reservations on one side and buys congestion penalties on the other.
	other. As congestion rises, one can imagine the gateway discovering	As congestion rises, one can imagine the gateway discovering that
	that congestion penalties have risen higher than the (probably fixed)	congestion penalties have risen higher than the (probably fixed)
	revenue it will earn from selling the next flow reservation. This	revenue it will earn from selling the next flow reservation. This
	encourages the gateway to cut its losses by blocking new calls, which	encourages the gateway to cut its losses by blocking new calls, which
	is why we believe downstream congestion penalties can emulate per-	is why we believe downstream congestion penalties can emulate per-
	flow rate policing at borders, as the next section explains.	flow rate policing at borders, as the next section explains.

	5.4. Emulation of Per-Flow Rate Policing: Rationale and Limits	5.4. Emulation of Per-Flow Rate Policing: Rationale and Limits

	The important feature of charging in proportion to congestion volume	The important feature of charging in proportion to congestion volume
	is that the penalty aggregates and disaggregates correctly along with	is that the penalty aggregates and disaggregates correctly along with
	packet flows. This is because the penalty rises linearly with bit	packet flows. This is because the penalty rises linearly with bit

	skipping to change at page 31, line 36	skipping to change at page 35, line 7
	utilisation of a particular resource. So if someone tries to push	utilisation of a particular resource. So if someone tries to push
	another flow into a path that is already signalling enough pre-	another flow into a path that is already signalling enough pre-
	congestion to warrant admission control, the penalty will be a lot	congestion to warrant admission control, the penalty will be a lot
	greater than it would have been to add the same flow to a less	greater than it would have been to add the same flow to a less
	congested path. This makes the incentive system fairly insensitive	congested path. This makes the incentive system fairly insensitive
	to the actual level of pre-congestion for triggering admission	to the actual level of pre-congestion for triggering admission
	control that each ingress chooses. The deterrent against exceeding	control that each ingress chooses. The deterrent against exceeding
	whatever threshold is chosen rises very quickly with a small amount	whatever threshold is chosen rises very quickly with a small amount
	of cheating.	of cheating.


	These are the properties that allow re-ECN to emulate per-flow border	These are the properties that allow re-PCN to emulate per-flow border
	policing of both rate and admission control. It is not a perfect	policing of both rate and admission control. It is not a perfect
	emulation of per-flow border policing, but we claim it is sufficient	emulation of per-flow border policing, but we claim it is sufficient
	to at least ensure the cost to others of a cheat is borne by the	to at least ensure the cost to others of a cheat is borne by the
	cheater, because the penalties are at least proportionate to the	cheater, because the penalties are at least proportionate to the
	level of the cheat. If an edge network operator is selling	level of the cheat. If an edge network operator is selling
	reservations at a large profit over the congestion cost, these pre-	reservations at a large profit over the congestion cost, these pre-
	congestion penalties will not be sufficient to ensure networks in the	congestion penalties will not be sufficient to ensure networks in the
	middle get a share of those profits, but at least they can cover	middle get a share of those profits, but at least they can cover
	their costs.	their costs.


	skipping to change at page 32, line 20	skipping to change at page 35, line 40
	the price L (per octet) of pre-congestion would be about 1000 times	the price L (per octet) of pre-congestion would be about 1000 times
	the previously used (per octet) price for volume. We should add that	the previously used (per octet) price for volume. We should add that
	a switch to pre-congestion is unlikely to exactly maintain the same	a switch to pre-congestion is unlikely to exactly maintain the same
	overall level of usage charges, but this argument will be	overall level of usage charges, but this argument will be
	approximately true, because usage charge will rise to at least the	approximately true, because usage charge will rise to at least the
	level the market finds necessary to push back against usage.	level the market finds necessary to push back against usage.

	From the above example it can be seen why a 1000x higher price will	From the above example it can be seen why a 1000x higher price will
	make operators become acutely sensitive to the congestion they cause	make operators become acutely sensitive to the congestion they cause
	in other networks, which is of course the desired effect; to	in other networks, which is of course the desired effect; to

	encourage networks to _control_ the congestion they allow their users	encourage networks to _avoid_ the congestion they allow their users
	to cause to others.	to cause to others.

	If any network sends even one flow at higher rate, they will	If any network sends even one flow at higher rate, they will
	immediately have to pay proportionately more usage charges. Because	immediately have to pay proportionately more usage charges. Because

	there is no knowledge of reservations within the Diffserv region, no	there is no knowledge of reservations within the PCN-region, no
	interior router can police whether the rate of each flow is greater	interior router can police whether the rate of each flow is greater
	than each reservation. So the system doesn't truly emulate rate-	than each reservation. So the system doesn't truly emulate rate-
	policing of each flow. But there is no incentive to pack a higher	policing of each flow. But there is no incentive to pack a higher
	rate into a reservation, because the charges are directly	rate into a reservation, because the charges are directly
	proportional to rate, irrespective of the reservations.	proportional to rate, irrespective of the reservations.

	However, if virtual queues start to fill on any path, even though	However, if virtual queues start to fill on any path, even though
	real queues will still be able to provide low latency service, pre-	real queues will still be able to provide low latency service, pre-
	congestion marking will rise fairly quickly. It may eventually reach	congestion marking will rise fairly quickly. It may eventually reach
	the threshold where the ingress gateway would deny admission to new	the threshold where the ingress gateway would deny admission to new

	skipping to change at page 32, line 49	skipping to change at page 36, line 22
	control should have been invoked. The ingress gateway will have to	control should have been invoked. The ingress gateway will have to
	pay the penalty for such an extremely high pre-congestion level, so	pay the penalty for such an extremely high pre-congestion level, so
	the pressure to invoke admission control should become unbearable.	the pressure to invoke admission control should become unbearable.

	The above mechanisms protect against rational operators. In	The above mechanisms protect against rational operators. In
	Section 5.6.3 we discuss how networks can protect themselves from	Section 5.6.3 we discuss how networks can protect themselves from
	accidental or deliberate misconfiguration in neighbouring networks.	accidental or deliberate misconfiguration in neighbouring networks.

	5.5. Sanctioning Dishonest Marking	5.5. Sanctioning Dishonest Marking


	As CL traffic leaves the last network before the egress gateway	As PCN traffic leaves the last network before the egress gateway
	(domain C) the RE blanking fraction should match the congestion	(domain 'C' in Figure 4) the RE blanking fraction should match the
	marking fraction, when averaged over a sufficiently long duration	congestion marking fraction, when averaged over a sufficiently long
	(perhaps ~10s to allow a few rounds of feedback through regular	duration (perhaps ~10s to allow a few rounds of feedback through
	signalling of new and refreshed reservations).	regular signalling of new and refreshed reservations).


	To protect itself, domain C should install a monitor at its egress.	To protect itself, domain 'C' should install a monitor at its egress.
	It aims to detect flows of CL packets that are persistently negative.	It aims to detect flows of PCN packets that are persistently
	If flows are positive, domain C need take no action--this simply	negative. If flows are positive, domain 'C' need take no action--
	means an upstream network must be paying more penalties than it needs	this simply means an upstream network must be paying more penalties
	to. Appendix A.3 gives a suggested algorithm for the monitor,	than it needs to. Appendix A.3 gives a suggested algorithm for the
	meeting the criteria below.	monitor, meeting the criteria below.

	o It SHOULD introduce minimal false positives for honest flows;	o It SHOULD introduce minimal false positives for honest flows;

	o It SHOULD quickly detect and sanction dishonest flows (minimal	o It SHOULD quickly detect and sanction dishonest flows (minimal
	false negatives);	false negatives);

	o It MUST be invulnerable to state exhaustion attacks from malicious	o It MUST be invulnerable to state exhaustion attacks from malicious
	sources. For instance, if the dropper uses flow-state, it should	sources. For instance, if the dropper uses flow-state, it should
	not be possible for a source to send numerous packets, each with a	not be possible for a source to send numerous packets, each with a
	different flow ID, to force the dropper to exhaust its memory	different flow ID, to force the dropper to exhaust its memory
	capacity;	capacity;


	o It MUST introduce sufficient loss in goodput so that malicious	o If drop is used as a sanction, it SHOULD introduce sufficient loss
	sources cannot play off losses in the egress dropper against	in goodput so that malicious sources cannot play off losses in the
	higher allowed throughput. Salvatori [CLoop_pol] describes this	egress dropper against higher allowed throughput.
	attack, which involves the source understating path congestion	Salvatori [CLoop_pol] describes this attack, which involves the
	then inserting forward error correction (FEC) packets to	source understating path congestion then inserting forward error
	compensate expected losses.	correction (FEC) packets to compensate expected losses.

	Note that the monitor operates on flows but with careful design we	Note that the monitor operates on flows but with careful design we
	can avoid per-flow state. This is why we have been careful to ensure	can avoid per-flow state. This is why we have been careful to ensure
	that all flows MUST start with a packet marked with the FNE	that all flows MUST start with a packet marked with the FNE
	codepoint. If a flow does not start with the FNE codepoint, a	codepoint. If a flow does not start with the FNE codepoint, a
	monitor is likely to treat it unfavourably. This risk makes it worth	monitor is likely to treat it unfavourably. This risk makes it worth
	setting the FNE codepoint at the start of a flow, even though there	setting the FNE codepoint at the start of a flow, even though there
	is a cost to setting FNE (positive `worth').	is a cost to setting FNE (positive `worth').

	Starting flows with an FNE packet also means that a monitor will be	Starting flows with an FNE packet also means that a monitor will be

	skipping to change at page 34, line 9	skipping to change at page 37, line 31
	across flows, a monitor MUST ignore packets with the FNE codepoint	across flows, a monitor MUST ignore packets with the FNE codepoint
	set. An ingress gateway sets the FNE codepoint when it does not have	set. An ingress gateway sets the FNE codepoint when it does not have
	the benefit of feedback from the egress. So counting packets with	the benefit of feedback from the egress. So counting packets with
	FNE cleared would be likely to make the average unnecessarily	FNE cleared would be likely to make the average unnecessarily
	positive, providing headroom (or should we say footroom?) for	positive, providing headroom (or should we say footroom?) for
	dishonest (negative) traffic.	dishonest (negative) traffic.

	If the monitor detects a persistently negative flow, it could drop	If the monitor detects a persistently negative flow, it could drop
	sufficient negative and neutral packets to force the flow to not be	sufficient negative and neutral packets to force the flow to not be
	negative. This is the approach taken for the `egress dropper' in	negative. This is the approach taken for the `egress dropper' in

	[Re-TCP], but for the scenario in this memo, where everyone would	[I-D.briscoe-tsvwg-re-ecn-tcp], but for the scenario in this memo,
	expect everyone else to keep to the protocol, a management alarm	where everyone would expect everyone else to keep to the protocol, a
	SHOULD be raised on detecting persistently negative traffic and any	management alarm SHOULD be raised on detecting persistently negative
	automatic sanctions taken SHOULD be logged. Even if the chosen	traffic and any automatic sanctions taken SHOULD be logged. Even if
	policy is to take no automatic action, the cause can then be	the chosen policy is to take no automatic action, the cause can then
	investigated manually.	be investigated manually.

	Then all ingresses cannot understate downstream pre-congestion	Then all ingresses cannot understate downstream pre-congestion
	without their action being logged. So network operators can deal	without their action being logged. So network operators can deal
	with offending networks at the human level, out of band. As a last	with offending networks at the human level, out of band. As a last
	resort, perhaps where the ingress gateway address seems to have been	resort, perhaps where the ingress gateway address seems to have been
	spoofed in the signalling, packets can be dropped. Drops could be	spoofed in the signalling, packets can be dropped. Drops could be
	focused on just sufficient packets in misbehaving flows to remove the	focused on just sufficient packets in misbehaving flows to remove the
	negative bias while doing minimal harm.	negative bias while doing minimal harm.

	A future version of this memo may define a control message that could	A future version of this memo may define a control message that could

	skipping to change at page 34, line 43	skipping to change at page 38, line 17
	traffic caused sufficient congestion to lead to drop but they	traffic caused sufficient congestion to lead to drop but they
	understated path congestion to avoid penalties for causing high	understated path congestion to avoid penalties for causing high
	congestion, the preferential drop recommendations in Section 4.3.4	congestion, the preferential drop recommendations in Section 4.3.4
	would at least ensure that these flows would always be dropped before	would at least ensure that these flows would always be dropped before
	honest flows..	honest flows..

	5.6. Border Mechanisms	5.6. Border Mechanisms

	5.6.1. Border Accounting Mechanisms	5.6.1. Border Accounting Mechanisms


	One of the main design goals of re-ECN was for border security	One of the main design goals of re-PCN was for border security
	mechanisms to be as simple as possible, otherwise they would become	mechanisms to be as simple as possible, otherwise they would become
	the pinch-points that limit scalability of the whole internetwork.	the pinch-points that limit scalability of the whole internetwork.
	As the title of this memo suggests, we want to avoid per-flow	As the title of this memo suggests, we want to avoid per-flow
	processing at borders. We also want to keep to passive mechanisms	processing at borders. We also want to keep to passive mechanisms
	that can monitor traffic in parallel to forwarding, rather than	that can monitor traffic in parallel to forwarding, rather than
	having to filter traffic inline--in series with forwarding. As data	having to filter traffic inline--in series with forwarding. As data
	rates continue to rise, we suspect that all-optical interconnection	rates continue to rise, we suspect that all-optical interconnection
	between networks will soon be a requirement. So we want to avoid any	between networks will soon be a requirement. So we want to avoid any
	new need for buffering (even though border filtering is current	new need for buffering (even though border filtering is current
	practice for other reasons, we don't want to make it even less likely	practice for other reasons, we don't want to make it even less likely
	that we will ever get rid of it).	that we will ever get rid of it).

	So far, we have been able to keep the border mechanisms simple,	So far, we have been able to keep the border mechanisms simple,
	despite having had to harden them against some subtle attacks on the	despite having had to harden them against some subtle attacks on the

	re-ECN design. The mechanisms are still passive and avoid per-flow	re-PCN design. The mechanisms are still passive and avoid per-flow
	processing, although we do use filtering as a fail-safe to	processing, although we do use filtering as a fail-safe to
	temporarily shield against extreme events in other networks, such as	temporarily shield against extreme events in other networks, such as
	accidental misconfigurations (Section 5.6.3).	accidental misconfigurations (Section 5.6.3).

	The basic accounting mechanism at each border interface simply	The basic accounting mechanism at each border interface simply
	involves accumulating the volume of packets with positive worth (Re-	involves accumulating the volume of packets with positive worth (Re-

	Echo and FNE), and subtracting the volume of those with negative	PCT-Echo and FNE), and subtracting the volume of those with negative
	worth: AM(-1) and PM(-1). Even though this mechanism takes no regard	worth: AM(-1) and TM(-1). Even though this mechanism takes no regard
	of flows, over an accounting period (say a month) this subtraction	of flows, over an accounting period (say a month) this subtraction
	will account for the downstream congestion caused by all the flows	will account for the downstream congestion caused by all the flows
	traversing the interface, wherever they come from, and wherever they	traversing the interface, wherever they come from, and wherever they
	go to. The two networks can agree to use this metric however they	go to. The two networks can agree to use this metric however they
	wish to determine some congestion-related penalty against the	wish to determine some congestion-related penalty against the
	upstream network (see Section 5.3 for examples). Although the	upstream network (see Section 5.3 for examples). Although the
	algorithm could hardly be simpler, it is spelled out using pseudo-	algorithm could hardly be simpler, it is spelled out using pseudo-
	code in Appendix A.2.1.	code in Appendix A.2.1.

	Various attempts to subvert the re-ECN design have been made. In all	Various attempts to subvert the re-ECN design have been made. In all

	skipping to change at page 36, line 22	skipping to change at page 39, line 42

	o A network can simply create its own dummy traffic to congest	o A network can simply create its own dummy traffic to congest
	another network, perhaps causing it to lose business at no cost to	another network, perhaps causing it to lose business at no cost to
	the attacking network. This is a form of denial of service	the attacking network. This is a form of denial of service
	perpetrated by one network on another. The preferential drop	perpetrated by one network on another. The preferential drop
	measures in Section 4.3.4 provide crude protection against such	measures in Section 4.3.4 provide crude protection against such
	attacks, but we are not overly worried about more accurate	attacks, but we are not overly worried about more accurate
	prevention measures, because it is already possible for networks	prevention measures, because it is already possible for networks
	to DoS other networks on the general Internet, but they generally	to DoS other networks on the general Internet, but they generally
	don't because of the grave consequences of being found out. We	don't because of the grave consequences of being found out. We

	are only concerned if re-ECN increases the motivation for such an	are only concerned if re-PCN increases the motivation for such an
	attack, as in the next example.	attack, as in the next example.

	o A network can just generate negative traffic and send it over its	o A network can just generate negative traffic and send it over its
	border with a neighbour to reduce the overall penalties that it	border with a neighbour to reduce the overall penalties that it
	should pay to that neighbour. It could even initialise the TTL so	should pay to that neighbour. It could even initialise the TTL so
	it expired shortly after entering the neighbouring network,	it expired shortly after entering the neighbouring network,
	reducing the chance of detection further downstream. This attack	reducing the chance of detection further downstream. This attack
	need not be motivated by a desire to deny service and indeed need	need not be motivated by a desire to deny service and indeed need
	not cause denial of service. A network's main motivator would	not cause denial of service. A network's main motivator would
	most likely be to reduce the penalties it pays to a neighbour.	most likely be to reduce the penalties it pays to a neighbour.
	But, the prospect of financial gain might tempt the network into	But, the prospect of financial gain might tempt the network into
	mounting a DoS attack on the other network as well, given the gain	mounting a DoS attack on the other network as well, given the gain
	would offset some of the risk of being detected.	would offset some of the risk of being detected.

	Note that we have not included DoS by Internet hosts in the above	Note that we have not included DoS by Internet hosts in the above
	list of attacks, because we have restricted ourselves to a scenario	list of attacks, because we have restricted ourselves to a scenario

	with edge-to-edge admission control across a Diffserv region. In	with edge-to-edge admission control across a PCN-region. In this
	this case, the edge ingress gateways insulate the Diffserv region	case, the edge ingress gateways insulate the PCN-region from DoS by
	from DoS by Internet hosts. Re-ECN resists more general DoS attacks,	Internet hosts. Re-ECN resists more general DoS attacks, but this is
	but this is discussed in [Re-TCP].	discussed in [I-D.briscoe-tsvwg-re-ecn-tcp].

	The first step towards a solution to all these problems with negative	The first step towards a solution to all these problems with negative
	flows is to be able to estimate the contribution they make to	flows is to be able to estimate the contribution they make to
	downstream congestion at a border and to correct the measure	downstream congestion at a border and to correct the measure
	accordingly. Although ideally we want to remove negative flows	accordingly. Although ideally we want to remove negative flows
	themselves, perhaps surprisingly, the most effective first step is to	themselves, perhaps surprisingly, the most effective first step is to
	cancel out the polluting effect negative flows have on the measure of	cancel out the polluting effect negative flows have on the measure of
	downstream congestion at a border. It is more important to get an	downstream congestion at a border. It is more important to get an
	unbiased estimate of their effect, than to try to remove them all. A	unbiased estimate of their effect, than to try to remove them all. A
	suggested algorithm to give an unbiased estimate of the contribution	suggested algorithm to give an unbiased estimate of the contribution
	from negative flows to the downstream congestion measure is given in	from negative flows to the downstream congestion measure is given in
	Appendix A.2.2.	Appendix A.2.2.

	Although making an accurate assessment of the contribution from	Although making an accurate assessment of the contribution from
	negative flows may not be easy, just the single step of neutralising	negative flows may not be easy, just the single step of neutralising
	their polluting effect on congestion metrics removes all the gains	their polluting effect on congestion metrics removes all the gains
	networks could otherwise make from mounting dummy traffic attacks on	networks could otherwise make from mounting dummy traffic attacks on
	each other. This puts all networks on the same side (only with	each other. This puts all networks on the same side (only with
	respect to negative flows of course), rather than being pitched	respect to negative flows of course), rather than being pitched

	against each other. The network where this flow goes negative as	against each other. The network where a flow goes negative as well
	well as all the networks downstream lose out from not being	as all the networks downstream lose out from not being reimbursed for
	reimbursed for any congestion this flow causes. So they all have an	any congestion this flow causes. So they all have an interest in
	interest in getting rid of these negative flows. Networks forwarding	getting rid of these negative flows. Networks forwarding a flow
	a flow before it goes negative aren't strictly on the same side, but	before it goes negative aren't strictly on the same side, but they
	they are disinterested bystanders--they don't care that the flow goes	are disinterested bystanders--they don't care that the flow goes
	negative downstream, but at least they can't actively gain from	negative downstream, but at least they can't actively gain from
	making it go negative. The problem becomes localised so that once a	making it go negative. The problem becomes localised so that once a
	flow goes negative, all the networks from where it happens and beyond	flow goes negative, all the networks from where it happens and beyond
	downstream each have a small problem, each can detect it has a	downstream each have a small problem, each can detect it has a
	problem and each can get rid of the problem if it chooses to. But	problem and each can get rid of the problem if it chooses to. But
	negative flows can no longer be used for any new attacks.	negative flows can no longer be used for any new attacks.

	Once an unbiased estimate of the effect of negative flows can be	Once an unbiased estimate of the effect of negative flows can be
	made, the problem reduces to detecting and preferably removing flows	made, the problem reduces to detecting and preferably removing flows
	that have gone negative as soon as possible. But importantly,	that have gone negative as soon as possible. But importantly,

	skipping to change at page 37, line 48	skipping to change at page 41, line 21

	For instance, if possible, flows should be removed as soon as they go	For instance, if possible, flows should be removed as soon as they go
	negative, but we do NOT RECOMMEND any attempts to discard such flows	negative, but we do NOT RECOMMEND any attempts to discard such flows
	further upstream while they are still positive. Such over-zealous	further upstream while they are still positive. Such over-zealous
	push-back is unnecessary and potentially dangerous. These flows have	push-back is unnecessary and potentially dangerous. These flows have
	paid their `fare' up to the point they go negative, so there is no	paid their `fare' up to the point they go negative, so there is no
	harm in delivering them that far. If someone downstream asks for a	harm in delivering them that far. If someone downstream asks for a
	flow to be dropped as near to the source as possible, because they	flow to be dropped as near to the source as possible, because they
	say it is going to become negative later, an upstream node cannot	say it is going to become negative later, an upstream node cannot
	test the truth of this assertion. Rather than have to authenticate	test the truth of this assertion. Rather than have to authenticate

	such messages, re-ECN has been designed so that flows can be dropped	such messages, re-PCN has been designed so that flows can be dropped
	solely based on locally measurable evidence. A message hinting that	solely based on locally measurable evidence. A message hinting that
	a flow should be watched closely to test for negativity is fine. But	a flow should be watched closely to test for negativity is fine. But
	not a message that claims that a positive flow will go negative	not a message that claims that a positive flow will go negative

	later, so it should be dropped. .	later, so it should be dropped.

	5.6.2. Competitive Routing	5.6.2. Competitive Routing

	With the above penalty system, each domain seems to have a perverse	With the above penalty system, each domain seems to have a perverse

	incentive to fake pre-congestion. For instance domain B profits from	incentive to fake pre-congestion. For instance domain 'B' profits
	the difference between penalties it receives at its ingress (its	from the difference between penalties it receives at its ingress (its
	revenue) and those it pays at its egress (its cost). So if B	revenue) and those it pays at its egress (its cost). So if 'B'
	overstates internal pre-congestion it seems to increase its profit.	overstates internal pre-congestion it seems to increase its profit.

	However, we can assume that domain A could bypass B, routing through	However, we can assume that domain 'A' could bypass 'B', routing
	other domains to reach the egress. So the competitive discipline of	through other domains to reach the egress. So the competitive
	least-cost routing can ensure that any domain tempted to fake pre-	discipline of least-cost routing can ensure that any domain tempted
	congestion for profit risks losing _all_ its incoming traffic. The	to fake pre-congestion for profit risks losing _all_ its incoming
	least congested route would eventually be able to win this	traffic. The least congested route would eventually be able to win
	competitive game, only as long as it didn't declare more fake pre-	this competitive game, only as long as it didn't declare more fake
	congestion than the next most competitive route.	pre-congestion than the next most competitive route.

	The competitive effect of interdomain routing might be weaker nearer	The competitive effect of interdomain routing might be weaker nearer

	to the egress. For instance, C may be the only route B can take to	to the egress. For instance, 'C' may be the only route 'B' can take
	reach the ultimate receiver. And if C over-penalises B, the egress	to reach the ultimate receiver. And if 'C' over-penalises 'B', the
	gateway and the ultimate receiver seem to have no incentive to move	egress gateway and the ultimate receiver seem to have no incentive to
	their terminating attachment to another network, because only B and	move their terminating attachment to another network, because only
	those upstream of B suffer the higher penalties. However, we must	'B' and those upstream of 'B' suffer the higher penalties. However,
	remember that we are only looking at the money flows at the	we must remember that we are only looking at the money flows at the
	unidirectional network layer. There are likely to be all sorts of	unidirectional network layer. There are likely to be all sorts of
	higher level business models constructed over the top of these low	higher level business models constructed over the top of these low
	level 'sender-pays' penalties. For instance, we might expect a	level 'sender-pays' penalties. For instance, we might expect a
	session layer charging model where the session originator pays for a	session layer charging model where the session originator pays for a
	pair of duplex flows, one as receiver and one as sender.	pair of duplex flows, one as receiver and one as sender.
	Traditionally this has been a common model for telephony and we might	Traditionally this has been a common model for telephony and we might
	expect it to be used, at least sometimes, for other media such as	expect it to be used, at least sometimes, for other media such as
	video. Wherever such a model is used, the data receiver will be	video. Wherever such a model is used, the data receiver will be

	directly affected if its sessions terminate through a network like C	directly affected if its sessions terminate through a network like
	that fakes congestion to over-penalise B. So end-customers will	'C' that fakes congestion to over-penalise 'B'. So end-customers
	experience a direct competitive pressure to switch to cheaper	will experience a direct competitive pressure to switch to cheaper
	networks, away from networks like C that try to over-penalise B.	networks, away from networks like 'C' that try to over-penalise 'B'.

	This memo does not need to standardise any particular mechanism for	This memo does not need to standardise any particular mechanism for

	routing based on re-ECN. Goldenberg et al [Smart_rtg] refers to	routing based on re-PCN. Goldenberg et al [Smart_rtg] refers to
	various commercial products and presents its own algorithms for	various commercial products and presents its own algorithms for
	moving traffic between multi-homed routes based on usage charges.	moving traffic between multi-homed routes based on usage charges.
	None of these systems require any changes to standards protocols	None of these systems require any changes to standards protocols
	because the choice between the available border gateway protocol	because the choice between the available border gateway protocol
	(BGP) routes is based on a combination of local knowledge of the	(BGP) routes is based on a combination of local knowledge of the
	charging regime and local measurement of traffic levels. If, as we	charging regime and local measurement of traffic levels. If, as we

	propose, charges or penalties were based on the level of re-ECN	propose, charges or penalties were based on the level of re-PCN
	measured in passing traffic, a similar optimisation could be achieved	measured locally in passing traffic, a similar optimisation could be
	without requiring any changes to standard routing protocols.	achieved without requiring any changes to standard routing protocols.

	We must be clear that applying pre-congestion-based routing to this	We must be clear that applying pre-congestion-based routing to this
	admission control system remains an open research issue. Traffic	admission control system remains an open research issue. Traffic
	engineering based on congestion requires careful damping to avoid	engineering based on congestion requires careful damping to avoid
	oscillations, and should not be attempted without adult supervision	oscillations, and should not be attempted without adult supervision
	:) Mortier & Pratt [ECN-BGP] have analysed traffic engineering based	:) Mortier & Pratt [ECN-BGP] have analysed traffic engineering based

	on congestion. But without the benefit of re-ECN, they had to add a	on congestion. But without the benefit of re-ECN or re-PCN, they had
	path attribute to BGP to advertise a route's downstream congestion	to add a path attribute to BGP to advertise a route's downstream
	(actually they proposed that BGP should advertise the charge for	congestion (actually they proposed that BGP should advertise the
	congestion, which we believe wrongly embeds an assumption into BGP	charge for congestion, which we believe wrongly embeds an assumption
	that the only thing to do with congestion is charge for it).	into BGP that the only thing to do with congestion is charge for it).

	5.6.3. Fail-safes	5.6.3. Fail-safes

	The mechanisms described so far create incentives for rational	The mechanisms described so far create incentives for rational
	operators to behave. That is, one operator aims to make another	operators to behave. That is, one operator aims to make another
	behave responsibly by applying penalties and expects a rational	behave responsibly by applying penalties and expects a rational
	response (i.e. one that trades off costs against benefits). It is	response (i.e. one that trades off costs against benefits). It is
	usually reasonable to assume that other network operators will behave	usually reasonable to assume that other network operators will behave
	rationally (policy routing can avoid those that might not). But this	rationally (policy routing can avoid those that might not). But this
	approach does not protect against the misconfigurations and accidents	approach does not protect against the misconfigurations and accidents

	skipping to change at page 40, line 16	skipping to change at page 43, line 40

	6. Analysis	6. Analysis

	The domains in Figure 1 are not expected to be completely malicious	The domains in Figure 1 are not expected to be completely malicious
	towards each other. After all, we can assume that they are all co-	towards each other. After all, we can assume that they are all co-
	operating to provide an internetworking service to the benefit of	operating to provide an internetworking service to the benefit of
	each of them and their customers. Otherwise their routing polices	each of them and their customers. Otherwise their routing polices
	would not interconnect them in the first place. However, we assume	would not interconnect them in the first place. However, we assume
	that they are also competitors of each other. So a network may try	that they are also competitors of each other. So a network may try
	to contravene our proposed protocol if it would gain or make a	to contravene our proposed protocol if it would gain or make a

	competitor lose, or both, but only if it can do so without being	competitor lose, or both. But only if it can do so without being
	caught. Therefore we do not have to consider every possible random	caught. Therefore we do not have to consider every possible random
	attack one network could launch on the traffic of another, given	attack one network could launch on the traffic of another, given
	anyway one network can always drop or corrupt packets that it	anyway one network can always drop or corrupt packets that it
	forwards on behalf of another.	forwards on behalf of another.

	Therefore, we only consider new opportunities for _gainful_ attack	Therefore, we only consider new opportunities for _gainful_ attack
	that our proposal introduces. But to a certain extent we can also	that our proposal introduces. But to a certain extent we can also
	rely on the in depth defences we have described (Section 5.6.3 )	rely on the in depth defences we have described (Section 5.6.3 )
	intended to mitigate the potential impact if one network accidentally	intended to mitigate the potential impact if one network accidentally
	misconfiguring the workings of this protocol.	misconfiguring the workings of this protocol.

	skipping to change at page 40, line 39	skipping to change at page 44, line 16
	arrangement possible in Figure 1, without any surrounding network.	arrangement possible in Figure 1, without any surrounding network.
	This allows us to consider more specific cases where these gateways	This allows us to consider more specific cases where these gateways
	and a neighbouring network are operated by the same player. As well	and a neighbouring network are operated by the same player. As well
	as cases where the same player operates neighbouring networks, we	as cases where the same player operates neighbouring networks, we
	will also consider cases where the two gateways collude as one player	will also consider cases where the two gateways collude as one player
	and where the sender and receiver collude as one. Collusion of other	and where the sender and receiver collude as one. Collusion of other
	sets of domains is less likely, but we will consider such cases. In	sets of domains is less likely, but we will consider such cases. In
	the general case, we will assume none of the nine trust domains	the general case, we will assume none of the nine trust domains
	across the figure fully trust any of the others.	across the figure fully trust any of the others.


	As we only propose to change routers within the Diffserv region, we	As we only propose to change routers within the PCN-region, we assume
	assume the operators of networks outside the region will be doing	the operators of networks outside the region will be doing per-flow
	per-flow policing. That is, we assume the networks outside the	policing. That is, we assume the networks outside the PCN-region and
	Diffserv region and the gateways around its edges can protect	the gateways around its edges can protect themselves. So given we
	themselves. So given we are proposing to remove flow policing from	are proposing to remove flow policing from some networks, our primary
	some networks, our primary concern must be to protect networks that	concern must be to protect networks that don't do per-flow policing
	don't do per-flow policing (the potential `victims') from those that	(the potential `victims') from those that do (the `enemy'). The
	do (the `enemy'). The ingress and egress gateways are the only way	ingress and egress gateways are the only way the outer enemy can get
	the outer enemy can get at the middle victim, so we can consider the	at the middle victim, so we can consider the gateways as the
	gateways as the representatives of the enemy as far as domains A, B	representatives of the enemy as far as domains 'A', 'B' and 'C' are
	and C are concerned. We will call this trust scenario `edges against	concerned. We will call this trust scenario `edges against middles'.
	middles'.

	Earlier in this memo, we outlined the classic border rate policing	Earlier in this memo, we outlined the classic border rate policing
	problem (Section 3). It will now be useful to reiterate the	problem (Section 3). It will now be useful to reiterate the
	motivations that are the root cause of the problem. The more	motivations that are the root cause of the problem. The more
	reservations a gateway can allow, the more revenue it receives. The	reservations a gateway can allow, the more revenue it receives. The
	middle networks want the edges to comply with the admission control	middle networks want the edges to comply with the admission control
	protocol when they become so congested that their service to others	protocol when they become so congested that their service to others
	might suffer. The middle networks also want to ensure the edges	might suffer. The middle networks also want to ensure the edges
	cannot steal more service from them than they are entitled to.	cannot steal more service from them than they are entitled to.


	In the context of this `edges against middles' scenario, the re-ECN	In the context of this `edges against middles' scenario, the re-PCN
	protocol has two main effects:	protocol has two main effects:


	o The more pre-congestion there is on a path across the Diffserv	o The more pre-congestion there is on a path across the PCN-region,
	region, the higher the ingress gateway must declare downstream	the higher the ingress gateway must declare downstream pre-
	pre-congestion.	congestion.

	o If the ingress gateway does not declare downstream pre-congestion	o If the ingress gateway does not declare downstream pre-congestion
	high enough on average, it will `hit the ground before the	high enough on average, it will `hit the ground before the
	runway', going negative and triggering sanctions, either directly	runway', going negative and triggering sanctions, either directly
	against the traffic or against the ingress gateway at a management	against the traffic or against the ingress gateway at a management
	level	level

	An executive summary of our security analysis can be stated in three	An executive summary of our security analysis can be stated in three
	parts, distinguished by the type of collusion considered.	parts, distinguished by the type of collusion considered.

	Neighbour-only Middle-Middle Collusion: Here there is no collusion	Neighbour-only Middle-Middle Collusion: Here there is no collusion
	or collusion is limited to neighbours in the feedback loop. In	or collusion is limited to neighbours in the feedback loop. In
	other words, two neighbouring networks can be assumed to act as	other words, two neighbouring networks can be assumed to act as

	one. Or the egress gateway might collude with domain C. Or the	one. Or the egress gateway might collude with domain 'C'. Or the
	ingress gateway might collude with domain A. Or ingress and egress	ingress gateway might collude with domain 'A'. Or ingress and
	gateways might collude with each other.	egress gateways might collude with each other.

	In these cases where only neighbours in the feedback loop collude,	In these cases where only neighbours in the feedback loop collude,
	we concludes that all parties have a positive incentive to declare	we concludes that all parties have a positive incentive to declare
	downstream pre-congestion truthfully, and the ingress gateway has	downstream pre-congestion truthfully, and the ingress gateway has
	a positive incentive to invoke admission control when congestion	a positive incentive to invoke admission control when congestion
	rises above the admission threshold in any network in the region	rises above the admission threshold in any network in the region
	(including its own). No party has an incentive to send more	(including its own). No party has an incentive to send more
	traffic than declared in reservation signalling (even though only	traffic than declared in reservation signalling (even though only
	the gateways read this signalling). In short, no party can gain	the gateways read this signalling). In short, no party can gain
	at the expense of another.	at the expense of another.

	Non-neighbour Middle-Middle Collusion: In the case of other forms of	Non-neighbour Middle-Middle Collusion: In the case of other forms of

	collusion between middle networks (e.g. between domain A and C) it	collusion between middle networks (e.g. between domain 'A' and
	would be possible for say A & C to create a tunnel between	'C') it would be possible for say 'A' & 'C' to create a tunnel
	themselves so that A would gain at the expense of B. But C would	between themselves so that 'A' would gain at the expense of 'B'.
	then lose the gain that A had made. Therefore the value to A & C	But 'C' would then lose the gain that 'A' had made. Therefore the
	of colluding to mount this attack seems questionable. It is made	value to 'A' & 'C' of colluding to mount this attack seems
	more questionable, because the attack can be statistically	questionable. It is made more questionable, because the attack
	detected by B using the second `defence in depth' mechanism	can be statistically detected by 'B' using the second `defence in
	mentioned already. Note that C can defend itself from being	depth' mechanism mentioned already. Note that 'C' can defend
	attacked through a tunnel by treating the tunnel end point as a	itself from being attacked through a tunnel by treating the tunnel
	direct link to a neighbouring network (e.g. as if A were a	end point as a direct link to a neighbouring network (e.g. as if
	neighbour of C, via the tunnel), which falls back to the safety of	'A' were a neighbour of 'C', via the tunnel), which falls back to
	the neighbour-only scenario.	the safety of the neighbour-only scenario.

	Middle-Edge Collusion: Collusion between networks or gateways within	Middle-Edge Collusion: Collusion between networks or gateways within

	the Diffserv region and networks or users outside the region has	the PCN-region and networks or users outside the region has not
	not yet been fully analysed. The presence of full per-flow	yet been fully analysed. The presence of full per-flow policing
	policing at the ingress gateway seems to make this a less likely	at the ingress gateway seems to make this a less likely source of
	source of a successful attack.	a successful attack.

	{ToDo: Due to lack of time, the full write up of the security	{ToDo: Due to lack of time, the full write up of the security
	analysis is deferred to the next version of this memo.}	analysis is deferred to the next version of this memo.}

	Finally, it is well known that the best person to analyse the	Finally, it is well known that the best person to analyse the
	security of a system is not the designer. Therefore, our confident	security of a system is not the designer. Therefore, our confident
	claims must be hedged with doubt until others with perhaps a greater	claims must be hedged with doubt until others with perhaps a greater
	incentive to break it have mounted a full analysis.	incentive to break it have mounted a full analysis.

	7. Incremental Deployment	7. Incremental Deployment

	We believe ECN has so far not been widely deployed because it	We believe ECN has so far not been widely deployed because it

	requires widespread end system and network deployment just to achieve	requires end system and widespread network deployment just to achieve
	a marginal improvement in performance. The ability to offer a new	a marginal improvement in performance. The ability to offer a new
	service (admission control) would be a much stronger driver for ECN	service (admission control) would be a much stronger driver for ECN
	deployment.	deployment.

	As stated in the introduction, the aim of this memo is to "Design in	As stated in the introduction, the aim of this memo is to "Design in
	security from the start" when admission control is based on pre-	security from the start" when admission control is based on pre-
	congestion notification. The proposal has been designed so that	congestion notification. The proposal has been designed so that
	security can be added some time after first deployment, but only if	security can be added some time after first deployment, but only if
	the PCN wire protocol encoding is defined with the foresight to	the PCN wire protocol encoding is defined with the foresight to
	accommodate the extended set of codepoints defined in this document.	accommodate the extended set of codepoints defined in this document.
	Given admission control based on pre-congestion notification requires	Given admission control based on pre-congestion notification requires
	few changes to standards, it should be deployable fairly soon.	few changes to standards, it should be deployable fairly soon.

	However, re-ECN requires a change to IP, which may take a little	However, re-PCN requires a change to IP, which may take a little
	longer.	longer :)

	We expect that initial deployments of PCN-based admission control	We expect that initial deployments of PCN-based admission control
	will be confined to single networks, or to clubs of networks that	will be confined to single networks, or to clubs of networks that
	trust each other. The proposal in this memo will only become	trust each other. The proposal in this memo will only become
	relevant once networks with conflicting interests wish to	relevant once networks with conflicting interests wish to
	interconnect their admission controlled services, but without the	interconnect their admission controlled services, but without the
	scalability constraints of per-flow border policing. It will not be	scalability constraints of per-flow border policing. It will not be

	possible to use re-ECN, even in a controlled environment between	possible to use re-PCN, even in a controlled environment between
	consenting operators, unless it is standardised into IP. Given the	consenting operators, unless it is standardised into IP. Given the
	IPv4 header has limited space for further changes, current IESG	IPv4 header has limited space for further changes, current IESG
	policy [RFC4727] is not to allow experimental use of codepoints in	policy [RFC4727] is not to allow experimental use of codepoints in
	the IPv4 header, as whenever an experiment isn't taken up, the space	the IPv4 header, as whenever an experiment isn't taken up, the space

	it used tends to be impossible to reclaim.	it used tends to be impossible to reclaim. Therefore, for IPv4 at
		least, we will need to find a way to run an experiment so that the
		header fields it uses can be reclaimed if the experiment is not a
		success.


	If PCN-based admission control is deployed before re-ECN is	If PCN-based admission control is deployed before re-PCN is
	standardised into IP, wherever a networks (or club of networks)	standardised into IP, wherever a network (or club of networks)
	connects to another network (or club of networks) with conflicting	connects to another network (or club of networks) with conflicting
	interests, they will place a gateway between the two regions that	interests, they will place a gateway between the two regions that

	does per-flow rate policing and admission control. If re-ECN is	does per-flow rate policing and admission control. If re-PCN is
	eventually standardised into IP, it will be possible for these	eventually standardised into IP, it will be possible for these

	separate regions to upgrade all their gateways to use re-ECN before	separate regions to upgrade all their ingress gateways to support re-
	removing the per-flow policing gateways between them. Given the	PCN before removing the per-flow policing gateways between them.
	edge-to-edge deployment model of PCN-based admission control, it is	Given the edge-to-edge deployment model of PCN-based admission
	reasonable to imagine this incremental deployment model without	control, it is reasonable to expect incremental deployment of re-PCN
	needing to cater for partial deployment of re-ECN in just some of the	will be feasible on a domain-by domain basis, without needing to
	gateways around one Diffserv region.	cater for partial deployment of re-PCN in just some of the gateways
		around one PCN-domain.


	Only the edge gateways around a Diffserv region have to be upgraded	Nonetheless, if the upgrade of one ingress gateway is accidentally
	to add re-ECN support, not interior routers. It is also necessary to	overlooked, the RE flag has been defined the safe way round for the
	add the mechanisms that use re-ECN to secure a network against	default legacy behaviour (leaving RE cleared as "0"). A legacy
	misbehaving gateways and networks. Specifically, these are the	ingress will appear to be declaring a high level of pre-congestion
	border mechanisms (Section 5.6) and the mechanisms to sanction	into the aggregate. The fail-safe border mechanism in Section 5.6.3
	dishonest marking (Section 5.5).	might trigger management alarms (which would help in tracking down
		the need to upgrade the ingress), but all packets would continue to
		be delivered safely, as overstatement of downstream congestion
		requires no sanction.

		Only the ingress edge gateways around a PCN-region have to be
		upgraded to add re-PCN support, not interior routers. It is also
		necessary to add the mechanisms that monitor re-PCN to secure a
		network against misbehaving gateways and networks. Specifically,
		these are the border mechanisms (Section 5.6) and the mechanisms to
		sanction dishonest marking (Section 5.5).

	We also RECOMMEND adding improvements to forwarding on interior	We also RECOMMEND adding improvements to forwarding on interior
	routers (Section 4.3.4). But the system works whether all, some or	routers (Section 4.3.4). But the system works whether all, some or
	none are upgraded, so interior routers may be upgraded in a piecemeal	none are upgraded, so interior routers may be upgraded in a piecemeal
	fashion at any time.	fashion at any time.

	8. Design Choices and Rationale	8. Design Choices and Rationale

	The primary insight of this work is that downstream congestion is the	The primary insight of this work is that downstream congestion is the
	metric that would be most useful to control an internetwork, and	metric that would be most useful to control an internetwork, and
	particularly to police how one network responds to the congestion it	particularly to police how one network responds to the congestion it
	causes in a remote network. This is the problem that has previously	causes in a remote network. This is the problem that has previously
	made it so hard to provide scalable admission control.	made it so hard to provide scalable admission control.

	The case for using re-feedback (a generalisation of re-ECN) to police	The case for using re-feedback (a generalisation of re-ECN) to police
	congestion response and provide QoS is made in [Re-fb]. Essentially,	congestion response and provide QoS is made in [Re-fb]. Essentially,
	the insight is that congestion is a factor that crosses layers from	the insight is that congestion is a factor that crosses layers from

	the physical upwards. Therefore re-feedback polices congestion where	the physical upwards. Therefore re-feedback polices congestion as it
	it emerges from a physical interface between networks. This is	crosses the physical interface between networks. This is achieved by
	achieved by bringing the congestion information to the interface,	bringing information about congestion of resources later on the path
	rather than examining packet addressing where there is congestion.	to the interface, rather than trying to deal with congestion where it
		happens by examining the notoriously unreliable source address in
	Then congestion crossing the physical interface at a border can be	packets. Then congestion crossing the physical interface at a border
	policed at the interface, rather than policing the congestion on	can be policed at the interface, rather than policing the congestion
	packets that claim to come from an address (which may be spoofed).	on packets that claim to come from an address (which may be spoofed).
	Also, re-feedback works in the network layer independently of other	Also, re-feedback works in the network layer independently of other
	layers--despite its name re-feedback does not actually require	layers--despite its name re-feedback does not actually require

	feedback. It requires a source to act conservatively before it gets	feedback. It makes a source to act conservatively before it gets
	feedback.	feedback.

	On the subject of lack of feedback, the feedback not established	On the subject of lack of feedback, the feedback not established
	(FNE) codepoint is motivated by arguments for a state set-up bit in	(FNE) codepoint is motivated by arguments for a state set-up bit in
	IP to prevent state exhaustion attacks. This idea was first put	IP to prevent state exhaustion attacks. This idea was first put

	forward informally by David Clark and documented by Handley and	forward informally by David Clark and developed by Handley and
	Greenhalgh in [Steps_DoS]. The idea is that network layer datagrams	Greenhalgh in [Steps_DoS]. The idea is that network layer datagrams
	should signal explicitly when they require state to be created in the	should signal explicitly when they require state to be created in the
	network layer or the layer above (e.g. at flow start). Then a node	network layer or the layer above (e.g. at flow start). Then a node
	can refuse to create any state unless a datagram declares this	can refuse to create any state unless a datagram declares this
	intent. We believe the proposed FNE codepoint serves the same	intent. We believe the proposed FNE codepoint serves the same

	purpose as the proposed state-set-up bit, but it has been overloaded	purpose as the proposed state set-up bit, but it has been overloaded
	with a more specific purpose, using it on more packets than just the	with a more specific purpose, using it on more packets than just the
	first in a flow, but never less (i.e. it is idempotent). In effect	first in a flow, but never less (i.e. it is idempotent). In effect
	the FNE codepoint serves the purpose of a `soft-state set-up	the FNE codepoint serves the purpose of a `soft-state set-up
	codepoint'.	codepoint'.

	The re-feedback paper [Re-fb] also makes the case for converting the	The re-feedback paper [Re-fb] also makes the case for converting the
	economic interpretation of congestion into hard engineering	economic interpretation of congestion into hard engineering
	mechanism, which is the basis of the approach used in this memo. The	mechanism, which is the basis of the approach used in this memo. The

	admission control gateways around the Diffserv region use hard	admission control gateways around the PCN-region use hard
	engineering, not incentives, to prevent end users from sending more	engineering, not incentives, to prevent end users from sending more
	traffic than they have reserved. Incentive-based mechanisms are only	traffic than they have reserved. Incentive-based mechanisms are only
	used between networks, because they are expected to respond to	used between networks, because they are expected to respond to
	incentives more rationally than end-users can be expected to.	incentives more rationally than end-users can be expected to.
	However, even then, a network can use fail-safes to protect itself	However, even then, a network can use fail-safes to protect itself
	from excessively unusual behaviour by neighbouring networks, whether	from excessively unusual behaviour by neighbouring networks, whether
	due to an accidental misconfiguration or malicious intent.	due to an accidental misconfiguration or malicious intent.

	The guiding principle behind the incentive-based approach used	The guiding principle behind the incentive-based approach used
	between networks is that any gain from subverting the protocol should	between networks is that any gain from subverting the protocol should

	skipping to change at page 45, line 5	skipping to change at page 48, line 44
	will most likely open up a new vulnerability, where the amplifying	will most likely open up a new vulnerability, where the amplifying
	effect of the punishment mechanism can be turned on others.	effect of the punishment mechanism can be turned on others.

	The re-feedback paper also makes the case against the use of	The re-feedback paper also makes the case against the use of
	congestion charging to police congestion if it is based on classic	congestion charging to police congestion if it is based on classic
	feedback (where only upstream congestion is visible to network	feedback (where only upstream congestion is visible to network
	elements). It argues this would open up receiving networks to	elements). It argues this would open up receiving networks to
	`denial of funds' attacks and would require end users to accept	`denial of funds' attacks and would require end users to accept
	dynamic pricing (which few would).	dynamic pricing (which few would).


	Re-ECN has been deliberately designed to simplify policing at the	Re-PCN has been deliberately designed to simplify policing at the
	borders between networks. These trust boundaries are the critical	borders between networks. These trust boundaries are the critical
	pinch-points that will limit the scalability of the whole	pinch-points that will limit the scalability of the whole
	internetwork unless the overall design minimises the complexity of	internetwork unless the overall design minimises the complexity of
	security functions at these borders. The border mechanisms described	security functions at these borders. The border mechanisms described
	in this memo run passively in parallel to data forwarding and they do	in this memo run passively in parallel to data forwarding and they do
	not require per-flow processing.	not require per-flow processing.

	9. Security Considerations	9. Security Considerations

	This whole memo concerns the security of a scalable admission control	This whole memo concerns the security of a scalable admission control

	skipping to change at page 45, line 39	skipping to change at page 49, line 31
	markings introduced by an upstream network, but it would only lose	markings introduced by an upstream network, but it would only lose
	out on the penalties it could apply to a downstream network.	out on the penalties it could apply to a downstream network.

	When one network forwards a neighbouring network's traffic it will	When one network forwards a neighbouring network's traffic it will
	always be possible to cause damage by dropping or corrupting it.	always be possible to cause damage by dropping or corrupting it.
	Therefore we do not believe networks would set their routing policies	Therefore we do not believe networks would set their routing policies
	to interconnect in the first place if they didn't trust the other	to interconnect in the first place if they didn't trust the other
	networks not to arbitrarily damage their traffic.	networks not to arbitrarily damage their traffic.

	Having said this, we do want to highlight some of the weaker parts of	Having said this, we do want to highlight some of the weaker parts of

	our argument. We have argued that networks will be dissuaded from	our argument.
	faking congestion marking by the possibility that upstream networks
	will route round them. As we have said, these arguments are based on	o We have argued that networks will be dissuaded from faking
		congestion marking by the possibility that upstream networks will
		route round them. As we have said, these arguments are based on
	fairly delicate assumptions and will remain fairly tenuous until	fairly delicate assumptions and will remain fairly tenuous until
	proved in practice, particularly close to the egress where less	proved in practice, particularly close to the egress where less
	competitive routing is likely.	competitive routing is likely.


	We should also point out that the approach in this memo was only	o Given the congestion feedback system is piggy-backed on flow
		signalling, which can be fairly infrequent, sanctions may not be
		appropriate until a flow has been persistently negative for
		perhaps 20s. This may allow brief attacks to go unpunished.
		However, vulnerability to brief attacks may be reduced if the
		egress triggers asynchronous feedback when the congestion level on
		an aggregate has risen sufficiently since the last feedback,
		rather than waiting for the next opportunity to piggy-back on a
		signal.

		o We should also point out that the approach in this memo was only
	designed to be robust for admission control. We do not claim the	designed to be robust for admission control. We do not claim the

	incentives will always be strong enough to force correct flow pre-	incentives will always be strong enough to force correct flow
	emption behaviour. This is because a user will tend to perceive much	termination behaviour. This is because a user will tend to
	greater loss in value if a flow is pre-empted than if admission is	perceive much greater loss in value if a flow is terminated than
	denied at the start. However, in general the incentives for correct	if admission is denied at the start. However, in general the
	flow pre-emption are similar to those for admission control.	incentives for correct flow termination are similar to those for
		admission control.

	Finally, it may seem that the 8 codepoints that have been made	Finally, it may seem that the 8 codepoints that have been made
	available by extending the ECN field with the RE flag have been used	available by extending the ECN field with the RE flag have been used
	rather wastefully. In effect the RE flag has been used as an	rather wastefully. In effect the RE flag has been used as an
	orthogonal single bit in nearly all cases. The only exception being	orthogonal single bit in nearly all cases. The only exception being
	when the ECN field is cleared to "00". The mapping of the codepoints	when the ECN field is cleared to "00". The mapping of the codepoints
	in an earlier version of this proposal used the codepoint space more	in an earlier version of this proposal used the codepoint space more
	efficiently, but the scheme became vulnerable to a network operator	efficiently, but the scheme became vulnerable to a network operator
	focusing its congestion marking to mark more positive than neutral	focusing its congestion marking to mark more positive than neutral
	packets in order to reduce its penalties (see Appendix B of	packets in order to reduce its penalties (see Appendix B of

	[Re-TCP]).	[I-D.briscoe-tsvwg-re-ecn-tcp]).

	With the scheme as now proposed, once the RE flag is set or cleared	With the scheme as now proposed, once the RE flag is set or cleared
	by the sender or its proxy, it should not be written by the network,	by the sender or its proxy, it should not be written by the network,
	only read. So the gateways can detect if any network maliciously	only read. So the gateways can detect if any network maliciously
	alters the RE flag. IPSec AH integrity checking does not cover the	alters the RE flag. IPSec AH integrity checking does not cover the
	IPv4 option flags (they were considered mutable--even the one we	IPv4 option flags (they were considered mutable--even the one we
	propose using for the RE flag that was `currently unused' when IPSec	propose using for the RE flag that was `currently unused' when IPSec
	was defined). But it would be sufficient for a pair of gateways to	was defined). But it would be sufficient for a pair of gateways to
	make random checks on whether the RE flag was the same when it	make random checks on whether the RE flag was the same when it
	reached the egress gateway as when it left the ingress. Indeed, if	reached the egress gateway as when it left the ingress. Indeed, if
	IPSec AH had covered the RE flag, any network intending to alter	IPSec AH had covered the RE flag, any network intending to alter
	sufficient RE flags to make a gain would have focused its alterations	sufficient RE flags to make a gain would have focused its alterations
	on packets without authenticating headers (AHs).	on packets without authenticating headers (AHs).


	No cryptographic algorithms have been harmed in the making of this	Therefore, no cryptographic algorithms have been exploited in the
	proposal.	making of this proposal.

	10. IANA Considerations	10. IANA Considerations

	This memo includes no request to IANA.	This memo includes no request to IANA.

	11. Conclusions	11. Conclusions


	This memo builds on a promising technique to solve the classic	This memo solves the classic problem of making flow admission control
	problem of making flow admission control scale to any size network.	scale to any size network. It builds on a technique, called PCN,
	It involves the use of Diffserv in a deployment model that uses pre-	which involves the use of Diffserv in a domain and uses pre-
	congestion notification feedback to control admission into a network	congestion notification feedback to control admission into each
	path [I-D.ietf-pcn-architecture]. However as it stands, that	network path across the domain [I-D.ietf-pcn-architecture].
	deployment model depends on all network domains trusting each other
	to comply with the protocols, invoking admission control and flow
	pre-emption when requested.


	We propose that the congestion feedback used in that deployment model	Without PCN, Diffserv requires over-provisioning that must grow
	should be re-echoed into the forward data path, by making a trivial	linearly with network diameter to cater for variation in the traffic
	modification to the ingress gateway. We then explain how the	matrix. However, even with PCN, multiple network domains can only
		join together into one larger PCN region if all domains trust each
		other to comply with the protocols, invoking admission control and
		flow termination when requested. Domains could join together and
		still police flows at their borders by requiring reservation
		signalling to touch each border and only use PCN internally to each
		domain. But the per-flow processing at borders would still limit
		scalability.

		Instead, this memo proposes a technique called re-PCN which enables a
		PCN region to extend across multiple domains, without unscalable per-
		flow processing at borders, and still without the need for linear
		growth in capacity over-provisioning as the hop-diameter of the
		Diffserv region grows.

		We propose that the congestion feedback used for PCN-based admission
		control should be re-echoed into the forward data path, by making a
		trivial modification to the ingress gateway. We then explain how the
	resulting downstream pre-congestion metric in packets can be	resulting downstream pre-congestion metric in packets can be
	monitored in bulk at borders to sufficiently emulate flow rate	monitored in bulk at borders to sufficiently emulate flow rate
	policing.	policing.

	We claim the result of combining these two approaches is an admission	We claim the result of combining these two approaches is an admission
	control system that scales to any size network _and_ any number of	control system that scales to any size network _and_ any number of
	interconnected networks, even if they all act in their own interests.	interconnected networks, even if they all act in their own interests.

	This proposal aims to convince its readers to "Design in Security	This proposal aims to convince its readers to "Design in Security
	from the start," by ensuring the PCN wire protocol encoding can	from the start," by ensuring the PCN wire protocol encoding can
	accommodate the extended set of codepoints defined in this document,	accommodate the extended set of codepoints defined in this document,

	even if border policing is not needed at first. This way, we will	even if per-flow policing is used at first rather than the bulk
	not build ourselves tomorrow's legacy problem.	border policing described here. This way, we will not build
		ourselves tomorrow's legacy problem.

	Re-echoing congestion feedback is based on a principled technique	Re-echoing congestion feedback is based on a principled technique

	called Re-ECN [Re-TCP], designed to add accountability for causing	called Re-ECN [I-D.briscoe-tsvwg-re-ecn-tcp], designed to add
	congestion to the general-purpose IP datagram service. Re-ECN	accountability for causing congestion to the general-purpose IP
	proposes to consume the last completely unused bit in the basic IPv4	datagram service. Re-ECN proposes to consume the last completely
	header.	unused bit in the basic IPv4 header or it uses extension header in
		IPv6.

	12. Acknowledgements	12. Acknowledgements


	All the following have given helpful comments and some may become co-	All the following have given helpful comments either on re-PCN or on
	authors of later drafts: Arnaud Jacquet, Alessandro Salvatori, Steve	relevant parts of re-ECN that re-PCN uses: Arnaud Jacquet, Alessandro
	Rudkin, David Songhurst, John Davey, Ian Self, Anthony Sheppard,	Salvatori, Steve Rudkin, David Songhurst, John Davey, Ian Self,
	Carla Di Cairano-Gilfedder (BT), Mark Handley (who identified the	Anthony Sheppard, Carla Di Cairano-Gilfedder (BT), Mark Handley (who
	excess canceled packets attack), Stephen Hailes, Adam Greenhalgh	identified the excess canceled packets attack), Stephen Hailes, Adam
	(UCL), Francois Le Faucheur, Anna Charny (Cisco), Jozef Babiarz,	Greenhalgh (UCL), Francois Le Faucheur, Anna Charny (Cisco), Jozef
	Kwok-Ho Chan, Corey Alexander (Nortel), David Clark, Bill Lehr,	Babiarz, Kwok-Ho Chan, Corey Alexander (Nortel), David Clark, Bill
	Sharon Gillett, Steve Bauer (MIT) (who publicised various dummy	Lehr, Sharon Gillett, Steve Bauer (MIT) (who publicised various dummy
	traffic attacks), Sally Floyd (ICIR) and comments from participants	traffic attacks), Sally Floyd (ICIR) and comments from participants
	in the CFP/CRN Inter-Provider QoS, Broadband and DoS-Resistant	in the CFP/CRN Inter-Provider QoS, Broadband and DoS-Resistant
	Internet working groups.	Internet working groups.

	13. Comments Solicited	13. Comments Solicited

	Comments and questions are encouraged and very welcome. They can be	Comments and questions are encouraged and very welcome. They can be
	addressed to the IETF Congestion and Pre-Congestion Notification	addressed to the IETF Congestion and Pre-Congestion Notification
	working group's mailing list <pcn@ietf.org>, and/or to the author(s).	working group's mailing list <pcn@ietf.org>, and/or to the author(s).

	14. References	14. References

	14.1. Normative References	14.1. Normative References


	[PCN] Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F.,	[I-D.briscoe-tsvwg-ecn-tunnel]
	Charny, A., Liatsos, V., Babiarz, J., Chan, K., Dudley,	Briscoe, B., "Layered Encapsulation of Congestion
	S., Westberg, L., Bader, A., and G. Karagiannis, "Pre-	Notification", draft-briscoe-tsvwg-ecn-tunnel-01 (work in
	Congestion Notification Marking",	progress), July 2008.
	draft-briscoe-tsvwg-cl-phb-03 (work in progress),
	October 2006.	[I-D.briscoe-tsvwg-re-ecn-tcp]
		Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith,
		"Re-ECN: Adding Accountability for Causing Congestion to
		TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-06 (work in
		progress), August 2008.

		[I-D.eardley-pcn-marking-behaviour]
		Eardley, P., "Marking behaviour of PCN-nodes",
		draft-eardley-pcn-marking-behaviour-01 (work in progress),
		June 2008.

		[I-D.moncaster-pcn-baseline-encoding]
		Moncaster, T., Briscoe, B., and M. Menth, "Baseline
		Encoding and Transport of Pre-Congestion Information",
		draft-moncaster-pcn-baseline-encoding-02 (work in
		progress), July 2008.

	[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate	[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
	Requirement Levels", BCP 14, RFC 2119, March 1997.	Requirement Levels", BCP 14, RFC 2119, March 1997.

	[RFC2211] Wroclawski, J., "Specification of the Controlled-Load	[RFC2211] Wroclawski, J., "Specification of the Controlled-Load
	Network Element Service", RFC 2211, September 1997.	Network Element Service", RFC 2211, September 1997.

	[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition	[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
	of Explicit Congestion Notification (ECN) to IP",	of Explicit Congestion Notification (ECN) to IP",
	RFC 3168, September 2001.	RFC 3168, September 2001.

	[RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec,	[RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec,
	J., Courtney, W., Davari, S., Firoiu, V., and D.	J., Courtney, W., Davari, S., Firoiu, V., and D.
	Stiliadis, "An Expedited Forwarding PHB (Per-Hop	Stiliadis, "An Expedited Forwarding PHB (Per-Hop
	Behavior)", RFC 3246, March 2002.	Behavior)", RFC 3246, March 2002.


	[RSVP-ECN]	[RFC4774] Floyd, S., "Specifying Alternate Semantics for the
	Le Faucheur, F., Charny, A., Briscoe, B., Eardley, P.,	Explicit Congestion Notification (ECN) Field", BCP 124,
	Babiarz, J., and K. Chan, "RSVP Extensions for Admission	RFC 4774, November 2006.
	Control over Diffserv using Pre-congestion Notification",
	draft-lefaucheur-rsvp-ecn-01 (work in progress),
	June 2006.

	[Re-TCP] Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith,
	"Re-ECN: Adding Accountability for Causing Congestion to
	TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-05 (work in
	progress), January 2008.

	14.2. Informative References	14.2. Informative References

	[CLoop_pol]	[CLoop_pol]
	Salvatori, A., "Closed Loop Traffic Policing", Politecnico	Salvatori, A., "Closed Loop Traffic Policing", Politecnico
	Torino and Institut Eurecom Masters Thesis ,	Torino and Institut Eurecom Masters Thesis ,
	September 2005.	September 2005.

	[ECN-BGP] Mortier, R. and I. Pratt, "Incentive Based Inter-Domain	[ECN-BGP] Mortier, R. and I. Pratt, "Incentive Based Inter-Domain
	Routeing", Proc Internet Charging and QoS Technology	Routeing", Proc Internet Charging and QoS Technology
	Workshop (ICQT'03) pp308--317, September 2003, <http://	Workshop (ICQT'03) pp308--317, September 2003, <http://
	research.microsoft.com/users/mort/publications.aspx>.	research.microsoft.com/users/mort/publications.aspx>.

	[I-D.arumaithurai-nsis-pcn]	[I-D.arumaithurai-nsis-pcn]
	Arumaithurai, M., "NSIS PCN-QoSM: A Quality of Service	Arumaithurai, M., "NSIS PCN-QoSM: A Quality of Service
	Model for Pre-Congestion Notification (PCN)",	Model for Pre-Congestion Notification (PCN)",
	draft-arumaithurai-nsis-pcn-00 (work in progress),	draft-arumaithurai-nsis-pcn-00 (work in progress),
	September 2007.	September 2007.


		[I-D.charny-pcn-single-marking]
		Charny, A., Zhang, X., Faucheur, F., and V. Liatsos, "Pre-
		Congestion Notification Using Single Marking for Admission
		and Termination", draft-charny-pcn-single-marking-03
		(work in progress), November 2007.

	[I-D.ietf-nsis-rmd]	[I-D.ietf-nsis-rmd]
	Bader, A., "RMD-QOSM - The Resource Management in Diffserv	Bader, A., "RMD-QOSM - The Resource Management in Diffserv
	QOS Model", draft-ietf-nsis-rmd-12 (work in progress),	QOS Model", draft-ietf-nsis-rmd-12 (work in progress),
	November 2007.	November 2007.

	[I-D.ietf-pcn-architecture]	[I-D.ietf-pcn-architecture]

	Eardley, P., "Pre-Congestion Notification Architecture",	Eardley, P., "Pre-Congestion Notification (PCN)
	draft-ietf-pcn-architecture-03 (work in progress),	Architecture", draft-ietf-pcn-architecture-06 (work in
	February 2008.	progress), September 2008.

		[I-D.ietf-tsvwg-admitted-realtime-dscp]
		Baker, F., Polk, J., and M. Dolly, "DSCPs for Capacity-
		Admitted Traffic",
		draft-ietf-tsvwg-admitted-realtime-dscp-04 (work in
		progress), February 2008.

	[IXQoS] Briscoe, B. and S. Rudkin, "Commercial Models for IP	[IXQoS] Briscoe, B. and S. Rudkin, "Commercial Models for IP
	Quality of Service Interconnect", BT Technology Journal	Quality of Service Interconnect", BT Technology Journal
	(BTTJ) 23(2)171--195, April 2005,	(BTTJ) 23(2)171--195, April 2005,
	<http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#ixqos>.	<http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#ixqos>.


		[QoS_scale]
		Reid, A., "Economics and Scalability of QoS Solutions", BT
		Technology Journal (BTTJ) 23(2)97--117, April 2005.

	[RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S.	[RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
	Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1	Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
	Functional Specification", RFC 2205, September 1997.	Functional Specification", RFC 2205, September 1997.

	[RFC2207] Berger, L. and T. O'Malley, "RSVP Extensions for IPSEC	[RFC2207] Berger, L. and T. O'Malley, "RSVP Extensions for IPSEC
	Data Flows", RFC 2207, September 1997.	Data Flows", RFC 2207, September 1997.

	[RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell,	[RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell,
	M., Romanow, A., Weinrib, A., and L. Zhang, "Resource	M., Romanow, A., Weinrib, A., and L. Zhang, "Resource
	ReSerVation Protocol (RSVP) Version 1 Applicability	ReSerVation Protocol (RSVP) Version 1 Applicability

	skipping to change at page 49, line 51	skipping to change at page 54, line 43

	[RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L.,	[RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L.,
	Speer, M., Braden, R., Davie, B., Wroclawski, J., and E.	Speer, M., Braden, R., Davie, B., Wroclawski, J., and E.
	Felstaine, "A Framework for Integrated Services Operation	Felstaine, "A Framework for Integrated Services Operation
	over Diffserv Networks", RFC 2998, November 2000.	over Diffserv Networks", RFC 2998, November 2000.

	[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit	[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
	Congestion Notification (ECN) Signaling with Nonces",	Congestion Notification (ECN) Signaling with Nonces",
	RFC 3540, June 2003.	RFC 3540, June 2003.


		[RFC4301] Kent, S. and K. Seo, "Security Architecture for the
		Internet Protocol", RFC 4301, December 2005.

	[RFC4727] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4,	[RFC4727] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4,
	ICMPv6, UDP, and TCP Headers", RFC 4727, November 2006.	ICMPv6, UDP, and TCP Headers", RFC 4727, November 2006.

	[RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion	[RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion
	Marking in MPLS", RFC 5129, January 2008.	Marking in MPLS", RFC 5129, January 2008.


		[RSVP-ECN]
		Le Faucheur, F., Charny, A., Briscoe, B., Eardley, P.,
		Babiarz, J., and K. Chan, "RSVP Extensions for Admission
		Control over Diffserv using Pre-congestion Notification",
		draft-lefaucheur-rsvp-ecn-01 (work in progress),
		June 2006.

	[Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C.,	[Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C.,
	Salvatori, A., Soppera, A., and M. Koyabe, "Policing	Salvatori, A., Soppera, A., and M. Koyabe, "Policing
	Congestion Response in an Internetwork Using Re-Feedback",	Congestion Response in an Internetwork Using Re-Feedback",
	ACM SIGCOMM CCR 35(4)277--288, August 2005, <http://	ACM SIGCOMM CCR 35(4)277--288, August 2005, <http://
	www.acm.org/sigs/sigcomm/sigcomm2005/	www.acm.org/sigs/sigcomm/sigcomm2005/
	techprog.html#session8>.	techprog.html#session8>.

	[Smart_rtg]	[Smart_rtg]
	Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang,	Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang,
	"Optimizing Cost and Performance for Multihoming", ACM	"Optimizing Cost and Performance for Multihoming", ACM

	skipping to change at page 50, line 31	skipping to change at page 55, line 35
	[Steps_DoS]	[Steps_DoS]
	Handley, M. and A. Greenhalgh, "Steps towards a DoS-	Handley, M. and A. Greenhalgh, "Steps towards a DoS-
	resistant Internet Architecture", Proc. ACM SIGCOMM	resistant Internet Architecture", Proc. ACM SIGCOMM
	workshop on Future directions in network architecture	workshop on Future directions in network architecture
	(FDNA'04) pp 49--56, August 2004.	(FDNA'04) pp 49--56, August 2004.

	Appendix A. Implementation	Appendix A. Implementation

	A.1. Ingress Gateway Algorithm for Blanking the RE flag	A.1. Ingress Gateway Algorithm for Blanking the RE flag


	The ingress gateway receives regular feedback reporting the fraction	The ingress gateway receives regular feedback 'PCN-feedback-
	of congestion marked octets for each aggregate arriving at the	information' reporting the fraction of congestion marked octets for
	egress. So for each aggregate it should blank the RE flag on the	each aggregate arriving at the egress. So for each aggregate it
	same fraction of octets. It is more efficient to calculate the	should blank the RE flag on this fraction of octets. A suitable
	reciprocal of this fraction when the signalling arrives, Z_0 = (1 /	pseudo-code algorithm for the ingress gateway is as follows:
	Congestion-Level-Estimate). Z_0 will be the number of octets of
	packets the ingress should send with the RE flag set between those it
	sends with the RE flag blanked. Z_0 will also take account of the
	sustainable rate reported during the flow pre-emption process, if
	necessary.

	A suitable pseudo-code algorithm for the ingress gateway is as
	follows:

	====================================================================	====================================================================

	B_i = 0 /* interblank volume */	for each PCN-capable-packet {
	for each PCN-capable packet {	if RAND(0,1) <= PCN-feedback-information
	b = readLength(packet) /* set b to packet size */	writeRE(0);
	B_i += b /* accumulate interblank volume */	else
	if B_i < b * Z_0 { /* test whether interblank volume... */	writeRE(1);
	writeRE(1)
	} else { /* ...exceeds blank RE spacing * pkt size*/
	writeRE(0) /* ...and if so, clear RE */
	B_i = 0 /* ...and re-set interblank volume */
	}
	}	}
	====================================================================	====================================================================

	A.2. Downstream Congestion Metering Algorithms	A.2. Downstream Congestion Metering Algorithms

	A.2.1. Bulk Downstream Congestion Metering Algorithm	A.2.1. Bulk Downstream Congestion Metering Algorithm

	To meter the bulk amount of downstream pre-congestion in traffic	To meter the bulk amount of downstream pre-congestion in traffic
	crossing an inter-domain border, an algorithm is needed that	crossing an inter-domain border, an algorithm is needed that
	accumulates the size of positive packets and subtracts the size of	accumulates the size of positive packets and subtracts the size of

	skipping to change at page 51, line 40	skipping to change at page 56, line 26
	B: total data volume (in case it is needed)	B: total data volume (in case it is needed)

	A suitable pseudo-code algorithm for a border router is as follows:	A suitable pseudo-code algorithm for a border router is as follows:

	====================================================================	====================================================================
	V_b = 0	V_b = 0
	B = 0	B = 0
	for each PCN-capable packet {	for each PCN-capable packet {
	b = readLength(packet) /* set b to packet size */	b = readLength(packet) /* set b to packet size */
	B += b /* accumulate total volume */	B += b /* accumulate total volume */

	if readEECN(packet) == (Re-Echo \|\| FNE) {	if readEPCN(packet) == (Re-PCT-Echo \|\| FNE) {
	V_b += b /* increment... */	V_b += b /* increment... */

	} elseif readEECN(packet) == ( AM(-1) \|\| PM(-1) ) {	} elseif readEPCN(packet) == ( AM(-1) \|\| TM(-1) ) {
	V_b -= b /* ...or decrement V_b... */	V_b -= b /* ...or decrement V_b... */

	} /...depending on EECN field /	} /...depending on EPCN field /
	}	}
	====================================================================	====================================================================

	At the end of an accounting period this counter V_b represents the	At the end of an accounting period this counter V_b represents the
	pre-congestion volume that penalties could be applied to, as	pre-congestion volume that penalties could be applied to, as
	described in Section 5.3.	described in Section 5.3.

	For instance, accumulated volume of pre-congestion through a border	For instance, accumulated volume of pre-congestion through a border

	interface over a month might be V_b = 5PB (petabyte = 10^15 byte).	interface over a month might be V_b = 5TB (terabyte = 10^12 byte).
	This might have resulted from an average downstream pre-congestion	This might have resulted from an average downstream pre-congestion

	level of 1% on an accumulated total data volume of B = 500PB.	level of 0.001% on an accumulated total data volume of B = 500PB
		(petabyte = 10^15 byte).

	A.2.2. Inflation Factor for Persistently Negative Flows	A.2.2. Inflation Factor for Persistently Negative Flows

	The following process is suggested to complement the simple algorithm	The following process is suggested to complement the simple algorithm
	above in order to protect against the various attacks from	above in order to protect against the various attacks from
	persistently negative flows described in Section 5.6.1. As explained	persistently negative flows described in Section 5.6.1. As explained
	in that section, the most important and first step is to estimate the	in that section, the most important and first step is to estimate the
	contribution of persistently negative flows to the bulk volume of	contribution of persistently negative flows to the bulk volume of
	downstream pre-congestion and to inflate this bulk volume as if these	downstream pre-congestion and to inflate this bulk volume as if these
	flows weren't there. The process below has been designed to give an	flows weren't there. The process below has been designed to give an
	unbiased estimate, but it may be possible to define other processes	unbiased estimate, but it may be possible to define other processes
	that achieve similar ends.	that achieve similar ends.


	While the above simple metering algorithm is counting the bulk of	While the above simple metering algorithm (Appendix A.2) is counting
	traffic over an accounting period, the meter should also select a	the bulk of traffic over an accounting period, the meter should also
	subset of the whole flow ID space that is small enough to be able to	select a subset of the whole flow ID space that is small enough to be
	realistically measure but large enough to give a realistic sample.	able to realistically measure but large enough to give a realistic
	Many different samples of different subsets of the ID space should be	sample. Many different samples of different subsets of the ID space
	taken at different times during the accounting period, preferably	should be taken at different times during the accounting period,
	covering the whole ID space. During each sample, the meter should	preferably covering the whole ID space. During each sample, the
	count the volume of positive packets and subtract the volume of	meter should count the volume of positive packets and subtract the
	negative, maintaining a separate account for each flow in the sample.	volume of negative, maintaining a separate account for each flow in
	It should run a lot longer than the large majority of flows, to avoid	the sample. It should run a lot longer than the large majority of
	a bias from missing the starts and ends of flows, which tend to be	flows, to avoid a bias from missing the starts and ends of flows,
	positive and negative respectively.	which tend to be positive and negative respectively.

	Once the accounting period finishes, the meter should calculate the	Once the accounting period finishes, the meter should calculate the
	total of the accounts V_{bI} for the subset of flows I in the sample,	total of the accounts V_{bI} for the subset of flows I in the sample,
	and the total of the accounts V_{fI} excluding flows with a negative	and the total of the accounts V_{fI} excluding flows with a negative
	account from the subset I. Then the weighted mean of all these	account from the subset I. Then the weighted mean of all these
	samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I}	samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I}
	V_{bI}.	V_{bI}.

	If V_b is the result of the bulk accounting algorithm over the	If V_b is the result of the bulk accounting algorithm over the
	accounting period (Appendix A.2.1) it can be inflated by this factor	accounting period (Appendix A.2.1) it can be inflated by this factor
	a_S to get a good unbiased estimate of the volume of downstream	a_S to get a good unbiased estimate of the volume of downstream
	congestion over the accounting period a_S.V_b, without being polluted	congestion over the accounting period a_S.V_b, without being polluted
	by the effect of persistently negative flows.	by the effect of persistently negative flows.

	A.3. Algorithm for Sanctioning Negative Traffic	A.3. Algorithm for Sanctioning Negative Traffic


	{ToDo: Write up algorithms similar to Appendix D of [Re-TCP] for the	{ToDo: Write up algorithms similar to Appendix E of
	negative flow monitor with flow management algorithm and the variant	[I-D.briscoe-tsvwg-re-ecn-tcp] for the negative flow monitor with
	with bounded flow state.}	flow management algorithm and the variant with bounded flow state.}

	Author's Address	Author's Address

	Bob Briscoe	Bob Briscoe
	BT & UCL	BT & UCL
	B54/77, Adastral Park	B54/77, Adastral Park
	Martlesham Heath	Martlesham Heath
	Ipswich IP5 3RE	Ipswich IP5 3RE
	UK	UK


	skipping to change at page 54, line 45	skipping to change at page 59, line 45
	such proprietary rights by implementers or users of this	such proprietary rights by implementers or users of this
	specification can be obtained from the IETF on-line IPR repository at	specification can be obtained from the IETF on-line IPR repository at
	http://www.ietf.org/ipr.	http://www.ietf.org/ipr.

	The IETF invites any interested party to bring to its attention any	The IETF invites any interested party to bring to its attention any
	copyrights, patents or patent applications, or other proprietary	copyrights, patents or patent applications, or other proprietary
	rights that may cover technology that may be required to implement	rights that may cover technology that may be required to implement
	this standard. Please address the information to the IETF at	this standard. Please address the information to the IETF at
	ietf-ipr@ietf.org.	ietf-ipr@ietf.org.


	Acknowledgments	Acknowledgment


	Funding for the RFC Editor function is provided by the IETF	This document was produced using xml2rfc v1.33 (of
	Administrative Support Activity (IASA). This document was produced	http://xml.resource.org/) from a source in RFC-2629 XML format.
	using xml2rfc v1.32 (of http://xml.resource.org/) from a source in
	RFC-2629 XML format.

End of changes. 194 change blocks.
	756 lines changed or deleted	1004 lines changed or added
This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/