draft-briscoe-tsvwg-re-ecn-border-cheat-01.txt   draft-briscoe-re-pcn-border-cheat-00.txt 
Transport Area Working Group B. Briscoe PCN Working Group B. Briscoe
Internet-Draft BT & UCL Internet-Draft BT & UCL
Expires: December 28, 2006 June 26, 2006 Intended status: Informational June 30, 2007
Expires: January 1, 2008
Emulating Border Flow Policing using Re-ECN on Bulk Data Emulating Border Flow Policing using Re-ECN on Bulk Data
draft-briscoe-tsvwg-re-ecn-border-cheat-01 draft-briscoe-re-pcn-border-cheat-00
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 33 skipping to change at page 1, line 34
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 28, 2006. This Internet-Draft will expire on January 1, 2008.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2006). Copyright (C) The IETF Trust (2007).
Abstract Abstract
Scaling per flow admission control to the Internet is a hard problem. Scaling per flow admission control to the Internet is a hard problem.
A recently proposed approach combines Diffserv and pre-congestion A recently proposed approach combines Diffserv and pre-congestion
notification (PCN) to provide a service slightly better than Intserv notification (PCN) to provide a service slightly better than Intserv
controlled load. It scales to networks of any size, but only if controlled load. It scales to networks of any size, but only if
domains trust each other to comply with admission control and rate domains trust each other to comply with admission control and rate
policing. This memo claims to solve this trust problem without policing. This memo claims to solve this trust problem without
losing scalability. It describes bulk border policing that provides losing scalability. It describes bulk border policing that provides
a sufficient emulation of per-flow policing with the help of another a sufficient emulation of per-flow policing with the help of another
recently proposed extension to ECN, involving re-echoing ECN feedback recently proposed extension to ECN, involving re-echoing ECN feedback
(re-ECN). With only passive bulk measurements at borders, sanctions (re-ECN). With only passive bulk measurements at borders, sanctions
can be applied against cheating networks. can be applied against cheating networks.
Status (to be removed by the RFC Editor) Status (to be removed by the RFC Editor)
This memo is posted as an Internet-Draft with the intent to This memo is posted as an Internet-Draft with the intent to
eventually progress to informational status. It is envisaged that eventually be broken down in two documents; one for the standards
the necessary standards actions to realise the system described would track and one for informational status. But until it becomes an item
sit in three other documents currently being discussed (but not on of IETF working group business the whole proposal has been kept
the standards track) in the IETF Transport Area [Re-TCP], [RSVP-ECN] together to aid understanding. Only the text of Section 4 of this
& [PCN]. The authors seek comments from the Internet community on document requires standardisation. The rest of the sections describe
whether combining PCN and re-ECN is a sufficient solution to the how a system might be built from these protocols by the operators of
admission control problem. an internetwork. Note in particular that the policing and monitoring
functions proposed for the trust boundaries between operators would
not need standardisation by the IETF. They simply represent one way
that the proposed protocols could be used to extend the PCN
architecture [PCN-arch] to span multiple domains without mutual trust
between the operators.
To realise the system described, this document also depends on
standardisation of three other documents currently being discussed
(but not on the standards track) in the IETF Transport Area: pre-
congestion notification (PCN) marking on interior nodes [PCN];
feedback of aggregate PCN measurements by suitably extending the
admission control signalling protocol (e.g. RSVP) [RSVP-ECN]; and
re-insertion of the feedback into the forward stream of IP packets by
the PCN ingress gateway in a similar way to that proposed for a TCP
source [Re-TCP].
The authors seek comments from the Internet community on whether
combining PCN and re-ECN in this way is a sufficient solution to the
problem of scaling microflow admission control to the Internet as a
whole, even though such scaling must take account of the increasing
numbers of networks and users who may all have conflicting interests.
Changes from previous drafts (to be removed by the RFC Editor) Changes from previous drafts (to be removed by the RFC Editor)
From -00 to -01: Changes in this version <draft-briscoe-re-pcn-border-cheat-00>
relative to the last <draft-briscoe-tsvwg-re-ecn-border-cheat-01>:
Changed filename to associate it with the new IETF PCN w-g, rather
than the TSVWG w-g.
Introduction: Clarified that bulk policing only replaces per-flow
policing at interior inter-domain borders, while per-flow policing
is still needed at the access interface to the internetwork. Also
clarified that the aim is to neutralise any gains from cheating
using local bilateral contracts between neighbouring networks,
rather than merely identifying remote cheaters.
Section 3.1: Described the traditional per-flow policing problem
with inter-domain reservations more precisely, particularly with
respect to direction of reservations and of traffic flows.
Clarified status of Section 5 onwards, in particular that policers
and monitors would not need standardisation, but that the protocol
in Section 4 would require standardisation.
Section 5.6.2 on competitive routing: Added discussion of direct
incentives for a receiver to switch to a different provider even
if the provider has a termination monopoly.
Clarified that "Designing in security from the start" merely means
allowing codepoint space in the PCN protocol encoding. There is
no need to actually implement inter-domain security mechanisms for
solutions confined to a single domain.
Updated some references and added a ref to the Security
Considerations, as well as other minor corrections and
improvements.
Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-00 to
<draft-briscoe-tsvwg-re-ecn-border-cheat-01>:
Added subsection on Border Accounting Mechanisms (Section 5.6.1) Added subsection on Border Accounting Mechanisms (Section 5.6.1)
Section 4.2 on the re-ECN wire protocol clarified and re-organised Section 4.2 on the re-ECN wire protocol clarified and re-organised
to separately discuss re-ECN for default ECN marking and for pre- to separately discuss re-ECN for default ECN marking and for pre-
congestion marking (PCN). congestion marking (PCN).
Router Forwarding Behaviour subsection added to re-organised Router Forwarding Behaviour subsection added to re-organised
section on Protocol Operation (Section 4.3). Extensions section section on Protocol Operation (Section 4.3). Extensions section
moved within Protocol Operations. moved within Protocol Operations.
skipping to change at page 3, line 7 skipping to change at page 5, line 7
Sections on Design Rationale (Section 8) and Security Sections on Design Rationale (Section 8) and Security
Considerations (Section 9) expanded with some new material, Considerations (Section 9) expanded with some new material,
including new attacks and their defences. including new attacks and their defences.
Suggested Border Metering Algorithms improved (Appendix A.2) for Suggested Border Metering Algorithms improved (Appendix A.2) for
resilience to newly identified attacks. resilience to newly identified attacks.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 7
2. Requirements Notation . . . . . . . . . . . . . . . . . . . . 7 2. Requirements Notation . . . . . . . . . . . . . . . . . . . . 9
3. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 7 3. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1. The Traditional Per-flow Policing Problem . . . . . . . . 7 3.1. The Traditional Per-flow Policing Problem . . . . . . . . 9
3.2. Generic Scenario . . . . . . . . . . . . . . . . . . . . . 9 3.2. Generic Scenario . . . . . . . . . . . . . . . . . . . . . 11
4. Re-ECN Protocol for an RSVP (or similar) Transport . . . . . . 11 4. Re-ECN Protocol for an RSVP (or similar) Transport . . . . . . 14
4.1. Protocol Overview . . . . . . . . . . . . . . . . . . . . 11 4.1. Protocol Overview . . . . . . . . . . . . . . . . . . . . 14
4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or 4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or
v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.1. Re-ECN Recap . . . . . . . . . . . . . . . . . . . . . 13 4.2.1. Re-ECN Recap . . . . . . . . . . . . . . . . . . . . . 16
4.2.2. Re-ECN Combined with Pre-Congestion Notification 4.2.2. Re-ECN Combined with Pre-Congestion Notification
(re-PCN) . . . . . . . . . . . . . . . . . . . . . . . 14 (re-PCN) . . . . . . . . . . . . . . . . . . . . . . . 17
4.3. Protocol Operation . . . . . . . . . . . . . . . . . . . . 17 4.3. Protocol Operation . . . . . . . . . . . . . . . . . . . . 19
4.3.1. Protocol Operation for an Established Flow . . . . . . 17 4.3.1. Protocol Operation for an Established Flow . . . . . . 19
4.3.2. Aggregate Bootstrap . . . . . . . . . . . . . . . . . 18 4.3.2. Aggregate Bootstrap . . . . . . . . . . . . . . . . . 21
4.3.3. Flow Bootstrap . . . . . . . . . . . . . . . . . . . . 19 4.3.3. Flow Bootstrap . . . . . . . . . . . . . . . . . . . . 22
4.3.4. Router Forwarding Behaviour . . . . . . . . . . . . . 20 4.3.4. Router Forwarding Behaviour . . . . . . . . . . . . . 23
4.3.5. Extensions . . . . . . . . . . . . . . . . . . . . . . 22 4.3.5. Extensions . . . . . . . . . . . . . . . . . . . . . . 24
5. Emulating Border Policing with Re-ECN . . . . . . . . . . . . 22 5. Emulating Border Policing with Re-ECN . . . . . . . . . . . . 24
5.1. Informal Terminology . . . . . . . . . . . . . . . . . . . 22 5.1. Informal Terminology . . . . . . . . . . . . . . . . . . . 25
5.2. Policing Overview . . . . . . . . . . . . . . . . . . . . 23 5.2. Policing Overview . . . . . . . . . . . . . . . . . . . . 26
5.3. Pre-requisite Contractual Arrangements . . . . . . . . . . 25 5.3. Pre-requisite Contractual Arrangements . . . . . . . . . . 28
5.4. Emulation of Per-Flow Rate Policing: Rationale and 5.4. Emulation of Per-Flow Rate Policing: Rationale and
Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.5. Sanctioning Dishonest Marking . . . . . . . . . . . . . . 29 5.5. Sanctioning Dishonest Marking . . . . . . . . . . . . . . 32
5.6. Border Mechanisms . . . . . . . . . . . . . . . . . . . . 31 5.6. Border Mechanisms . . . . . . . . . . . . . . . . . . . . 34
5.6.1. Border Accounting Mechanisms . . . . . . . . . . . . . 31 5.6.1. Border Accounting Mechanisms . . . . . . . . . . . . . 34
5.6.2. Competitive Routing . . . . . . . . . . . . . . . . . 35 5.6.2. Competitive Routing . . . . . . . . . . . . . . . . . 38
5.6.3. Fail-safes . . . . . . . . . . . . . . . . . . . . . . 35 5.6.3. Fail-safes . . . . . . . . . . . . . . . . . . . . . . 39
6. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 6. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 39 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 42
8. Design Choices and Rationale . . . . . . . . . . . . . . . . . 40 8. Design Choices and Rationale . . . . . . . . . . . . . . . . . 43
9. Security Considerations . . . . . . . . . . . . . . . . . . . 41 9. Security Considerations . . . . . . . . . . . . . . . . . . . 45
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 43 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46
11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 43 11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 46
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 44 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 47
13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 44 13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 47
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 44 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 48
14.1. Normative References . . . . . . . . . . . . . . . . . . . 44 14.1. Normative References . . . . . . . . . . . . . . . . . . . 48
14.2. Informative References . . . . . . . . . . . . . . . . . . 45 14.2. Informative References . . . . . . . . . . . . . . . . . . 48
Appendix A. Implementation . . . . . . . . . . . . . . . . . . . 46 Appendix A. Implementation . . . . . . . . . . . . . . . . . . . 50
A.1. Ingress Gateway Algorithm for Blanking the RE flag . . . . 47 A.1. Ingress Gateway Algorithm for Blanking the RE flag . . . . 50
A.2. Downstream Congestion Metering Algorithms . . . . . . . . 47 A.2. Downstream Congestion Metering Algorithms . . . . . . . . 51
A.2.1. Bulk Downstream Congestion Metering Algorithm . . . . 47 A.2.1. Bulk Downstream Congestion Metering Algorithm . . . . 51
A.2.2. Inflation Factor for Persistently Negative Flows . . . 48 A.2.2. Inflation Factor for Persistently Negative Flows . . . 52
A.3. Algorithm for Sanctioning Negative Traffic . . . . . . . . 49 A.3. Algorithm for Sanctioning Negative Traffic . . . . . . . . 52
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 50 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 53
Intellectual Property and Copyright Statements . . . . . . . . . . 51 Intellectual Property and Copyright Statements . . . . . . . . . . 54
1. Introduction 1. Introduction
The Internet community largely lost interest in the Intserv The Internet community largely lost interest in the Intserv
architecture after it was clarified that it would be unlikely to architecture after it was clarified that it would be unlikely to
scale to the whole Internet [RFC2208]. Although Intserv mechanisms scale to the whole Internet [RFC2208]. Although Intserv mechanisms
proved impractical, the bandwidth reservation service it aimed to proved impractical, the bandwidth reservation service it aimed to
offer is still very much required. offer is still very much required.
A recently proposed approach [CL-deploy] combines Diffserv and pre- A recently proposed approach [PCN-arch] combines Diffserv and pre-
congestion notification (PCN) to provide a service slightly better congestion notification (PCN) to provide a service slightly better
than Intserv controlled load [RFC2211]. It scales to any size than Intserv controlled load [RFC2211]. It scales to any size
network, but only if domains trust their neighbours to have checked network, but only if domains trust their neighbours to have checked
that upstream customers aren't taking more bandwidth than they that upstream customers aren't taking more bandwidth than they
reserved, either accidentally or deliberately. This memo describes reserved, either accidentally or deliberately. This memo describes
border policing measures so that one network can protect its border policing measures so that one network can protect its
interests, even if networks around it are deliberately trying to interests, even if networks around it are deliberately trying to
cheat. The approach provides a sufficient emulation of flow rate cheat. The approach provides a sufficient emulation of flow rate
policing at trust boundaries but without per-flow processing. The policing at trust boundaries but without per-flow processing. The
emulation is not perfect, but it is sufficient to ensure that the emulation is not perfect, but it is sufficient to ensure that the
punishment is at least proportionate to the severity of the cheat. punishment is at least proportionate to the severity of the cheat.
Per-flow rate policing for each reservation is still expected to be
used at the access edge of the internetwork, but at the borders
between networks bulk policing can be used to emulate per-flow
policing.
The aim is to be able to scale controlled load service to any number The aim is to be able to scale controlled load service to any number
of endpoints, even though such scaling must take account of the of endpoints, even though such scaling must take account of the
increasing numbers of networks and users who may all have conflicting increasing numbers of networks and users who may all have conflicting
interests. To achieve such scaling, this memo combines two recent interests. To achieve such scaling, this memo combines two recent
proposals, both of which it briefly recaps: proposals, both of which it briefly recaps:
o A deployment model for admission control over Diffserv using pre- o A deployment model for admission control over Diffserv using pre-
congestion notification [CL-deploy] describes how bulk pre- congestion notification [PCN-arch] describes how bulk pre-
congestion notification on routers within an edge-to-edge Diffserv congestion notification on routers within an edge-to-edge Diffserv
region can emulate the precision of per-flow admission control to region can emulate the precision of per-flow admission control to
provide controlled load service without unscalable per-flow provide controlled load service without unscalable per-flow
processing; processing;
o Re-ECN: Adding Accountability to TCP/IP [Re-TCP]. The trick that o Re-ECN: Adding Accountability to TCP/IP [Re-TCP]. The trick that
addresses cheating at borders is to recognise that border policing addresses cheating at borders is to recognise that border policing
is mainly necessary because cheating upstream networks will admit is mainly necessary because cheating upstream networks will admit
traffic when they shouldn't only as long as they don't directly traffic when they shouldn't only as long as they don't directly
experience the downstream congestion their misbehaviour can cause. experience the downstream congestion their misbehaviour can cause.
The re-ECN protocol requires upstream nodes to declare expected The re-ECN protocol requires upstream nodes to declare expected
downstream congestion in all forwarded packets and it makes it in downstream congestion in all forwarded packets and it makes it in
their interests to declare it honestly. Operators can then their interests to declare it honestly. Operators can then
monitor downstream congestion in bulk at borders to emulate monitor downstream congestion in bulk at borders to emulate
policing. policing.
The aim is not to enable a network to _identify_ some remote cheating
party, which would rarely be useful given the victim network would be
unlikely to be able to seek redress from a cheater in some remote
part of the world with whom no direct contractual relationship
exists. Rather the aim is to ensure that any gain from cheating will
be cancelled out by penalties applied to the cheating party by its
local network. Further, the solution ensures each of the chain of
networks between the cheater and the victim will lose out if it
doesn't apply penalties to its neighbour. Thus the solution builds
on the local bilateral contractual relationships that already exist
between neighbouring networks.
Rather than the end-to-end arrangement used when re-ECN was specified Rather than the end-to-end arrangement used when re-ECN was specified
for the TCP transport [Re-TCP], this memo specifies re-ECN in an for the TCP transport [Re-TCP], this memo specifies re-ECN in an
edge-to-edge arrangement, making it applicable to the above edge-to-edge arrangement, making it applicable to the above
deployment model for admission control over Diffserv. Also, rather deployment model for admission control over Diffserv. Also, rather
than using a TCP transport for regular congestion feedback, this memo than using a TCP transport for regular congestion feedback, this memo
specifies re-ECN using RSVP as the transport for feedback [RSVP-ECN]. specifies re-ECN using RSVP as the transport for feedback [RSVP-ECN].
A similar deployment model, but with a different transport for A similar deployment model, but with a different transport for
signalling congestion feedback could be used (e.g. RMD [NSIS-RMD] signalling congestion feedback could be used (e.g. RMD [NSIS-RMD]
uses NSIS). uses NSIS).
This memo aims to do two things: i) define how to apply the re-ECN This memo aims to do two things: i) define how to apply the re-ECN
protocol to the admission control over Diffserv scenario; and ii) protocol to the admission control over Diffserv scenario; and ii)
explain why re-ECN sufficiently emulates border policing in that explain why re-ECN sufficiently emulates border policing in that
scenario. Most of the memo is taken up with the second aim; scenario. Most of the memo is taken up with the second aim;
explaining why it works. Applying re-ECN to the scenario actually explaining why it works. Applying re-ECN to the scenario actually
involves quite a trivial modification to the ingress gateway. Our involves quite a trivial modification to the ingress gateway. That
immediate goal is to convince everyone to build that modification in modification can be added to gateways later, so our immediate goal is
to ingress gateways from the start, whether first deployments require to convince everyone to have the foresight to define the PCN wire
policing or not. Otherwise, when we want to add policing, we will protocol encoding to accommodate the extended codepoints defined in
have built ourselves a legacy problem. In other words, we aim to this document, whether first deployments require border policing or
convince people to "Build in security from the start." not. Otherwise, when we want to add policing, we will have built
ourselves a legacy problem. In other words, we aim to convince
people to "Design in security from the start."
The body of this memo is structured as follows: The body of this memo is structured as follows:
Section 3 describes the border policing problem. We recap the Section 3 describes the border policing problem. We recap the
traditional, unscalable view of how to solve the problem, and we traditional, unscalable view of how to solve the problem, and we
recap the admission control solution which has the scalability we recap the admission control solution which has the scalability we
do not want to lose when we add border policing; do not want to lose when we add border policing;
Section 4 specifies the re-ECN protocol solution in detail; Section 4 specifies the re-ECN protocol solution in detail;
skipping to change at page 6, line 48 skipping to change at page 9, line 17
design decisions; design decisions;
Section 9 comments on the overall robustness of the security Section 9 comments on the overall robustness of the security
assumptions and lists specific security issues. assumptions and lists specific security issues.
It must be emphasised that we are not evangelical about removing per- It must be emphasised that we are not evangelical about removing per-
flow processing from borders. Network operators may choose to do flow processing from borders. Network operators may choose to do
per-flow processing at their borders for their own reasons, such as per-flow processing at their borders for their own reasons, such as
to support business models that require per-flow accounting. Our aim to support business models that require per-flow accounting. Our aim
is to show that per-flow processing at borders is no longer is to show that per-flow processing at borders is no longer
/necessary/ in order to provide end-to-end QoS using flow admission _necessary_ in order to provide end-to-end QoS using flow admission
control. Indeed, we are absolutely opposed to standardisation of control. Indeed, we are absolutely opposed to standardisation of
technology that embeds particular business models into the Internet. technology that embeds particular business models into the Internet.
Our aim is merely to provide a new useful metric (downstream Our aim is merely to provide a new useful metric (downstream
congestion) at trust boundaries. Given the well-known significance congestion) at trust boundaries. Given the well-known significance
of congestion in economics, operators can then use this new metric in of congestion in economics, operators can then use this new metric in
their interconnection contracts if they choose. This will enable their interconnection contracts if they choose. This will enable
competitive evolution of new business models (for examples competitive evolution of new business models (for examples
see [IXQoS]), alongside more traditional models that depend on more see [IXQoS]), even for sets of flows running alongside another set
costly per-flow processing at borders. across the same border but using the more traditional model that
depends on more costly per-flow processing at each border.
2. Requirements Notation 2. Requirements Notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
3. The Problem 3. The Problem
3.1. The Traditional Per-flow Policing Problem 3.1. The Traditional Per-flow Policing Problem
If we claim to be able to emulate per-flow policing with bulk If we claim to be able to emulate per-flow policing with bulk
policing at trust boundaries, we need to know exactly what we are policing at trust boundaries, we need to know exactly what we are
emulating. So, even though we expect it to become a historic emulating. So, we will start from the traditional scenario with per-
practice, we will start from the traditional scenario with per-flow flow policing at trust boundaries to explain why it has always been
policing at trust boundaries to explain why it has always been
considered necessary. considered necessary.
To be able to take advantage of a reservation-based service such as To be able to take advantage of a reservation-based service such as
controlled load, a source must reserve resources using a signalling controlled load, a source-destination pair must reserve resources
protocol such as RSVP [RFC2205]. An RSVP signalling request refers using a signalling protocol such as RSVP [RFC2205]. An RSVP
to a flow of packets by its flow ID tuple (filter spec [RFC2205]) (or signalling request refers to a flow of packets by its flow ID tuple
its security parameter index (SPI) [RFC2207] if port numbers are (filter spec [RFC2205]) (or its security parameter index
hidden by IPSec encryption). Other signalling protocols use similar (SPI) [RFC2207] if port numbers are hidden by IPSec encryption).
flow identifiers. But, it is insufficient to merely authorise and Other signalling protocols use similar flow identifiers. But, it is
admit a flow based on its identifiers, for instance merely opening a insufficient to merely authorise and admit a flow based on its
pin-hole for packets with identifiers that match an admitted flow ID. identifiers, for instance merely opening a pin-hole for packets with
Once a flow is admitted, it cannot necessarily be trusted to send identifiers that match an admitted flow ID. Because, once a flow is
packets within the rate profile it requested. admitted, it cannot necessarily be trusted to send packets within the
rate profile it requested.
The packet rate must also be policed to keep the flow within the The packet rate must also be policed to keep the flow within the
requested flow spec [RFC2205]. For instance, without data rate requested flow spec [RFC2205]. For instance, without data rate
policing, a source could reserve resources for an 8kbps audio flow policing, a source-destination pair could reserve resources for an
but transmit a 6Mbps video (theft of service). More subtly, the 8kbps audio flow but the source could transmit a 6Mbps video (theft
sender could generate bursts that were outside the profile it had of service). More subtly, the sender could generate bursts that were
requested. outside the profile requested.
In traditional architectures, per-flow packet rate-policing is In traditional architectures, per-flow packet rate-policing is
expensive and unscalable but, without it, a network is vulnerable to expensive and unscalable but, without it, a network is vulnerable to
such theft of service (whether malicious or accidental). Perhaps such theft of service (whether malicious or accidental). Perhaps
more importantly, if flows are allowed to send more data than they more importantly, if flows are allowed to send more data than they
were permitted, the ability of admission control to give assurances were permitted, the ability of admission control to give assurances
to other flows will break. to other flows will break.
Just as sources need not be trusted to keep within their requested Just as sources need not be trusted to keep within the requested flow
flow spec, whole networks might also try to cheat. We will now set spec, whole networks might also try to cheat. We will now set up a
up a concrete scenario to illustrate such cheats. Imagine concrete scenario to illustrate such cheats. Imagine reservations
reservations for unidirectional flows from senders, through at least for unidirectional flows, through at least two networks, an edge
two networks, an edge network and its downstream transit provider. network and its downstream transit provider. Imagine the edge
Imagine the edge network charges its retail customers per reservation network charges its retail customers per reservation but also has to
but also has to pay its transit provider a charge per reservation. pay its transit provider a charge per reservation. Typically, both
Typically, both its selling and buying charges might depend on the its selling and buying charges might depend on the duration and rate
duration and rate of each reservation. The level of the actual of each reservation. The level of the actual selling and buying
selling and buying prices are irrelevant to our discussion (most prices are irrelevant to our discussion (most likely the network will
likely the network will sell at a higher price than it buys, of sell at a higher price than it buys, of course).
course).
A cheating ingress network could systematically reduce the size of A cheating ingress network could systematically reduce the size of
its retail customers' reservation signalling requests before its retail customers' reservation signalling requests (e.g. the
forwarding them to its transit provider (and systematically reinstate SENDER_TSPEC object in RSVP's PATH message) before forwarding them to
the responses on the way back). It would then receive an honest its transit provider and systematically reinstate the responses on
income from its upstream retail customer but only pay for the way back (e.g. the FLOWSPEC object in RSVP's RESV message). It
fraudulently smaller reservations downstream. Equivalently, a would then receive an honest income from its upstream retail customer
cheating ingress network may feed the traffic from a number of flows but only pay for fraudulently smaller reservations downstream. A
into an aggregate reservation over the transit that is smaller than similar but opposite trick (increasing the TSPEC and decreasing the
the total of all the flows. Because of these fraud possibilities, in FLOWSPEC) could be perpetrated by the receiver's access network if
traditional QoS reservation architectures the downstream network the reservation was paid for by the receiver.
polices at each border. The policer checks that the actual sent data
rate of each flow is within the signalled reservation. Equivalently, a cheating ingress network may feed the traffic from a
number of flows into an aggregate reservation over the transit that
is smaller than the total of all the flows. Because of these fraud
possibilities, in traditional QoS reservation architectures the
downstream network polices at each border. The policer checks that
the actual sent data rate of each flow is within the signalled
reservation.
Reservation signalling could be authenticated end to end, but this Reservation signalling could be authenticated end to end, but this
wouldn't prevent the aggregation cheat just described. For this wouldn't prevent the aggregation cheat just described. For this
reason, and to avoid the need for a global PKI, signalling integrity reason, and to avoid the need for a global PKI, signalling integrity
is typically only protected on a hop-by-hop basis [RFC2747]. is typically only protected on a hop-by-hop basis [RFC2747].
A variant of the above cheat is where a router in an honest A variant of the above cheat is where a router in an honest
downstream network denies admission to a new reservation, but a downstream network denies admission to a new reservation, but a
cheating upstream network still admits the flow. For instance, the cheating upstream network still admits the flow. For instance, the
networks may be using Diffserv internally, but Intserv admission networks may be using Diffserv internally, but Intserv admission
skipping to change at page 9, line 9 skipping to change at page 11, line 32
revenue from the reservation, but it doesn't have to pay any revenue from the reservation, but it doesn't have to pay any
downstream wholesale charges and the congestion is in someone else's downstream wholesale charges and the congestion is in someone else's
network. The cheating network may calculate that most of the flows network. The cheating network may calculate that most of the flows
affected by congestion in the downstream network aren't likely to be affected by congestion in the downstream network aren't likely to be
its own. It may also calculate that the downstream router has been its own. It may also calculate that the downstream router has been
configured to deny admission to new flows in order to protect configured to deny admission to new flows in order to protect
bandwidth assigned to other network services (e.g. enterprise VPNs). bandwidth assigned to other network services (e.g. enterprise VPNs).
So the cheating network can steal capacity from the downstream So the cheating network can steal capacity from the downstream
operator's VPNs that are probably not actually congested. operator's VPNs that are probably not actually congested.
All the above cheats are framed in the context of RSVP's receiver
confirmed reservation model, but similar cheats are possible with
sender-initiated and other models.
To summarise, in traditional reservation signalling architectures, if To summarise, in traditional reservation signalling architectures, if
a network cannot trust a neighbouring upstream network to rate-police a network cannot trust a neighbouring upstream network to rate-police
each reservation, it has to check for itself that the data rate fits each reservation, it has to check for itself that the data rate fits
within each of the reservations it has admitted. within each of the reservations it has admitted.
3.2. Generic Scenario 3.2. Generic Scenario
We will now describe a generic internetworking scenario that we will We will now describe a generic internetworking scenario that we will
use to describe and to test our bulk policing proposal. It consists use to describe and to test our bulk policing proposal. It consists
of a number of networks and endpoints that do not fully trust each of a number of networks and endpoints that do not fully trust each
skipping to change at page 10, line 16 skipping to change at page 12, line 45
Within the Diffserv region are three interior domains, A, B and C, as Within the Diffserv region are three interior domains, A, B and C, as
well as the inward facing interfaces of the ingress and egress well as the inward facing interfaces of the ingress and egress
gateways. An ingress and egress border router (BR) is shown gateways. An ingress and egress border router (BR) is shown
interconnecting each interior domain with the next. There may be interconnecting each interior domain with the next. There may be
other interior routers (not shown) within each interior domain. other interior routers (not shown) within each interior domain.
In two paragraphs we now briefly recap how pre-congestion In two paragraphs we now briefly recap how pre-congestion
notification is intended to be used to control flow admission to a notification is intended to be used to control flow admission to a
large Diffserv region. The first paragraph describes data plane large Diffserv region. The first paragraph describes data plane
functions and the second describes signalling in the control plane. functions and the second describes signalling in the control plane.
We omit many details from [CL-deploy] including behaviour during We omit many details from [PCN-arch] including behaviour during
routing changes. For brevity here we assume other flows are already routing changes. For brevity here we assume other flows are already
in progress across a path through the Diffserv region before a new in progress across a path through the Diffserv region before a new
one arrives, but how bootstrap works is described in Section 4.3.2. one arrives, but how bootstrap works is described in Section 4.3.2.
Figure 1 shows a single simplex reserved flow from the sending (Sx) Figure 1 shows a single simplex reserved flow from the sending (Sx)
end host to the receiving (Rx) end host. The ingress gateway polices end host to the receiving (Rx) end host. The ingress gateway polices
incoming traffic within its admitted reservation and remarks it to incoming traffic within its admitted reservation and remarks it to
turn on an ECN-capable codepoint [RFC3168] and the controlled load turn on an ECN-capable codepoint [RFC3168] and the controlled load
(CL) Diffserv codepoint. Together, these codepoints define which (CL) Diffserv codepoint. Together, these codepoints define which
traffic is entitled to the enhanced scheduling of the CL behaviour traffic is entitled to the enhanced scheduling of the CL behaviour
skipping to change at page 11, line 18 skipping to change at page 13, line 46
otherwise it returns the original RESV signal back towards the data otherwise it returns the original RESV signal back towards the data
sender. sender.
Once a reservation is admitted, its traffic will always receive low Once a reservation is admitted, its traffic will always receive low
delay service for the duration of the reservation. This is because delay service for the duration of the reservation. This is because
ingress gateways ensure that traffic not under a reservation cannot ingress gateways ensure that traffic not under a reservation cannot
pass into the Diffserv region with the CL DSCP set. So non-reserved pass into the Diffserv region with the CL DSCP set. So non-reserved
traffic will always be treated with a lower priority PHB at each traffic will always be treated with a lower priority PHB at each
interior router. And even if some disaster re-routes traffic after interior router. And even if some disaster re-routes traffic after
it has been admitted, if the traffic through any resource tips over a it has been admitted, if the traffic through any resource tips over a
fail-safe threshold, pre-congestion notification will trigger flow- fail-safe threshold, pre-congestion notification will trigger flow
pre-emption to very quickly bring every router within the whole pre-emption to very quickly bring every router within the whole
Diffserv region back below its operating point. Diffserv region back below its operating point.
The whole admission control system just described deliberately The whole admission control system just described deliberately
confines per-flow processing to the access edges of the network, confines per-flow processing to the access edges of the network,
where it will not limit the system's scalability. But ideally we where it will not limit the system's scalability. But ideally we
want to extend this approach to multiple networks, to take even more want to extend this approach to multiple networks, to take even more
advantage of its scaling potential. We would still need per-flow advantage of its scaling potential. We would still need per-flow
processing at the access edges of each network, but not at the high processing at the access edges of each network, but not at the high
speed interfaces where they interconnect. Even though such an speed interfaces where they interconnect. Even though such an
admission control system would work technically, it would gain us no admission control system would work technically, it would gain us no
scaling advantage if each network also wanted to police the rate of scaling advantage if each network also wanted to police the rate of
each admitted flow for itself---border routers would still have to do each admitted flow for itself--border routers would still have to do
complex packet operations per-flow anyway, given they don't trust complex packet operations per-flow anyway, given they don't trust
upstream networks to do their policing for them. upstream networks to do their policing for them.
This memo describes how to emulate per-flow rate policing using bulk This memo describes how to emulate per-flow rate policing using bulk
mechanisms at border routers, so the full scalability potential of mechanisms at border routers, so the full scalability potential of
pre-congestion notification is not limited by the need for per-flow pre-congestion notification is not limited by the need for per-flow
policing mechanisms at borders, which would make borders the most policing mechanisms at borders, which would make borders the most
cost-critical pinch-points. Then we can achieve the long sought-for cost-critical pinch-points. Then we can achieve the long sought-for
vision of secure Internet-wide bandwidth reservations without needing vision of secure Internet-wide bandwidth reservations without needing
per-flow processing at all in core and border routers---where per-flow processing at all in core and border routers--where
scalability is most critical. scalability is most critical.
4. Re-ECN Protocol for an RSVP (or similar) Transport 4. Re-ECN Protocol for an RSVP (or similar) Transport
4.1. Protocol Overview 4.1. Protocol Overview
First we need to recap the way routers accumulate congestion marking First we need to recap the way routers accumulate congestion marking
along a path. Each ECN-capable router marks some packets with CE, along a path. Each ECN-capable router marks some packets with CE,
the marking probability increasing with the length of the queue at the marking probability increasing with the length of the queue at
its egress link. The only difference with pre-congestion its egress link. The only difference with pre-congestion
skipping to change at page 14, line 21 skipping to change at page 17, line 5
which need only be read by border policing functions. which need only be read by border policing functions.
Although the RE flag is a separate, single bit field, it can be read Although the RE flag is a separate, single bit field, it can be read
as an extension to the two-bit ECN field; the three concatenated bits as an extension to the two-bit ECN field; the three concatenated bits
in what we will call the extended ECN field (EECN) make eight in what we will call the extended ECN field (EECN) make eight
codepoints available. When the RE flag setting is "don't care", we codepoints available. When the RE flag setting is "don't care", we
use the RFC3168 names of the ECN codepoints, but [Re-TCP] proposes use the RFC3168 names of the ECN codepoints, but [Re-TCP] proposes
the following six codepoint names for when there is a need to be more the following six codepoint names for when there is a need to be more
specific. specific.
+-------+------------+------+---------------+-----------------------+ +--------+-------------+-------+-------------+----------------------+
| ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | ECN | RFC3168 | RE | Extended | Re-ECN meaning |
| field | codepoint | flag | codepoint | | | field | codepoint | flag | ECN | |
+-------+------------+------+---------------+-----------------------+ | | | | codepoint | |
+--------+-------------+-------+-------------+----------------------+
| 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable |
| | | | | transport | | | | | | transport |
| 00 | Not-ECT | 1 | FNE | Feedback not | | 00 | Not-ECT | 1 | FNE | Feedback not |
| | | | | established | | | | | | established |
| 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion |
| | | | | and RECT | | | | | | and RECT |
| 01 | ECT(1) | 1 | RECT | Re-ECN capable | | 01 | ECT(1) | 1 | RECT | Re-ECN capable |
| | | | | transport | | | | | | transport |
| 10 | ECT(0) | 0 | --- | Legacy ECN use | | 10 | ECT(0) | 0 | --- | Legacy ECN use |
| | | | | only | | | | | | only |
| 10 | ECT(0) | 1 | --CU-- | Currently unused | | 10 | ECT(0) | 1 | --CU-- | Currently unused |
| | | | | | | | | | | |
| 11 | CE | 0 | CE(0) | Congestion | | 11 | CE | 0 | CE(0) | Congestion |
| | | | | experienced with | | | | | | experienced with |
| | | | | Re-Echo | | | | | | Re-Echo |
| 11 | CE | 1 | CE(-1) | Congestion | | 11 | CE | 1 | CE(-1) | Congestion |
| | | | | experienced | | | | | | experienced |
+-------+------------+------+---------------+-----------------------+ +--------+-------------+-------+-------------+----------------------+
Table 1: Re-cap of Default Extended ECN Codepoints Proposed for Re- Table 1: Re-cap of Default Extended ECN Codepoints Proposed for Re-
ECN ECN
4.2.2. Re-ECN Combined with Pre-Congestion Notification (re-PCN) 4.2.2. Re-ECN Combined with Pre-Congestion Notification (re-PCN)
As permitted by the ECN specification [RFC3168], a proposal is As permitted by the ECN specification [RFC3168], a proposal is
currently being advanced in the IETF to define different semantics currently being advanced in the IETF to define different semantics
for how routers might mark the ECN field of certain packets. The for how routers might mark the ECN field of certain packets. The
idea is to be able to notify congestion when the router's load idea is to be able to notify congestion when the router's load
skipping to change at page 16, line 4 skipping to change at page 18, line 36
sending node (or its proxy) to detect suppression of congestion sending node (or its proxy) to detect suppression of congestion
marking in the feedback loop. Thus the Nonce requires the sender or marking in the feedback loop. Thus the Nonce requires the sender or
its proxy to be trusted to respond correctly to congestion. But this its proxy to be trusted to respond correctly to congestion. But this
is precisely the main cheat we want to protect against (as well as is precisely the main cheat we want to protect against (as well as
many others). many others).
One of the compromise protocol encodings that [PCN] explores One of the compromise protocol encodings that [PCN] explores
("Alternative 5") leaves out support for the ECN Nonce. Therefore we ("Alternative 5") leaves out support for the ECN Nonce. Therefore we
use that one. This encoding of PCN markings is shown on the left of use that one. This encoding of PCN markings is shown on the left of
Table 2. Note that these codepoints of the ECN field only take on Table 2. Note that these codepoints of the ECN field only take on
the semantics of pre-congestion noticiation if they are combined with the semantics of pre-congestion notification if they are combined
a Diffserv codepoint that the operator has configured to cause PCN with a Diffserv codepoint that the operator has configured to cause
marking, by mapping it to a PCN-enhanced PHB. PCN marking, by mapping it to a PCN-enhanced PHB.
For the rest of this memo, we will not distinguish between Admission For the rest of this memo, we will not distinguish between Admission
Marking and Pre-emption Marking unless we need to be specific. We Marking and Pre-emption Marking unless we need to be specific. We
will call both "congestion marking". With the above encoding, will call both "congestion marking". With the above encoding,
congestion marking can be read to mean any packet with the left-most congestion marking can be read to mean any packet with the left-most
bit of the ECN field set. bit of the ECN field set.
The re-ECN protocol can be used to control misbehaving sources The re-ECN protocol can be used to control misbehaving sources
whether congestion is with respect to a logical threshold (PCN) or whether congestion is with respect to a logical threshold (PCN) or
the physical line rate (ECN). In either case the RE flag can be used the physical line rate (ECN). In either case the RE flag can be used
to create an extended ECN field. For PCN-capable packets, the 8 to create an extended ECN field. For PCN-capable packets, the 8
possible encodings of this 3-bit extended ECN (EECN) field are possible encodings of this 3-bit extended ECN (EECN) field are
defined on the right of Table 2 below. The purposes of these defined on the right of Table 2 below. The purposes of these
different codepoints will be introduced in subsequent sections. different codepoints will be introduced in subsequent sections.
+-------+-----------------+------+-------------+--------------------+ +-------+-----------------+------+--------------+-------------------+
| ECN | PCN codepoint | RE | Extended | Re-ECN meaning | | ECN | PCN codepoint | RE | Extended ECN | Re-ECN meaning |
| field | (Alternative 5) | flag | ECN | | | field | (Alternative 5) | flag | codepoint | |
| | | | codepoint | | +-------+-----------------+------+--------------+-------------------+
+-------+-----------------+------+-------------+--------------------+ | 00 | Not-ECT | 0 | Not-RECT | Not |
| 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | | | | | re-ECN-capable |
| | | | | transport | | | | | | transport |
| 00 | Not-ECT | 1 | FNE | Feedback not | | 00 | Not-ECT | 1 | FNE | Feedback not |
| | | | | established | | | | | | established |
| 01 | ECT(1) | 0 | Re-Echo | Re-echoed | | 01 | ECT(1) | 0 | Re-Echo | Re-echoed |
| | | | | congestion and | | | | | | congestion and |
| | | | | RECT | | | | | | RECT |
| 01 | ECT(1) | 1 | RECT | Re-ECN capable | | 01 | ECT(1) | 1 | RECT | Re-ECN capable |
| | | | | transport | | | | | | transport |
| 10 | AM | 0 | AM(0) | Admission Marking | | 10 | AM | 0 | AM(0) | Admission Marking |
| | | | | with Re-Echo | | | | | | with Re-Echo |
| 10 | AM | 1 | AM(-1) | Admission Marking | | 10 | AM | 1 | AM(-1) | Admission Marking |
| | | | | | | | | | | |
| 11 | PM | 0 | PM(0) | Pre-emption | | 11 | PM | 0 | PM(0) | Pre-emption |
| | | | | Marking with | | | | | | Marking with |
| | | | | Re-Echo | | | | | | Re-Echo |
| 11 | PM | 1 | PM(-1) | Pre-emption | | 11 | PM | 1 | PM(-1) | Pre-emption |
| | | | | Marking | | | | | | Marking |
+-------+-----------------+------+-------------+--------------------+ +-------+-----------------+------+--------------+-------------------+
Table 2: Extended ECN Codepoints if the Diffserv codepoint uses Pre- Table 2: Extended ECN Codepoints if the Diffserv codepoint uses Pre-
congestion Notification (PCN) congestion Notification (PCN)
4.3. Protocol Operation 4.3. Protocol Operation
4.3.1. Protocol Operation for an Established Flow 4.3.1. Protocol Operation for an Established Flow
The re-ECN protocol involves a simple tweak to the action of the The re-ECN protocol involves a simple tweak to the action of the
gateway at the ingress edge of the CL region. In the deployment gateway at the ingress edge of the CL region. In the deployment
model just described [CL-deploy], for each active traffic aggregate model just described [PCN-arch], for each active traffic aggregate
across the CL region (CL-region-aggregate) the ingress gateway will across the CL region (CL-region-aggregate) the ingress gateway will
hold a fairly recent Congestion-Level-Estimate that the egress hold a fairly recent Congestion-Level-Estimate that the egress
gateway will have fed back to it, piggybacked on the signalling that gateway will have fed back to it, piggybacked on the signalling that
sets up each flow. For instance, one aggregate might have been sets up each flow. For instance, one aggregate might have been
experiencing 3% pre-congestion (that is, congestion marked octets experiencing 3% pre-congestion (that is, congestion marked octets
whether Admission Marked or Pre-emption Marked). In this case, the whether Admission Marked or Pre-emption Marked). In this case, the
ingress gateway MUST clear the RE flag to "0" for the same percentage ingress gateway MUST clear the RE flag to "0" for the same percentage
of octets of CL-packets (3%) and set it to "1" in the rest (97%). of octets of CL-packets (3%) and set it to "1" in the rest (97%).
Appendix A.1 gives a simple pseudo-code algorithm that the ingress Appendix A.1 gives a simple pseudo-code algorithm that the ingress
gateway may use to do this. gateway may use to do this.
skipping to change at page 18, line 49 skipping to change at page 21, line 30
4.3.2. Aggregate Bootstrap 4.3.2. Aggregate Bootstrap
When a new reservation PATH message arrives at the egress, if there When a new reservation PATH message arrives at the egress, if there
are currently no flows in progress from the same ingress, there will are currently no flows in progress from the same ingress, there will
be no state maintaining the current level of pre-congestion marking be no state maintaining the current level of pre-congestion marking
for the aggregate. While the reservation signalling continues onward for the aggregate. While the reservation signalling continues onward
towards the receiving host, the egress gateway returns an RSVP towards the receiving host, the egress gateway returns an RSVP
message to the ingress with a flag [RSVP-ECN] asking the ingress to message to the ingress with a flag [RSVP-ECN] asking the ingress to
send a specified number of data probes between them. This bootstrap send a specified number of data probes between them. This bootstrap
behaviour is all described in the deployment model [CL-deploy]. behaviour is all described in the deployment model [PCN-arch].
However, with our new re-ECN scheme, the ingress does not know what However, with our new re-ECN scheme, the ingress does not know what
proportion of the data probes should have the RE flag blanked, proportion of the data probes should have the RE flag blanked,
because it has no estimate yet of pre-congestion for the path across because it has no estimate yet of pre-congestion for the path across
the Diffserv region. the Diffserv region.
To be conservative, following the guidance for specifying other re- To be conservative, following the guidance for specifying other re-
ECN transports in [Re-TCP], the ingress SHOULD set the FNE codepoint ECN transports in [Re-TCP], the ingress SHOULD set the FNE codepoint
of the extended ECN header in all probe packets (Table 2). As per of the extended ECN header in all probe packets (Table 2). As per
the deployment model, the egress gateway measures the fraction of the deployment model, the egress gateway measures the fraction of
skipping to change at page 20, line 19 skipping to change at page 22, line 48
gateway. It will often be possible to apply sanctions at the gateway. It will often be possible to apply sanctions at the
granularity of aggregates rather than flows, but in an internetworked granularity of aggregates rather than flows, but in an internetworked
environment it cannot be guaranteed that aggregates will be environment it cannot be guaranteed that aggregates will be
identifiable in remote networks. So setting FNE at the start of each identifiable in remote networks. So setting FNE at the start of each
flow is a safe strategy. For instance, a remote network may have flow is a safe strategy. For instance, a remote network may have
equal cost multi-path (ECMP) routing enabled, causing different flows equal cost multi-path (ECMP) routing enabled, causing different flows
between the same gateways to traverse different paths. between the same gateways to traverse different paths.
After an idle period of more than 1 second, the ingress gateway After an idle period of more than 1 second, the ingress gateway
SHOULD set the EECN field of the next packet it sends to FNE. This SHOULD set the EECN field of the next packet it sends to FNE. This
allows the design of network policers to be deterministic (see [Re- allows the design of network policers to be deterministic (see
TCP]). [Re-TCP]).
However, if the ingress gateway can guarantee that the network(s) However, if the ingress gateway can guarantee that the network(s)
that will carry the flow to its egress gateway all use a common that will carry the flow to its egress gateway all use a common
identifier for the aggregate (e.g. a single MPLS network without ECMP identifier for the aggregate (e.g. a single MPLS network without ECMP
routing), it MAY NOT set FNE when it adds a new flow to an active routing), it MAY NOT set FNE when it adds a new flow to an active
aggregate. And an FNE packet need only be sent if a whole aggregate aggregate. And an FNE packet need only be sent if a whole aggregate
has been idle for more than 1 second. has been idle for more than 1 second.
4.3.4. Router Forwarding Behaviour 4.3.4. Router Forwarding Behaviour
skipping to change at page 21, line 5 skipping to change at page 23, line 25
congestion notification: congestion notification:
Preferential drop: When a router cannot avoid dropping ECN-capable Preferential drop: When a router cannot avoid dropping ECN-capable
packets, preferential dropping of packets with different extended packets, preferential dropping of packets with different extended
ECN codepoints SHOULD be implemented between packets within a PHB ECN codepoints SHOULD be implemented between packets within a PHB
that uses PCN marking. The drop preference order to use is that uses PCN marking. The drop preference order to use is
defined in Table 4. Note that to reduce configuration complexity, defined in Table 4. Note that to reduce configuration complexity,
Re-Echo and FNE MAY be given the same drop preference, but if Re-Echo and FNE MAY be given the same drop preference, but if
feasible, FNE should be dropped in preference to Re-Echo. feasible, FNE should be dropped in preference to Re-Echo.
+--------+------+----------------+---------+------------------------+ +---------+-------+----------------+---------+----------------------+
| ECN | RE | Extended ECN | Drop | Re-ECN meaning | | ECN | RE | Extended ECN | Drop | Re-ECN meaning |
| field | flag | codepoint | Pref | | | field | flag | codepoint | Pref | |
+--------+------+----------------+---------+------------------------+ +---------+-------+----------------+---------+----------------------+
| 01 | 0 | Re-Echo | 5/4 | Re-echoed congestion | | 01 | 0 | Re-Echo | 5/4 | Re-echoed congestion |
| | | | | and RECT | | | | | | and RECT |
| 00 | 1 | FNE | 4 | Feedback not | | 00 | 1 | FNE | 4 | Feedback not |
| | | | | established | | | | | | established |
| 01 | 1 | RECT | 3 | Re-ECN capable | | 01 | 1 | RECT | 3 | Re-ECN capable |
| | | | | transport | | | | | | transport |
| 10 | 0 | AM(0) | 3 | Admission Marking with | | 10 | 0 | AM(0) | 3 | Admission Marking |
| | | | | Re-Echo | | | | | | with Re-Echo |
| 10 | 1 | AM(-1) | 3 | Admission Marking | | 10 | 1 | AM(-1) | 3 | Admission Marking |
| | | | | | | | | | | |
| 11 | 0 | PM(0) | 2 | Pre-emption Marking | | 11 | 0 | PM(0) | 2 | Pre-emption Marking |
| | | | | with Re-Echo | | | | | | with Re-Echo |
| 11 | 1 | PM(-1) | 2 | Pre-emption Marking | | 11 | 1 | PM(-1) | 2 | Pre-emption Marking |
| | | | | | | | | | | |
| 00 | 0 | Not-RECT | 1 | Not re-ECN-capable | | 00 | 0 | Not-RECT | 1 | Not re-ECN-capable |
| | | | | transport | | | | | | transport |
+--------+------+----------------+---------+------------------------+ +---------+-------+----------------+---------+----------------------+
Table 4: Drop Preference of Extended ECN Codepoints (1 = drop 1st) Table 4: Drop Preference of Extended ECN Codepoints (1 = drop 1st)
Given this proposal is being advanced at the same time as PCN Given this proposal is being advanced at the same time as PCN
itself, we strongly RECOMMEND that preferential drop based on itself, we strongly RECOMMEND that preferential drop based on
extended ECN codepoint is added to router forwarding at the same extended ECN codepoint is added to router forwarding at the same
time as PCN marking. Preferential dropping can be difficult to time as PCN marking. Preferential dropping can be difficult to
implement, but we strongly RECOMMEND this security-related re-ECN implement, but we strongly RECOMMEND this security-related re-ECN
improvement where feasible as it is an effective defence against improvement where feasible as it is an effective defence against
flooding attacks. flooding attacks.
Marking vs. Drop: We propose that PCN-routers SHOULD inspect the RE Marking vs. Drop: We propose that PCN-routers SHOULD inspect the RE
flag as well as the ECN field to decide whether to drop or mark flag as well as the ECN field to decide whether to drop or mark
skipping to change at page 22, line 6 skipping to change at page 24, line 30
understand drop, not congestion marking. But a PCN-capable router understand drop, not congestion marking. But a PCN-capable router
can mark rather than drop an FNE packet, even though its ECN field can mark rather than drop an FNE packet, even though its ECN field
when looked at in isolation is '00' which appears to be a legacy when looked at in isolation is '00' which appears to be a legacy
Not-ECT packet. Therefore, if a packet's RE flag is '1', even if Not-ECT packet. Therefore, if a packet's RE flag is '1', even if
its ECN field is '00', a PCN-enabled router SHOULD use congestion its ECN field is '00', a PCN-enabled router SHOULD use congestion
marking. This allows the `feedback not established' (FNE) marking. This allows the `feedback not established' (FNE)
codepoint to be used for probe packets, in order to pick up PCN codepoint to be used for probe packets, in order to pick up PCN
marking when bootstrapping an aggregate. marking when bootstrapping an aggregate.
ECN marking rather than dropping of FNE packets MUST only be ECN marking rather than dropping of FNE packets MUST only be
deployed in controlled environments, such as that in [CL-deploy], deployed in controlled environments, such as that in [PCN-arch],
where the presence of an egress node that understands ECN marking where the presence of an egress node that understands ECN marking
is assured. Congestion events might otherwise be ignored if the is assured. Congestion events might otherwise be ignored if the
receiver only understands drop, rather than ECN marking. This is receiver only understands drop, rather than ECN marking. This is
because there is no guarantee that ECN capability has been because there is no guarantee that ECN capability has been
negotiated if feedback is not established (FNE). Also, [Re-TCP] negotiated if feedback is not established (FNE). Also, [Re-TCP]
places the strong condition that a router MUST apply drop rather places the strong condition that a router MUST apply drop rather
than marking to FNE packets unless it can guarantee that FNE than marking to FNE packets unless it can guarantee that FNE
packets are rate limited either locally or upstream. packets are rate limited either locally or upstream.
4.3.5. Extensions 4.3.5. Extensions
If a different signalling system, such as NSIS, were used, but it If a different signalling system, such as NSIS, were used, but it
provided admission control in a similar way, using pre-congestion provided admission control in a similar way, using pre-congestion
notification (e.g. with RMD [NSIS-RMD]) we believe re-ECN could be notification (e.g. with RMD [NSIS-RMD]) we believe re-ECN could be
used to protect against misbehaving networks in the same way as used to protect against misbehaving networks in the same way as
proposed above. proposed above.
5. Emulating Border Policing with Re-ECN 5. Emulating Border Policing with Re-ECN
Note that the re-ECN protocol described in Section 4 above would
require standardisation, whereas operators acting in their own
interests would be expected to deploy policing and monitoring
functions similar to those proposed in the sections below without any
further need for standardisation by the IETF. Flexibility is
expected in exactly how policing and monitoring is done.
5.1. Informal Terminology 5.1. Informal Terminology
In the rest of this memo, where the context makes it clear, we will In the rest of this memo, where the context makes it clear, we will
sometimes loosely use the term `congestion' rather than using the sometimes loosely use the term `congestion' rather than using the
stricter `downstream pre-congestion'. Also we will loosely talk of stricter `downstream pre-congestion'. Also we will loosely talk of
positive or negative flows, meaning flows where the moving average of positive or negative flows, meaning flows where the moving average of
the downstream pre-congestion metric is persistently positive or the downstream pre-congestion metric is persistently positive or
negative. The notion of a negative metric arises because it is negative. The notion of a negative metric arises because it is
derived by subtracting one metric from another. Of course actual derived by subtracting one metric from another. Of course actual
downstream congestion cannot be negative, only the metric can downstream congestion cannot be negative, only the metric can
skipping to change at page 23, line 7 skipping to change at page 26, line 5
0. Blanking the RE flag increments the worth of a packet to +1. 0. Blanking the RE flag increments the worth of a packet to +1.
Congestion marking a packet decrements its worth (whether admission Congestion marking a packet decrements its worth (whether admission
marking or pre-emption marking). Congestion marking a previously marking or pre-emption marking). Congestion marking a previously
blanked packet cancel out the positive and negative worth of each blanked packet cancel out the positive and negative worth of each
marking (a worth of 0). The FNE codepoint is an exception. It has marking (a worth of 0). The FNE codepoint is an exception. It has
the same positive worth as a packet with the Re-Echo codepoint. The the same positive worth as a packet with the Re-Echo codepoint. The
table below specifies unambiguously the worth of each extended ECN table below specifies unambiguously the worth of each extended ECN
codepoint. Note the order is different from the previous table to codepoint. Note the order is different from the previous table to
emphasise how congestion marking processes decrement the worth. emphasise how congestion marking processes decrement the worth.
+--------+------+------------------+-------+------------------------+ +---------+-------+-----------------+-------+-----------------------+
| ECN | RE | Extended ECN | Worth | Re-ECN meaning | | ECN | RE | Extended ECN | Worth | Re-ECN meaning |
| field | flag | codepoint | | | | field | flag | codepoint | | |
+--------+------+------------------+-------+------------------------+ +---------+-------+-----------------+-------+-----------------------+
| 00 | 0 | Not-RECT | n/a | Not re-ECN-capable | | 00 | 0 | Not-RECT | n/a | Not re-ECN-capable |
| | | | | transport | | | | | | transport |
| 01 | 0 | Re-Echo | +1 | Re-echoed congestion | | 01 | 0 | Re-Echo | +1 | Re-echoed congestion |
| | | | | and RECT | | | | | | and RECT |
| 10 | 0 | AM(0) | 0 | Admission Marking with | | 10 | 0 | AM(0) | 0 | Admission Marking |
| | | | | Re-Echo | | | | | | with Re-Echo |
| 11 | 0 | PM(0) | 0 | Pre-emption Marking | | 11 | 0 | PM(0) | 0 | Pre-emption Marking |
| | | | | with Re-Echo | | | | | | with Re-Echo |
| 00 | 1 | FNE | +1 | Feedback not | | 00 | 1 | FNE | +1 | Feedback not |
| | | | | established | | | | | | established |
| 01 | 1 | RECT | 0 | Re-ECN capable | | 01 | 1 | RECT | 0 | Re-ECN capable |
| | | | | transport | | | | | | transport |
| 10 | 1 | AM(-1) | -1 | Admission Marking | | 10 | 1 | AM(-1) | -1 | Admission Marking |
| | | | | | | | | | | |
| 11 | 1 | PM(-1) | -1 | Pre-emption Marking | | 11 | 1 | PM(-1) | -1 | Pre-emption Marking |
+--------+------+------------------+-------+------------------------+ +---------+-------+-----------------+-------+-----------------------+
Table 5: 'Worth' of Extended ECN Codepoints Table 5: 'Worth' of Extended ECN Codepoints
5.2. Policing Overview 5.2. Policing Overview
It will be recalled that downstream congestion can be found by It will be recalled that downstream congestion can be found by
subtracting upstream congestion from path congestion. Figure 4 subtracting upstream congestion from path congestion. Figure 4
displays the difference between the two plots in Figure 3 to show displays the difference between the two plots in Figure 3 to show
downstream pre-congestion across the same path through the Internet. downstream pre-congestion across the same path through the Internet.
skipping to change at page 24, line 41 skipping to change at page 27, line 41
sanctions to flows if downstream congestion goes negative before the sanctions to flows if downstream congestion goes negative before the
egress gateway. The upward arrow at Domain C's border with the egress gateway. The upward arrow at Domain C's border with the
egress gateway represents the incentive the sanctions would create to egress gateway represents the incentive the sanctions would create to
prevent negative traffic. The same upward pressure can be applied at prevent negative traffic. The same upward pressure can be applied at
any domain border (arrows not shown). any domain border (arrows not shown).
Any flow that persistently goes negative by the time it leaves a Any flow that persistently goes negative by the time it leaves a
domain must not have been marked correctly in the first place. A domain must not have been marked correctly in the first place. A
domain that discovers such a flow can adopt a range of strategies to domain that discovers such a flow can adopt a range of strategies to
protect itself. Which strategy it uses will depend on policy, protect itself. Which strategy it uses will depend on policy,
because it cannot immediately assume malice---there may be an because it cannot immediately assume malice--there may be an innocent
innocent configuration error somewhere in the system. configuration error somewhere in the system.
This memo does not propose to standardise any particular mechanism to This memo does not propose to standardise any particular mechanism to
detect persistently negative flows, but Section 5.5 does give detect persistently negative flows, but Section 5.5 does give
examples. Note that we have used the term flow, but there will be no examples. Note that we have used the term flow, but there will be no
need to bury into the transport layer for port numbers; identifiers need to bury into the transport layer for port numbers; identifiers
visible in the network layer will be sufficient (IP address pair, visible in the network layer will be sufficient (IP address pair,
DSCP, protocol ID). The appendix also gives a mechanism to bound the DSCP, protocol ID). The appendix also gives a mechanism to bound the
required flow state, preventing state exhaustion attacks. required flow state, preventing state exhaustion attacks.
Of course, some domains may trust other domains to comply with Of course, some domains may trust other domains to comply with
skipping to change at page 26, line 28 skipping to change at page 29, line 28
price to pre-congestion itself. Then the usage element of the price to pre-congestion itself. Then the usage element of the
interconnection contract would directly relate to the volume of pre- interconnection contract would directly relate to the volume of pre-
congestion caused by the upstream network. congestion caused by the upstream network.
The direction of penalties and charges relative to the direction of The direction of penalties and charges relative to the direction of
traffic flow is a constant source of confusion. Typically, where traffic flow is a constant source of confusion. Typically, where
capacity charges are concerned, lower tier customer networks pay capacity charges are concerned, lower tier customer networks pay
higher tier provider networks. So money flows from the edges to the higher tier provider networks. So money flows from the edges to the
middle of the internetwork, towards greater connectivity, middle of the internetwork, towards greater connectivity,
irrespective of the flow of data. But we advise that penalties or irrespective of the flow of data. But we advise that penalties or
charges for usage should follow the same direction as the data charges for usage should follow the same direction as the data flow--
flow---the direction of control at the network layer. Otherwise a the direction of control at the network layer. Otherwise a network
network lays itself open to `denial of funds' attacks. So, where a lays itself open to `denial of funds' attacks. So, where a tier 2
tier 2 provider sends data into a tier 3 customer network, we would provider sends data into a tier 3 customer network, we would expect
expect the penalty clauses for sending too much pre-congestion to be the penalty clauses for sending too much pre-congestion to be against
against the tier 2 network, even though it is the provider. the tier 2 network, even though it is the provider.
It may help to remember that data will be flowing in the other It may help to remember that data will be flowing in the other
direction too. So the provider network has as much opportunity to direction too. So the provider network has as much opportunity to
levy usage penalties as its customer, and it can set the price or levy usage penalties as its customer, and it can set the price or
strength of its own penalties higher if it chooses. Usage charges in strength of its own penalties higher if it chooses. Usage charges in
both directions tend to cancel each other out, which confirms that both directions tend to cancel each other out, which confirms that
usage-charging is less to do with revenue raising and more to do with usage-charging is less to do with revenue raising and more to do with
encouraging load control discipline in order to smooth peaks and encouraging load control discipline in order to smooth peaks and
troughs, improving utilisation and quality. troughs, improving utilisation and quality.
skipping to change at page 28, line 50 skipping to change at page 31, line 50
cheater, because the penalties are at least proportionate to the cheater, because the penalties are at least proportionate to the
level of the cheat. If an edge network operator is selling level of the cheat. If an edge network operator is selling
reservations at a large profit over the congestion cost, these pre- reservations at a large profit over the congestion cost, these pre-
congestion penalties will not be sufficient to ensure networks in the congestion penalties will not be sufficient to ensure networks in the
middle get a share of those profits, but at least they can cover middle get a share of those profits, but at least they can cover
their costs. their costs.
We will now explain with an example. When a whole inter-network is We will now explain with an example. When a whole inter-network is
operating at normal (typically very low) congestion, the pre- operating at normal (typically very low) congestion, the pre-
congestion marking from virtual queues will be a little higher than congestion marking from virtual queues will be a little higher than
if the real queues had been used---still low, but more noticeable. if the real queues had been used--still low, but more noticeable.
But low congestion levels do not imply that usage /charges/ must also But low congestion levels do not imply that usage _charges_ must also
be low. Usage charges will depend on the /price/ L as well. be low. Usage charges will depend on the _price_ L as well.
If the metric of the usage element of an interconnection agreement If the metric of the usage element of an interconnection agreement
was changed from pure volume to pre-congested volume, one would was changed from pure volume to pre-congested volume, one would
expect the price of pre-congestion to be arranged so that the total expect the price of pre-congestion to be arranged so that the total
usage charge remained about the same. So, if an average pre- usage charge remained about the same. So, if an average pre-
congestion fraction turned out to be 1/1000, one would expect that congestion fraction turned out to be 1/1000, one would expect that
the price L (per octet) of pre-congestion would be about 1000 times the price L (per octet) of pre-congestion would be about 1000 times
the previously used (per octet) price for volume. We should add that the previously used (per octet) price for volume. We should add that
a switch to pre-congestion is unlikely to exactly maintain the same a switch to pre-congestion is unlikely to exactly maintain the same
overall level of usage charges, but this argument will be overall level of usage charges, but this argument will be
approximately true, because usage charge will rise to at least the approximately true, because usage charge will rise to at least the
level the market finds necessary to push back against usage. level the market finds necessary to push back against usage.
From the above example it can be seen why a 1000x higher price will From the above example it can be seen why a 1000x higher price will
make operators become acutely sensitive to the congestion they cause make operators become acutely sensitive to the congestion they cause
in other networks, which is of course the desired effect; to in other networks, which is of course the desired effect; to
encourage networks to /control/ the congestion they allow their users encourage networks to _control_ the congestion they allow their users
to cause to others. to cause to others.
If any network sends even one flow at higher rate, they will If any network sends even one flow at higher rate, they will
immediately have to pay proportionately more usage charges. Because immediately have to pay proportionately more usage charges. Because
there is no knowledge of reservations within the Diffserv region, no there is no knowledge of reservations within the Diffserv region, no
interior router can police whether the rate of each flow is greater interior router can police whether the rate of each flow is greater
than each reservation. So the system doesn't truly emulate rate- than each reservation. So the system doesn't truly emulate rate-
policing of each flow. But there is no incentive to pack a higher policing of each flow. But there is no incentive to pack a higher
rate into a reservation, because the charges are directly rate into a reservation, because the charges are directly
proportional to rate, irrespective of the reservations. proportional to rate, irrespective of the reservations.
skipping to change at page 30, line 8 skipping to change at page 33, line 8
5.5. Sanctioning Dishonest Marking 5.5. Sanctioning Dishonest Marking
As CL traffic leaves the last network before the egress gateway As CL traffic leaves the last network before the egress gateway
(domain C) the RE blanking fraction should match the congestion (domain C) the RE blanking fraction should match the congestion
marking fraction, when averaged over a sufficiently long duration marking fraction, when averaged over a sufficiently long duration
(perhaps ~10s to allow a few rounds of feedback through regular (perhaps ~10s to allow a few rounds of feedback through regular
signalling of new and refreshed reservations). signalling of new and refreshed reservations).
To protect itself, domain C should install a monitor at its egress. To protect itself, domain C should install a monitor at its egress.
It aims to detect flows of CL packets that are persistently negative. It aims to detect flows of CL packets that are persistently negative.
If flows are positive, domain C need take no action---this simply If flows are positive, domain C need take no action--this simply
means an upstream network must be paying more penalties than it needs means an upstream network must be paying more penalties than it needs
to. Appendix A.3 gives a suggested algorithm for the monitor, to. Appendix A.3 gives a suggested algorithm for the monitor,
meeting the criteria below. meeting the criteria below.
o It SHOULD introduce minimal false positives for honest flows; o It SHOULD introduce minimal false positives for honest flows;
o It SHOULD quickly detect and sanction dishonest flows (minimal o It SHOULD quickly detect and sanction dishonest flows (minimal
false negatives); false negatives);
o It MUST be invulnerable to state exhaustion attacks from malicious o It MUST be invulnerable to state exhaustion attacks from malicious
skipping to change at page 31, line 49 skipping to change at page 34, line 49
5.6. Border Mechanisms 5.6. Border Mechanisms
5.6.1. Border Accounting Mechanisms 5.6.1. Border Accounting Mechanisms
One of the main design goals of re-ECN was for border security One of the main design goals of re-ECN was for border security
mechanisms to be as simple as possible, otherwise they would become mechanisms to be as simple as possible, otherwise they would become
the pinch-points that limit scalability of the whole internetwork. the pinch-points that limit scalability of the whole internetwork.
As the title of this memo suggests, we want to avoid per-flow As the title of this memo suggests, we want to avoid per-flow
processing at borders. We also want to keep to passive mechanisms processing at borders. We also want to keep to passive mechanisms
that can monitor traffic in parallel to forwarding, rather than that can monitor traffic in parallel to forwarding, rather than
having to filter traffic inline---in series with forwarding. As data having to filter traffic inline--in series with forwarding. As data
rates continue to rise, we suspect that all-optical interconnection rates continue to rise, we suspect that all-optical interconnection
between networks will soon be a requirement. So we want to avoid any between networks will soon be a requirement. So we want to avoid any
new need for buffering (even though border filtering is current new need for buffering (even though border filtering is current
practice for other reasons, we don't want to make it even less likely practice for other reasons, we don't want to make it even less likely
that we will ever get rid of it). that we will ever get rid of it).
So far, we have been able to keep the border mechanisms simple, So far, we have been able to keep the border mechanisms simple,
despite having had to harden them against some subtle attacks on the despite having had to harden them against some subtle attacks on the
re-ECN design. The mechanisms are still passive and avoid per-flow re-ECN design. The mechanisms are still passive and avoid per-flow
processing, although we do use filtering as a fail-safe to processing, although we do use filtering as a fail-safe to
skipping to change at page 34, line 18 skipping to change at page 37, line 18
negative flows may not be easy, just the single step of neutralising negative flows may not be easy, just the single step of neutralising
their polluting effect on congestion metrics removes all the gains their polluting effect on congestion metrics removes all the gains
networks could otherwise make from mounting dummy traffic attacks on networks could otherwise make from mounting dummy traffic attacks on
each other. This puts all networks on the same side (only with each other. This puts all networks on the same side (only with
respect to negative flows of course), rather than being pitched respect to negative flows of course), rather than being pitched
against each other. The network where this flow goes negative as against each other. The network where this flow goes negative as
well as all the networks downstream lose out from not being well as all the networks downstream lose out from not being
reimbursed for any congestion this flow causes. So they all have an reimbursed for any congestion this flow causes. So they all have an
interest in getting rid of these negative flows. Networks forwarding interest in getting rid of these negative flows. Networks forwarding
a flow before it goes negative aren't strictly on the same side, but a flow before it goes negative aren't strictly on the same side, but
they are disinterested bystanders---they don't care that the flow they are disinterested bystanders--they don't care that the flow goes
goes negative downstream, but at least they can't actively gain from negative downstream, but at least they can't actively gain from
making it go negative. The problem becomes localised so that once a making it go negative. The problem becomes localised so that once a
flow goes negative, all the networks from where it happens and beyond flow goes negative, all the networks from where it happens and beyond
downstream each have a small problem, each can detect it has a downstream each have a small problem, each can detect it has a
problem and each can get rid of the problem if it chooses to. But problem and each can get rid of the problem if it chooses to. But
negative flows can no longer be used for any new attacks. negative flows can no longer be used for any new attacks.
Once an unbiased estimate of the effect of negative flows can be Once an unbiased estimate of the effect of negative flows can be
made, the problem reduces to detecting and preferably removing flows made, the problem reduces to detecting and preferably removing flows
that have gone negative as soon as possible. But importantly, that have gone negative as soon as possible. But importantly,
complete eradication of negative flows is no longer critical---best complete eradication of negative flows is no longer critical--best
endeavours will be sufficient. endeavours will be sufficient.
Note that the guiding principle behind all the above discussion is Note that the guiding principle behind all the above discussion is
that any gain from subverting the protocol should be precisely that any gain from subverting the protocol should be precisely
neutralised, rather than punished. If a gain is punished to a neutralised, rather than punished. If a gain is punished to a
greater extent than is sufficient to neutralise it, it will most greater extent than is sufficient to neutralise it, it will most
likely open up a new vulnerability, where the amplifying effect of likely open up a new vulnerability, where the amplifying effect of
the punishment mechanism can be turned on others. the punishment mechanism can be turned on others.
For instance, if possible, flows should be removed as soon as they go For instance, if possible, flows should be removed as soon as they go
skipping to change at page 35, line 16 skipping to change at page 38, line 16
5.6.2. Competitive Routing 5.6.2. Competitive Routing
With the above penalty system, each domain seems to have a perverse With the above penalty system, each domain seems to have a perverse
incentive to fake pre-congestion. For instance domain B profits from incentive to fake pre-congestion. For instance domain B profits from
the difference between penalties it receives at its ingress (its the difference between penalties it receives at its ingress (its
revenue) and those it pays at its egress (its cost). So if B revenue) and those it pays at its egress (its cost). So if B
overstates internal pre-congestion it seems to increase its profit. overstates internal pre-congestion it seems to increase its profit.
However, we can assume that domain A could bypass B, routing through However, we can assume that domain A could bypass B, routing through
other domains to reach the egress. So the competitive discipline of other domains to reach the egress. So the competitive discipline of
least-cost routing can ensure that any domain tempted to fake pre- least-cost routing can ensure that any domain tempted to fake pre-
congestion for profit risks losing /all/ its incoming traffic. The congestion for profit risks losing _all_ its incoming traffic. The
least congested route would eventually be able to win this least congested route would eventually be able to win this
competitive game, only as long as it didn't declare more fake pre- competitive game, only as long as it didn't declare more fake pre-
congestion than the next most competitive route. congestion than the next most competitive route.
The competitive effect of interdomain routing might be weaker nearer
to the egress. For instance, C may be the only route B can take to
reach the ultimate receiver. And if C over-penalises B, the egress
gateway and the ultimate receiver seem to have no incentive to move
their terminating attachment to another network, because only B and
those upstream of B suffer the higher penalties. However, we must
remember that we are only looking at the money flows at the
unidirectional network layer. There are likely to be all sorts of
higher level business models constructed over the top of these low
level 'sender-pays' penalties. For instance, we might expect a
session layer charging model where the session originator pays for a
pair of duplex flows, one as receiver and one as sender.
Traditionally this has been a common model for telephony and we might
expect it to be used, at least sometimes, for other media such as
video. Wherever such a model is used, the data receiver will be
directly affected if its sessions terminate through a network like C
that fakes congestion to over-penalise B. So end-customers will
experience a direct competitive pressure to switch to cheaper
networks, away from networks like C that try to over-penalise B.
This memo does not need to standardise any particular mechanism for This memo does not need to standardise any particular mechanism for
routing based on re-ECN. Goldenberg et al [Smart_rtg] refers to routing based on re-ECN. Goldenberg et al [Smart_rtg] refers to
various commercial products and presents its own algorithms for various commercial products and presents its own algorithms for
moving traffic between multi-homed routes based on usage charges. moving traffic between multi-homed routes based on usage charges.
None of these systems require any changes to standards protocols None of these systems require any changes to standards protocols
because the choice between the available border gateway protocol because the choice between the available border gateway protocol
(BGP) routes is based on a combination of local knowledge of the (BGP) routes is based on a combination of local knowledge of the
charging regime and local measurement of traffic levels. If, as we charging regime and local measurement of traffic levels. If, as we
propose, charges or penalties were based on the level of re-ECN propose, charges or penalties were based on the level of re-ECN
measured in passing traffic, a similar optimisation could be achieved measured in passing traffic, a similar optimisation could be achieved
skipping to change at page 36, line 30 skipping to change at page 39, line 50
interface. Then subsequent packets matching the same source and interface. Then subsequent packets matching the same source and
destination address and DSCP should be monitored. If the RE destination address and DSCP should be monitored. If the RE
blanking fraction minus the congestion marking fraction is blanking fraction minus the congestion marking fraction is
persistently negative, a management alarm SHOULD be raised, and persistently negative, a management alarm SHOULD be raised, and
the flow MAY be automatically subject to focused drop. the flow MAY be automatically subject to focused drop.
Both these mechanisms rely on the fact that highly positive (or Both these mechanisms rely on the fact that highly positive (or
negative) flows will appear more quickly in the sample by selecting negative) flows will appear more quickly in the sample by selecting
randomly solely from positive (or negative) packets. randomly solely from positive (or negative) packets.
Note that there is no assumption that /users/ behave rationally. The Note that there is no assumption that _users_ behave rationally. The
system is protected from the vagaries of irrational user behaviour by system is protected from the vagaries of irrational user behaviour by
the ingress gateways, which transform internal penalties into a the ingress gateways, which transform internal penalties into a
deterministic, admission control mechanism that prevents users from deterministic, admission control mechanism that prevents users from
misbehaving, by directly engineered means. misbehaving, by directly engineered means.
6. Analysis 6. Analysis
The domains in Figure 1 are not expected to be completely malicious The domains in Figure 1 are not expected to be completely malicious
towards each other. After all, we can assume that they are all co- towards each other. After all, we can assume that they are all co-
operating to provide an internetworking service to the benefit of operating to provide an internetworking service to the benefit of
each of them and their customers. Otherwise their routing polices each of them and their customers. Otherwise their routing polices
would not interconnect them in the first place. However, we assume would not interconnect them in the first place. However, we assume
that they are also competitors of each other. So a network may try that they are also competitors of each other. So a network may try
to contravene our proposed protocol if it would gain or make a to contravene our proposed protocol if it would gain or make a
competitor lose, or both, but only if it can do so without being competitor lose, or both, but only if it can do so without being
caught. Therefore we do not have to consider every possible random caught. Therefore we do not have to consider every possible random
attack one network could launch on the traffic of another, given attack one network could launch on the traffic of another, given
anyway one network can always drop or corrupt packets that it anyway one network can always drop or corrupt packets that it
forwards on behalf of another. forwards on behalf of another.
Therefore, we only consider new opportunities for /gainful/ attack Therefore, we only consider new opportunities for _gainful_ attack
that our proposal introduces. But to a certain extent we can also that our proposal introduces. But to a certain extent we can also
rely on the in depth defences we have described (Section 5.6.3 ) rely on the in depth defences we have described (Section 5.6.3 )
intended to mitigate the potential impact if one network accidentally intended to mitigate the potential impact if one network accidentally
misconfiguring the workings of this protocol. misconfiguring the workings of this protocol.
The ingress and egress gateways are shown in the most generic The ingress and egress gateways are shown in the most generic
arrangement possible in Figure 1, without any surrounding network. arrangement possible in Figure 1, without any surrounding network.
This allows us to consider more specific cases where these gateways This allows us to consider more specific cases where these gateways
and a neighbouring network are operated by the same player. As well and a neighbouring network are operated by the same player. As well
as cases where the same player operates neighbouring networks, we as cases where the same player operates neighbouring networks, we
skipping to change at page 38, line 11 skipping to change at page 41, line 30
o If the ingress gateway does not declare downstream pre-congestion o If the ingress gateway does not declare downstream pre-congestion
high enough on average, it will `hit the ground before the high enough on average, it will `hit the ground before the
runway', going negative and triggering sanctions, either directly runway', going negative and triggering sanctions, either directly
against the traffic or against the ingress gateway at a management against the traffic or against the ingress gateway at a management
level level
An executive summary of our security analysis can be stated in three An executive summary of our security analysis can be stated in three
parts, distinguished by the type of collusion considered. parts, distinguished by the type of collusion considered.
Neighbour-only Middle-Middle Collusion: Here there is no collusion or Neighbour-only Middle-Middle Collusion: Here there is no collusion
collusion is limited to neighbours in the feedback loop. In other or collusion is limited to neighbours in the feedback loop. In
words, two neighbouring networks can be assumed to act as one. Or other words, two neighbouring networks can be assumed to act as
the egress gateway might collude with domain C. Or the ingress one. Or the egress gateway might collude with domain C. Or the
gateway might collude with domain A. Or ingress and egress ingress gateway might collude with domain A. Or ingress and egress
gateways might collude with each other. gateways might collude with each other.
In these cases where only neighbours in the feedback loop collude, In these cases where only neighbours in the feedback loop collude,
we concludes that all parties have a positive incentive to declare we concludes that all parties have a positive incentive to declare
downstream pre-congestion truthfully, and the ingress gateway has downstream pre-congestion truthfully, and the ingress gateway has
a positive incentive to invoke admission control when congestion a positive incentive to invoke admission control when congestion
rises above the admission threshold in any network in the region rises above the admission threshold in any network in the region
(including its own). No party has an incentive to send more (including its own). No party has an incentive to send more
traffic than declared in reservation signalling (even though only traffic than declared in reservation signalling (even though only
the gateways read this signalling). In short, no party can gain the gateways read this signalling). In short, no party can gain
skipping to change at page 39, line 16 skipping to change at page 42, line 34
incentive to break it have mounted a full analysis. incentive to break it have mounted a full analysis.
7. Incremental Deployment 7. Incremental Deployment
We believe ECN has so far not been widely deployed because it We believe ECN has so far not been widely deployed because it
requires widespread end system and network deployment just to achieve requires widespread end system and network deployment just to achieve
a marginal improvement in performance. The ability to offer a new a marginal improvement in performance. The ability to offer a new
service (admission control) would be a much stronger driver for ECN service (admission control) would be a much stronger driver for ECN
deployment. deployment.
As stated in the introduction, the aim of this memo is to "build in As stated in the introduction, the aim of this memo is to "Design in
security from the start" when admission control is based on pre- security from the start" when admission control is based on pre-
congestion notification. However, the proposal has been designed so congestion notification. The proposal has been designed so that
that security can be added some time after first deployment. Given security can be added some time after first deployment, but only if
admission control based on pre-congestion notification requires few the PCN wire protocol encoding is defined with the foresight to
changes to standards, it should be deployable fairly soon. However, accommodate the extended set of codepoints defined in this document.
re-ECN requires a change to IP, which may take a little longer. Given admission control based on pre-congestion notification requires
few changes to standards, it should be deployable fairly soon.
However, re-ECN requires a change to IP, which may take a little
longer.
We expect that initial deployments of PCN-based admission control We expect that initial deployments of PCN-based admission control
will be confined to single networks, or to clubs of networks that will be confined to single networks, or to clubs of networks that
trust each other. The proposal in this memo will only become trust each other. The proposal in this memo will only become
relevant once networks with conflicting interests wish to relevant once networks with conflicting interests wish to
interconnect their admission controlled services, but without the interconnect their admission controlled services, but without the
scalability constraints of per-flow border policing. It will not be scalability constraints of per-flow border policing. It will not be
possible to use re-ECN, even in a controlled environment between possible to use re-ECN, even in a controlled environment between
consenting operators, unless it is standardised into IP. Given the consenting operators, unless it is standardised into IP. Given the
IPv4 header has limited space for further changes, current IESG IPv4 header has limited space for further changes, current IESG
policy [{ToDo: ref?}] is not to allow experimental use of codepoints policy [RFC4727] is not to allow experimental use of codepoints in
in the IPv4 header, as whenever an experiment isn't taken up, the the IPv4 header, as whenever an experiment isn't taken up, the space
space it used tends to be impossible to reclaim. it used tends to be impossible to reclaim.
If PCN-based admission control is deployed before re-ECN is If PCN-based admission control is deployed before re-ECN is
standardised into IP, wherever a networks (or club of networks) standardised into IP, wherever a networks (or club of networks)
connects to another network (or club of networks) with conflicting connects to another network (or club of networks) with conflicting
interests, they will place a gateway between the two regions that interests, they will place a gateway between the two regions that
does per-flow rate policing and admission control. If re-ECN is does per-flow rate policing and admission control. If re-ECN is
eventually standardised into IP, it will be possible for these eventually standardised into IP, it will be possible for these
separate regions to upgrade all their gateways to use re-ECN before separate regions to upgrade all their gateways to use re-ECN before
removing the per-flow policing gateways between them. Given the removing the per-flow policing gateways between them. Given the
edge-to-edge deployment model of PCN-based admission control, it is edge-to-edge deployment model of PCN-based admission control, it is
skipping to change at page 40, line 30 skipping to change at page 44, line 4
causes in a remote network. This is the problem that has previously causes in a remote network. This is the problem that has previously
made it so hard to provide scalable admission control. made it so hard to provide scalable admission control.
The case for using re-feedback (a generalisation of re-ECN) to police The case for using re-feedback (a generalisation of re-ECN) to police
congestion response and provide QoS is made in [Re-fb]. Essentially, congestion response and provide QoS is made in [Re-fb]. Essentially,
the insight is that congestion is a factor that crosses layers from the insight is that congestion is a factor that crosses layers from
the physical upwards. Therefore re-feedback polices congestion where the physical upwards. Therefore re-feedback polices congestion where
it emerges from a physical interface between networks. This is it emerges from a physical interface between networks. This is
achieved by bringing the congestion information to the interface, achieved by bringing the congestion information to the interface,
rather than examining packet addressing where there is congestion. rather than examining packet addressing where there is congestion.
Then congestion crossing the physical interface at a border can be Then congestion crossing the physical interface at a border can be
policed at the interface, rather than policing the congestion on policed at the interface, rather than policing the congestion on
packets that claim to come from an address (which may be spoofed). packets that claim to come from an address (which may be spoofed).
Also, re-feedback works in the network layer independently of other Also, re-feedback works in the network layer independently of other
layers---despite its name re-feedback does not actually require layers--despite its name re-feedback does not actually require
feedback. It requires a source to act conservatively before it gets feedback. It requires a source to act conservatively before it gets
feedback. feedback.
On the subject of lack of feedback, the feedback not established On the subject of lack of feedback, the feedback not established
(FNE) codepoint is motivated by arguments for a state set-up bit in (FNE) codepoint is motivated by arguments for a state set-up bit in
IP to prevent state exhaustion attacks. This idea was first put IP to prevent state exhaustion attacks. This idea was first put
forward informally by David Clark and documented by Handley and forward informally by David Clark and documented by Handley and
Greenhalgh in [Steps_DoS]. The idea is that network layer datagrams Greenhalgh in [Steps_DoS]. The idea is that network layer datagrams
should signal explicitly when they require state to be created in the should signal explicitly when they require state to be created in the
network layer or the layer above (e.g. at flow start). Then a node network layer or the layer above (e.g. at flow start). Then a node
skipping to change at page 41, line 49 skipping to change at page 45, line 22
9. Security Considerations 9. Security Considerations
This whole memo concerns the security of a scalable admission control This whole memo concerns the security of a scalable admission control
system. In particular the analysis section. Below some specific system. In particular the analysis section. Below some specific
security issues are mentioned that did not belong elsewhere or which security issues are mentioned that did not belong elsewhere or which
comment on the overall robustness of the security provided by the comment on the overall robustness of the security provided by the
design. design.
Firstly, we must repeat the statement of applicability in the Firstly, we must repeat the statement of applicability in the
analysis: that we only consider new opportunities for /gainful/ analysis: that we only consider new opportunities for _gainful_
attack that our proposal introduces, particularly if the attacker can attack that our proposal introduces, particularly if the attacker can
avoid being identified. Despite only involving a few bits, there is avoid being identified. Despite only involving a few bits, there is
sufficient complexity in the whole system that there are probably sufficient complexity in the whole system that there are probably
numerous possibilities for other attacks. However, as far as we are numerous possibilities for other attacks. However, as far as we are
aware, none reap any benefit to the attacker. For instance, it would aware, none reap any benefit to the attacker. For instance, it would
be possible for a downstream network to remove the congestion be possible for a downstream network to remove the congestion
markings introduced by an upstream network, but it would only lose markings introduced by an upstream network, but it would only lose
out on the penalties it could apply to a downstream network. out on the penalties it could apply to a downstream network.
When one network forwards a neighbouring network's traffic it will When one network forwards a neighbouring network's traffic it will
skipping to change at page 42, line 42 skipping to change at page 46, line 14
flow pre-emption are similar to those for admission control. flow pre-emption are similar to those for admission control.
Finally, it may seem that the 8 codepoints that have been made Finally, it may seem that the 8 codepoints that have been made
available by extending the ECN field with the RE flag have been used available by extending the ECN field with the RE flag have been used
rather wastefully. In effect the RE flag has been used as an rather wastefully. In effect the RE flag has been used as an
orthogonal single bit in nearly all cases. The only exception being orthogonal single bit in nearly all cases. The only exception being
when the ECN field is cleared to "00". The mapping of the codepoints when the ECN field is cleared to "00". The mapping of the codepoints
in an earlier version of this proposal used the codepoint space more in an earlier version of this proposal used the codepoint space more
efficiently, but the scheme became vulnerable to a network operator efficiently, but the scheme became vulnerable to a network operator
focusing its congestion marking to mark more positive than neutral focusing its congestion marking to mark more positive than neutral
packets in order to reduce its penalties. packets in order to reduce its penalties (see Appendix B of
[Re-TCP]).
With the scheme as now proposed, once the RE flag is set or cleared With the scheme as now proposed, once the RE flag is set or cleared
by the sender or its proxy, it should not be written by the network, by the sender or its proxy, it should not be written by the network,
only read. So the gateways can detect if any network maliciously only read. So the gateways can detect if any network maliciously
alters the RE flag. IPSec AH integrity checking does not cover the alters the RE flag. IPSec AH integrity checking does not cover the
IPv4 option flags (they were considered mutable---even the one we IPv4 option flags (they were considered mutable--even the one we
propose using for the RE flag that was `currently unused' when IPSec propose using for the RE flag that was `currently unused' when IPSec
was defined). But it would be sufficient for a pair of gateways to was defined). But it would be sufficient for a pair of gateways to
make random checks on whether the RE flag was the same when it make random checks on whether the RE flag was the same when it
reached the egress gateway as when it left the ingress. Indeed, if reached the egress gateway as when it left the ingress. Indeed, if
IPSec AH had covered the RE flag, any network intending to alter IPSec AH had covered the RE flag, any network intending to alter
sufficient RE flags to make a gain would have focused its alterations sufficient RE flags to make a gain would have focused its alterations
on packets without authenticating headers (AHs). on packets without authenticating headers (AHs).
No cryptographic algorithms have been harmed in the making of this No cryptographic algorithms have been harmed in the making of this
proposal. proposal.
skipping to change at page 43, line 22 skipping to change at page 46, line 43
10. IANA Considerations 10. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
11. Conclusions 11. Conclusions
This memo builds on a promising technique to solve the classic This memo builds on a promising technique to solve the classic
problem of making flow admission control scale to any size network. problem of making flow admission control scale to any size network.
It involves the use of Diffserv in a deployment model that uses pre- It involves the use of Diffserv in a deployment model that uses pre-
congestion notification feedback to control admission into a network congestion notification feedback to control admission into a network
path [CL-deploy]. However as it stands, that deployment model path [PCN-arch]. However as it stands, that deployment model depends
depends on all network domains trusting each other to comply with the on all network domains trusting each other to comply with the
protocols, invoking admission control and flow pre-emption when protocols, invoking admission control and flow pre-emption when
requested. requested.
We propose that the congestion feedback used in that deployment model We propose that the congestion feedback used in that deployment model
should be re-echoed into the forward data path, by making a trivial should be re-echoed into the forward data path, by making a trivial
modification to the ingress gateway. We then explain how the modification to the ingress gateway. We then explain how the
resulting downstream pre-congestion metric in packets can be resulting downstream pre-congestion metric in packets can be
monitored in bulk at borders to sufficiently emulate flow rate monitored in bulk at borders to sufficiently emulate flow rate
policing. policing.
We claim the result of combining these two approaches is an admission We claim the result of combining these two approaches is an admission
control system that scales to any size network /and/ any number of control system that scales to any size network _and_ any number of
interconnected networks, even if they all act in their own interests. interconnected networks, even if they all act in their own interests.
This proposal aims to convince its readers to "Design in Security This proposal aims to convince its readers to "Design in Security
from the start," by building modified ingress gateways from day one, from the start," by ensuring the PCN wire protocol encoding can
accommodate the extended set of codepoints defined in this document,
even if border policing is not needed at first. This way, we will even if border policing is not needed at first. This way, we will
not build ourselves tomorrow's legacy problem. not build ourselves tomorrow's legacy problem.
Re-echoing congestion feedback is based on a principled technique Re-echoing congestion feedback is based on a principled technique
called Re-ECN [Re-TCP], designed to add accountability for causing called Re-ECN [Re-TCP], designed to add accountability for causing
congestion to the general-purpose IP datagram service. Re-ECN congestion to the general-purpose IP datagram service. Re-ECN
proposes to consume the last completely unused bit in the basic IPv4 proposes to consume the last completely unused bit in the basic IPv4
header. header.
12. Acknowledgements 12. Acknowledgements
All the following have given helpful comments and some may become co- All the following have given helpful comments and some may become co-
authors of later drafts: Arnaud Jacquet, Alessandro Salvatori, Steve authors of later drafts: Arnaud Jacquet, Alessandro Salvatori, Steve
Rudkin, David Songhurst, John Davey, Ian Self, Anthony Sheppard, Rudkin, David Songhurst, John Davey, Ian Self, Anthony Sheppard,
Carla Di Cairano-Gilfedder (BT), Mark Handley (who identified the Carla Di Cairano-Gilfedder (BT), Mark Handley (who identified the
excess canceled packets attack), Stephen Hailes, Adam Greenhalgh excess canceled packets attack), Stephen Hailes, Adam Greenhalgh
(UCL), Francois Le Faucheur, Anna Charny (Cisco), Jozef Babiarz, (UCL), Francois Le Faucheur, Anna Charny (Cisco), Jozef Babiarz,
Kwok-Ho Chan, Corey Alexander (Nortel), David Clark, Bill Lehr, Kwok-Ho Chan, Corey Alexander (Nortel), David Clark, Bill Lehr,
Sharon Gillett, Steve Bauer (MIT) (who publicised various dummy Sharon Gillett, Steve Bauer (MIT) (who publicised various dummy
traffic attacks), Sally Floyd (ICIR) and comments from participants traffic attacks), Sally Floyd (ICIR) and comments from participants
in the CFP/CRN inter-provider QoS and broadband working groups. in the CFP/CRN Inter-Provider QoS, Broadband and DoS-Resistant
Internet working groups.
13. Comments Solicited 13. Comments Solicited
Comments and questions are encouraged and very welcome. They can be Comments and questions are encouraged and very welcome. They can be
addressed to the IETF Transport Area working group's mailing list addressed to the IETF Transport Area working group's mailing list
<tsvwg@ietf.org>, and/or to the authors. <tsvwg@ietf.org>, and/or to the authors.
14. References 14. References
14.1. Normative References 14.1. Normative References
[PCN] Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F., [PCN] Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F.,
Charny, A., Liatsos, V., Babiarz, J., Chan, K., Dudley, Charny, A., Liatsos, V., Babiarz, J., Chan, K., Dudley,
S., Westberg, L., Bader, A., and G. Karagiannis, "Pre- S., Westberg, L., Bader, A., and G. Karagiannis, "Pre-
Congestion Notification Marking", Congestion Notification Marking",
draft-briscoe-tsvwg-cl-phb-02 (work in progress), draft-briscoe-tsvwg-cl-phb-03 (work in progress),
June 2006. October 2006.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2211] Wroclawski, J., "Specification of the Controlled-Load [RFC2211] Wroclawski, J., "Specification of the Controlled-Load
Network Element Service", RFC 2211, September 1997. Network Element Service", RFC 2211, September 1997.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, September 2001. RFC 3168, September 2001.
skipping to change at page 45, line 10 skipping to change at page 48, line 36
Stiliadis, "An Expedited Forwarding PHB (Per-Hop Stiliadis, "An Expedited Forwarding PHB (Per-Hop
Behavior)", RFC 3246, March 2002. Behavior)", RFC 3246, March 2002.
[RSVP-ECN] [RSVP-ECN]
Le Faucheur, F., Charny, A., Briscoe, B., Eardley, P., Le Faucheur, F., Charny, A., Briscoe, B., Eardley, P.,
Babiarz, J., and K. Chan, "RSVP Extensions for Admission Babiarz, J., and K. Chan, "RSVP Extensions for Admission
Control over Diffserv using Pre-congestion Notification", Control over Diffserv using Pre-congestion Notification",
draft-lefaucheur-rsvp-ecn-01 (work in progress), draft-lefaucheur-rsvp-ecn-01 (work in progress),
June 2006. June 2006.
[Re-TCP] Briscoe, B., Jacquet, A., and A. Salvatori, "Re-ECN: [Re-TCP] Briscoe, B., Jacquet, A., Salvatori, A., and M. Koyabi,
Adding Accountability for Causing Congestion to TCP/IP", "Re-ECN: Adding Accountability for Causing Congestion to
draft-briscoe-tsvwg-re-ecn-tcp-02 (work in progress), TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-04 (work in
June 2006. progress), June 2007.
14.2. Informative References 14.2. Informative References
[CL-deploy]
Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F.,
Charny, A., Babiarz, J., Chan, K., Westberg, L., Bader,
A., and G. Karagiannis, "A Deployment Model for Admission
Control over DiffServ using Pre-Congestion Notification",
draft-briscoe-tsvwg-cl-architecture-03 (work in progress),
June 2006.
[CLoop_pol] [CLoop_pol]
Salvatori, A., "Closed Loop Traffic Policing", Politecnico Salvatori, A., "Closed Loop Traffic Policing", Politecnico
Torino and Institut Eurecom Masters Thesis , Torino and Institut Eurecom Masters Thesis ,
September 2005. September 2005.
[ECN-BGP] Mortier, R. and I. Pratt, "Incentive Based Inter-Domain [ECN-BGP] Mortier, R. and I. Pratt, "Incentive Based Inter-Domain
Routeing", Proc Internet Charging and QoS Technology Routeing", Proc Internet Charging and QoS Technology
Workshop (ICQT'03) pp308--317, September 2003, <http:// Workshop (ICQT'03) pp308--317, September 2003, <http://
research.microsoft.com/users/mort/publications.aspx>. research.microsoft.com/users/mort/publications.aspx>.
[ECN-MPLS] [ECN-MPLS]
Bruce, B., Briscoe, B., and J. Tay, "Explicit Congestion Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion
Marking in MPLS", draft-davie-ecn-mpls-00 (work in Marking in MPLS", draft-ietf-tsvwg-ecn-mpls-01 (work in
progress), June 2006. progress), June 2007.
[IXQoS] Briscoe, B. and S. Rudkin, "Commercial Models for IP [IXQoS] Briscoe, B. and S. Rudkin, "Commercial Models for IP
Quality of Service Interconnect", BT Technology Journal Quality of Service Interconnect", BT Technology Journal
(BTTJ) 23(2)171--195, April 2005, (BTTJ) 23(2)171--195, April 2005,
<http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#ixqos>. <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#ixqos>.
[NSIS-RMD] [NSIS-RMD]
Bader, A., Westberg, L., Karagiannis, G., Kappler, C., and Bader, A., Westberg, L., Karagiannis, G., Kappler, C., and
T. Phelan, "RMD-QOSM - The Resource Management in Diffserv T. Phelan, "RMD-QOSM - The Resource Management in Diffserv
QOS Model", draft-ietf-nsis-rmd-06 (work in progress), QOS Model", draft-ietf-nsis-rmd-09 (work in progress),
February 2006. March 2007.
[RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. [PCN-arch]
Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R.,
Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion
Notification Architecture",
draft-eardley-pcn-architecture-00 (work in progress),
June 2007.
[RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
Functional Specification", RFC 2205, September 1997. Functional Specification", RFC 2205, September 1997.
[RFC2207] Berger, L. and T. O'Malley, "RSVP Extensions for IPSEC [RFC2207] Berger, L. and T. O'Malley, "RSVP Extensions for IPSEC
Data Flows", RFC 2207, September 1997. Data Flows", RFC 2207, September 1997.
[RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell, [RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell,
M., Romanow, A., Weinrib, A., and L. Zhang, "Resource M., Romanow, A., Weinrib, A., and L. Zhang, "Resource
ReSerVation Protocol (RSVP) Version 1 Applicability ReSerVation Protocol (RSVP) Version 1 Applicability
Statement Some Guidelines on Deployment", RFC 2208, Statement Some Guidelines on Deployment", RFC 2208,
skipping to change at page 46, line 29 skipping to change at page 50, line 5
[RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., [RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L.,
Speer, M., Braden, R., Davie, B., Wroclawski, J., and E. Speer, M., Braden, R., Davie, B., Wroclawski, J., and E.
Felstaine, "A Framework for Integrated Services Operation Felstaine, "A Framework for Integrated Services Operation
over Diffserv Networks", RFC 2998, November 2000. over Diffserv Networks", RFC 2998, November 2000.
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces", Congestion Notification (ECN) Signaling with Nonces",
RFC 3540, June 2003. RFC 3540, June 2003.
[RFC4727] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4,
ICMPv6, UDP, and TCP Headers", RFC 4727, November 2006.
[Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C.,
Salvatori, A., Soppera, A., and M. Koyabe, "Policing Salvatori, A., Soppera, A., and M. Koyabe, "Policing
Congestion Response in an Internetwork Using Re-Feedback", Congestion Response in an Internetwork Using Re-Feedback",
ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// ACM SIGCOMM CCR 35(4)277--288, August 2005, <http://
www.acm.org/sigs/sigcomm/sigcomm2005/ www.acm.org/sigs/sigcomm/sigcomm2005/
techprog.html#session8>. techprog.html#session8>.
[Smart_rtg] [Smart_rtg]
Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang, Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang,
"Optimizing Cost and Performance for Multihoming", ACM "Optimizing Cost and Performance for Multihoming", ACM
skipping to change at page 47, line 24 skipping to change at page 51, line 8
sends with the RE flag blanked. Z_0 will also take account of the sends with the RE flag blanked. Z_0 will also take account of the
sustainable rate reported during the flow pre-emption process, if sustainable rate reported during the flow pre-emption process, if
necessary. necessary.
A suitable pseudo-code algorithm for the ingress gateway is as A suitable pseudo-code algorithm for the ingress gateway is as
follows: follows:
==================================================================== ====================================================================
B_i = 0 /* interblank volume */ B_i = 0 /* interblank volume */
for each PCN-capable packet { for each PCN-capable packet {
b = readLength() /* set b to packet size */ b = readLength(packet) /* set b to packet size */
B_i += b /* accumulate interblank volume */ B_i += b /* accumulate interblank volume */
if B_i < b * Z_0 { /* test whether interblank volume... */ if B_i < b * Z_0 { /* test whether interblank volume... */
writeRE(1) writeRE(1)
} else { /* ...exceeds blank RE spacing * pkt size*/ } else { /* ...exceeds blank RE spacing * pkt size*/
writeRE(0) /* ...and if so, clear RE */ writeRE(0) /* ...and if so, clear RE */
B_i = 0 /* ...and re-set interblank volume */ B_i = 0 /* ...and re-set interblank volume */
} }
} }
==================================================================== ====================================================================
skipping to change at page 48, line 37 skipping to change at page 52, line 17
A.2.2. Inflation Factor for Persistently Negative Flows A.2.2. Inflation Factor for Persistently Negative Flows
The following process is suggested to complement the simple algorithm The following process is suggested to complement the simple algorithm
above in order to protect against the various attacks from above in order to protect against the various attacks from
persistently negative flows described in Section 5.6.1. As explained persistently negative flows described in Section 5.6.1. As explained
in that section, the most important and first step is to estimate the in that section, the most important and first step is to estimate the
contribution of persistently negative flows to the bulk volume of contribution of persistently negative flows to the bulk volume of
downstream pre-congestion and to inflate this bulk volume as if these downstream pre-congestion and to inflate this bulk volume as if these
flows weren't there. The process below has been designed to give an flows weren't there. The process below has been designed to give an
unboased estimate, but it may be possible to define other processes unbiased estimate, but it may be possible to define other processes
that achieve similar ends. that achieve similar ends.
While the above simple metering algorithm is counting the bulk of While the above simple metering algorithm is counting the bulk of
traffic over an accounting period, the meter should also select a traffic over an accounting period, the meter should also select a
subset of the whole flow ID space that is small enough to be able to subset of the whole flow ID space that is small enough to be able to
realistically measure but large enough to give a realistic sample. realistically measure but large enough to give a realistic sample.
Many different samples of different subsets of the ID space should be Many different samples of different subsets of the ID space should be
taken at different times during the accounting period, preferably taken at different times during the accounting period, preferably
covering the whole ID space. During each sample, the meter should covering the whole ID space. During each sample, the meter should
count the volume of positive packets and subtract the volume of count the volume of positive packets and subtract the volume of
skipping to change at page 51, line 5 skipping to change at page 54, line 5
BT & UCL BT & UCL
B54/77, Adastral Park B54/77, Adastral Park
Martlesham Heath Martlesham Heath
Ipswich IP5 3RE Ipswich IP5 3RE
UK UK
Phone: +44 1473 645196 Phone: +44 1473 645196
Email: bob.briscoe@bt.com Email: bob.briscoe@bt.com
URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/ URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/
Intellectual Property Statement Full Copyright Statement
Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79. found in BCP 78 and BCP 79.
skipping to change at page 51, line 29 skipping to change at page 54, line 45
such proprietary rights by implementers or users of this such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr. http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at this standard. Please address the information to the IETF at
ietf-ipr@ietf.org. ietf-ipr@ietf.org.
Disclaimer of Validity Acknowledgments
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment
Funding for the RFC Editor function is currently provided by the Funding for the RFC Editor function is provided by the IETF
Internet Society. Administrative Support Activity (IASA). This document was produced
using xml2rfc v1.32 (of http://xml.resource.org/) from a source in
RFC-2629 XML format.
 End of changes. 85 change blocks. 
227 lines changed or deleted 347 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/