draft-briscoe-re-pcn-border-cheat-01.txt | draft-briscoe-re-pcn-border-cheat-02.txt | |||
---|---|---|---|---|
PCN Working Group B. Briscoe | PCN Working Group B. Briscoe | |||
Internet-Draft BT & UCL | Internet-Draft BT & UCL | |||
Intended status: Informational February 25, 2008 | Intended status: Standards Track September 13, 2008 | |||
Expires: August 28, 2008 | Expires: March 17, 2009 | |||
Emulating Border Flow Policing using Re-ECN on Bulk Data | Emulating Border Flow Policing using Re-PCN on Bulk Data | |||
draft-briscoe-re-pcn-border-cheat-01 | draft-briscoe-re-pcn-border-cheat-02 | |||
Status of this Memo | Status of this Memo | |||
By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
skipping to change at page 1, line 34 | skipping to change at page 1, line 34 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on August 28, 2008. | This Internet-Draft will expire on March 17, 2009. | |||
Copyright Notice | ||||
Copyright (C) The IETF Trust (2008). | ||||
Abstract | Abstract | |||
Scaling per flow admission control to the Internet is a hard problem. | Scaling per flow admission control to the Internet is a hard problem. | |||
A recently proposed approach combines Diffserv and pre-congestion | The approach of combining Diffserv and pre-congestion notification | |||
notification (PCN) to provide a service slightly better than Intserv | (PCN) provides a service slightly better than Intserv controlled load | |||
controlled load. It scales to networks of any size, but only if | that scales to networks of any size without needing Diffserv's usual | |||
domains trust each other to comply with admission control and rate | overprovisioning, but only if domains trust each other to comply with | |||
policing. This memo claims to solve this trust problem without | admission control and rate policing. This memo claims to solve this | |||
losing scalability. It describes bulk border policing that provides | trust problem without losing scalability. It provides a sufficient | |||
a sufficient emulation of per-flow policing with the help of another | emulation of per-flow policing at borders but with only passive bulk | |||
recently proposed extension to ECN, involving re-echoing ECN feedback | metering rather than per-flow processing. Measurements are | |||
(re-ECN). With only passive bulk measurements at borders, sanctions | sufficient to apply penalties against cheating neighbour networks. | |||
can be applied against cheating networks. | ||||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
2. Requirements Notation . . . . . . . . . . . . . . . . . . . . 9 | 2. Requirements Notation . . . . . . . . . . . . . . . . . . . . 11 | |||
3. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 3. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
3.1. The Traditional Per-flow Policing Problem . . . . . . . . 10 | 3.1. The Traditional Per-flow Policing Problem . . . . . . . . 11 | |||
3.2. Generic Scenario . . . . . . . . . . . . . . . . . . . . . 12 | 3.2. Generic Scenario . . . . . . . . . . . . . . . . . . . . . 14 | |||
4. Re-ECN Protocol for an RSVP (or similar) Transport . . . . . . 14 | 4. Re-ECN Protocol in IP with Two Congestion Marking Levels . . . 17 | |||
4.1. Protocol Overview . . . . . . . . . . . . . . . . . . . . 14 | 4.1. Protocol Overview . . . . . . . . . . . . . . . . . . . . 17 | |||
4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | 4.2. Re-PCN Abstracted Network Layer Wire Protocol (IPv4 or | |||
v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 | v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
4.2.1. Re-ECN Recap . . . . . . . . . . . . . . . . . . . . . 16 | 4.2.1. Re-ECN Recap . . . . . . . . . . . . . . . . . . . . . 18 | |||
4.2.2. Re-ECN Combined with Pre-Congestion Notification | 4.2.2. Re-ECN Combined with Pre-Congestion Notification | |||
(re-PCN) . . . . . . . . . . . . . . . . . . . . . . . 18 | (re-PCN) . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
4.3. Protocol Operation . . . . . . . . . . . . . . . . . . . . 20 | 4.3. Protocol Operation . . . . . . . . . . . . . . . . . . . . 22 | |||
4.3.1. Protocol Operation for an Established Flow . . . . . . 20 | 4.3.1. Protocol Operation for an Established Flow . . . . . . 23 | |||
4.3.2. Aggregate Bootstrap . . . . . . . . . . . . . . . . . 21 | 4.3.2. Aggregate Bootstrap . . . . . . . . . . . . . . . . . 24 | |||
4.3.3. Flow Bootstrap . . . . . . . . . . . . . . . . . . . . 22 | 4.3.3. Flow Bootstrap . . . . . . . . . . . . . . . . . . . . 26 | |||
4.3.4. Router Forwarding Behaviour . . . . . . . . . . . . . 23 | 4.3.4. Router Forwarding Behaviour . . . . . . . . . . . . . 26 | |||
4.3.5. Extensions . . . . . . . . . . . . . . . . . . . . . . 25 | 4.3.5. Extensions . . . . . . . . . . . . . . . . . . . . . . 28 | |||
5. Emulating Border Policing with Re-ECN . . . . . . . . . . . . 25 | 5. Emulating Border Policing with Re-ECN . . . . . . . . . . . . 28 | |||
5.1. Informal Terminology . . . . . . . . . . . . . . . . . . . 25 | 5.1. Informal Terminology . . . . . . . . . . . . . . . . . . . 28 | |||
5.2. Policing Overview . . . . . . . . . . . . . . . . . . . . 26 | 5.2. Policing Overview . . . . . . . . . . . . . . . . . . . . 30 | |||
5.3. Pre-requisite Contractual Arrangements . . . . . . . . . . 28 | 5.3. Pre-requisite Contractual Arrangements . . . . . . . . . . 31 | |||
5.4. Emulation of Per-Flow Rate Policing: Rationale and | 5.4. Emulation of Per-Flow Rate Policing: Rationale and | |||
Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 31 | Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
5.5. Sanctioning Dishonest Marking . . . . . . . . . . . . . . 32 | 5.5. Sanctioning Dishonest Marking . . . . . . . . . . . . . . 36 | |||
5.6. Border Mechanisms . . . . . . . . . . . . . . . . . . . . 34 | 5.6. Border Mechanisms . . . . . . . . . . . . . . . . . . . . 38 | |||
5.6.1. Border Accounting Mechanisms . . . . . . . . . . . . . 34 | 5.6.1. Border Accounting Mechanisms . . . . . . . . . . . . . 38 | |||
5.6.2. Competitive Routing . . . . . . . . . . . . . . . . . 38 | 5.6.2. Competitive Routing . . . . . . . . . . . . . . . . . 41 | |||
5.6.3. Fail-safes . . . . . . . . . . . . . . . . . . . . . . 39 | 5.6.3. Fail-safes . . . . . . . . . . . . . . . . . . . . . . 42 | |||
6. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 | 6. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 42 | 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 46 | |||
8. Design Choices and Rationale . . . . . . . . . . . . . . . . . 43 | 8. Design Choices and Rationale . . . . . . . . . . . . . . . . . 47 | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 45 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 49 | |||
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46 | 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 50 | |||
11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 46 | 11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 50 | |||
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 47 | 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 51 | |||
13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 47 | 13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 52 | |||
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 48 | 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 52 | |||
14.1. Normative References . . . . . . . . . . . . . . . . . . . 48 | 14.1. Normative References . . . . . . . . . . . . . . . . . . . 52 | |||
14.2. Informative References . . . . . . . . . . . . . . . . . . 48 | 14.2. Informative References . . . . . . . . . . . . . . . . . . 53 | |||
Appendix A. Implementation . . . . . . . . . . . . . . . . . . . 50 | Appendix A. Implementation . . . . . . . . . . . . . . . . . . . 55 | |||
A.1. Ingress Gateway Algorithm for Blanking the RE flag . . . . 50 | A.1. Ingress Gateway Algorithm for Blanking the RE flag . . . . 55 | |||
A.2. Downstream Congestion Metering Algorithms . . . . . . . . 51 | A.2. Downstream Congestion Metering Algorithms . . . . . . . . 56 | |||
A.2.1. Bulk Downstream Congestion Metering Algorithm . . . . 51 | A.2.1. Bulk Downstream Congestion Metering Algorithm . . . . 56 | |||
A.2.2. Inflation Factor for Persistently Negative Flows . . . 52 | A.2.2. Inflation Factor for Persistently Negative Flows . . . 56 | |||
A.3. Algorithm for Sanctioning Negative Traffic . . . . . . . . 52 | A.3. Algorithm for Sanctioning Negative Traffic . . . . . . . . 57 | |||
Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 53 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 57 | |||
Intellectual Property and Copyright Statements . . . . . . . . . . 54 | Intellectual Property and Copyright Statements . . . . . . . . . . 59 | |||
Status (to be removed by the RFC Editor) | Status (to be removed by the RFC Editor) | |||
The IETF PCN working group is initially chartered to consider PCN | ||||
domains only under a single trust authority. However, after its | ||||
initial work is complete the charter says the working group may re- | ||||
charter to consider concatenated Diffserv domains, amongst other new | ||||
work items. The charter ends by stating "The details of these work | ||||
items are outside the scope of the initial phase; but the WG may | ||||
consider their requirements to design components that are | ||||
sufficiently general to support such extensions in the future." | ||||
This memo is therefore contributed to describe how PCN could be | ||||
extended to inter-domain. We wanted to document the solution to | ||||
reduce the chances that something else eats up the codepoint space | ||||
needed before PCN re-charters to consider inter-domain. Losing the | ||||
chance to standardise this simple, scalable solution to the problem | ||||
of inter-domain flow admission control would be unfortunate | ||||
(understatement), given it took years to find, and even then it was | ||||
very difficult to find codepoint space for it. | ||||
The scheme described here (Section 4) requires the PCN ingress | ||||
gateway to re-echo any PCN feedback it receives back into the forward | ||||
stream of IP packets (hence we call this scheme re-PCN). Re-PCN | ||||
works in a very similar way to the re-ECN proposal on which it is | ||||
based [I-D.briscoe-tsvwg-re-ecn-tcp], the only difference being that | ||||
PCN might encode three states of congestion, whereas ECN encodes two. | ||||
This document is written to stand alone from re-ECN, so that readers | ||||
do not have to read [I-D.briscoe-tsvwg-re-ecn-tcp]. | ||||
The authors seek comments from the Internet community on whether | ||||
combining PCN and re-ECN to create re-PCN in this way is a sufficient | ||||
solution to the problem of scaling microflow admission control to the | ||||
Internet as a whole. Here we emphasise that scaling is not just an | ||||
issue of numbers of flows, but also the number of security entities-- | ||||
networks and users--who may all have conflicting interests. | ||||
This memo is posted as an Internet-Draft with the intent to | This memo is posted as an Internet-Draft with the intent to | |||
eventually be broken down in two documents; one for the standards | eventually be broken down in two documents; one for the standards | |||
track and one for informational status. But until it becomes an item | track and one for informational status. But until it becomes an item | |||
of IETF working group business the whole proposal has been kept | of IETF working group business the whole proposal has been kept | |||
together to aid understanding. Only the text of Section 4 of this | together to aid understanding. Only the text of Section 4 of this | |||
document requires standardisation. The rest of the sections describe | document is intended to be normative (requiring standardisation). | |||
how a system might be built from these protocols by the operators of | The rest of the sections are merely informative, describing how a | |||
an internetwork. Note in particular that the policing and monitoring | system might be built from these protocols by the operators of an | |||
internetwork. Note in particular that the policing and monitoring | ||||
functions proposed for the trust boundaries between operators would | functions proposed for the trust boundaries between operators would | |||
not need standardisation by the IETF. They simply represent one way | not need standardisation by the IETF. They simply represent one | |||
that the proposed protocols could be used to extend the PCN | possible way that the proposed protocols could be used to extend the | |||
architecture [I-D.ietf-pcn-architecture] to span multiple domains | PCN architecture [I-D.ietf-pcn-architecture] to span multiple domains | |||
without mutual trust between the operators. | without mutual trust between the operators. | |||
To realise the system described, this document also depends on | Dependencies (to be removed by the RFC Editor) | |||
standardisation of three other documents currently being discussed | ||||
(but not on the standards track) in the IETF Transport Area: pre- | ||||
congestion notification (PCN) marking on interior nodes [PCN]; | ||||
feedback of aggregate PCN measurements by suitably extending the | ||||
admission control signalling protocol (e.g. RSVP) [RSVP-ECN]; and | ||||
re-insertion of the feedback into the forward stream of IP packets by | ||||
the PCN ingress gateway in a similar way to that proposed for a TCP | ||||
source [Re-TCP]. | ||||
The authors seek comments from the Internet community on whether | To realise the system described, this document also depends on other | |||
combining PCN and re-ECN in this way is a sufficient solution to the | documents chartered in the IETF Transport Area progressing along the | |||
problem of scaling microflow admission control to the Internet as a | standards track: | |||
whole, even though such scaling must take account of the increasing | ||||
numbers of networks and users who may all have conflicting interests. | o Pre-congestion notification (PCN) marking on interior nodes | |||
[I-D.eardley-pcn-marking-behaviour], chartered for standardisation | ||||
in the PCN w-g; | ||||
o The baseline encoding of pre-congestion notification in the IP | ||||
header [I-D.moncaster-pcn-baseline-encoding], also chartered for | ||||
standardisation in the PCN w-g; | ||||
o Feedback of aggregate PCN measurements by suitably extending the | ||||
admission control signalling protocol (e.g. RSVP extension | ||||
[RSVP-ECN] or NSIS extension [I-D.arumaithurai-nsis-pcn]). | ||||
The baseline encoding makes no new demands on codepoint space in the | ||||
IP header but provides just two PCN encoding states (not marked and | ||||
marked). The PCN architecture recognises that operators might want | ||||
PCN marking to trigger two functions (admission control and flow | ||||
termination) at different levels of pre-congestion, which seems to | ||||
require three encoding states. A scheme has been proposed | ||||
[I-D.charny-pcn-single-marking] that can do both functions with just | ||||
two encoding states, but simulations have shown it performs poorly | ||||
under certain conditions that might be typical. As it seems likely | ||||
that PCN might need three encoding states to be fully operational, we | ||||
want to be sure that three encoding states can be extended to work | ||||
inter-domain. Therefore, we have defined a three-state extension | ||||
encoding scheme in this document, then we have added the re-PCN | ||||
scheme to it. The three-state encoding we have chosen depends on | ||||
standardisation of yet another document in the IETF Transport Area: | ||||
o Propagation beyond the tunnel decapsulator of any changes in the | ||||
ECN field to ECT(0) or ECT(1) made within a tunnel (the ideal | ||||
decapsulation rules of [I-D.briscoe-tsvwg-ecn-tunnel]); | ||||
Changes from previous drafts (to be removed by the RFC Editor) | Changes from previous drafts (to be removed by the RFC Editor) | |||
Full diffs of incremental changes between drafts are available at | Full diffs of incremental changes between drafts are available at | |||
URL: <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#repcn> | URL: <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#repcn> | |||
Changes from <draft-briscoe-re-pcn-border-cheat-01> to | ||||
<draft-briscoe-re-pcn-border-cheat-02> (current version): | ||||
Considerably updated the 'Status' note to explain the | ||||
relationship of this draft to other documents in the IETF | ||||
process (or not) and to chartered PCN w-g activity. | ||||
Split out the dependencies into a separate note and added | ||||
dependencies on new PCN documents in progress. | ||||
Made scalability motivation in the introduction clearer, | ||||
explaining why Diffserv over-provisioning doesn't scale unless | ||||
PCN is used. | ||||
Clarified that the standards action in Section 4 is to define | ||||
the meanings of the combination of fields in the IP header: the | ||||
RE flag and 2-level congestion marking in the ECN field. And | ||||
that it is not characterised by a particular feedback style in | ||||
the transport. | ||||
Switched round the two ECT codepoints to be compatible with the | ||||
new PCN baseline encoding and used less confusing naming for | ||||
re-PCN codepoints (Section 4). | ||||
Generalised rules for encoding probes when bootstrapping or re- | ||||
starting aggregates & flows (Section 4.3.2). | ||||
Downgraded drop sanction behaviour from MUST to conditional | ||||
SHOULD (Section 5.5). | ||||
Added incremental deployment safety justification for choice of | ||||
which way round the RE flag works (Section 7). | ||||
Added possible vulnerability to brief attacks and possible | ||||
solution to security considerations (Section 9). | ||||
Updated references and terminology, particularly taking account | ||||
of recent new PCN w-g documents; | ||||
Replaced suggested Ingress Gateway Algorithm for Blanking the | ||||
RE flag (Appendix A.1) | ||||
Clarifications throughout; | ||||
Changes from <draft-briscoe-re-pcn-border-cheat-00> to | Changes from <draft-briscoe-re-pcn-border-cheat-00> to | |||
<draft-briscoe-re-pcn-border-cheat-01> (current version): | <draft-briscoe-re-pcn-border-cheat-01>: | |||
Updated references. | Updated references. | |||
Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-01> | Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-01> | |||
to <draft-briscoe-re-pcn-border-cheat-00>: | to <draft-briscoe-re-pcn-border-cheat-00>: | |||
Changed filename to associate it with the new IETF PCN w-g, rather | Changed filename to associate it with the new IETF PCN w-g, | |||
than the TSVWG w-g. | rather than the TSVWG w-g. | |||
Introduction: Clarified that bulk policing only replaces per-flow | Introduction: Clarified that bulk policing only replaces per- | |||
policing at interior inter-domain borders, while per-flow policing | flow policing at interior inter-domain borders, while per-flow | |||
is still needed at the access interface to the internetwork. Also | policing is still needed at the access interface to the | |||
clarified that the aim is to neutralise any gains from cheating | internetwork. Also clarified that the aim is to neutralise any | |||
using local bilateral contracts between neighbouring networks, | gains from cheating using local bilateral contracts between | |||
rather than merely identifying remote cheaters. | neighbouring networks, rather than merely identifying remote | |||
cheaters. | ||||
Section 3.1: Described the traditional per-flow policing problem | Section 3.1: Described the traditional per-flow policing | |||
with inter-domain reservations more precisely, particularly with | problem with inter-domain reservations more precisely, | |||
respect to direction of reservations and of traffic flows. | particularly with respect to direction of reservations and of | |||
traffic flows. | ||||
Clarified status of Section 5 onwards, in particular that policers | Clarified status of Section 5 onwards, in particular that | |||
and monitors would not need standardisation, but that the protocol | policers and monitors would not need standardisation, but that | |||
in Section 4 would require standardisation. | the protocol in Section 4 would require standardisation. | |||
Section 5.6.2 on competitive routing: Added discussion of direct | Section 5.6.2 on competitive routing: Added discussion of | |||
incentives for a receiver to switch to a different provider even | direct incentives for a receiver to switch to a different | |||
if the provider has a termination monopoly. | provider even if the provider has a termination monopoly. | |||
Clarified that "Designing in security from the start" merely means | Clarified that "Designing in security from the start" merely | |||
allowing codepoint space in the PCN protocol encoding. There is | means allowing codepoint space in the PCN protocol encoding. | |||
no need to actually implement inter-domain security mechanisms for | There is no need to actually implement inter-domain security | |||
solutions confined to a single domain. | mechanisms for solutions confined to a single domain. | |||
Updated some references and added a ref to the Security | Updated some references and added a ref to the Security | |||
Considerations, as well as other minor corrections and | Considerations, as well as other minor corrections and | |||
improvements. | improvements. | |||
Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-00> to | Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-00> to | |||
<draft-briscoe-tsvwg-re-ecn-border-cheat-01>: | <draft-briscoe-tsvwg-re-ecn-border-cheat-01>: | |||
Added subsection on Border Accounting Mechanisms (Section 5.6.1) | Added subsection on Border Accounting Mechanisms | |||
(Section 5.6.1) | ||||
Section 4.2 on the re-ECN wire protocol clarified and re-organised | Section 4.2 on the re-ECN wire protocol clarified and re- | |||
to separately discuss re-ECN for default ECN marking and for pre- | organised to separately discuss re-ECN for default ECN marking | |||
congestion marking (PCN). | and for pre-congestion marking (PCN). | |||
Router Forwarding Behaviour subsection added to re-organised | Router Forwarding Behaviour subsection added to re-organised | |||
section on Protocol Operation (Section 4.3). Extensions section | section on Protocol Operation (Section 4.3). Extensions | |||
moved within Protocol Operations. | section moved within Protocol Operations. | |||
Emulating Border Policing (Section 5) reorganised, starting with a | Emulating Border Policing (Section 5) reorganised, starting | |||
new Terminology subsection heading, and a simplified overview | with a new Terminology subsection heading, and a simplified | |||
section. Added a large new subsection on Border Accounting | overview section. Added a large new subsection on Border | |||
Mechanisms within a new section bringing together other | Accounting Mechanisms within a new section bringing together | |||
subsections on Border Mechanisms generally (Section 5.6). Some | other subsections on Border Mechanisms generally (Section 5.6). | |||
text moved from old subsections into these new ones. | Some text moved from old subsections into these new ones. | |||
Added section on Incremental Deployment (Section 7), drawing | Added section on Incremental Deployment (Section 7), drawing | |||
together relevant points about deployment made throughout. | together relevant points about deployment made throughout. | |||
Sections on Design Rationale (Section 8) and Security | Sections on Design Rationale (Section 8) and Security | |||
Considerations (Section 9) expanded with some new material, | Considerations (Section 9) expanded with some new material, | |||
including new attacks and their defences. | including new attacks and their defences. | |||
Suggested Border Metering Algorithms improved (Appendix A.2) for | Suggested Border Metering Algorithms improved (Appendix A.2) | |||
resilience to newly identified attacks. | for resilience to newly identified attacks. | |||
1. Introduction | 1. Introduction | |||
The Internet community largely lost interest in the Intserv | The Internet community largely lost interest in the Intserv | |||
architecture after it was clarified that it would be unlikely to | architecture after it was clarified that it would be unlikely to | |||
scale to the whole Internet [RFC2208]. Although Intserv mechanisms | scale to the whole Internet [RFC2208]. Although Intserv mechanisms | |||
proved impractical, the bandwidth reservation service it aimed to | proved impractical, the bandwidth reservation service it aimed to | |||
offer is still very much required. | offer is still very much required. | |||
A recently proposed approach [I-D.ietf-pcn-architecture] combines | A recently proposed approach [I-D.ietf-pcn-architecture] combines | |||
Diffserv and pre-congestion notification (PCN) to provide a service | Diffserv and pre-congestion notification (PCN) to provide a service | |||
slightly better than Intserv controlled load [RFC2211]. It scales to | slightly better than Intserv controlled load [RFC2211]. PCN does not | |||
any size network, but only if domains trust their neighbours to have | require the considerable over-provisioning that is normally required | |||
checked that upstream customers aren't taking more bandwidth than | for admission control over Diffserv [RFC2998] to be robust against | |||
they reserved, either accidentally or deliberately. This memo | re-routes or variation in the traffic matrix. It has been proved | |||
describes border policing measures so that one network can protect | that Diffserv's over-provisioning requirement grows linearly with the | |||
its interests, even if networks around it are deliberately trying to | network diameter in hops [QoS_scale]. | |||
cheat. The approach provides a sufficient emulation of flow rate | ||||
policing at trust boundaries but without per-flow processing. The | ||||
emulation is not perfect, but it is sufficient to ensure that the | ||||
punishment is at least proportionate to the severity of the cheat. | ||||
Per-flow rate policing for each reservation is still expected to be | ||||
used at the access edge of the internetwork, but at the borders | ||||
between networks bulk policing can be used to emulate per-flow | ||||
policing. | ||||
The aim is to be able to scale controlled load service to any number | A number of PCN domains can be concatenated into a larger PCN region | |||
of endpoints, even though such scaling must take account of the | without any per-flow processing between them, but only if each domain | |||
increasing numbers of networks and users who may all have conflicting | trusts the ingress network to have checked that upstream customers | |||
interests. To achieve such scaling, this memo combines two recent | aren't taking more bandwidth than they reserved, either accidentally | |||
proposals, both of which it briefly recaps: | or deliberately. Unfortunately, networks can gain considerably by | |||
breaking this trust. One way for a network to protect itself against | ||||
others is to handle flow signalling at its own border and police | ||||
traffic against reservations itself. However, this reintroduces the | ||||
per-flow unscalability at borders that Intserv over Diffserv suffers | ||||
from. | ||||
o A deployment model for admission control over Diffserv using pre- | This memo describes a protocol called re-PCN that enables bulk border | |||
congestion notification [I-D.ietf-pcn-architecture] describes how | measurements so that one network can protect its interests, even if | |||
bulk pre-congestion notification on routers within an edge-to-edge | networks around it are deliberately trying to cheat. The approach | |||
Diffserv region can emulate the precision of per-flow admission | provides a sufficient emulation of flow rate policing at trust | |||
control to provide controlled load service without unscalable per- | boundaries but without per-flow processing. Per-flow rate policing | |||
flow processing; | for each reservation is still expected to be used at the access edge | |||
of the internetwork, but at the borders between networks bulk | ||||
policing can be used to emulate per-flow policing. The emulation is | ||||
not perfect, but it is sufficient to ensure that the punishment is at | ||||
least proportionate to the severity of the cheat. Re-PCN neither | ||||
requires the unscalable over-provisioning of Diffserv nor the per- | ||||
flow processing at borders of Intserv over Diffserv. | ||||
o Re-ECN: Adding Accountability to TCP/IP [Re-TCP]. The trick that | It should therefore scale controlled load service to the whole | |||
addresses cheating at borders is to recognise that border policing | internetwork without the cost of Diffserv's linearly increasing over- | |||
is mainly necessary because cheating upstream networks will admit | provisioning, or the cost of per-flow policing at each border. To | |||
traffic when they shouldn't only as long as they don't directly | achieve such scaling, this memo combines two recent proposals, both | |||
experience the downstream congestion their misbehaviour can cause. | of which it briefly recaps: | |||
The re-ECN protocol requires upstream nodes to declare expected | ||||
downstream congestion in all forwarded packets and it makes it in | o The pre-congestion notification (PCN) | |||
their interests to declare it honestly. Operators can then | architecture[I-D.ietf-pcn-architecture] describes how bulk pre- | |||
monitor downstream congestion in bulk at borders to emulate | congestion notification on routers within an edge-to-edge Diffserv | |||
policing. | region can emulate the precision of per-flow admission control to | |||
provide controlled load service without unscalable per-flow | ||||
processing; | ||||
o Re-ECN: Adding Accountability to TCP/ | ||||
IP [I-D.briscoe-tsvwg-re-ecn-tcp]. | ||||
We coin the term re-PCN for the combination of PCN and re-ECN. | ||||
The trick that addresses cheating at borders is to recognise that | ||||
border policing is mainly necessary because cheating upstream | ||||
networks will admit traffic when they shouldn't only as long as they | ||||
don't directly experience the downstream congestion their | ||||
misbehaviour can cause. The re-ECN protocol ensures a network can be | ||||
made to experience the congestion it causes in other networks. Re- | ||||
ECN requires the sending node to declare expected downstream | ||||
congestion in all packets and it makes it in its interest to declare | ||||
this honestly. At the border between upstream network 'A' and | ||||
downstream network 'B' (say), both networks can monitor packets | ||||
crossing the border to measure how much congestion 'A' is causing in | ||||
'B' and beyond. 'B' can then include a limit or penalty based on | ||||
this metric in its contract with 'A'. This is how 'A' experiences | ||||
the effect of congestion it causes in other networks. 'A' no longer | ||||
gains by admitting traffic when it shouldn't, which is why we can say | ||||
re-PCN emulates flow policing, even though it doesn't measure flows. | ||||
The aim is not to enable a network to _identify_ some remote cheating | The aim is not to enable a network to _identify_ some remote cheating | |||
party, which would rarely be useful given the victim network would be | party, which would rarely be useful given the victim network would be | |||
unlikely to be able to seek redress from a cheater in some remote | unlikely to be able to seek redress from a cheater in some remote | |||
part of the world with whom no direct contractual relationship | part of the world with whom no direct contractual relationship | |||
exists. Rather the aim is to ensure that any gain from cheating will | exists. Rather the aim is to ensure that any gain from cheating will | |||
be cancelled out by penalties applied to the cheating party by its | be cancelled out by penalties applied to the cheating party by its | |||
local network. Further, the solution ensures each of the chain of | local network. Further, the solution ensures each of the chain of | |||
networks between the cheater and the victim will lose out if it | networks between the cheater and the victim will lose out if it | |||
doesn't apply penalties to its neighbour. Thus the solution builds | doesn't apply penalties to its neighbour. Thus the solution builds | |||
on the local bilateral contractual relationships that already exist | on the local bilateral contractual relationships that already exist | |||
between neighbouring networks. | between neighbouring networks. | |||
Rather than the end-to-end arrangement used when re-ECN was specified | Rather than the end-to-end arrangement used when re-ECN was specified | |||
for the TCP transport [Re-TCP], this memo specifies re-ECN in an | for the TCP transport [I-D.briscoe-tsvwg-re-ecn-tcp], this memo | |||
edge-to-edge arrangement, making it applicable to the above | specifies re-ECN in an edge-to-edge arrangement, making it applicable | |||
deployment model for admission control over Diffserv. Also, rather | to deployment models where admission control over Diffserv is based | |||
than using a TCP transport for regular congestion feedback, this memo | on pre-congestion notification. Also, rather than using a TCP | |||
specifies re-ECN using RSVP as the transport for feedback [RSVP-ECN]. | transport for regular congestion feedback, this memo specifies re-ECN | |||
A similar deployment model, but with a different transport for | using RSVP as the transport for feedback [RSVP-ECN]. RSVP is used to | |||
signalling congestion feedback could be used (e.g. Arumaithurai | be concrete, but a similar deployment model, but with a different | |||
[I-D.arumaithurai-nsis-pcn] and RMD [I-D.ietf-nsis-rmd] use NSIS). | transport for signalling congestion feedback could be used (e.g. | |||
Arumaithurai [I-D.arumaithurai-nsis-pcn] and RMD [I-D.ietf-nsis-rmd] | ||||
both use NSIS). | ||||
This memo aims to do two things: i) define how to apply the re-ECN | This memo aims to do two things: i) define how to apply the re-PCN | |||
protocol to the admission control over Diffserv scenario; and ii) | protocol to the admission control over Diffserv scenario; and ii) | |||
explain why re-ECN sufficiently emulates border policing in that | explain why re-PCN sufficiently emulates border policing in that | |||
scenario. Most of the memo is taken up with the second aim; | scenario. Most of the memo is taken up with the second aim; | |||
explaining why it works. Applying re-ECN to the scenario actually | explaining why it works. Applying re-PCN to the scenario actually | |||
involves quite a trivial modification to the ingress gateway. That | involves quite a trivial modification to the ingress gateway. That | |||
modification can be added to gateways later, so our immediate goal is | modification can be added to gateways later, so our immediate goal is | |||
to convince everyone to have the foresight to define the PCN wire | to convince everyone to have the foresight to define the PCN wire | |||
protocol encoding to accommodate the extended codepoints defined in | protocol encoding to accommodate the extended codepoints defined in | |||
this document, whether first deployments require border policing or | this document, whether first deployments require border policing or | |||
not. Otherwise, when we want to add policing, we will have built | not. Otherwise, when we want to add policing, we will have built | |||
ourselves a legacy problem. In other words, we aim to convince | ourselves a legacy problem. In other words, we aim to convince | |||
people to "Design in security from the start." | people to "Design in security from the start." | |||
The body of this memo is structured as follows: | The body of this memo is structured as follows: | |||
Section 3 describes the border policing problem. We recap the | Section 3 describes the border policing problem. We recap the | |||
traditional, unscalable view of how to solve the problem, and we | traditional, unscalable view of how to solve the problem, and we | |||
recap the admission control solution which has the scalability we | recap the admission control solution which has the scalability we | |||
do not want to lose when we add border policing; | do not want to lose when we add border policing; | |||
Section 4 specifies the re-ECN protocol solution in detail; | Section 4 specifies the re-PCN protocol solution in detail; | |||
Section 5 explains how to use the protocol to emulate border | Section 5 explains how to use the protocol to emulate border | |||
policing, and why it works; | policing, and why it works; | |||
Section 6 analyses the security of the proposed solution; | Section 6 analyses the security of the proposed solution; | |||
Section 8 explains the sometimes subtle rationale behind our | Section 8 explains the sometimes subtle rationale behind our | |||
design decisions; | design decisions; | |||
Section 9 comments on the overall robustness of the security | Section 9 comments on the overall robustness of the security | |||
skipping to change at page 10, line 49 | skipping to change at page 12, line 41 | |||
were permitted, the ability of admission control to give assurances | were permitted, the ability of admission control to give assurances | |||
to other flows will break. | to other flows will break. | |||
Just as sources need not be trusted to keep within the requested flow | Just as sources need not be trusted to keep within the requested flow | |||
spec, whole networks might also try to cheat. We will now set up a | spec, whole networks might also try to cheat. We will now set up a | |||
concrete scenario to illustrate such cheats. Imagine reservations | concrete scenario to illustrate such cheats. Imagine reservations | |||
for unidirectional flows, through at least two networks, an edge | for unidirectional flows, through at least two networks, an edge | |||
network and its downstream transit provider. Imagine the edge | network and its downstream transit provider. Imagine the edge | |||
network charges its retail customers per reservation but also has to | network charges its retail customers per reservation but also has to | |||
pay its transit provider a charge per reservation. Typically, both | pay its transit provider a charge per reservation. Typically, both | |||
its selling and buying charges might depend on the duration and rate | the charges for buying from the transit and selling to the retail | |||
of each reservation. The level of the actual selling and buying | customer might depend on the duration and rate of each reservation. | |||
prices are irrelevant to our discussion (most likely the network will | The level of the actual selling and buying prices are irrelevant to | |||
sell at a higher price than it buys, of course). | our discussion (most likely the network will sell at a higher price | |||
than it buys, of course). | ||||
A cheating ingress network could systematically reduce the size of | A cheating ingress network could systematically reduce the size of | |||
its retail customers' reservation signalling requests (e.g. the | its retail customers' reservation signalling requests (e.g. the | |||
SENDER_TSPEC object in RSVP's PATH message) before forwarding them to | SENDER_TSPEC object in RSVP's PATH message) before forwarding them to | |||
its transit provider and systematically reinstate the responses on | its transit provider and systematically reinstate the responses on | |||
the way back (e.g. the FLOWSPEC object in RSVP's RESV message). It | the way back (e.g. the FLOWSPEC object in RSVP's RESV message). It | |||
would then receive an honest income from its upstream retail customer | would then receive an honest income from its upstream retail customer | |||
but only pay for fraudulently smaller reservations downstream. A | but only pay for fraudulently smaller reservations downstream. A | |||
similar but opposite trick (increasing the TSPEC and decreasing the | similar but opposite trick (increasing the TSPEC and decreasing the | |||
FLOWSPEC) could be perpetrated by the receiver's access network if | FLOWSPEC) could be perpetrated by the receiver's access network if | |||
the reservation was paid for by the receiver. | the reservation was paid for by the receiver. | |||
Equivalently, a cheating ingress network may feed the traffic from a | Equivalently, a cheating ingress network may feed the traffic from a | |||
number of flows into an aggregate reservation over the transit that | number of flows into an aggregate reservation over the transit that | |||
is smaller than the total of all the flows. Because of these fraud | is smaller than the total of all the flows. Because of these fraud | |||
possibilities, in traditional QoS reservation architectures the | possibilities, in traditional QoS reservation architectures the | |||
downstream network polices at each border. The policer checks that | downstream network polices traffic at each border. The policer | |||
the actual sent data rate of each flow is within the signalled | checks that the actual sent data rate of each flow is within the | |||
reservation. | signalled reservation. | |||
Reservation signalling could be authenticated end to end, but this | Reservation signalling could be authenticated end to end, but this | |||
wouldn't prevent the aggregation cheat just described. For this | wouldn't prevent the aggregation cheat just described. For this | |||
reason, and to avoid the need for a global PKI, signalling integrity | reason, and to avoid the need for a global PKI, signalling integrity | |||
is typically only protected on a hop-by-hop basis [RFC2747]. | is typically only protected on a hop-by-hop basis [RFC2747]. | |||
A variant of the above cheat is where a router in an honest | A variant of the above cheat is where a router in an honest | |||
downstream network denies admission to a new reservation, but a | downstream network denies admission to a new reservation, but a | |||
cheating upstream network still admits the flow. For instance, the | cheating upstream network still admits the flow. For instance, the | |||
networks may be using Diffserv internally, but Intserv admission | networks may be using Diffserv internally, but Intserv admission | |||
skipping to change at page 12, line 47 | skipping to change at page 14, line 45 | |||
<-------- edge-to-edge signalling -------> | <-------- edge-to-edge signalling -------> | |||
(for admission control) | (for admission control) | |||
<-------------------end-to-end QoS signalling protocol-------------> | <-------------------end-to-end QoS signalling protocol-------------> | |||
Figure 1: Generic Scenario (see text for explanation of terms) | Figure 1: Generic Scenario (see text for explanation of terms) | |||
An ingress and egress gateway (Ingr G/W and Egr G/W in Figure 1) | An ingress and egress gateway (Ingr G/W and Egr G/W in Figure 1) | |||
connect the interior Diffserv region to the edge access networks | connect the interior Diffserv region to the edge access networks | |||
where routers (not shown) use per-flow reservation processing. | where routers (not shown) use per-flow reservation processing. | |||
Within the Diffserv region are three interior domains, A, B and C, as | Within the Diffserv region are three interior domains, 'A', 'B' and | |||
well as the inward facing interfaces of the ingress and egress | 'C', as well as the inward facing interfaces of the ingress and | |||
gateways. An ingress and egress border router (BR) is shown | egress gateways. An ingress and egress border router (BR) is shown | |||
interconnecting each interior domain with the next. There may be | interconnecting each interior domain with the next. There will | |||
other interior routers (not shown) within each interior domain. | typically be other interior routers (not shown) within each interior | |||
domain. | ||||
In two paragraphs we now briefly recap how pre-congestion | In two paragraphs we now briefly recap how pre-congestion | |||
notification is intended to be used to control flow admission to a | notification is intended to be used to control flow admission to a | |||
large Diffserv region. The first paragraph describes data plane | large Diffserv region. The first paragraph describes data plane | |||
functions and the second describes signalling in the control plane. | functions and the second describes signalling in the control plane. | |||
We omit many details from [I-D.ietf-pcn-architecture] including | We omit many details from [I-D.ietf-pcn-architecture] including | |||
behaviour during routing changes. For brevity here we assume other | behaviour during routing changes. For brevity here we assume other | |||
flows are already in progress across a path through the Diffserv | flows are already in progress across a path through the Diffserv | |||
region before a new one arrives, but how bootstrap works is described | region before a new one arrives, but how bootstrap works is described | |||
in Section 4.3.2. | in Section 4.3.2. | |||
Figure 1 shows a single simplex reserved flow from the sending (Sx) | Figure 1 shows a single simplex reserved flow from the sending (Sx) | |||
end host to the receiving (Rx) end host. The ingress gateway polices | end host to the receiving (Rx) end host. The ingress gateway polices | |||
incoming traffic within its admitted reservation and remarks it to | incoming traffic and colours conforming traffic within an admitted | |||
turn on an ECN-capable codepoint [RFC3168] and the controlled load | reservation to a combination of Diffserv codepoint and ECN field that | |||
(CL) Diffserv codepoint. Together, these codepoints define which | defines the traffic as 'PCN-enabled'. This redefines the meaning of | |||
traffic is entitled to the enhanced scheduling of the CL behaviour | the ECN field as a PCN field, which is largely the same as ECN | |||
aggregate on routers within the Diffserv region. The CL PHB of | [RFC3168], but with slightly different semantics defined in | |||
interior routers consists of a scheduling behaviour and a new ECN | [I-D.moncaster-pcn-baseline-encoding] (or various extensions that are | |||
marking behaviour that we call `pre-congestion notification' [PCN]. | currently experimental). The Diffserv region is called a PCN-region | |||
The CL PHB simply re-uses the definition of expedited forwarding | because all the queues within it are PCN-enabled. This means the | |||
(EF) [RFC3246] for its scheduling behaviour. But it incorporates a | per-hop behaviour they apply to PCN-enabled traffic consists of both | |||
new ECN marking behaviour, which sets the ECN field of an increasing | a scheduling behaviour and a new ECN marking behaviour that we call | |||
number of CL packets to the admission marked (AM) codepoint as they | `pre-congestion notification' [I-D.eardley-pcn-marking-behaviour]. A | |||
approach a threshold rate that is lower than the line rate. The use | PCN-enabled queue typically re-uses the definition of expedited | |||
of virtual queues ensures real queues have hardly built up any | forwarding (EF) [RFC3246] for its scheduling behaviour. The new | |||
congestion delay. The level of marking detected at the egress of the | congestion marking behaviour sets the PCN field of an increasing | |||
Diffserv region is then used by the signalling system in order to | proportion of PCN packets to the PCN-marked (PM) codepoint | |||
determine admission control as follows. | [I-D.moncaster-pcn-baseline-encoding] as their load approaches a | |||
threshold rate that is lower than the line rate | ||||
[I-D.eardley-pcn-marking-behaviour]. This can be achieved with an | ||||
algorithm similar to a token-bucket called a virtual queue. The aim | ||||
is for a queue to start marking PCN traffic to trigger admission | ||||
control before the real queue builds up any congestion delay. The | ||||
level of a queue's pre-congestion marking is detected at the egress | ||||
of the Diffserv region and used by the signalling system to control | ||||
admission of further traffic that would otherwise overload that | ||||
queue, as follows. | ||||
The end-to-end QoS signalling (e.g. RSVP) for a new reservation | The end-to-end QoS signalling for a new reservation (to be concrete | |||
takes one giant hop from ingress to egress gateway, because interior | we will use RSVP) takes one giant hop from ingress to egress gateway, | |||
routers within the Diffserv region are configured to ignore RSVP. | because interior routers within the Diffserv region are configured to | |||
The egress gateway holds flow state because it takes part in the end- | ignore RSVP. The egress gateway holds flow state because it takes | |||
to-end reservation. So it can classify all packets by flow and it | part in the end-to-end reservation. So it can classify all packets | |||
can identify all flows that have the same previous RSVP hop (a CL- | by flow and it can identify all flows that have the same previous | |||
region-aggregate). For each CL-region-aggregate of flows in | RSVP hop (an ingress-egress-aggregate). For each ingress-egress- | |||
progress, the egress gateway maintains a per-packet moving average of | aggregate of flows in progress, the egress gateway maintains a per- | |||
the fraction of pre-congestion-marked traffic. Once an RSVP PATH | packet moving average of the fraction of pre-congestion-marked | |||
message for a new reservation has hopped across the Diffserv region | traffic. Once an RSVP PATH message for a new reservation has hopped | |||
and reached the destination, an RSVP RESV message is returned. As | across the Diffserv region and reached the destination, an RSVP RESV | |||
the RESV message passes, the egress gateway piggy-backs the relevant | message is returned. As the RESV message passes, the egress gateway | |||
pre-congestion level onto it [RSVP-ECN]. Again, interior routers | piggy-backs the relevant pre-congestion level onto it [RSVP-ECN]. | |||
ignore the RSVP message, but the ingress gateway strips off the pre- | Again, interior routers ignore the RSVP message, but the ingress | |||
congestion level. If the pre-congestion level is above a threshold, | gateway strips off the pre-congestion level. If the pre-congestion | |||
the ingress gateway denies admission to the new reservation, | level is above a threshold, the ingress gateway denies admission to | |||
otherwise it returns the original RESV signal back towards the data | the new reservation, otherwise it returns the original RESV signal | |||
sender. | back towards the data sender. | |||
Once a reservation is admitted, its traffic will always receive low | Once a reservation is admitted, its traffic will always receive low | |||
delay service for the duration of the reservation. This is because | delay service for the duration of the reservation. This is because | |||
ingress gateways ensure that traffic not under a reservation cannot | ingress gateways ensure that traffic not under a reservation cannot | |||
pass into the Diffserv region with the CL DSCP set. So non-reserved | pass into the PCN-region with a Diffserv codepoint that gives it | |||
traffic will always be treated with a lower priority PHB at each | priority over the capacity used for PCN traffic. | |||
interior router. And even if some disaster re-routes traffic after | ||||
it has been admitted, if the traffic through any resource tips over a | Even if some disaster re-routes traffic after it has been admitted, | |||
fail-safe threshold, pre-congestion notification will trigger flow | if the PCN traffic through any PCN resource tips over a higher, fail- | |||
pre-emption to very quickly bring every router within the whole | safe threshold, pre-congestion notification can trigger flow | |||
Diffserv region back below its operating point. | termination to very quickly bring every router within the whole PCN- | |||
region back below its operating point. The same marking process and | ||||
ECN codepoint can be used for both admission control and flow | ||||
termination, by simply triggering them at different fractions of | ||||
marking [I-D.charny-pcn-single-marking]. However simulations have | ||||
confirmed that this approach is not robust in all circumstances that | ||||
might typically be encountered, so approaches with two thresholds and | ||||
two congestion encodings are expected to be required in production | ||||
networks. | ||||
The whole admission control system just described deliberately | The whole admission control system just described deliberately | |||
confines per-flow processing to the access edges of the network, | confines per-flow processing to the access edges of the network, | |||
where it will not limit the system's scalability. But ideally we | where it will not limit the system's scalability. But ideally we | |||
want to extend this approach to multiple networks, to take even more | want to extend this approach to multiple networks, to take even more | |||
advantage of its scaling potential. We would still need per-flow | advantage of its scaling potential. We would still need per-flow | |||
processing at the access edges of each network, but not at the high | processing at the access edges of each network, but not at the high | |||
speed interfaces where they interconnect. Even though such an | speed interfaces where they interconnect. Even though such an | |||
admission control system would work technically, it would gain us no | admission control system would work technically, it would gain us no | |||
scaling advantage if each network also wanted to police the rate of | scaling advantage if each network also wanted to police the rate of | |||
each admitted flow for itself--border routers would still have to do | each admitted flow for itself--border routers would still have to do | |||
complex packet operations per-flow anyway, given they don't trust | complex packet operations per-flow anyway, given they don't trust | |||
upstream networks to do their policing for them. | upstream networks to do their policing for them. | |||
This memo describes how to emulate per-flow rate policing using bulk | This memo describes how to emulate per-flow rate policing using bulk | |||
mechanisms at border routers, so the full scalability potential of | mechanisms at border routers. Otherwise the full scalability | |||
pre-congestion notification is not limited by the need for per-flow | potential of pre-congestion notification would be limited by the need | |||
policing mechanisms at borders, which would make borders the most | for per-flow policing mechanisms at borders, which would make borders | |||
cost-critical pinch-points. Then we can achieve the long sought-for | the most cost-critical pinch-points. Instead we can achieve the long | |||
vision of secure Internet-wide bandwidth reservations without needing | sought-for vision of secure Internet-wide bandwidth reservations | |||
per-flow processing at all in core and border routers--where | without over-generous provisioning or per-flow processing. We still | |||
scalability is most critical. | use per-flow processing at the edge routers closest to the end-user, | |||
but we need no per-flow processing at all in core _or border | ||||
routers_--where scalability is most critical. | ||||
4. Re-ECN Protocol for an RSVP (or similar) Transport | 4. Re-ECN Protocol in IP with Two Congestion Marking Levels | |||
4.1. Protocol Overview | 4.1. Protocol Overview | |||
First we need to recap the way routers accumulate congestion marking | First we need to recap the way routers accumulate PCN congestion | |||
along a path. Each ECN-capable router marks some packets with CE, | marking along a path (it accumulates the same way as ECN). Each PCN- | |||
the marking probability increasing with the length of the queue at | capable queue into a link might mark some packets with a PCN-marked | |||
its egress link. The only difference with pre-congestion | (PM) codepoint, the marking probability increasing with the length of | |||
marking [PCN] is that marking is based on the length of a virtual | the queue [I-D.eardley-pcn-marking-behaviour]. With a series of PCN- | |||
queue, so that the real queue occupancy can remain very low. We will | capable routers on a path, a stream of packets accumulates the | |||
use the terms congestion and pre-congestion interchangeably in the | fraction of PCN markings that each queue adds. The combined effect | |||
following unless it is important to distinguish between them. | of the packet marking of all the queues along the path signals | |||
congestion of the whole path to the receiver. So, for example, if | ||||
one queue early in a path is marking 1% of packets and another later | ||||
in a path is marking 2%, flows that pass through both queues will | ||||
experience approximately 3% marking over a sequence of packets. | ||||
With multiple ECN-capable routers on a path, the ECN field | (Note: Whenever the word 'congestion' is used in this document it | |||
accumulates the fraction of CE marking that each router adds. The | should be taken to mean congestion of the virtual resource assigned | |||
combined effect of the packet marking of all the routers along the | for use by PCN-traffic. This avoids cumbersome repetition of the | |||
path signals congestion of the whole path to the receiver. So, for | strictly correct term 'pre-congestion'.) | |||
example, if one router early in a path is marking 1% of packets and | ||||
another later in a path is marking 2%, flows that pass through both | ||||
routers will experience approximately 3% marking. | ||||
The packets crossing an inter-domain trust boundary within the | The packets crossing an inter-domain trust boundary within the PCN- | |||
Diffserv region will all have come from different ingress gateways | region will all have come from different ingress gateways and will | |||
and will all be destined for different egress gateways. We will show | all be destined for different egress gateways. We will show that the | |||
that the key to policing against theft of service is for a border | key to policing against theft of service is for a border router to be | |||
router to be able to directly measure the congestion that is about to | able to directly measure the congestion that is about to be caused by | |||
be caused by the traffic it forwards. That is, it can measure | the packets it forwards into any of the downstream paths between | |||
locally the congestion on each of the downstream paths between itself | itself and the egress gateways that each packet is destined for. The | |||
and the egress gateways that its traffic is destined for. | purpose of the re-PCN protocol is to make packets automatically carry | |||
this information, which then merely needs to be counted locally at | ||||
the border. | ||||
With the original ECN protocol, if CE markings crossing the border | With the original PCN protocol, if a border router, e.g. that between | |||
had been counted over a period, they would have represented the | domains 'A' & 'B' Figure 2), counts PCN markings crossing the border | |||
accumulated upstream congestion that had already been experienced by | over a period, they represent the accumulated congestion that has | |||
those packets. The general idea of re-ECN is for the ingress gateway | already been experienced by those packets (congestion upstream of the | |||
to continuously encode path congestion into the IP header where, in | border, u). The idea of re-PCN is to make the ingress gateway | |||
this case, `path' means from ingress to egress gateway. Then at any | continuously encode the path congestion it knows into a new field in | |||
point on that path (e.g. between domains A & B in Figure 2 below), IP | the IP header (in this case, `path' means the path from the ingress | |||
headers can be monitored to subtract upstream congestion from | to the egress gateway). This new field is _not_ altered by queues | |||
expected path congestion in order to give the expected downstream | along the path. Then at any point on that path (e.g. between domains | |||
congestion still to be experienced until the egress gateway. | 'A' & 'B'), IP headers can be monitored to measure both expected path | |||
congestion, p and upstream congestion, u. Then congestion expected | ||||
downstream of the border, v, can be derived simply by subtracting | ||||
upstream congestion from expected path congestion. That is v ~= p - | ||||
u. | ||||
Importantly, it turns out that there is no need to monitor downstream | Importantly, it turns out that there is no need to monitor downstream | |||
congestion on a per-flow basis. We will show that accounting for it | congestion on a per-flow, per-path or per-aggregate basis. We will | |||
in bulk across all flows will be sufficient. | show that accounting for it in bulk by counting the volume of all | |||
marked packet will be sufficient. | ||||
_____________________________________ | _____________________________________ | |||
_|__ ______ ______ ______ _|__ | _|__ ______ ______ ______ _|__ | |||
| | | A | | B | | C | | | | | | | A | | B | | C | | | | |||
+----+ +-+ +-+ +-+ +-+ +-+ +-+ +----+ | +----+ +-+ +-+ +-+ +-+ +-+ +-+ +----+ | |||
| | |B| |B| |B| |B| |B| |B| | | | | | |B| |B| |B| |B| |B| |B| | | | |||
|Ingr|==|R| |R|==|R| |R|==|R| |R|==|Egr | | |Ingr|==|R| |R|==|R| |R|==|R| |R|==|Egr | | |||
|G/W | | | | |: | | | | | | | | |G/W | | |G/W | | | | |: | | | | | | | | |G/W | | |||
+----+ +-+ +-+: +-+ +-+ +-+ +-+ +----+ | +----+ +-+ +-+: +-+ +-+ +-+ +-+ +----+ | |||
| | | |: | | | | | | | | | | |: | | | | | | | |||
skipping to change at page 16, line 26 | skipping to change at page 18, line 34 | |||
: | : | |||
| : | | | : | | |||
|<-upstream-->:<-expected downstream->| | |<-upstream-->:<-expected downstream->| | |||
| congestion : congestion | | | congestion : congestion | | |||
| u v ~= p - u | | | u v ~= p - u | | |||
| | | | | | |||
|<--- expected path congestion, p --->| | |<--- expected path congestion, p --->| | |||
Figure 2: Re-ECN concept | Figure 2: Re-ECN concept | |||
4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6) | 4.2. Re-PCN Abstracted Network Layer Wire Protocol (IPv4 or v6) | |||
In this section we define the names of the various codepoints of the | In this section we define the names of the various codepoints of the | |||
re-ECN protocol when used with pre-congestion notification, deferring | extended ECN field when used with pre-congestion notification, | |||
description of their semantics to the following sections. But first | deferring description of their semantics to the following sections. | |||
we recap the re-ECN wire protocol proposed in [Re-TCP]. | But first we recap the re-ECN wire protocol proposed in | |||
[I-D.briscoe-tsvwg-re-ecn-tcp]. | ||||
4.2.1. Re-ECN Recap | 4.2.1. Re-ECN Recap | |||
Re-ECN uses the two bit ECN field broadly as in RFC3168 [RFC3168]. | Re-ECN uses the two bit ECN field broadly as in RFC3168 [RFC3168]. | |||
It also uses a new re-ECN extension (RE) flag. The actual position | It also uses a new re-ECN extension (RE) flag. The actual position | |||
of the RE flag is different between IPv4 & v6 headers so we will use | of the RE flag is different between IPv4 & v6 headers so we will use | |||
an abstraction of the IPv4 and v6 wire protocols by just calling it | an abstraction of the IPv4 and v6 wire protocols by just calling it | |||
the RE flag. [Re-TCP] proposes using bit 48 (currently unused) in | the RE flag. [I-D.briscoe-tsvwg-re-ecn-tcp] proposes using bit 48 | |||
the IPv4 header for the RE flag, while for IPv6 it proposes an ECN | (currently unused) in the IPv4 header for the RE flag, while for IPv6 | |||
extension header. | it proposes an congestion extension header. | |||
Unlike the ECN field, the RE flag is intended to be set by the sender | Unlike the ECN field, the RE flag is intended to be set by the sender | |||
and remain unchanged along the path, although it can be read by | and remain unchanged along the path, although it can be read by | |||
network elements that understand the re-ECN protocol. In the | network elements that understand the re-ECN protocol. In the | |||
scenario used in this memo, the ingress gateway acts as a proxy for | scenario used in this memo, the ingress gateway is the 'sender' as | |||
the sender, setting the RE flag as permitted in the specification of | far as the scope of the PCN region is concerned, so it sets the RE | |||
re-ECN. | flag (as permitted for sender proxies in the specification of re- | |||
ECN). | ||||
Note that general-purpose routers do not have to read the RE flag, | Note that general-purpose routers do not have to read the RE flag, | |||
only special policing elements at borders do. And no general-purpose | only special policing elements at borders do. And no general-purpose | |||
routers have to change the RE flag, although the ingress and egress | routers have to change the RE flag, although the ingress and egress | |||
gateways do because in the edge-to-edge deployment model we are | gateways do because in the edge-to-edge deployment model we are | |||
using, they act as proxies for the endpoints. Therefore the RE flag | using, they act as the endpoints of the PCN region. Therefore the RE | |||
does not even have to be visible to interior routers. So the RE flag | flag does not even have to be visible to interior routers. So the RE | |||
has no implications on protocols like MPLS. Congested label | flag has no implications on protocols like MPLS. Congested label | |||
switching routers (LSRs) would have to be able to notify their | switching routers (LSRs) would have to be able to notify their | |||
congestion with an ECN/PCN codepoint in the MPLS shim [RFC5129], but | congestion with an ECN/PCN codepoint in the MPLS shim [RFC5129], but | |||
like any interior IP router, they can be oblivious to the RE flag, | like any interior IP router, they can be oblivious to the RE flag, | |||
which need only be read by border policing functions. | which need only be read by border policing functions. | |||
Although the RE flag is a separate, single bit field, it can be read | Although the RE flag is a separate single bit field, it can be read | |||
as an extension to the two-bit ECN field; the three concatenated bits | as an extension to the two-bit ECN field; the three concatenated bits | |||
in what we will call the extended ECN field (EECN) make eight | in what we will call the extended ECN field (EECN) make eight | |||
codepoints available. When the RE flag setting is "don't care", we | codepoints available. When the RE flag setting is "don't care", we | |||
use the RFC3168 names of the ECN codepoints, but [Re-TCP] proposes | use the RFC3168 names of the ECN codepoints, but | |||
the following six codepoint names for when there is a need to be more | [I-D.briscoe-tsvwg-re-ecn-tcp] proposes the following six codepoint | |||
specific. | names for when there is a need to be more specific. | |||
+--------+-------------+-------+-------------+----------------------+ | +--------+-------------+-------+-------------+----------------------+ | |||
| ECN | RFC3168 | RE | Extended | Re-ECN meaning | | | ECN | RFC3168 | RE | Extended | Re-ECN meaning | | |||
| field | codepoint | flag | ECN | | | | field | codepoint | flag | ECN | | | |||
| | | | codepoint | | | | | | | codepoint | | | |||
+--------+-------------+-------+-------------+----------------------+ | +--------+-------------+-------+-------------+----------------------+ | |||
| 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | |||
| | | | | transport | | | | | | | transport | | |||
| 00 | Not-ECT | 1 | FNE | Feedback not | | | 00 | Not-ECT | 1 | FNE | Feedback not | | |||
| | | | | established | | | | | | | established | | |||
| 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | ||||
| | | | | and RECT | | ||||
| 01 | ECT(1) | 1 | RECT | Re-ECN capable | | ||||
| | | | | transport | | ||||
| 10 | ECT(0) | 0 | --- | Legacy ECN use | | | 10 | ECT(0) | 0 | --- | Legacy ECN use | | |||
| | | | | only | | | | | | | only | | |||
| 10 | ECT(0) | 1 | --CU-- | Currently unused | | | 10 | ECT(0) | 1 | --CU-- | Currently unused | | |||
| | | | | | | | | | | | | | |||
| 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | ||||
| | | | | and RECT | | ||||
| 01 | ECT(1) | 1 | RECT | Re-ECN capable | | ||||
| | | | | transport | | ||||
| 11 | CE | 0 | CE(0) | Congestion | | | 11 | CE | 0 | CE(0) | Congestion | | |||
| | | | | experienced with | | | | | | | experienced with | | |||
| | | | | Re-Echo | | | | | | | Re-Echo | | |||
| 11 | CE | 1 | CE(-1) | Congestion | | | 11 | CE | 1 | CE(-1) | Congestion | | |||
| | | | | experienced | | | | | | | experienced | | |||
+--------+-------------+-------+-------------+----------------------+ | +--------+-------------+-------+-------------+----------------------+ | |||
Table 1: Re-cap of Default Extended ECN Codepoints Proposed for Re- | Table 1: Re-cap of Default Extended ECN Codepoints Proposed for Re- | |||
ECN | ECN | |||
4.2.2. Re-ECN Combined with Pre-Congestion Notification (re-PCN) | 4.2.2. Re-ECN Combined with Pre-Congestion Notification (re-PCN) | |||
As permitted by the ECN specification [RFC3168], a proposal is | As permitted by the ECN specification [RFC3168] and by the guidelines | |||
currently being advanced in the IETF to define different semantics | for specifying alternative semantics for the ECN field [RFC4774], a | |||
for how routers might mark the ECN field of certain packets. The | proposal is currently being advanced in the IETF to define different | |||
idea is to be able to notify congestion when the router's load | semantics for how queues might mark the ECN field of certain packets. | |||
The idea is to be able to notify congestion when the queue's load | ||||
approaches a logical limit, rather than the physical limit of the | approaches a logical limit, rather than the physical limit of the | |||
line. This new marking is called pre-congestion notification [PCN] | line. This new marking is called pre-congestion | |||
and we will use the term PCN-enabled router for a router that can | notification [I-D.eardley-pcn-marking-behaviour] and we will use the | |||
apply pre-congestion notification marking to the ECN fields of | term PCN-enabled queue for a queue that can apply pre-congestion | |||
packets. | notification marking to the ECN fields of packets. | |||
[RFC3168] recommends that a packet's Diffserv codepoint should | [RFC3168] recommends that a packet's Diffserv codepoint should | |||
determine which type of ECN marking it receives. A Diffserv per-hop | determine which type of ECN marking it receives. A PCN-capable | |||
behaviour (PHB) can specify that routers should apply pre-congestion | packet must meet two conditions; it must carry a DSCP that has been | |||
notification marking to PCN-capable packets. We will call this a | associated with PCN marking and it must carry an ECN field that turns | |||
PCN-enhanced PHB. A PCN-capable packet must meet two conditions, it | on PCN marking. | |||
must carry a DSCP that maps to a PCN-enhanced PHB and it must carry | ||||
an ECN field that turns on PCN marking. | ||||
As an example, the controlled load (CL) PHB might specify expedited | As an example, a packet carrying the VOICE-ADMIT | |||
forwarding as its scheduling behaviour and PCN marking as its | [I-D.ietf-tsvwg-admitted-realtime-dscp] DSCP would be associated with | |||
congestion marking behaviour. Then we would say the CL PHB is a PCN- | expedited forwarding [RFC3246] as its scheduling behaviour and pre- | |||
enhanced PHB, and that packets with a DSCP that maps to the CL PHB | congestion notification as its congestion marking behaviour. PCN | |||
and with ECN turned on are PCN-capable packets. | would only be turned on within a PCN-region by an ECN codepoint other | |||
than Not-ECT (00). Then we would describe packets with the VOICE- | ||||
ADMIT DSCP and with ECN turned on as PCN-capable packets. | ||||
[PCN] actually proposes that two logical limits should be used for | [I-D.eardley-pcn-marking-behaviour] actually proposes that two | |||
pre-congestion notification, with the higher limit as a back-stop for | logical limits can be used for pre-congestion notification, with the | |||
dealing with anomalous events. It envisages PCN will be used to | higher limit as a back-stop for dealing with anomalous events. It | |||
admission control inelastic real-time traffic, so marking at the | envisages PCN will be used to admission control inelastic real-time | |||
lower limit will trigger admission control, while at the higher limit | traffic, so marking at the lower limit will trigger admission | |||
it will trigger flow pre-emption. | control, while at the higher limit it will trigger flow termination. | |||
Because it needs two types of congestion marking, PCN seems to need | Because it needs two types of congestion marking, PCN needs four | |||
five states: Not-ECT, ECT (ECN-capable transport), the ECN Nonce, | states: Not PCN-capable (Not-PCN), PCN-capable but not PCN-marked | |||
Admission Marking (AM) and Flow Pre-emption Marking (PM). [PCN] | (NM), Admission Marked (AM) and Flow Termination Marked (TM). A | |||
proposes various alternative encodings of the ECN field, attempting | proposed encoding of the four required PCN states is shown on the | |||
various compromises to fit these five states into the four available | left of Table 2. Note that these codepoints of the ECN field only | |||
ECN codepoints. | take on the semantics of pre-congestion notification if they are | |||
combined with a Diffserv codepoint that the operator has configured | ||||
to be associated with PCN marking. | ||||
One of the five states to make room for is the ECN Nonce [RFC3540], | This encoding only correctly traverses an IP in IP tunnel if the | |||
but the capability we describe in this memo supersedes any need for | ideal decapsulation rules in [I-D.briscoe-tsvwg-ecn-tunnel] are | |||
the Nonce. The ECN Nonce is an elegant scheme, but it only allows a | followed when combining the ECN fields of the outer and inner | |||
sending node (or its proxy) to detect suppression of congestion | headers. If instead the decapsulation rules in [RFC3168] or | |||
marking in the feedback loop. Thus the Nonce requires the sender or | [RFC4301] are followed, any admission marking applied to an outer | |||
its proxy to be trusted to respond correctly to congestion. But this | header will be incorrectly removed on decapsulation at the tunnel | |||
is precisely the main cheat we want to protect against (as well as | egress. | |||
many others). | ||||
One of the compromise protocol encodings that [PCN] explores | The RFC3168 ECN field includes space for the experimental ECN | |||
("Alternative 5") leaves out support for the ECN Nonce. Therefore we | Nonce [RFC3540], which seems to require a fifth state if it is also | |||
use that one. This encoding of PCN markings is shown on the left of | needed with re-PCN. But re-PCN supersedes any need for the Nonce | |||
Table 2. Note that these codepoints of the ECN field only take on | within the PCN-region. The ECN Nonce is an elegant scheme, but it | |||
the semantics of pre-congestion notification if they are combined | only allows a sending node (or its proxy) to detect suppression of | |||
with a Diffserv codepoint that the operator has configured to cause | congestion marking in the feedback loop. Thus the Nonce requires the | |||
PCN marking, by mapping it to a PCN-enhanced PHB. | sender (or in our case the PCN ingress) to be trusted to respond | |||
correctly to congestion. But this is precisely the main cheat we | ||||
want to protect against (as well as many others). Also, the ECN | ||||
nonce only works once the receiver has placed packets in the same | ||||
order as they left the ingress, which cannot be done by an edge node | ||||
without adding unnecessary edge-edge packet ordering. Nonetheless, | ||||
if the ECN nonce were in use outside the PCN region (end-to-end), the | ||||
ingress would have to tunnel the arriving IP header across the PCN | ||||
region ([I-D.ietf-pcn-architecture]). | ||||
For the rest of this memo, we will not distinguish between Admission | For the rest of this memo, to mean either Admission Marking or | |||
Marking and Pre-emption Marking unless we need to be specific. We | Termination Marking we will call both "congestion marking" or "PCN | |||
will call both "congestion marking". With the above encoding, | marking" unless we need to be specific. With the above encoding, | |||
congestion marking can be read to mean any packet with the left-most | congestion marking can be read to mean any packet with the right-most | |||
bit of the ECN field set. | bit of the ECN field set. | |||
The re-ECN protocol can be used to control misbehaving sources | The re-ECN protocol can be used to control misbehaving sources | |||
whether congestion is with respect to a logical threshold (PCN) or | whether congestion is with respect to a logical threshold (PCN) or | |||
the physical line rate (ECN). In either case the RE flag can be used | the physical line rate (ECN). In either case the RE flag can be used | |||
to create an extended ECN field. For PCN-capable packets, the 8 | to create an extended ECN field. For PCN-capable packets, the 8 | |||
possible encodings of this 3-bit extended ECN (EECN) field are | possible encodings of this 3-bit extended PCN (EPCN) field are | |||
defined on the right of Table 2 below. The purposes of these | defined on the right of Table 2 below. The purposes of these | |||
different codepoints will be introduced in subsequent sections. | different codepoints will be introduced in subsequent sections. | |||
+-------+-----------------+------+--------------+-------------------+ | +--------+-----------+-------+-----------------+--------------------+ | |||
| ECN | PCN codepoint | RE | Extended ECN | Re-ECN meaning | | | ECN | PCN | RE | Extended PCN | Re-PCN meaning | | |||
| field | (Alternative 5) | flag | codepoint | | | | field | codepoint | flag | codepoint | | | |||
+-------+-----------------+------+--------------+-------------------+ | +--------+-----------+-------+-----------------+--------------------+ | |||
| 00 | Not-ECT | 0 | Not-RECT | Not | | | 00 | Not-PCN | 0 | Not-PCN | Not PCN-capable | | |||
| | | | | re-ECN-capable | | ||||
| | | | | transport | | | | | | | transport | | |||
| 00 | Not-ECT | 1 | FNE | Feedback not | | | 00 | Not-PCN | 1 | FNE | Feedback not | | |||
| | | | | established | | | | | | | established | | |||
| 01 | ECT(1) | 0 | Re-Echo | Re-echoed | | | 10 | NM | 0 | Re-PCT-Echo | Re-echoed | | |||
| | | | | congestion and | | | | | | | congestion and | | |||
| | | | | RECT | | | | | | | Re-PCT | | |||
| 01 | ECT(1) | 1 | RECT | Re-ECN capable | | | 10 | NM | 1 | Re-PCT | Re-PCN capable | | |||
| | | | | transport | | | | | | | transport | | |||
| 10 | AM | 0 | AM(0) | Admission Marking | | | 01 | AM | 0 | AM(0) | Admission Marking | | |||
| | | | | with Re-Echo | | | | | | | with Re-Echo | | |||
| 10 | AM | 1 | AM(-1) | Admission Marking | | | 01 | AM | 1 | AM(-1) | Admission Marking | | |||
| | | | | | | | | | | | | | |||
| 11 | PM | 0 | PM(0) | Pre-emption | | | 11 | TM | 0 | TM(0) | Termination | | |||
| | | | | Marking with | | | | | | | Marking with | | |||
| | | | | Re-Echo | | | | | | | Re-Echo | | |||
| 11 | PM | 1 | PM(-1) | Pre-emption | | | 11 | TM | 1 | TM(-1) | Termination | | |||
| | | | | Marking | | | | | | | Marking | | |||
+-------+-----------------+------+--------------+-------------------+ | +--------+-----------+-------+-----------------+--------------------+ | |||
Table 2: Extended ECN Codepoints if the Diffserv codepoint uses Pre- | Table 2: Extended ECN Codepoints if the Diffserv codepoint uses Pre- | |||
congestion Notification (PCN) | congestion Notification (PCN) | |||
Note that Table 2 shows re-PCN uses ECT(0) but Table 1 shows re-ECN | ||||
uses ECT(1) for the unmarked state. The difference is intended-- | ||||
although it makes it harder to remember the two schemes, it makes | ||||
them both safer during incremental deployment. | ||||
4.3. Protocol Operation | 4.3. Protocol Operation | |||
4.3.1. Protocol Operation for an Established Flow | 4.3.1. Protocol Operation for an Established Flow | |||
The re-ECN protocol involves a simple tweak to the action of the | The re-PCN protocol involves a simple addition to the action of the | |||
gateway at the ingress edge of the CL region. In the deployment | gateway at the ingress edge of the PCN region (the PCN-ingress-node). | |||
model just described [I-D.ietf-pcn-architecture], for each active | But first we will recap how PCN works without the addition. For each | |||
traffic aggregate across the CL region (CL-region-aggregate) the | active traffic aggregate across a PCN region (ingress-egress- | |||
ingress gateway will hold a fairly recent Congestion-Level-Estimate | aggregate) the egress gateway measures the level of PCN marking and | |||
that the egress gateway will have fed back to it, piggybacked on the | feeds it back to the ingress piggy-backed as 'PCN-feedback- | |||
signalling that sets up each flow. For instance, one aggregate might | information' on any control signal passing between the nodes (e.g. | |||
have been experiencing 3% pre-congestion (that is, congestion marked | every flow set-up, refresh or tear-down). Therefore the ingress | |||
octets whether Admission Marked or Pre-emption Marked). In this | gateway will always hold a fairly recent (typically at most 30sec) | |||
case, the ingress gateway MUST clear the RE flag to "0" for the same | estimate of the ingress-egress-aggregate congestion level. For | |||
percentage of octets of CL-packets (3%) and set it to "1" in the rest | instance, one aggregate might have been experiencing 3% pre- | |||
congestion (that is, congestion marked octets whether Admission | ||||
Marked or Termination Marked). | ||||
To comply with the re-PCN protocol, for all PCN packets in each | ||||
ingress-egress-aggregate the ingress gateway MUST clear the RE flag | ||||
to "0" for the same percentage of octets as its current estimate of | ||||
congestion on the aggregate (e.g. 3%) and set it to "1" in the rest | ||||
(97%). Appendix A.1 gives a simple pseudo-code algorithm that the | (97%). Appendix A.1 gives a simple pseudo-code algorithm that the | |||
ingress gateway may use to do this. | ingress gateway may use to do this. | |||
The RE flag is set and cleared this way round for incremental | The RE flag is set and cleared this way round for incremental | |||
deployment reasons (see [Re-TCP]). To avoid confusion we will use | deployment reasons (see Section 7). To avoid confusion we will use | |||
the term `blanking' (rather than marking) when the RE flag is cleared | the term `blanking' (rather than marking) when the RE flag is cleared | |||
to "0", so we will talk of the `RE blanking fraction' as the fraction | to "0", so we will talk of the `RE blanking fraction' as the fraction | |||
of octets with the RE flag cleared to "0". | of octets with the RE flag cleared to "0". | |||
^ | ^ | |||
| | | | |||
| RE blanking fraction | | RE blanking fraction | |||
3% | +----------------------------+====+ | 3% | +----------------------------+====+ | |||
| | | | | | | | | | |||
2% | | | | | 2% | | | | | |||
skipping to change at page 20, line 51 | skipping to change at page 24, line 6 | |||
| ^ ^ | | | ^ ^ | | |||
ingress | | egress | ingress | | egress | |||
1.00% 2.00% marking fraction | 1.00% 2.00% marking fraction | |||
Figure 3: Example Extended ECN codepoint Marking fractions | Figure 3: Example Extended ECN codepoint Marking fractions | |||
(Imprecise) | (Imprecise) | |||
Figure 3 illustrates our example. The horizontal axis represents the | Figure 3 illustrates our example. The horizontal axis represents the | |||
index of each congestible resource (typically queues) along a path | index of each congestible resource (typically queues) along a path | |||
through the Internet. The two superimposed plots show the fraction | through the Internet. The two superimposed plots show the fraction | |||
of each ECN codepoint observed along this path, assuming there are | of each extended PCN codepoint observed along this path, assuming | |||
two congested routers somewhere within domains A and C. And Table 3 | there are two congested routers somewhere within domains A and C. And | |||
below shows the downstream pre-congestion measured at various border | Table 3 below shows the downstream pre-congestion measured at various | |||
observation points along the path. Figure 4 (later) shows the same | border observation points along the path. Figure 4 (later) shows the | |||
results of these subtractions, but in graphical form like the above | same results of these subtractions, but in graphical form like the | |||
figure. The tabulated figures are actually reasonable approximations | above figure. The tabulated figures are actually reasonable | |||
derived from more precise formulae given in Appendix A of [Re-TCP]. | approximations derived from more precise formulae given in Appendix A | |||
The RE flag is not changed by interior routers, so it can be seen | of [I-D.briscoe-tsvwg-re-ecn-tcp]. The RE flag is not changed by | |||
that it acts as a reference against which the congestion marking | interior routers, so it can be seen that it acts as a reference | |||
fraction can be compared along the path. | against which the congestion marking fraction can be compared along | |||
the path. | ||||
+--------------------------+---------------------------------------+ | +--------------------------+---------------------------------------+ | |||
| Border observation point | Approximate Downstream pre-congestion | | | Border observation point | Approximate Downstream pre-congestion | | |||
+--------------------------+---------------------------------------+ | +--------------------------+---------------------------------------+ | |||
| ingress -- A | 3% - 0% = 3% | | | ingress -- A | 3% - 0% = 3% | | |||
| A -- B | 3% - 1% = 2% | | | A -- B | 3% - 1% = 2% | | |||
| B -- C | 3% - 1% = 2% | | | B -- C | 3% - 1% = 2% | | |||
| C -- egress | 3% - 3% = 0% | | | C -- egress | 3% - 3% = 0% | | |||
+--------------------------+---------------------------------------+ | +--------------------------+---------------------------------------+ | |||
skipping to change at page 21, line 36 | skipping to change at page 24, line 40 | |||
aggregate using the most recent feedback from the relevant egress, | aggregate using the most recent feedback from the relevant egress, | |||
arriving with each new reservation, or each refresh. These updates | arriving with each new reservation, or each refresh. These updates | |||
arrive relatively infrequently compared to the speed with which | arrive relatively infrequently compared to the speed with which | |||
congestion changes. Although this feedback will always be out of | congestion changes. Although this feedback will always be out of | |||
date, on average positive errors should cancel out negative over a | date, on average positive errors should cancel out negative over a | |||
sufficiently long duration. | sufficiently long duration. | |||
In summary, the network adds pre-congestion marking in the forward | In summary, the network adds pre-congestion marking in the forward | |||
data path, the egress feeds its level back to the ingress in RSVP (or | data path, the egress feeds its level back to the ingress in RSVP (or | |||
similar signalling), then the ingress gateway re-echoes it into the | similar signalling), then the ingress gateway re-echoes it into the | |||
forward data path by blanking the RE flag. Hence the name re-ECN. | forward data path by blanking the RE flag. Then at any border within | |||
Then at any border within the Diffserv region, the pre-congestion | the PCN-region, the pre-congestion marking that every passing packet | |||
marking that every passing packet will be expected to experience | will be expected to experience downstream can be measured to be the | |||
downstream can be measured to be the RE blanking fraction minus the | RE blanking fraction minus the congestion marking fraction. | |||
congestion marking fraction. | ||||
4.3.2. Aggregate Bootstrap | 4.3.2. Aggregate Bootstrap | |||
When a new reservation PATH message arrives at the egress, if there | When a new reservation PATH message arrives at the egress, if there | |||
are currently no flows in progress from the same ingress, there will | are currently no flows in progress from the same ingress, there will | |||
be no state maintaining the current level of pre-congestion marking | be no state maintaining the current level of pre-congestion marking | |||
for the aggregate. While the reservation signalling continues onward | for the aggregate. In the case of RSVP reservation signalling, while | |||
towards the receiving host, the egress gateway returns an RSVP | the signal continues onward towards the receiving host, the egress | |||
message to the ingress with a flag [RSVP-ECN] asking the ingress to | gateway can return an RSVP message to the ingress with a | |||
send a specified number of data probes between them. This bootstrap | flag [RSVP-ECN] asking the ingress to send a specified number of data | |||
behaviour is all described in the deployment | probes between them. The more general possibilities for bootstrap | |||
model [I-D.ietf-pcn-architecture]. | behaviour are described in the PCN | |||
architecture [I-D.ietf-pcn-architecture], including using the | ||||
reservation signal itself as a probe. | ||||
However, with our new re-ECN scheme, the ingress does not know what | However, with our new re-PCN scheme, the ingress does not know what | |||
proportion of the data probes should have the RE flag blanked, | proportion of the data probes should have the RE flag blanked, | |||
because it has no estimate yet of pre-congestion for the path across | because it has no estimate yet of pre-congestion for the path across | |||
the Diffserv region. | the PCN-region. | |||
To be conservative, following the guidance for specifying other re- | To be conservative, following the guidance for specifying other re- | |||
ECN transports in [Re-TCP], the ingress SHOULD set the FNE codepoint | ECN transports in [I-D.briscoe-tsvwg-re-ecn-tcp], the ingress SHOULD | |||
of the extended ECN header in all probe packets (Table 2). As per | set the FNE codepoint of the extended PCN header in all probe packets | |||
the deployment model, the egress gateway measures the fraction of | (Table 2). As per the PCN deployment model, the egress gateway | |||
congestion-marked probe octets and feeds back the resulting pre- | measures the fraction of congestion-marked probe octets and feeds | |||
congestion level to the ingress, piggy-backed on the returning | back the resulting pre-congestion level to the ingress, piggy-backed | |||
reservation response (RESV) for the new flow. Probe packets are | on the returning reservation response (RESV) for the new flow. Probe | |||
identifiable by the egress because they have the ingress as the | packets are identifiable by the egress because they carry the FNE | |||
source and the egress as the destination in the IP header. | codepoint. | |||
It may seem inadvisable to expect the FNE codepoint to be set on | It may seem inadvisable to expect the FNE codepoint to be set on | |||
probes, given legacy firewalls etc. might discard such packets | probes, given legacy firewalls etc. might discard such packets | |||
(because this flag had no previous legitimate use). However, in the | (because this flag had no previous legitimate use). However, in the | |||
deployment scenarios envisaged, each domain in the Diffserv region | deployment scenarios envisaged, each domain in the PCN-region has to | |||
has to be explicitly configured to support the controlled load | be explicitly configured to support the admission controlled service. | |||
service. So, before deploying the service, the operator MUST | So, before deploying the service, the operator MUST reconfigure such | |||
reconfigure such a misbehaving middlebox to allow through packets | a badly implemented middlebox to allow through packets with the RE | |||
with the RE flag set. | flag set. | |||
Note that we have said SHOULD rather than MUST for the FNE setting | Note that we have said SHOULD rather than MUST for the FNE setting | |||
behaviour of the ingress for probe packets. This entertains the | behaviour of the ingress for probe packets. This entertains the | |||
possibility of an ingress implementation having the benefit of other | possibility of an ingress implementation having the benefit of other | |||
knowledge of the path, which it re-uses for a newly starting | knowledge of the path, which it re-uses for a newly starting | |||
aggregate. For instance, it may hold cached information from a | aggregate. For instance, it may hold cached information from a | |||
recent use of the aggregate that is still sufficiently current to be | recent use of the aggregate that is still sufficiently current to be | |||
useful. | useful. If not all probe packets are set to FNE, the ingress will | |||
have to ensure probe packets are identifiable by some other means, | ||||
perhaps by using the egress as the destination address. | ||||
It might seem pedantic worrying about these few probe packets, but | It might seem pedantic worrying about these few probe packets, but | |||
this behaviour ensures the system is safe, even if the proportion of | this behaviour ensures the system is safe, even if the proportion of | |||
probe packets becomes large. | probe packets becomes large. | |||
4.3.3. Flow Bootstrap | 4.3.3. Flow Bootstrap | |||
It might be expected that a new flow within an active aggregate would | It might be expected that a new flow within an active aggregate would | |||
need no special bootstrap behaviour. If there was an aggregate | need no special bootstrap behaviour. If there was an aggregate | |||
already in progress between the gateways the new flow was about to | already in progress between the gateways the new flow was about to | |||
skipping to change at page 23, line 21 | skipping to change at page 26, line 32 | |||
that sanctions may be too strict at the interface before the egress | that sanctions may be too strict at the interface before the egress | |||
gateway. It will often be possible to apply sanctions at the | gateway. It will often be possible to apply sanctions at the | |||
granularity of aggregates rather than flows, but in an internetworked | granularity of aggregates rather than flows, but in an internetworked | |||
environment it cannot be guaranteed that aggregates will be | environment it cannot be guaranteed that aggregates will be | |||
identifiable in remote networks. So setting FNE at the start of each | identifiable in remote networks. So setting FNE at the start of each | |||
flow is a safe strategy. For instance, a remote network may have | flow is a safe strategy. For instance, a remote network may have | |||
equal cost multi-path (ECMP) routing enabled, causing different flows | equal cost multi-path (ECMP) routing enabled, causing different flows | |||
between the same gateways to traverse different paths. | between the same gateways to traverse different paths. | |||
After an idle period of more than 1 second, the ingress gateway | After an idle period of more than 1 second, the ingress gateway | |||
SHOULD set the EECN field of the next packet it sends to FNE. This | SHOULD set the EPCN field of the next packet it sends to FNE. This | |||
allows the design of network policers to be deterministic (see | allows the design of network policers to be deterministic (see | |||
[Re-TCP]). | [I-D.briscoe-tsvwg-re-ecn-tcp]). | |||
However, if the ingress gateway can guarantee that the network(s) | However, if the ingress gateway can guarantee that the network(s) | |||
that will carry the flow to its egress gateway all use a common | that will carry the flow to its egress gateway all use a common | |||
identifier for the aggregate (e.g. a single MPLS network without ECMP | identifier for the aggregate (e.g. a single MPLS network without ECMP | |||
routing), it MAY NOT set FNE when it adds a new flow to an active | routing), it MAY NOT set FNE when it adds a new flow to an active | |||
aggregate. And an FNE packet need only be sent if a whole aggregate | aggregate. And an FNE packet need only be sent if a whole aggregate | |||
has been idle for more than 1 second. | has been idle for more than 1 second. | |||
4.3.4. Router Forwarding Behaviour | 4.3.4. Router Forwarding Behaviour | |||
Adding re-ECN works well without modifying the forwarding behaviour | Adding re-PCN works well with the regular PCN forwarding behaviour of | |||
of any routers. However, below, two changes are proposed when | interior queues. However, below, two optional changes are proposed | |||
forwarding packets with a per-hop-behaviour that requires pre- | when forwarding packets with a per-hop-behaviour that requires pre- | |||
congestion notification: | congestion notification: | |||
Preferential drop: When a router cannot avoid dropping ECN-capable | Preferential drop: When a router cannot avoid dropping PCN-capable | |||
packets, preferential dropping of packets with different extended | packets, preferential dropping of packets with different extended | |||
ECN codepoints SHOULD be implemented between packets within a PHB | PCN codepoints SHOULD be implemented between packets within a PHB | |||
that uses PCN marking. The drop preference order to use is | that uses PCN marking. The drop preference order to use is | |||
defined in Table 4. Note that to reduce configuration complexity, | defined in Table 4. Note that to reduce configuration complexity, | |||
Re-Echo and FNE MAY be given the same drop preference, but if | Re-PCT-Echo and FNE MAY be given the same drop preference, but if | |||
feasible, FNE should be dropped in preference to Re-Echo. | feasible, FNE SHOULD be dropped in preference to Re-PCT-Echo. | |||
+---------+-------+----------------+---------+----------------------+ | ||||
| ECN | RE | Extended ECN | Drop | Re-ECN meaning | | ||||
| field | flag | codepoint | Pref | | | ||||
+---------+-------+----------------+---------+----------------------+ | ||||
| 01 | 0 | Re-Echo | 5/4 | Re-echoed congestion | | ||||
| | | | | and RECT | | ||||
| 00 | 1 | FNE | 4 | Feedback not | | ||||
| | | | | established | | ||||
| 01 | 1 | RECT | 3 | Re-ECN capable | | ||||
| | | | | transport | | ||||
| 10 | 0 | AM(0) | 3 | Admission Marking | | ||||
| | | | | with Re-Echo | | ||||
| 10 | 1 | AM(-1) | 3 | Admission Marking | | ||||
| | | | | | | ||||
| 11 | 0 | PM(0) | 2 | Pre-emption Marking | | ||||
| | | | | with Re-Echo | | ||||
| 11 | 1 | PM(-1) | 2 | Pre-emption Marking | | ||||
| | | | | | | ||||
| 00 | 0 | Not-RECT | 1 | Not re-ECN-capable | | ||||
| | | | | transport | | ||||
+---------+-------+----------------+---------+----------------------+ | ||||
Table 4: Drop Preference of Extended ECN Codepoints (1 = drop 1st) | ||||
Given this proposal is being advanced at the same time as PCN | If this proposal were advanced at the same time as PCN itself, we | |||
itself, we strongly RECOMMEND that preferential drop based on | would recommend that preferential drop based on extended PCN | |||
extended ECN codepoint is added to router forwarding at the same | codepoint SHOULD be added to router forwarding at the same time as | |||
time as PCN marking. Preferential dropping can be difficult to | PCN marking. Preferential dropping can be difficult to implement, | |||
implement, but we strongly RECOMMEND this security-related re-ECN | but we RECOMMEND this security-related re-PCN improvement where | |||
improvement where feasible as it is an effective defence against | feasible as it is an effective defence against flooding attacks. | |||
flooding attacks. | ||||
Marking vs. Drop: We propose that PCN-routers SHOULD inspect the RE | Marking vs. Drop: We propose that PCN-routers SHOULD inspect the RE | |||
flag as well as the ECN field to decide whether to drop or mark | flag as well as the ECN field to decide whether to drop or mark | |||
PCN DSCPs. They MUST choose drop if the codepoint of this | PCN DSCPs. They MUST choose drop if the codepoint of this | |||
extended ECN field is Not-RECT. Otherwise they SHOULD mark | extended ECN field is Not-PCN. Otherwise they SHOULD mark | |||
(unless, of course, buffer space is exhausted). | (unless, of course, buffer space is exhausted). | |||
A PCN-capable router MUST NOT ever congestion mark a packet | A PCN-capable router MUST NOT ever congestion mark a packet | |||
carrying the Not-RECT codepoint because the transport will only | carrying the Not-PCN codepoint because the transport will only | |||
understand drop, not congestion marking. But a PCN-capable router | understand drop, not congestion marking. But a PCN-capable router | |||
can mark rather than drop an FNE packet, even though its ECN field | can mark rather than drop an FNE packet, even though its ECN field | |||
when looked at in isolation is '00' which appears to be a legacy | when looked at in isolation is '00' which appears to be a legacy | |||
Not-ECT packet. Therefore, if a packet's RE flag is '1', even if | Not-ECT packet. Therefore, if a packet's RE flag is '1', even if | |||
its ECN field is '00', a PCN-enabled router SHOULD use congestion | its ECN field is '00', a PCN-enabled router SHOULD use congestion | |||
marking. This allows the `feedback not established' (FNE) | marking. This allows the `feedback not established' (FNE) | |||
codepoint to be used for probe packets, in order to pick up PCN | codepoint to be used for probe packets, in order to pick up PCN | |||
marking when bootstrapping an aggregate. | marking when bootstrapping an aggregate. | |||
ECN marking rather than dropping of FNE packets MUST only be | PCN marking rather than dropping of FNE packets MUST only be | |||
deployed in controlled environments, such as that in | deployed in controlled environments, such as that in | |||
[I-D.ietf-pcn-architecture], where the presence of an egress node | [I-D.ietf-pcn-architecture], where the presence of an egress node | |||
that understands ECN marking is assured. Congestion events might | that understands PCN marking is assured. Congestion events might | |||
otherwise be ignored if the receiver only understands drop, rather | otherwise be ignored if the receiver only understands drop, rather | |||
than ECN marking. This is because there is no guarantee that ECN | than PCN marking. This is because there is no guarantee that PCN | |||
capability has been negotiated if feedback is not established | capability has been negotiated if feedback is not established | |||
(FNE). Also, [Re-TCP] places the strong condition that a router | (FNE). Also, [I-D.briscoe-tsvwg-re-ecn-tcp] places the strong | |||
MUST apply drop rather than marking to FNE packets unless it can | condition that a router MUST apply drop rather than marking to FNE | |||
guarantee that FNE packets are rate limited either locally or | packets unless it can guarantee that FNE packets are rate limited | |||
upstream. | either locally or upstream. | |||
+---------+-------+-----------------+---------+---------------------+ | ||||
| PCN | RE | Extended PCN | Drop | Re-PCN meaning | | ||||
| field | flag | codepoint | Pref | | | ||||
+---------+-------+-----------------+---------+---------------------+ | ||||
| 10 | 0 | Re-PCT-Echo | 5/4 | Re-echoed | | ||||
| | | | | congestion and | | ||||
| | | | | Re-PCT | | ||||
| 00 | 1 | FNE | 4 | Feedback not | | ||||
| | | | | established | | ||||
| 10 | 1 | Re-PCT | 3 | Re-PCN capable | | ||||
| | | | | transport | | ||||
| 01 | 0 | AM(0) | 3 | Admission Marking | | ||||
| | | | | with Re-Echo | | ||||
| 01 | 1 | AM(-1) | 3 | Admission Marking | | ||||
| | | | | | | ||||
| 11 | 0 | TM(0) | 2 | Termination Marking | | ||||
| | | | | with Re-Echo | | ||||
| 11 | 1 | TM(-1) | 2 | Termination Marking | | ||||
| | | | | | | ||||
| 00 | 0 | Not-PCN | 1 | Not PCN-capable | | ||||
| | | | | transport | | ||||
+---------+-------+-----------------+---------+---------------------+ | ||||
Table 4: Drop Preference of Extended ECN Codepoints (1 = drop 1st) | ||||
4.3.5. Extensions | 4.3.5. Extensions | |||
If a different signalling system, such as NSIS, were used, but it | If a different signalling system, such as NSIS, were used but it | |||
provided admission control in a similar way, using pre-congestion | provided admission control in a similar way using pre-congestion | |||
notification (e.g. Arumaithurai [I-D.arumaithurai-nsis-pcn] or | notification (e.g. Arumaithurai [I-D.arumaithurai-nsis-pcn] or | |||
RMD [I-D.ietf-nsis-rmd]) we believe re-ECN could be used to protect | RMD [I-D.ietf-nsis-rmd]), we believe re-PCN could be used to protect | |||
against misbehaving networks in the same way as proposed above. | against misbehaving networks in the same way as proposed above. | |||
5. Emulating Border Policing with Re-ECN | 5. Emulating Border Policing with Re-ECN | |||
Note that the re-ECN protocol described in Section 4 above would | The following sections are informative, not normative. The re-PCN | |||
require standardisation, whereas operators acting in their own | protocol described in Section 4 above would require standardisation, | |||
interests would be expected to deploy policing and monitoring | whereas operators acting in their own interests would be expected to | |||
functions similar to those proposed in the sections below without any | deploy policing and monitoring functions similar to those proposed in | |||
further need for standardisation by the IETF. Flexibility is | the sections below without any further need for standardisation by | |||
expected in exactly how policing and monitoring is done. | the IETF. Flexibility is expected in exactly how policing and | |||
monitoring is done. | ||||
5.1. Informal Terminology | 5.1. Informal Terminology | |||
In the rest of this memo, where the context makes it clear, we will | In the rest of this memo, where the context makes it clear, we will | |||
sometimes loosely use the term `congestion' rather than using the | sometimes loosely use the term `congestion' rather than using the | |||
stricter `downstream pre-congestion'. Also we will loosely talk of | stricter `downstream pre-congestion'. Also we will loosely talk of | |||
positive or negative flows, meaning flows where the moving average of | positive or negative flows, meaning flows where the moving average of | |||
the downstream pre-congestion metric is persistently positive or | the downstream pre-congestion metric is persistently positive or | |||
negative. The notion of a negative metric arises because it is | negative. The notion of a negative metric arises because it is | |||
derived by subtracting one metric from another. Of course actual | derived by subtracting one metric from another. Of course actual | |||
downstream congestion cannot be negative, only the metric can | downstream congestion cannot be negative, only the metric can | |||
(whether due to time lags or deliberate malice). | (whether due to time lags or deliberate malice). | |||
Just as we will loosely talk of positive and negative flows, we will | Just as we will loosely talk of positive and negative flows, we will | |||
also talk of positive or negative packets, meaning packets that | also talk of positive or negative packets, meaning packets that | |||
contribute positively or negatively to downstream pre-congestion. | contribute positively or negatively to downstream pre-congestion. | |||
Therefore packets can be considered to have a `worth' of +1, 0 or -1, | Therefore packets can be considered to have a `worth' of +1, 0 or -1, | |||
which, when multiplied by their size, indicates their contribution to | which, when multiplied by their size, indicates their contribution to | |||
downstream congestion. Packets will usually be sent with a worth of | downstream congestion. Packets will usually be initialised by the | |||
0. Blanking the RE flag increments the worth of a packet to +1. | PCN ingress with a worth of 0. Blanking the RE flag increments the | |||
Congestion marking a packet decrements its worth (whether admission | worth of a packet to +1. Congestion marking a packet decrements its | |||
marking or pre-emption marking). Congestion marking a previously | worth (whether admission marking or termination marking). Congestion | |||
blanked packet cancel out the positive and negative worth of each | marking a previously blanked packet cancels out the positive worth | |||
marking (a worth of 0). The FNE codepoint is an exception. It has | with the negative worth of the congestion marking (resulting in a | |||
the same positive worth as a packet with the Re-Echo codepoint. The | packet worth 0). The FNE codepoint is an exception. It has the same | |||
table below specifies unambiguously the worth of each extended ECN | positive worth as a packet with the Re-PCT-Echo codepoint. The table | |||
below specifies unambiguously the worth of each extended PCN | ||||
codepoint. Note the order is different from the previous table to | codepoint. Note the order is different from the previous table to | |||
emphasise how congestion marking processes decrement the worth. | emphasise how congestion marking processes decrement the worth (with | |||
the exception of FNE). | ||||
+---------+-------+-----------------+-------+-----------------------+ | +---------+-------+------------------+-------+----------------------+ | |||
| ECN | RE | Extended ECN | Worth | Re-ECN meaning | | | ECN | RE | Extended PCN | Worth | Re-PCN meaning | | |||
| field | flag | codepoint | | | | | field | flag | codepoint | | | | |||
+---------+-------+-----------------+-------+-----------------------+ | +---------+-------+------------------+-------+----------------------+ | |||
| 00 | 0 | Not-RECT | n/a | Not re-ECN-capable | | | 00 | 0 | Not-PCN | n/a | Not PCN-capable | | |||
| | | | | transport | | | | | | | transport | | |||
| 01 | 0 | Re-Echo | +1 | Re-echoed congestion | | | 10 | 0 | Re-PCT-Echo | +1 | Re-echoed congestion | | |||
| | | | | and RECT | | | | | | | and Re-PCT | | |||
| 10 | 0 | AM(0) | 0 | Admission Marking | | | 01 | 0 | AM(0) | 0 | Admission Marking | | |||
| | | | | with Re-Echo | | | | | | | with Re-Echo | | |||
| 11 | 0 | PM(0) | 0 | Pre-emption Marking | | | 11 | 0 | TM(0) | 0 | Termination Marking | | |||
| | | | | with Re-Echo | | | | | | | with Re-Echo | | |||
| 00 | 1 | FNE | +1 | Feedback not | | | 00 | 1 | FNE | +1 | Feedback not | | |||
| | | | | established | | | | | | | established | | |||
| 01 | 1 | RECT | 0 | Re-ECN capable | | | 10 | 1 | Re-PCT | 0 | Re-PCN capable | | |||
| | | | | transport | | | | | | | transport | | |||
| 10 | 1 | AM(-1) | -1 | Admission Marking | | | 01 | 1 | AM(-1) | -1 | Admission Marking | | |||
| | | | | | | | | | | | | | |||
| 11 | 1 | PM(-1) | -1 | Pre-emption Marking | | | 11 | 1 | TM(-1) | -1 | Termination Marking | | |||
+---------+-------+-----------------+-------+-----------------------+ | +---------+-------+------------------+-------+----------------------+ | |||
Table 5: 'Worth' of Extended ECN Codepoints | Table 5: 'Worth' of Extended ECN Codepoints | |||
5.2. Policing Overview | 5.2. Policing Overview | |||
It will be recalled that downstream congestion can be found by | It will be recalled that downstream congestion can be found by | |||
subtracting upstream congestion from path congestion. Figure 4 | subtracting upstream congestion from path congestion. Figure 4 | |||
displays the difference between the two plots in Figure 3 to show | displays the difference between the two plots in Figure 3 to show | |||
downstream pre-congestion across the same path through the Internet. | downstream pre-congestion across the same path through the Internet. | |||
To emulate border policing, the general idea is for each domain to | To emulate border policing, the general idea is for each domain to | |||
skipping to change at page 27, line 30 | skipping to change at page 30, line 46 | |||
1.00% 2.00%: pre-congestion | 1.00% 2.00%: pre-congestion | |||
| | | | |||
sanctions | sanctions | |||
Figure 4: Policing Framework, showing creation of opposing pressures | Figure 4: Policing Framework, showing creation of opposing pressures | |||
to under-declare and over-declare downstream pre-congestion, using | to under-declare and over-declare downstream pre-congestion, using | |||
penalties and sanctions | penalties and sanctions | |||
These penalties seem to encourage everyone to understate downstream | These penalties seem to encourage everyone to understate downstream | |||
congestion in order to reduce the penalties they incur. But a | congestion in order to reduce the penalties they incur. But a | |||
balancing pressure is introduced by the last domain, which applies | balancing pressure is introduced by the last domain (strictly by any | |||
sanctions to flows if downstream congestion goes negative before the | domain), which applies sanctions to flows if downstream congestion | |||
egress gateway. The upward arrow at Domain C's border with the | goes negative before the egress gateway. The upward arrow at Domain | |||
egress gateway represents the incentive the sanctions would create to | C's border with the egress gateway represents the incentive the | |||
prevent negative traffic. The same upward pressure can be applied at | sanctions would create to prevent negative traffic. The same upward | |||
any domain border (arrows not shown). | pressure can be applied at any domain border (arrows not shown). | |||
Any flow that persistently goes negative by the time it leaves a | Any flow that persistently goes negative by the time it leaves a | |||
domain must not have been marked correctly in the first place. A | domain must not have been marked correctly in the first place. A | |||
domain that discovers such a flow can adopt a range of strategies to | domain that discovers such a flow can adopt a range of strategies to | |||
protect itself. Which strategy it uses will depend on policy, | protect itself. Which strategy it uses will depend on policy, | |||
because it cannot immediately assume malice--there may be an innocent | because it cannot immediately assume malice--there may be an innocent | |||
configuration error somewhere in the system. | configuration error somewhere in the system. | |||
This memo does not propose to standardise any particular mechanism to | This memo does not propose to standardise any particular mechanism to | |||
detect persistently negative flows, but Section 5.5 does give | detect persistently negative flows, but Section 5.5 does give | |||
examples. Note that we have used the term flow, but there will be no | examples. Note that we have used the term flow, but there will be no | |||
need to bury into the transport layer for port numbers; identifiers | need to bury into the transport layer for port numbers; identifiers | |||
visible in the network layer will be sufficient (IP address pair, | visible in the network layer will be sufficient (IP address pair, | |||
DSCP, protocol ID). The appendix also gives a mechanism to bound the | DSCP, protocol ID). The appendix also gives a mechanism to limit the | |||
required flow state, preventing state exhaustion attacks. | required flow state, preventing state exhaustion attacks. | |||
Of course, some domains may trust other domains to comply with | Of course, some domains may trust other domains to comply with | |||
admission control without applying sanctions or penalties. In these | admission control without applying sanctions or penalties. In these | |||
cases, the protocol should still be used but no penalties need be | cases, the protocol should still be used but no penalties need be | |||
applied. The re-ECN protocol ensures downstream pre-congestion | applied. The re-PCN protocol ensures downstream pre-congestion | |||
marking is passed on correctly whether or not penalties are applied | marking is passed on correctly whether or not penalties are applied | |||
to it, so the system works just as well with a mixture of some | to it, so the system works just as well with a mixture of some | |||
domains trusting each other and others not. | domains trusting each other and others not. | |||
Providers should be free to agree the contractual terms they wish | Providers should be free to agree the contractual terms they wish | |||
between themselves, so this memo does not propose to standardise how | between themselves, so this memo does not propose to standardise how | |||
these penalties would be applied. It is sufficient to standardise | these penalties would be applied. It is sufficient to standardise | |||
the re-ECN protocol so the downstream pre-congestion metric is | the re-PCN protocol so the downstream pre-congestion metric is | |||
available if providers choose to use it. However, the next section | available if providers choose to use it. However, the next section | |||
(Section 5.3) gives some examples of how these penalties might be | (Section 5.3) gives some examples of how these penalties might be | |||
implemented. | implemented. | |||
5.3. Pre-requisite Contractual Arrangements | 5.3. Pre-requisite Contractual Arrangements | |||
The re-ECN protocol has been chosen to solve the policing problem | The re-PCN protocol has been chosen to solve the policing problem | |||
because it embeds a downstream pre-congestion metric in passing CL | because it embeds a downstream pre-congestion metric in passing PCN | |||
traffic that is difficult to lie about and can be measured in bulk. | traffic that is difficult to lie about and can be measured in bulk. | |||
The ability to emulate border policing depends on network operators | The ability to emulate border policing depends on network operators | |||
choosing to use this metric as one of the elements in their contracts | choosing to use this metric as one of the elements in their contracts | |||
with each other. | with each other. | |||
Already many inter-domain agreements involve a capacity and a usage | Already many inter-domain agreements involve a capacity and a usage | |||
element. The usage element may be based on volume or various | element. The usage element may be based on volume or various | |||
measures of peak demand. We expect that those network operators who | measures of peak demand. We expect that those network operators who | |||
choose to use pre-congestion notification for admission control would | choose to use pre-congestion notification for admission control would | |||
also be willing to consider using this downstream pre-congestion | also be willing to consider using this downstream pre-congestion | |||
metric as a usage element in their interconnection contracts for | metric as a usage element in their interconnection contracts for | |||
admission controlled (CL) traffic. | admission controlled (PCN) traffic. | |||
Congestion (or pre-congestion) has the dimension of [octet], being | Congestion (or pre-congestion) has the dimension of [octet], being | |||
the product of volume transferred [octet] and the congestion fraction | the product of volume transferred [octet] and the congestion fraction | |||
[dimensionless], which is the fraction of the offered load that the | [dimensionless], which is the fraction of the offered load that the | |||
network isn't able to serve (or would rather not serve in the case of | network isn't able to serve (or would rather not serve in the case of | |||
pre-congestion). Measuring downstream congestion gives a measure of | pre-congestion). Measuring downstream congestion gives a measure of | |||
the volume transferred but modulated by congestion expected | the volume transferred but modulated by congestion expected | |||
downstream. So volume transferred during off-peak periods counts as | downstream. So volume transferred during off-peak periods counts as | |||
nearly nothing, while volume transferred at peak times counts very | nearly nothing, while volume transferred at peak times or over | |||
highly. The re-ECN protocol allows one network to measure how much | temporarily congested links counts very highly. The re-PCN protocol | |||
pre-congestion has been `dumped' into it by another network. And | allows one network to measure how much pre-congestion has been | |||
then in turn how much of that pre-congestion it dumped into the next | `dumped' into it by another network. And then in turn how much of | |||
downstream network. | that pre-congestion it dumped into the next downstream network. | |||
Section 5.6 describes mechanisms for calculating border penalties | Section 5.6 describes mechanisms for calculating border penalties | |||
referring to Appendix A.2 for suggested metering algorithms for | referring to Appendix A.2 for suggested metering algorithms for | |||
downstream congestion at a border router. Conceptually, it could | downstream congestion at a border router. Conceptually, it could | |||
hardly be simpler. It broadly involves accumulating the volume of | hardly be simpler. It broadly involves accumulating the volume of | |||
packets with the RE flag blanked and the volume of those with | packets with the RE flag blanked and the volume of those with | |||
congestion marking then subtracting the two. | congestion marking then subtracting the two. | |||
Once this downstream pre-congestion metric is available, operators | Once this downstream pre-congestion metric is available, operators | |||
are free to choose how they incorporate it into their interconnection | are free to choose how they incorporate it into their interconnection | |||
skipping to change at page 30, line 9 | skipping to change at page 33, line 25 | |||
other words, penalties are always paid in the same direction as the | other words, penalties are always paid in the same direction as the | |||
data, and never against the data flow, even if downstream congestion | data, and never against the data flow, even if downstream congestion | |||
seems to be negative. This is consistent with the definition of | seems to be negative. This is consistent with the definition of | |||
physical congestion; when a resource is underutilised, it is not | physical congestion; when a resource is underutilised, it is not | |||
negatively congested. Its congestion is just zero. So, although | negatively congested. Its congestion is just zero. So, although | |||
short periods of negative marking can be tolerated to correct | short periods of negative marking can be tolerated to correct | |||
temporary over-declarations due to lags in the feedback system, | temporary over-declarations due to lags in the feedback system, | |||
persistent downstream negative congestion can have no physical | persistent downstream negative congestion can have no physical | |||
meaning and therefore must signify a problem. The incentive for | meaning and therefore must signify a problem. The incentive for | |||
domains not to tolerate persistently negative traffic depends on this | domains not to tolerate persistently negative traffic depends on this | |||
principle that penalties must never be paid against the data flow. | principle that negative penalties must never be paid for negative | |||
congestion. | ||||
Also note that at the last egress of the Diffserv region, domain C | Also note that at the last egress of the PCN-region, domain C should | |||
should not agree to pay any penalties to the egress gateway for pre- | not agree to pay any penalties to the egress gateway for pre- | |||
congestion passed to the egress gateway. Downstream pre-congestion | congestion passed to the egress gateway. Downstream pre-congestion | |||
to the egress gateway should have reached zero here. If domain C | to the egress gateway should have reached zero here. If domain C | |||
were to agree to pay for any remaining downstream pre-congestion, it | were to agree to pay for any remaining downstream pre-congestion, it | |||
would give the egress gateway an incentive to over-declare pre- | would give the egress gateway an incentive to over-declare pre- | |||
congestion feedback and take the resulting profit from domain C. | congestion feedback and take the resulting profit from domain C. | |||
To focus the discussion, from now on, unless otherwise stated, we | To focus the discussion, from now on, unless otherwise stated, we | |||
will assume a downstream network charges its upstream neighbour in | will assume a downstream network charges its upstream neighbour in | |||
proportion to the pre-congestion it sends (V_b in the notation of | proportion to the pre-congestion it sends (V_b in the notation of | |||
Appendix A.2). Effectively tiered thresholds would be just more | Appendix A.2). Effectively tiered thresholds would be just more | |||
coarse-grained approximations of the fine-grained case we choose to | coarse-grained approximations of the fine-grained case we choose to | |||
examine. If these neighbours had previously agreed that the (fixed) | examine. If these neighbours had previously agreed that the (fixed) | |||
price per octet of pre-congestion would be L, then the bill at the | price per octet of pre-congestion would be L, then the bill at the | |||
end of the month would simply be the product L*V_b, plus any fixed | end of the month would simply be the product L*V_b, plus any fixed | |||
charges they may also have agreed. | charges they may also have agreed. | |||
We are well aware that the IETF tries to avoid standardising | We are well aware that the IETF tries to avoid standardising | |||
technology that depends on a particular business model. Indeed, this | technology that depends on a particular business model. Indeed, this | |||
principle is at the heart of all our own work. Our aim here is to | principle is at the heart of all our own work. Our aim here is to | |||
make a new metric available that we believe is superior to all | make a new metric available that we believe is superior to all | |||
existing metrics. Then, our aim is to show that border policing can | existing metrics. Then, our aim is to show that bulk border policing | |||
at least work with the one model we have just outlined. We assume | can at least work with the one model we have just outlined. Of | |||
that operators might then experiment with the metric in other models. | course, operators are free to complement this pre-congestion-based | |||
Of course, operators are free to complement this pre-congestion-based | ||||
usage element of their charges with traditional capacity charging, | usage element of their charges with traditional capacity charging, | |||
and we expect they will. | and we expect they will. But if operators don't want to use this | |||
business model at all, they don't have to do bulk border policing. | ||||
We also assume that operators might experiment with the metric in | ||||
other models. | ||||
Also note well that everything we discuss in this memo only concerns | Also note well that everything we discuss in this memo only concerns | |||
interconnection within the Diffserv region. ISPs are free to sell or | interconnection within the PCN-region. ISPs are free to sell or give | |||
give away reservations however they want on the retail market. But | away reservations however they want on the retail market. But of | |||
of course, interconnection charges will have a bearing on that. | course, interconnection charges will have a bearing on that. Indeed, | |||
Indeed, in the present scenario, the ingress gateway effectively | in the present scenario, the ingress gateway effectively sells | |||
sells reservations on one side and buys congestion penalties on the | reservations on one side and buys congestion penalties on the other. | |||
other. As congestion rises, one can imagine the gateway discovering | As congestion rises, one can imagine the gateway discovering that | |||
that congestion penalties have risen higher than the (probably fixed) | congestion penalties have risen higher than the (probably fixed) | |||
revenue it will earn from selling the next flow reservation. This | revenue it will earn from selling the next flow reservation. This | |||
encourages the gateway to cut its losses by blocking new calls, which | encourages the gateway to cut its losses by blocking new calls, which | |||
is why we believe downstream congestion penalties can emulate per- | is why we believe downstream congestion penalties can emulate per- | |||
flow rate policing at borders, as the next section explains. | flow rate policing at borders, as the next section explains. | |||
5.4. Emulation of Per-Flow Rate Policing: Rationale and Limits | 5.4. Emulation of Per-Flow Rate Policing: Rationale and Limits | |||
The important feature of charging in proportion to congestion volume | The important feature of charging in proportion to congestion volume | |||
is that the penalty aggregates and disaggregates correctly along with | is that the penalty aggregates and disaggregates correctly along with | |||
packet flows. This is because the penalty rises linearly with bit | packet flows. This is because the penalty rises linearly with bit | |||
skipping to change at page 31, line 36 | skipping to change at page 35, line 7 | |||
utilisation of a particular resource. So if someone tries to push | utilisation of a particular resource. So if someone tries to push | |||
another flow into a path that is already signalling enough pre- | another flow into a path that is already signalling enough pre- | |||
congestion to warrant admission control, the penalty will be a lot | congestion to warrant admission control, the penalty will be a lot | |||
greater than it would have been to add the same flow to a less | greater than it would have been to add the same flow to a less | |||
congested path. This makes the incentive system fairly insensitive | congested path. This makes the incentive system fairly insensitive | |||
to the actual level of pre-congestion for triggering admission | to the actual level of pre-congestion for triggering admission | |||
control that each ingress chooses. The deterrent against exceeding | control that each ingress chooses. The deterrent against exceeding | |||
whatever threshold is chosen rises very quickly with a small amount | whatever threshold is chosen rises very quickly with a small amount | |||
of cheating. | of cheating. | |||
These are the properties that allow re-ECN to emulate per-flow border | These are the properties that allow re-PCN to emulate per-flow border | |||
policing of both rate and admission control. It is not a perfect | policing of both rate and admission control. It is not a perfect | |||
emulation of per-flow border policing, but we claim it is sufficient | emulation of per-flow border policing, but we claim it is sufficient | |||
to at least ensure the cost to others of a cheat is borne by the | to at least ensure the cost to others of a cheat is borne by the | |||
cheater, because the penalties are at least proportionate to the | cheater, because the penalties are at least proportionate to the | |||
level of the cheat. If an edge network operator is selling | level of the cheat. If an edge network operator is selling | |||
reservations at a large profit over the congestion cost, these pre- | reservations at a large profit over the congestion cost, these pre- | |||
congestion penalties will not be sufficient to ensure networks in the | congestion penalties will not be sufficient to ensure networks in the | |||
middle get a share of those profits, but at least they can cover | middle get a share of those profits, but at least they can cover | |||
their costs. | their costs. | |||
skipping to change at page 32, line 20 | skipping to change at page 35, line 40 | |||
the price L (per octet) of pre-congestion would be about 1000 times | the price L (per octet) of pre-congestion would be about 1000 times | |||
the previously used (per octet) price for volume. We should add that | the previously used (per octet) price for volume. We should add that | |||
a switch to pre-congestion is unlikely to exactly maintain the same | a switch to pre-congestion is unlikely to exactly maintain the same | |||
overall level of usage charges, but this argument will be | overall level of usage charges, but this argument will be | |||
approximately true, because usage charge will rise to at least the | approximately true, because usage charge will rise to at least the | |||
level the market finds necessary to push back against usage. | level the market finds necessary to push back against usage. | |||
From the above example it can be seen why a 1000x higher price will | From the above example it can be seen why a 1000x higher price will | |||
make operators become acutely sensitive to the congestion they cause | make operators become acutely sensitive to the congestion they cause | |||
in other networks, which is of course the desired effect; to | in other networks, which is of course the desired effect; to | |||
encourage networks to _control_ the congestion they allow their users | encourage networks to _avoid_ the congestion they allow their users | |||
to cause to others. | to cause to others. | |||
If any network sends even one flow at higher rate, they will | If any network sends even one flow at higher rate, they will | |||
immediately have to pay proportionately more usage charges. Because | immediately have to pay proportionately more usage charges. Because | |||
there is no knowledge of reservations within the Diffserv region, no | there is no knowledge of reservations within the PCN-region, no | |||
interior router can police whether the rate of each flow is greater | interior router can police whether the rate of each flow is greater | |||
than each reservation. So the system doesn't truly emulate rate- | than each reservation. So the system doesn't truly emulate rate- | |||
policing of each flow. But there is no incentive to pack a higher | policing of each flow. But there is no incentive to pack a higher | |||
rate into a reservation, because the charges are directly | rate into a reservation, because the charges are directly | |||
proportional to rate, irrespective of the reservations. | proportional to rate, irrespective of the reservations. | |||
However, if virtual queues start to fill on any path, even though | However, if virtual queues start to fill on any path, even though | |||
real queues will still be able to provide low latency service, pre- | real queues will still be able to provide low latency service, pre- | |||
congestion marking will rise fairly quickly. It may eventually reach | congestion marking will rise fairly quickly. It may eventually reach | |||
the threshold where the ingress gateway would deny admission to new | the threshold where the ingress gateway would deny admission to new | |||
skipping to change at page 32, line 49 | skipping to change at page 36, line 22 | |||
control should have been invoked. The ingress gateway will have to | control should have been invoked. The ingress gateway will have to | |||
pay the penalty for such an extremely high pre-congestion level, so | pay the penalty for such an extremely high pre-congestion level, so | |||
the pressure to invoke admission control should become unbearable. | the pressure to invoke admission control should become unbearable. | |||
The above mechanisms protect against rational operators. In | The above mechanisms protect against rational operators. In | |||
Section 5.6.3 we discuss how networks can protect themselves from | Section 5.6.3 we discuss how networks can protect themselves from | |||
accidental or deliberate misconfiguration in neighbouring networks. | accidental or deliberate misconfiguration in neighbouring networks. | |||
5.5. Sanctioning Dishonest Marking | 5.5. Sanctioning Dishonest Marking | |||
As CL traffic leaves the last network before the egress gateway | As PCN traffic leaves the last network before the egress gateway | |||
(domain C) the RE blanking fraction should match the congestion | (domain 'C' in Figure 4) the RE blanking fraction should match the | |||
marking fraction, when averaged over a sufficiently long duration | congestion marking fraction, when averaged over a sufficiently long | |||
(perhaps ~10s to allow a few rounds of feedback through regular | duration (perhaps ~10s to allow a few rounds of feedback through | |||
signalling of new and refreshed reservations). | regular signalling of new and refreshed reservations). | |||
To protect itself, domain C should install a monitor at its egress. | To protect itself, domain 'C' should install a monitor at its egress. | |||
It aims to detect flows of CL packets that are persistently negative. | It aims to detect flows of PCN packets that are persistently | |||
If flows are positive, domain C need take no action--this simply | negative. If flows are positive, domain 'C' need take no action-- | |||
means an upstream network must be paying more penalties than it needs | this simply means an upstream network must be paying more penalties | |||
to. Appendix A.3 gives a suggested algorithm for the monitor, | than it needs to. Appendix A.3 gives a suggested algorithm for the | |||
meeting the criteria below. | monitor, meeting the criteria below. | |||
o It SHOULD introduce minimal false positives for honest flows; | o It SHOULD introduce minimal false positives for honest flows; | |||
o It SHOULD quickly detect and sanction dishonest flows (minimal | o It SHOULD quickly detect and sanction dishonest flows (minimal | |||
false negatives); | false negatives); | |||
o It MUST be invulnerable to state exhaustion attacks from malicious | o It MUST be invulnerable to state exhaustion attacks from malicious | |||
sources. For instance, if the dropper uses flow-state, it should | sources. For instance, if the dropper uses flow-state, it should | |||
not be possible for a source to send numerous packets, each with a | not be possible for a source to send numerous packets, each with a | |||
different flow ID, to force the dropper to exhaust its memory | different flow ID, to force the dropper to exhaust its memory | |||
capacity; | capacity; | |||
o It MUST introduce sufficient loss in goodput so that malicious | o If drop is used as a sanction, it SHOULD introduce sufficient loss | |||
sources cannot play off losses in the egress dropper against | in goodput so that malicious sources cannot play off losses in the | |||
higher allowed throughput. Salvatori [CLoop_pol] describes this | egress dropper against higher allowed throughput. | |||
attack, which involves the source understating path congestion | Salvatori [CLoop_pol] describes this attack, which involves the | |||
then inserting forward error correction (FEC) packets to | source understating path congestion then inserting forward error | |||
compensate expected losses. | correction (FEC) packets to compensate expected losses. | |||
Note that the monitor operates on flows but with careful design we | Note that the monitor operates on flows but with careful design we | |||
can avoid per-flow state. This is why we have been careful to ensure | can avoid per-flow state. This is why we have been careful to ensure | |||
that all flows MUST start with a packet marked with the FNE | that all flows MUST start with a packet marked with the FNE | |||
codepoint. If a flow does not start with the FNE codepoint, a | codepoint. If a flow does not start with the FNE codepoint, a | |||
monitor is likely to treat it unfavourably. This risk makes it worth | monitor is likely to treat it unfavourably. This risk makes it worth | |||
setting the FNE codepoint at the start of a flow, even though there | setting the FNE codepoint at the start of a flow, even though there | |||
is a cost to setting FNE (positive `worth'). | is a cost to setting FNE (positive `worth'). | |||
Starting flows with an FNE packet also means that a monitor will be | Starting flows with an FNE packet also means that a monitor will be | |||
skipping to change at page 34, line 9 | skipping to change at page 37, line 31 | |||
across flows, a monitor MUST ignore packets with the FNE codepoint | across flows, a monitor MUST ignore packets with the FNE codepoint | |||
set. An ingress gateway sets the FNE codepoint when it does not have | set. An ingress gateway sets the FNE codepoint when it does not have | |||
the benefit of feedback from the egress. So counting packets with | the benefit of feedback from the egress. So counting packets with | |||
FNE cleared would be likely to make the average unnecessarily | FNE cleared would be likely to make the average unnecessarily | |||
positive, providing headroom (or should we say footroom?) for | positive, providing headroom (or should we say footroom?) for | |||
dishonest (negative) traffic. | dishonest (negative) traffic. | |||
If the monitor detects a persistently negative flow, it could drop | If the monitor detects a persistently negative flow, it could drop | |||
sufficient negative and neutral packets to force the flow to not be | sufficient negative and neutral packets to force the flow to not be | |||
negative. This is the approach taken for the `egress dropper' in | negative. This is the approach taken for the `egress dropper' in | |||
[Re-TCP], but for the scenario in this memo, where everyone would | [I-D.briscoe-tsvwg-re-ecn-tcp], but for the scenario in this memo, | |||
expect everyone else to keep to the protocol, a management alarm | where everyone would expect everyone else to keep to the protocol, a | |||
SHOULD be raised on detecting persistently negative traffic and any | management alarm SHOULD be raised on detecting persistently negative | |||
automatic sanctions taken SHOULD be logged. Even if the chosen | traffic and any automatic sanctions taken SHOULD be logged. Even if | |||
policy is to take no automatic action, the cause can then be | the chosen policy is to take no automatic action, the cause can then | |||
investigated manually. | be investigated manually. | |||
Then all ingresses cannot understate downstream pre-congestion | Then all ingresses cannot understate downstream pre-congestion | |||
without their action being logged. So network operators can deal | without their action being logged. So network operators can deal | |||
with offending networks at the human level, out of band. As a last | with offending networks at the human level, out of band. As a last | |||
resort, perhaps where the ingress gateway address seems to have been | resort, perhaps where the ingress gateway address seems to have been | |||
spoofed in the signalling, packets can be dropped. Drops could be | spoofed in the signalling, packets can be dropped. Drops could be | |||
focused on just sufficient packets in misbehaving flows to remove the | focused on just sufficient packets in misbehaving flows to remove the | |||
negative bias while doing minimal harm. | negative bias while doing minimal harm. | |||
A future version of this memo may define a control message that could | A future version of this memo may define a control message that could | |||
skipping to change at page 34, line 43 | skipping to change at page 38, line 17 | |||
traffic caused sufficient congestion to lead to drop but they | traffic caused sufficient congestion to lead to drop but they | |||
understated path congestion to avoid penalties for causing high | understated path congestion to avoid penalties for causing high | |||
congestion, the preferential drop recommendations in Section 4.3.4 | congestion, the preferential drop recommendations in Section 4.3.4 | |||
would at least ensure that these flows would always be dropped before | would at least ensure that these flows would always be dropped before | |||
honest flows.. | honest flows.. | |||
5.6. Border Mechanisms | 5.6. Border Mechanisms | |||
5.6.1. Border Accounting Mechanisms | 5.6.1. Border Accounting Mechanisms | |||
One of the main design goals of re-ECN was for border security | One of the main design goals of re-PCN was for border security | |||
mechanisms to be as simple as possible, otherwise they would become | mechanisms to be as simple as possible, otherwise they would become | |||
the pinch-points that limit scalability of the whole internetwork. | the pinch-points that limit scalability of the whole internetwork. | |||
As the title of this memo suggests, we want to avoid per-flow | As the title of this memo suggests, we want to avoid per-flow | |||
processing at borders. We also want to keep to passive mechanisms | processing at borders. We also want to keep to passive mechanisms | |||
that can monitor traffic in parallel to forwarding, rather than | that can monitor traffic in parallel to forwarding, rather than | |||
having to filter traffic inline--in series with forwarding. As data | having to filter traffic inline--in series with forwarding. As data | |||
rates continue to rise, we suspect that all-optical interconnection | rates continue to rise, we suspect that all-optical interconnection | |||
between networks will soon be a requirement. So we want to avoid any | between networks will soon be a requirement. So we want to avoid any | |||
new need for buffering (even though border filtering is current | new need for buffering (even though border filtering is current | |||
practice for other reasons, we don't want to make it even less likely | practice for other reasons, we don't want to make it even less likely | |||
that we will ever get rid of it). | that we will ever get rid of it). | |||
So far, we have been able to keep the border mechanisms simple, | So far, we have been able to keep the border mechanisms simple, | |||
despite having had to harden them against some subtle attacks on the | despite having had to harden them against some subtle attacks on the | |||
re-ECN design. The mechanisms are still passive and avoid per-flow | re-PCN design. The mechanisms are still passive and avoid per-flow | |||
processing, although we do use filtering as a fail-safe to | processing, although we do use filtering as a fail-safe to | |||
temporarily shield against extreme events in other networks, such as | temporarily shield against extreme events in other networks, such as | |||
accidental misconfigurations (Section 5.6.3). | accidental misconfigurations (Section 5.6.3). | |||
The basic accounting mechanism at each border interface simply | The basic accounting mechanism at each border interface simply | |||
involves accumulating the volume of packets with positive worth (Re- | involves accumulating the volume of packets with positive worth (Re- | |||
Echo and FNE), and subtracting the volume of those with negative | PCT-Echo and FNE), and subtracting the volume of those with negative | |||
worth: AM(-1) and PM(-1). Even though this mechanism takes no regard | worth: AM(-1) and TM(-1). Even though this mechanism takes no regard | |||
of flows, over an accounting period (say a month) this subtraction | of flows, over an accounting period (say a month) this subtraction | |||
will account for the downstream congestion caused by all the flows | will account for the downstream congestion caused by all the flows | |||
traversing the interface, wherever they come from, and wherever they | traversing the interface, wherever they come from, and wherever they | |||
go to. The two networks can agree to use this metric however they | go to. The two networks can agree to use this metric however they | |||
wish to determine some congestion-related penalty against the | wish to determine some congestion-related penalty against the | |||
upstream network (see Section 5.3 for examples). Although the | upstream network (see Section 5.3 for examples). Although the | |||
algorithm could hardly be simpler, it is spelled out using pseudo- | algorithm could hardly be simpler, it is spelled out using pseudo- | |||
code in Appendix A.2.1. | code in Appendix A.2.1. | |||
Various attempts to subvert the re-ECN design have been made. In all | Various attempts to subvert the re-ECN design have been made. In all | |||
skipping to change at page 36, line 22 | skipping to change at page 39, line 42 | |||
o A network can simply create its own dummy traffic to congest | o A network can simply create its own dummy traffic to congest | |||
another network, perhaps causing it to lose business at no cost to | another network, perhaps causing it to lose business at no cost to | |||
the attacking network. This is a form of denial of service | the attacking network. This is a form of denial of service | |||
perpetrated by one network on another. The preferential drop | perpetrated by one network on another. The preferential drop | |||
measures in Section 4.3.4 provide crude protection against such | measures in Section 4.3.4 provide crude protection against such | |||
attacks, but we are not overly worried about more accurate | attacks, but we are not overly worried about more accurate | |||
prevention measures, because it is already possible for networks | prevention measures, because it is already possible for networks | |||
to DoS other networks on the general Internet, but they generally | to DoS other networks on the general Internet, but they generally | |||
don't because of the grave consequences of being found out. We | don't because of the grave consequences of being found out. We | |||
are only concerned if re-ECN increases the motivation for such an | are only concerned if re-PCN increases the motivation for such an | |||
attack, as in the next example. | attack, as in the next example. | |||
o A network can just generate negative traffic and send it over its | o A network can just generate negative traffic and send it over its | |||
border with a neighbour to reduce the overall penalties that it | border with a neighbour to reduce the overall penalties that it | |||
should pay to that neighbour. It could even initialise the TTL so | should pay to that neighbour. It could even initialise the TTL so | |||
it expired shortly after entering the neighbouring network, | it expired shortly after entering the neighbouring network, | |||
reducing the chance of detection further downstream. This attack | reducing the chance of detection further downstream. This attack | |||
need not be motivated by a desire to deny service and indeed need | need not be motivated by a desire to deny service and indeed need | |||
not cause denial of service. A network's main motivator would | not cause denial of service. A network's main motivator would | |||
most likely be to reduce the penalties it pays to a neighbour. | most likely be to reduce the penalties it pays to a neighbour. | |||
But, the prospect of financial gain might tempt the network into | But, the prospect of financial gain might tempt the network into | |||
mounting a DoS attack on the other network as well, given the gain | mounting a DoS attack on the other network as well, given the gain | |||
would offset some of the risk of being detected. | would offset some of the risk of being detected. | |||
Note that we have not included DoS by Internet hosts in the above | Note that we have not included DoS by Internet hosts in the above | |||
list of attacks, because we have restricted ourselves to a scenario | list of attacks, because we have restricted ourselves to a scenario | |||
with edge-to-edge admission control across a Diffserv region. In | with edge-to-edge admission control across a PCN-region. In this | |||
this case, the edge ingress gateways insulate the Diffserv region | case, the edge ingress gateways insulate the PCN-region from DoS by | |||
from DoS by Internet hosts. Re-ECN resists more general DoS attacks, | Internet hosts. Re-ECN resists more general DoS attacks, but this is | |||
but this is discussed in [Re-TCP]. | discussed in [I-D.briscoe-tsvwg-re-ecn-tcp]. | |||
The first step towards a solution to all these problems with negative | The first step towards a solution to all these problems with negative | |||
flows is to be able to estimate the contribution they make to | flows is to be able to estimate the contribution they make to | |||
downstream congestion at a border and to correct the measure | downstream congestion at a border and to correct the measure | |||
accordingly. Although ideally we want to remove negative flows | accordingly. Although ideally we want to remove negative flows | |||
themselves, perhaps surprisingly, the most effective first step is to | themselves, perhaps surprisingly, the most effective first step is to | |||
cancel out the polluting effect negative flows have on the measure of | cancel out the polluting effect negative flows have on the measure of | |||
downstream congestion at a border. It is more important to get an | downstream congestion at a border. It is more important to get an | |||
unbiased estimate of their effect, than to try to remove them all. A | unbiased estimate of their effect, than to try to remove them all. A | |||
suggested algorithm to give an unbiased estimate of the contribution | suggested algorithm to give an unbiased estimate of the contribution | |||
from negative flows to the downstream congestion measure is given in | from negative flows to the downstream congestion measure is given in | |||
Appendix A.2.2. | Appendix A.2.2. | |||
Although making an accurate assessment of the contribution from | Although making an accurate assessment of the contribution from | |||
negative flows may not be easy, just the single step of neutralising | negative flows may not be easy, just the single step of neutralising | |||
their polluting effect on congestion metrics removes all the gains | their polluting effect on congestion metrics removes all the gains | |||
networks could otherwise make from mounting dummy traffic attacks on | networks could otherwise make from mounting dummy traffic attacks on | |||
each other. This puts all networks on the same side (only with | each other. This puts all networks on the same side (only with | |||
respect to negative flows of course), rather than being pitched | respect to negative flows of course), rather than being pitched | |||
against each other. The network where this flow goes negative as | against each other. The network where a flow goes negative as well | |||
well as all the networks downstream lose out from not being | as all the networks downstream lose out from not being reimbursed for | |||
reimbursed for any congestion this flow causes. So they all have an | any congestion this flow causes. So they all have an interest in | |||
interest in getting rid of these negative flows. Networks forwarding | getting rid of these negative flows. Networks forwarding a flow | |||
a flow before it goes negative aren't strictly on the same side, but | before it goes negative aren't strictly on the same side, but they | |||
they are disinterested bystanders--they don't care that the flow goes | are disinterested bystanders--they don't care that the flow goes | |||
negative downstream, but at least they can't actively gain from | negative downstream, but at least they can't actively gain from | |||
making it go negative. The problem becomes localised so that once a | making it go negative. The problem becomes localised so that once a | |||
flow goes negative, all the networks from where it happens and beyond | flow goes negative, all the networks from where it happens and beyond | |||
downstream each have a small problem, each can detect it has a | downstream each have a small problem, each can detect it has a | |||
problem and each can get rid of the problem if it chooses to. But | problem and each can get rid of the problem if it chooses to. But | |||
negative flows can no longer be used for any new attacks. | negative flows can no longer be used for any new attacks. | |||
Once an unbiased estimate of the effect of negative flows can be | Once an unbiased estimate of the effect of negative flows can be | |||
made, the problem reduces to detecting and preferably removing flows | made, the problem reduces to detecting and preferably removing flows | |||
that have gone negative as soon as possible. But importantly, | that have gone negative as soon as possible. But importantly, | |||
skipping to change at page 37, line 48 | skipping to change at page 41, line 21 | |||
For instance, if possible, flows should be removed as soon as they go | For instance, if possible, flows should be removed as soon as they go | |||
negative, but we do NOT RECOMMEND any attempts to discard such flows | negative, but we do NOT RECOMMEND any attempts to discard such flows | |||
further upstream while they are still positive. Such over-zealous | further upstream while they are still positive. Such over-zealous | |||
push-back is unnecessary and potentially dangerous. These flows have | push-back is unnecessary and potentially dangerous. These flows have | |||
paid their `fare' up to the point they go negative, so there is no | paid their `fare' up to the point they go negative, so there is no | |||
harm in delivering them that far. If someone downstream asks for a | harm in delivering them that far. If someone downstream asks for a | |||
flow to be dropped as near to the source as possible, because they | flow to be dropped as near to the source as possible, because they | |||
say it is going to become negative later, an upstream node cannot | say it is going to become negative later, an upstream node cannot | |||
test the truth of this assertion. Rather than have to authenticate | test the truth of this assertion. Rather than have to authenticate | |||
such messages, re-ECN has been designed so that flows can be dropped | such messages, re-PCN has been designed so that flows can be dropped | |||
solely based on locally measurable evidence. A message hinting that | solely based on locally measurable evidence. A message hinting that | |||
a flow should be watched closely to test for negativity is fine. But | a flow should be watched closely to test for negativity is fine. But | |||
not a message that claims that a positive flow will go negative | not a message that claims that a positive flow will go negative | |||
later, so it should be dropped. . | later, so it should be dropped. | |||
5.6.2. Competitive Routing | 5.6.2. Competitive Routing | |||
With the above penalty system, each domain seems to have a perverse | With the above penalty system, each domain seems to have a perverse | |||
incentive to fake pre-congestion. For instance domain B profits from | incentive to fake pre-congestion. For instance domain 'B' profits | |||
the difference between penalties it receives at its ingress (its | from the difference between penalties it receives at its ingress (its | |||
revenue) and those it pays at its egress (its cost). So if B | revenue) and those it pays at its egress (its cost). So if 'B' | |||
overstates internal pre-congestion it seems to increase its profit. | overstates internal pre-congestion it seems to increase its profit. | |||
However, we can assume that domain A could bypass B, routing through | However, we can assume that domain 'A' could bypass 'B', routing | |||
other domains to reach the egress. So the competitive discipline of | through other domains to reach the egress. So the competitive | |||
least-cost routing can ensure that any domain tempted to fake pre- | discipline of least-cost routing can ensure that any domain tempted | |||
congestion for profit risks losing _all_ its incoming traffic. The | to fake pre-congestion for profit risks losing _all_ its incoming | |||
least congested route would eventually be able to win this | traffic. The least congested route would eventually be able to win | |||
competitive game, only as long as it didn't declare more fake pre- | this competitive game, only as long as it didn't declare more fake | |||
congestion than the next most competitive route. | pre-congestion than the next most competitive route. | |||
The competitive effect of interdomain routing might be weaker nearer | The competitive effect of interdomain routing might be weaker nearer | |||
to the egress. For instance, C may be the only route B can take to | to the egress. For instance, 'C' may be the only route 'B' can take | |||
reach the ultimate receiver. And if C over-penalises B, the egress | to reach the ultimate receiver. And if 'C' over-penalises 'B', the | |||
gateway and the ultimate receiver seem to have no incentive to move | egress gateway and the ultimate receiver seem to have no incentive to | |||
their terminating attachment to another network, because only B and | move their terminating attachment to another network, because only | |||
those upstream of B suffer the higher penalties. However, we must | 'B' and those upstream of 'B' suffer the higher penalties. However, | |||
remember that we are only looking at the money flows at the | we must remember that we are only looking at the money flows at the | |||
unidirectional network layer. There are likely to be all sorts of | unidirectional network layer. There are likely to be all sorts of | |||
higher level business models constructed over the top of these low | higher level business models constructed over the top of these low | |||
level 'sender-pays' penalties. For instance, we might expect a | level 'sender-pays' penalties. For instance, we might expect a | |||
session layer charging model where the session originator pays for a | session layer charging model where the session originator pays for a | |||
pair of duplex flows, one as receiver and one as sender. | pair of duplex flows, one as receiver and one as sender. | |||
Traditionally this has been a common model for telephony and we might | Traditionally this has been a common model for telephony and we might | |||
expect it to be used, at least sometimes, for other media such as | expect it to be used, at least sometimes, for other media such as | |||
video. Wherever such a model is used, the data receiver will be | video. Wherever such a model is used, the data receiver will be | |||
directly affected if its sessions terminate through a network like C | directly affected if its sessions terminate through a network like | |||
that fakes congestion to over-penalise B. So end-customers will | 'C' that fakes congestion to over-penalise 'B'. So end-customers | |||
experience a direct competitive pressure to switch to cheaper | will experience a direct competitive pressure to switch to cheaper | |||
networks, away from networks like C that try to over-penalise B. | networks, away from networks like 'C' that try to over-penalise 'B'. | |||
This memo does not need to standardise any particular mechanism for | This memo does not need to standardise any particular mechanism for | |||
routing based on re-ECN. Goldenberg et al [Smart_rtg] refers to | routing based on re-PCN. Goldenberg et al [Smart_rtg] refers to | |||
various commercial products and presents its own algorithms for | various commercial products and presents its own algorithms for | |||
moving traffic between multi-homed routes based on usage charges. | moving traffic between multi-homed routes based on usage charges. | |||
None of these systems require any changes to standards protocols | None of these systems require any changes to standards protocols | |||
because the choice between the available border gateway protocol | because the choice between the available border gateway protocol | |||
(BGP) routes is based on a combination of local knowledge of the | (BGP) routes is based on a combination of local knowledge of the | |||
charging regime and local measurement of traffic levels. If, as we | charging regime and local measurement of traffic levels. If, as we | |||
propose, charges or penalties were based on the level of re-ECN | propose, charges or penalties were based on the level of re-PCN | |||
measured in passing traffic, a similar optimisation could be achieved | measured locally in passing traffic, a similar optimisation could be | |||
without requiring any changes to standard routing protocols. | achieved without requiring any changes to standard routing protocols. | |||
We must be clear that applying pre-congestion-based routing to this | We must be clear that applying pre-congestion-based routing to this | |||
admission control system remains an open research issue. Traffic | admission control system remains an open research issue. Traffic | |||
engineering based on congestion requires careful damping to avoid | engineering based on congestion requires careful damping to avoid | |||
oscillations, and should not be attempted without adult supervision | oscillations, and should not be attempted without adult supervision | |||
:) Mortier & Pratt [ECN-BGP] have analysed traffic engineering based | :) Mortier & Pratt [ECN-BGP] have analysed traffic engineering based | |||
on congestion. But without the benefit of re-ECN, they had to add a | on congestion. But without the benefit of re-ECN or re-PCN, they had | |||
path attribute to BGP to advertise a route's downstream congestion | to add a path attribute to BGP to advertise a route's downstream | |||
(actually they proposed that BGP should advertise the charge for | congestion (actually they proposed that BGP should advertise the | |||
congestion, which we believe wrongly embeds an assumption into BGP | charge for congestion, which we believe wrongly embeds an assumption | |||
that the only thing to do with congestion is charge for it). | into BGP that the only thing to do with congestion is charge for it). | |||
5.6.3. Fail-safes | 5.6.3. Fail-safes | |||
The mechanisms described so far create incentives for rational | The mechanisms described so far create incentives for rational | |||
operators to behave. That is, one operator aims to make another | operators to behave. That is, one operator aims to make another | |||
behave responsibly by applying penalties and expects a rational | behave responsibly by applying penalties and expects a rational | |||
response (i.e. one that trades off costs against benefits). It is | response (i.e. one that trades off costs against benefits). It is | |||
usually reasonable to assume that other network operators will behave | usually reasonable to assume that other network operators will behave | |||
rationally (policy routing can avoid those that might not). But this | rationally (policy routing can avoid those that might not). But this | |||
approach does not protect against the misconfigurations and accidents | approach does not protect against the misconfigurations and accidents | |||
skipping to change at page 40, line 16 | skipping to change at page 43, line 40 | |||
6. Analysis | 6. Analysis | |||
The domains in Figure 1 are not expected to be completely malicious | The domains in Figure 1 are not expected to be completely malicious | |||
towards each other. After all, we can assume that they are all co- | towards each other. After all, we can assume that they are all co- | |||
operating to provide an internetworking service to the benefit of | operating to provide an internetworking service to the benefit of | |||
each of them and their customers. Otherwise their routing polices | each of them and their customers. Otherwise their routing polices | |||
would not interconnect them in the first place. However, we assume | would not interconnect them in the first place. However, we assume | |||
that they are also competitors of each other. So a network may try | that they are also competitors of each other. So a network may try | |||
to contravene our proposed protocol if it would gain or make a | to contravene our proposed protocol if it would gain or make a | |||
competitor lose, or both, but only if it can do so without being | competitor lose, or both. But only if it can do so without being | |||
caught. Therefore we do not have to consider every possible random | caught. Therefore we do not have to consider every possible random | |||
attack one network could launch on the traffic of another, given | attack one network could launch on the traffic of another, given | |||
anyway one network can always drop or corrupt packets that it | anyway one network can always drop or corrupt packets that it | |||
forwards on behalf of another. | forwards on behalf of another. | |||
Therefore, we only consider new opportunities for _gainful_ attack | Therefore, we only consider new opportunities for _gainful_ attack | |||
that our proposal introduces. But to a certain extent we can also | that our proposal introduces. But to a certain extent we can also | |||
rely on the in depth defences we have described (Section 5.6.3 ) | rely on the in depth defences we have described (Section 5.6.3 ) | |||
intended to mitigate the potential impact if one network accidentally | intended to mitigate the potential impact if one network accidentally | |||
misconfiguring the workings of this protocol. | misconfiguring the workings of this protocol. | |||
skipping to change at page 40, line 39 | skipping to change at page 44, line 16 | |||
arrangement possible in Figure 1, without any surrounding network. | arrangement possible in Figure 1, without any surrounding network. | |||
This allows us to consider more specific cases where these gateways | This allows us to consider more specific cases where these gateways | |||
and a neighbouring network are operated by the same player. As well | and a neighbouring network are operated by the same player. As well | |||
as cases where the same player operates neighbouring networks, we | as cases where the same player operates neighbouring networks, we | |||
will also consider cases where the two gateways collude as one player | will also consider cases where the two gateways collude as one player | |||
and where the sender and receiver collude as one. Collusion of other | and where the sender and receiver collude as one. Collusion of other | |||
sets of domains is less likely, but we will consider such cases. In | sets of domains is less likely, but we will consider such cases. In | |||
the general case, we will assume none of the nine trust domains | the general case, we will assume none of the nine trust domains | |||
across the figure fully trust any of the others. | across the figure fully trust any of the others. | |||
As we only propose to change routers within the Diffserv region, we | As we only propose to change routers within the PCN-region, we assume | |||
assume the operators of networks outside the region will be doing | the operators of networks outside the region will be doing per-flow | |||
per-flow policing. That is, we assume the networks outside the | policing. That is, we assume the networks outside the PCN-region and | |||
Diffserv region and the gateways around its edges can protect | the gateways around its edges can protect themselves. So given we | |||
themselves. So given we are proposing to remove flow policing from | are proposing to remove flow policing from some networks, our primary | |||
some networks, our primary concern must be to protect networks that | concern must be to protect networks that don't do per-flow policing | |||
don't do per-flow policing (the potential `victims') from those that | (the potential `victims') from those that do (the `enemy'). The | |||
do (the `enemy'). The ingress and egress gateways are the only way | ingress and egress gateways are the only way the outer enemy can get | |||
the outer enemy can get at the middle victim, so we can consider the | at the middle victim, so we can consider the gateways as the | |||
gateways as the representatives of the enemy as far as domains A, B | representatives of the enemy as far as domains 'A', 'B' and 'C' are | |||
and C are concerned. We will call this trust scenario `edges against | concerned. We will call this trust scenario `edges against middles'. | |||
middles'. | ||||
Earlier in this memo, we outlined the classic border rate policing | Earlier in this memo, we outlined the classic border rate policing | |||
problem (Section 3). It will now be useful to reiterate the | problem (Section 3). It will now be useful to reiterate the | |||
motivations that are the root cause of the problem. The more | motivations that are the root cause of the problem. The more | |||
reservations a gateway can allow, the more revenue it receives. The | reservations a gateway can allow, the more revenue it receives. The | |||
middle networks want the edges to comply with the admission control | middle networks want the edges to comply with the admission control | |||
protocol when they become so congested that their service to others | protocol when they become so congested that their service to others | |||
might suffer. The middle networks also want to ensure the edges | might suffer. The middle networks also want to ensure the edges | |||
cannot steal more service from them than they are entitled to. | cannot steal more service from them than they are entitled to. | |||
In the context of this `edges against middles' scenario, the re-ECN | In the context of this `edges against middles' scenario, the re-PCN | |||
protocol has two main effects: | protocol has two main effects: | |||
o The more pre-congestion there is on a path across the Diffserv | o The more pre-congestion there is on a path across the PCN-region, | |||
region, the higher the ingress gateway must declare downstream | the higher the ingress gateway must declare downstream pre- | |||
pre-congestion. | congestion. | |||
o If the ingress gateway does not declare downstream pre-congestion | o If the ingress gateway does not declare downstream pre-congestion | |||
high enough on average, it will `hit the ground before the | high enough on average, it will `hit the ground before the | |||
runway', going negative and triggering sanctions, either directly | runway', going negative and triggering sanctions, either directly | |||
against the traffic or against the ingress gateway at a management | against the traffic or against the ingress gateway at a management | |||
level | level | |||
An executive summary of our security analysis can be stated in three | An executive summary of our security analysis can be stated in three | |||
parts, distinguished by the type of collusion considered. | parts, distinguished by the type of collusion considered. | |||
Neighbour-only Middle-Middle Collusion: Here there is no collusion | Neighbour-only Middle-Middle Collusion: Here there is no collusion | |||
or collusion is limited to neighbours in the feedback loop. In | or collusion is limited to neighbours in the feedback loop. In | |||
other words, two neighbouring networks can be assumed to act as | other words, two neighbouring networks can be assumed to act as | |||
one. Or the egress gateway might collude with domain C. Or the | one. Or the egress gateway might collude with domain 'C'. Or the | |||
ingress gateway might collude with domain A. Or ingress and egress | ingress gateway might collude with domain 'A'. Or ingress and | |||
gateways might collude with each other. | egress gateways might collude with each other. | |||
In these cases where only neighbours in the feedback loop collude, | In these cases where only neighbours in the feedback loop collude, | |||
we concludes that all parties have a positive incentive to declare | we concludes that all parties have a positive incentive to declare | |||
downstream pre-congestion truthfully, and the ingress gateway has | downstream pre-congestion truthfully, and the ingress gateway has | |||
a positive incentive to invoke admission control when congestion | a positive incentive to invoke admission control when congestion | |||
rises above the admission threshold in any network in the region | rises above the admission threshold in any network in the region | |||
(including its own). No party has an incentive to send more | (including its own). No party has an incentive to send more | |||
traffic than declared in reservation signalling (even though only | traffic than declared in reservation signalling (even though only | |||
the gateways read this signalling). In short, no party can gain | the gateways read this signalling). In short, no party can gain | |||
at the expense of another. | at the expense of another. | |||
Non-neighbour Middle-Middle Collusion: In the case of other forms of | Non-neighbour Middle-Middle Collusion: In the case of other forms of | |||
collusion between middle networks (e.g. between domain A and C) it | collusion between middle networks (e.g. between domain 'A' and | |||
would be possible for say A & C to create a tunnel between | 'C') it would be possible for say 'A' & 'C' to create a tunnel | |||
themselves so that A would gain at the expense of B. But C would | between themselves so that 'A' would gain at the expense of 'B'. | |||
then lose the gain that A had made. Therefore the value to A & C | But 'C' would then lose the gain that 'A' had made. Therefore the | |||
of colluding to mount this attack seems questionable. It is made | value to 'A' & 'C' of colluding to mount this attack seems | |||
more questionable, because the attack can be statistically | questionable. It is made more questionable, because the attack | |||
detected by B using the second `defence in depth' mechanism | can be statistically detected by 'B' using the second `defence in | |||
mentioned already. Note that C can defend itself from being | depth' mechanism mentioned already. Note that 'C' can defend | |||
attacked through a tunnel by treating the tunnel end point as a | itself from being attacked through a tunnel by treating the tunnel | |||
direct link to a neighbouring network (e.g. as if A were a | end point as a direct link to a neighbouring network (e.g. as if | |||
neighbour of C, via the tunnel), which falls back to the safety of | 'A' were a neighbour of 'C', via the tunnel), which falls back to | |||
the neighbour-only scenario. | the safety of the neighbour-only scenario. | |||
Middle-Edge Collusion: Collusion between networks or gateways within | Middle-Edge Collusion: Collusion between networks or gateways within | |||
the Diffserv region and networks or users outside the region has | the PCN-region and networks or users outside the region has not | |||
not yet been fully analysed. The presence of full per-flow | yet been fully analysed. The presence of full per-flow policing | |||
policing at the ingress gateway seems to make this a less likely | at the ingress gateway seems to make this a less likely source of | |||
source of a successful attack. | a successful attack. | |||
{ToDo: Due to lack of time, the full write up of the security | {ToDo: Due to lack of time, the full write up of the security | |||
analysis is deferred to the next version of this memo.} | analysis is deferred to the next version of this memo.} | |||
Finally, it is well known that the best person to analyse the | Finally, it is well known that the best person to analyse the | |||
security of a system is not the designer. Therefore, our confident | security of a system is not the designer. Therefore, our confident | |||
claims must be hedged with doubt until others with perhaps a greater | claims must be hedged with doubt until others with perhaps a greater | |||
incentive to break it have mounted a full analysis. | incentive to break it have mounted a full analysis. | |||
7. Incremental Deployment | 7. Incremental Deployment | |||
We believe ECN has so far not been widely deployed because it | We believe ECN has so far not been widely deployed because it | |||
requires widespread end system and network deployment just to achieve | requires end system and widespread network deployment just to achieve | |||
a marginal improvement in performance. The ability to offer a new | a marginal improvement in performance. The ability to offer a new | |||
service (admission control) would be a much stronger driver for ECN | service (admission control) would be a much stronger driver for ECN | |||
deployment. | deployment. | |||
As stated in the introduction, the aim of this memo is to "Design in | As stated in the introduction, the aim of this memo is to "Design in | |||
security from the start" when admission control is based on pre- | security from the start" when admission control is based on pre- | |||
congestion notification. The proposal has been designed so that | congestion notification. The proposal has been designed so that | |||
security can be added some time after first deployment, but only if | security can be added some time after first deployment, but only if | |||
the PCN wire protocol encoding is defined with the foresight to | the PCN wire protocol encoding is defined with the foresight to | |||
accommodate the extended set of codepoints defined in this document. | accommodate the extended set of codepoints defined in this document. | |||
Given admission control based on pre-congestion notification requires | Given admission control based on pre-congestion notification requires | |||
few changes to standards, it should be deployable fairly soon. | few changes to standards, it should be deployable fairly soon. | |||
However, re-ECN requires a change to IP, which may take a little | However, re-PCN requires a change to IP, which may take a little | |||
longer. | longer :) | |||
We expect that initial deployments of PCN-based admission control | We expect that initial deployments of PCN-based admission control | |||
will be confined to single networks, or to clubs of networks that | will be confined to single networks, or to clubs of networks that | |||
trust each other. The proposal in this memo will only become | trust each other. The proposal in this memo will only become | |||
relevant once networks with conflicting interests wish to | relevant once networks with conflicting interests wish to | |||
interconnect their admission controlled services, but without the | interconnect their admission controlled services, but without the | |||
scalability constraints of per-flow border policing. It will not be | scalability constraints of per-flow border policing. It will not be | |||
possible to use re-ECN, even in a controlled environment between | possible to use re-PCN, even in a controlled environment between | |||
consenting operators, unless it is standardised into IP. Given the | consenting operators, unless it is standardised into IP. Given the | |||
IPv4 header has limited space for further changes, current IESG | IPv4 header has limited space for further changes, current IESG | |||
policy [RFC4727] is not to allow experimental use of codepoints in | policy [RFC4727] is not to allow experimental use of codepoints in | |||
the IPv4 header, as whenever an experiment isn't taken up, the space | the IPv4 header, as whenever an experiment isn't taken up, the space | |||
it used tends to be impossible to reclaim. | it used tends to be impossible to reclaim. Therefore, for IPv4 at | |||
least, we will need to find a way to run an experiment so that the | ||||
header fields it uses can be reclaimed if the experiment is not a | ||||
success. | ||||
If PCN-based admission control is deployed before re-ECN is | If PCN-based admission control is deployed before re-PCN is | |||
standardised into IP, wherever a networks (or club of networks) | standardised into IP, wherever a network (or club of networks) | |||
connects to another network (or club of networks) with conflicting | connects to another network (or club of networks) with conflicting | |||
interests, they will place a gateway between the two regions that | interests, they will place a gateway between the two regions that | |||
does per-flow rate policing and admission control. If re-ECN is | does per-flow rate policing and admission control. If re-PCN is | |||
eventually standardised into IP, it will be possible for these | eventually standardised into IP, it will be possible for these | |||
separate regions to upgrade all their gateways to use re-ECN before | separate regions to upgrade all their ingress gateways to support re- | |||
removing the per-flow policing gateways between them. Given the | PCN before removing the per-flow policing gateways between them. | |||
edge-to-edge deployment model of PCN-based admission control, it is | Given the edge-to-edge deployment model of PCN-based admission | |||
reasonable to imagine this incremental deployment model without | control, it is reasonable to expect incremental deployment of re-PCN | |||
needing to cater for partial deployment of re-ECN in just some of the | will be feasible on a domain-by domain basis, without needing to | |||
gateways around one Diffserv region. | cater for partial deployment of re-PCN in just some of the gateways | |||
around one PCN-domain. | ||||
Only the edge gateways around a Diffserv region have to be upgraded | Nonetheless, if the upgrade of one ingress gateway is accidentally | |||
to add re-ECN support, not interior routers. It is also necessary to | overlooked, the RE flag has been defined the safe way round for the | |||
add the mechanisms that use re-ECN to secure a network against | default legacy behaviour (leaving RE cleared as "0"). A legacy | |||
misbehaving gateways and networks. Specifically, these are the | ingress will appear to be declaring a high level of pre-congestion | |||
border mechanisms (Section 5.6) and the mechanisms to sanction | into the aggregate. The fail-safe border mechanism in Section 5.6.3 | |||
dishonest marking (Section 5.5). | might trigger management alarms (which would help in tracking down | |||
the need to upgrade the ingress), but all packets would continue to | ||||
be delivered safely, as overstatement of downstream congestion | ||||
requires no sanction. | ||||
Only the ingress edge gateways around a PCN-region have to be | ||||
upgraded to add re-PCN support, not interior routers. It is also | ||||
necessary to add the mechanisms that monitor re-PCN to secure a | ||||
network against misbehaving gateways and networks. Specifically, | ||||
these are the border mechanisms (Section 5.6) and the mechanisms to | ||||
sanction dishonest marking (Section 5.5). | ||||
We also RECOMMEND adding improvements to forwarding on interior | We also RECOMMEND adding improvements to forwarding on interior | |||
routers (Section 4.3.4). But the system works whether all, some or | routers (Section 4.3.4). But the system works whether all, some or | |||
none are upgraded, so interior routers may be upgraded in a piecemeal | none are upgraded, so interior routers may be upgraded in a piecemeal | |||
fashion at any time. | fashion at any time. | |||
8. Design Choices and Rationale | 8. Design Choices and Rationale | |||
The primary insight of this work is that downstream congestion is the | The primary insight of this work is that downstream congestion is the | |||
metric that would be most useful to control an internetwork, and | metric that would be most useful to control an internetwork, and | |||
particularly to police how one network responds to the congestion it | particularly to police how one network responds to the congestion it | |||
causes in a remote network. This is the problem that has previously | causes in a remote network. This is the problem that has previously | |||
made it so hard to provide scalable admission control. | made it so hard to provide scalable admission control. | |||
The case for using re-feedback (a generalisation of re-ECN) to police | The case for using re-feedback (a generalisation of re-ECN) to police | |||
congestion response and provide QoS is made in [Re-fb]. Essentially, | congestion response and provide QoS is made in [Re-fb]. Essentially, | |||
the insight is that congestion is a factor that crosses layers from | the insight is that congestion is a factor that crosses layers from | |||
the physical upwards. Therefore re-feedback polices congestion where | the physical upwards. Therefore re-feedback polices congestion as it | |||
it emerges from a physical interface between networks. This is | crosses the physical interface between networks. This is achieved by | |||
achieved by bringing the congestion information to the interface, | bringing information about congestion of resources later on the path | |||
rather than examining packet addressing where there is congestion. | to the interface, rather than trying to deal with congestion where it | |||
happens by examining the notoriously unreliable source address in | ||||
Then congestion crossing the physical interface at a border can be | packets. Then congestion crossing the physical interface at a border | |||
policed at the interface, rather than policing the congestion on | can be policed at the interface, rather than policing the congestion | |||
packets that claim to come from an address (which may be spoofed). | on packets that claim to come from an address (which may be spoofed). | |||
Also, re-feedback works in the network layer independently of other | Also, re-feedback works in the network layer independently of other | |||
layers--despite its name re-feedback does not actually require | layers--despite its name re-feedback does not actually require | |||
feedback. It requires a source to act conservatively before it gets | feedback. It makes a source to act conservatively before it gets | |||
feedback. | feedback. | |||
On the subject of lack of feedback, the feedback not established | On the subject of lack of feedback, the feedback not established | |||
(FNE) codepoint is motivated by arguments for a state set-up bit in | (FNE) codepoint is motivated by arguments for a state set-up bit in | |||
IP to prevent state exhaustion attacks. This idea was first put | IP to prevent state exhaustion attacks. This idea was first put | |||
forward informally by David Clark and documented by Handley and | forward informally by David Clark and developed by Handley and | |||
Greenhalgh in [Steps_DoS]. The idea is that network layer datagrams | Greenhalgh in [Steps_DoS]. The idea is that network layer datagrams | |||
should signal explicitly when they require state to be created in the | should signal explicitly when they require state to be created in the | |||
network layer or the layer above (e.g. at flow start). Then a node | network layer or the layer above (e.g. at flow start). Then a node | |||
can refuse to create any state unless a datagram declares this | can refuse to create any state unless a datagram declares this | |||
intent. We believe the proposed FNE codepoint serves the same | intent. We believe the proposed FNE codepoint serves the same | |||
purpose as the proposed state-set-up bit, but it has been overloaded | purpose as the proposed state set-up bit, but it has been overloaded | |||
with a more specific purpose, using it on more packets than just the | with a more specific purpose, using it on more packets than just the | |||
first in a flow, but never less (i.e. it is idempotent). In effect | first in a flow, but never less (i.e. it is idempotent). In effect | |||
the FNE codepoint serves the purpose of a `soft-state set-up | the FNE codepoint serves the purpose of a `soft-state set-up | |||
codepoint'. | codepoint'. | |||
The re-feedback paper [Re-fb] also makes the case for converting the | The re-feedback paper [Re-fb] also makes the case for converting the | |||
economic interpretation of congestion into hard engineering | economic interpretation of congestion into hard engineering | |||
mechanism, which is the basis of the approach used in this memo. The | mechanism, which is the basis of the approach used in this memo. The | |||
admission control gateways around the Diffserv region use hard | admission control gateways around the PCN-region use hard | |||
engineering, not incentives, to prevent end users from sending more | engineering, not incentives, to prevent end users from sending more | |||
traffic than they have reserved. Incentive-based mechanisms are only | traffic than they have reserved. Incentive-based mechanisms are only | |||
used between networks, because they are expected to respond to | used between networks, because they are expected to respond to | |||
incentives more rationally than end-users can be expected to. | incentives more rationally than end-users can be expected to. | |||
However, even then, a network can use fail-safes to protect itself | However, even then, a network can use fail-safes to protect itself | |||
from excessively unusual behaviour by neighbouring networks, whether | from excessively unusual behaviour by neighbouring networks, whether | |||
due to an accidental misconfiguration or malicious intent. | due to an accidental misconfiguration or malicious intent. | |||
The guiding principle behind the incentive-based approach used | The guiding principle behind the incentive-based approach used | |||
between networks is that any gain from subverting the protocol should | between networks is that any gain from subverting the protocol should | |||
skipping to change at page 45, line 5 | skipping to change at page 48, line 44 | |||
will most likely open up a new vulnerability, where the amplifying | will most likely open up a new vulnerability, where the amplifying | |||
effect of the punishment mechanism can be turned on others. | effect of the punishment mechanism can be turned on others. | |||
The re-feedback paper also makes the case against the use of | The re-feedback paper also makes the case against the use of | |||
congestion charging to police congestion if it is based on classic | congestion charging to police congestion if it is based on classic | |||
feedback (where only upstream congestion is visible to network | feedback (where only upstream congestion is visible to network | |||
elements). It argues this would open up receiving networks to | elements). It argues this would open up receiving networks to | |||
`denial of funds' attacks and would require end users to accept | `denial of funds' attacks and would require end users to accept | |||
dynamic pricing (which few would). | dynamic pricing (which few would). | |||
Re-ECN has been deliberately designed to simplify policing at the | Re-PCN has been deliberately designed to simplify policing at the | |||
borders between networks. These trust boundaries are the critical | borders between networks. These trust boundaries are the critical | |||
pinch-points that will limit the scalability of the whole | pinch-points that will limit the scalability of the whole | |||
internetwork unless the overall design minimises the complexity of | internetwork unless the overall design minimises the complexity of | |||
security functions at these borders. The border mechanisms described | security functions at these borders. The border mechanisms described | |||
in this memo run passively in parallel to data forwarding and they do | in this memo run passively in parallel to data forwarding and they do | |||
not require per-flow processing. | not require per-flow processing. | |||
9. Security Considerations | 9. Security Considerations | |||
This whole memo concerns the security of a scalable admission control | This whole memo concerns the security of a scalable admission control | |||
skipping to change at page 45, line 39 | skipping to change at page 49, line 31 | |||
markings introduced by an upstream network, but it would only lose | markings introduced by an upstream network, but it would only lose | |||
out on the penalties it could apply to a downstream network. | out on the penalties it could apply to a downstream network. | |||
When one network forwards a neighbouring network's traffic it will | When one network forwards a neighbouring network's traffic it will | |||
always be possible to cause damage by dropping or corrupting it. | always be possible to cause damage by dropping or corrupting it. | |||
Therefore we do not believe networks would set their routing policies | Therefore we do not believe networks would set their routing policies | |||
to interconnect in the first place if they didn't trust the other | to interconnect in the first place if they didn't trust the other | |||
networks not to arbitrarily damage their traffic. | networks not to arbitrarily damage their traffic. | |||
Having said this, we do want to highlight some of the weaker parts of | Having said this, we do want to highlight some of the weaker parts of | |||
our argument. We have argued that networks will be dissuaded from | our argument. | |||
faking congestion marking by the possibility that upstream networks | ||||
will route round them. As we have said, these arguments are based on | o We have argued that networks will be dissuaded from faking | |||
congestion marking by the possibility that upstream networks will | ||||
route round them. As we have said, these arguments are based on | ||||
fairly delicate assumptions and will remain fairly tenuous until | fairly delicate assumptions and will remain fairly tenuous until | |||
proved in practice, particularly close to the egress where less | proved in practice, particularly close to the egress where less | |||
competitive routing is likely. | competitive routing is likely. | |||
We should also point out that the approach in this memo was only | o Given the congestion feedback system is piggy-backed on flow | |||
signalling, which can be fairly infrequent, sanctions may not be | ||||
appropriate until a flow has been persistently negative for | ||||
perhaps 20s. This may allow brief attacks to go unpunished. | ||||
However, vulnerability to brief attacks may be reduced if the | ||||
egress triggers asynchronous feedback when the congestion level on | ||||
an aggregate has risen sufficiently since the last feedback, | ||||
rather than waiting for the next opportunity to piggy-back on a | ||||
signal. | ||||
o We should also point out that the approach in this memo was only | ||||
designed to be robust for admission control. We do not claim the | designed to be robust for admission control. We do not claim the | |||
incentives will always be strong enough to force correct flow pre- | incentives will always be strong enough to force correct flow | |||
emption behaviour. This is because a user will tend to perceive much | termination behaviour. This is because a user will tend to | |||
greater loss in value if a flow is pre-empted than if admission is | perceive much greater loss in value if a flow is terminated than | |||
denied at the start. However, in general the incentives for correct | if admission is denied at the start. However, in general the | |||
flow pre-emption are similar to those for admission control. | incentives for correct flow termination are similar to those for | |||
admission control. | ||||
Finally, it may seem that the 8 codepoints that have been made | Finally, it may seem that the 8 codepoints that have been made | |||
available by extending the ECN field with the RE flag have been used | available by extending the ECN field with the RE flag have been used | |||
rather wastefully. In effect the RE flag has been used as an | rather wastefully. In effect the RE flag has been used as an | |||
orthogonal single bit in nearly all cases. The only exception being | orthogonal single bit in nearly all cases. The only exception being | |||
when the ECN field is cleared to "00". The mapping of the codepoints | when the ECN field is cleared to "00". The mapping of the codepoints | |||
in an earlier version of this proposal used the codepoint space more | in an earlier version of this proposal used the codepoint space more | |||
efficiently, but the scheme became vulnerable to a network operator | efficiently, but the scheme became vulnerable to a network operator | |||
focusing its congestion marking to mark more positive than neutral | focusing its congestion marking to mark more positive than neutral | |||
packets in order to reduce its penalties (see Appendix B of | packets in order to reduce its penalties (see Appendix B of | |||
[Re-TCP]). | [I-D.briscoe-tsvwg-re-ecn-tcp]). | |||
With the scheme as now proposed, once the RE flag is set or cleared | With the scheme as now proposed, once the RE flag is set or cleared | |||
by the sender or its proxy, it should not be written by the network, | by the sender or its proxy, it should not be written by the network, | |||
only read. So the gateways can detect if any network maliciously | only read. So the gateways can detect if any network maliciously | |||
alters the RE flag. IPSec AH integrity checking does not cover the | alters the RE flag. IPSec AH integrity checking does not cover the | |||
IPv4 option flags (they were considered mutable--even the one we | IPv4 option flags (they were considered mutable--even the one we | |||
propose using for the RE flag that was `currently unused' when IPSec | propose using for the RE flag that was `currently unused' when IPSec | |||
was defined). But it would be sufficient for a pair of gateways to | was defined). But it would be sufficient for a pair of gateways to | |||
make random checks on whether the RE flag was the same when it | make random checks on whether the RE flag was the same when it | |||
reached the egress gateway as when it left the ingress. Indeed, if | reached the egress gateway as when it left the ingress. Indeed, if | |||
IPSec AH had covered the RE flag, any network intending to alter | IPSec AH had covered the RE flag, any network intending to alter | |||
sufficient RE flags to make a gain would have focused its alterations | sufficient RE flags to make a gain would have focused its alterations | |||
on packets without authenticating headers (AHs). | on packets without authenticating headers (AHs). | |||
No cryptographic algorithms have been harmed in the making of this | Therefore, no cryptographic algorithms have been exploited in the | |||
proposal. | making of this proposal. | |||
10. IANA Considerations | 10. IANA Considerations | |||
This memo includes no request to IANA. | This memo includes no request to IANA. | |||
11. Conclusions | 11. Conclusions | |||
This memo builds on a promising technique to solve the classic | This memo solves the classic problem of making flow admission control | |||
problem of making flow admission control scale to any size network. | scale to any size network. It builds on a technique, called PCN, | |||
It involves the use of Diffserv in a deployment model that uses pre- | which involves the use of Diffserv in a domain and uses pre- | |||
congestion notification feedback to control admission into a network | congestion notification feedback to control admission into each | |||
path [I-D.ietf-pcn-architecture]. However as it stands, that | network path across the domain [I-D.ietf-pcn-architecture]. | |||
deployment model depends on all network domains trusting each other | ||||
to comply with the protocols, invoking admission control and flow | ||||
pre-emption when requested. | ||||
We propose that the congestion feedback used in that deployment model | Without PCN, Diffserv requires over-provisioning that must grow | |||
should be re-echoed into the forward data path, by making a trivial | linearly with network diameter to cater for variation in the traffic | |||
modification to the ingress gateway. We then explain how the | matrix. However, even with PCN, multiple network domains can only | |||
join together into one larger PCN region if all domains trust each | ||||
other to comply with the protocols, invoking admission control and | ||||
flow termination when requested. Domains could join together and | ||||
still police flows at their borders by requiring reservation | ||||
signalling to touch each border and only use PCN internally to each | ||||
domain. But the per-flow processing at borders would still limit | ||||
scalability. | ||||
Instead, this memo proposes a technique called re-PCN which enables a | ||||
PCN region to extend across multiple domains, without unscalable per- | ||||
flow processing at borders, and still without the need for linear | ||||
growth in capacity over-provisioning as the hop-diameter of the | ||||
Diffserv region grows. | ||||
We propose that the congestion feedback used for PCN-based admission | ||||
control should be re-echoed into the forward data path, by making a | ||||
trivial modification to the ingress gateway. We then explain how the | ||||
resulting downstream pre-congestion metric in packets can be | resulting downstream pre-congestion metric in packets can be | |||
monitored in bulk at borders to sufficiently emulate flow rate | monitored in bulk at borders to sufficiently emulate flow rate | |||
policing. | policing. | |||
We claim the result of combining these two approaches is an admission | We claim the result of combining these two approaches is an admission | |||
control system that scales to any size network _and_ any number of | control system that scales to any size network _and_ any number of | |||
interconnected networks, even if they all act in their own interests. | interconnected networks, even if they all act in their own interests. | |||
This proposal aims to convince its readers to "Design in Security | This proposal aims to convince its readers to "Design in Security | |||
from the start," by ensuring the PCN wire protocol encoding can | from the start," by ensuring the PCN wire protocol encoding can | |||
accommodate the extended set of codepoints defined in this document, | accommodate the extended set of codepoints defined in this document, | |||
even if border policing is not needed at first. This way, we will | even if per-flow policing is used at first rather than the bulk | |||
not build ourselves tomorrow's legacy problem. | border policing described here. This way, we will not build | |||
ourselves tomorrow's legacy problem. | ||||
Re-echoing congestion feedback is based on a principled technique | Re-echoing congestion feedback is based on a principled technique | |||
called Re-ECN [Re-TCP], designed to add accountability for causing | called Re-ECN [I-D.briscoe-tsvwg-re-ecn-tcp], designed to add | |||
congestion to the general-purpose IP datagram service. Re-ECN | accountability for causing congestion to the general-purpose IP | |||
proposes to consume the last completely unused bit in the basic IPv4 | datagram service. Re-ECN proposes to consume the last completely | |||
header. | unused bit in the basic IPv4 header or it uses extension header in | |||
IPv6. | ||||
12. Acknowledgements | 12. Acknowledgements | |||
All the following have given helpful comments and some may become co- | All the following have given helpful comments either on re-PCN or on | |||
authors of later drafts: Arnaud Jacquet, Alessandro Salvatori, Steve | relevant parts of re-ECN that re-PCN uses: Arnaud Jacquet, Alessandro | |||
Rudkin, David Songhurst, John Davey, Ian Self, Anthony Sheppard, | Salvatori, Steve Rudkin, David Songhurst, John Davey, Ian Self, | |||
Carla Di Cairano-Gilfedder (BT), Mark Handley (who identified the | Anthony Sheppard, Carla Di Cairano-Gilfedder (BT), Mark Handley (who | |||
excess canceled packets attack), Stephen Hailes, Adam Greenhalgh | identified the excess canceled packets attack), Stephen Hailes, Adam | |||
(UCL), Francois Le Faucheur, Anna Charny (Cisco), Jozef Babiarz, | Greenhalgh (UCL), Francois Le Faucheur, Anna Charny (Cisco), Jozef | |||
Kwok-Ho Chan, Corey Alexander (Nortel), David Clark, Bill Lehr, | Babiarz, Kwok-Ho Chan, Corey Alexander (Nortel), David Clark, Bill | |||
Sharon Gillett, Steve Bauer (MIT) (who publicised various dummy | Lehr, Sharon Gillett, Steve Bauer (MIT) (who publicised various dummy | |||
traffic attacks), Sally Floyd (ICIR) and comments from participants | traffic attacks), Sally Floyd (ICIR) and comments from participants | |||
in the CFP/CRN Inter-Provider QoS, Broadband and DoS-Resistant | in the CFP/CRN Inter-Provider QoS, Broadband and DoS-Resistant | |||
Internet working groups. | Internet working groups. | |||
13. Comments Solicited | 13. Comments Solicited | |||
Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
addressed to the IETF Congestion and Pre-Congestion Notification | addressed to the IETF Congestion and Pre-Congestion Notification | |||
working group's mailing list <pcn@ietf.org>, and/or to the author(s). | working group's mailing list <pcn@ietf.org>, and/or to the author(s). | |||
14. References | 14. References | |||
14.1. Normative References | 14.1. Normative References | |||
[PCN] Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F., | [I-D.briscoe-tsvwg-ecn-tunnel] | |||
Charny, A., Liatsos, V., Babiarz, J., Chan, K., Dudley, | Briscoe, B., "Layered Encapsulation of Congestion | |||
S., Westberg, L., Bader, A., and G. Karagiannis, "Pre- | Notification", draft-briscoe-tsvwg-ecn-tunnel-01 (work in | |||
Congestion Notification Marking", | progress), July 2008. | |||
draft-briscoe-tsvwg-cl-phb-03 (work in progress), | ||||
October 2006. | [I-D.briscoe-tsvwg-re-ecn-tcp] | |||
Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, | ||||
"Re-ECN: Adding Accountability for Causing Congestion to | ||||
TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-06 (work in | ||||
progress), August 2008. | ||||
[I-D.eardley-pcn-marking-behaviour] | ||||
Eardley, P., "Marking behaviour of PCN-nodes", | ||||
draft-eardley-pcn-marking-behaviour-01 (work in progress), | ||||
June 2008. | ||||
[I-D.moncaster-pcn-baseline-encoding] | ||||
Moncaster, T., Briscoe, B., and M. Menth, "Baseline | ||||
Encoding and Transport of Pre-Congestion Information", | ||||
draft-moncaster-pcn-baseline-encoding-02 (work in | ||||
progress), July 2008. | ||||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC2211] Wroclawski, J., "Specification of the Controlled-Load | [RFC2211] Wroclawski, J., "Specification of the Controlled-Load | |||
Network Element Service", RFC 2211, September 1997. | Network Element Service", RFC 2211, September 1997. | |||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
RFC 3168, September 2001. | RFC 3168, September 2001. | |||
[RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, | [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, | |||
J., Courtney, W., Davari, S., Firoiu, V., and D. | J., Courtney, W., Davari, S., Firoiu, V., and D. | |||
Stiliadis, "An Expedited Forwarding PHB (Per-Hop | Stiliadis, "An Expedited Forwarding PHB (Per-Hop | |||
Behavior)", RFC 3246, March 2002. | Behavior)", RFC 3246, March 2002. | |||
[RSVP-ECN] | [RFC4774] Floyd, S., "Specifying Alternate Semantics for the | |||
Le Faucheur, F., Charny, A., Briscoe, B., Eardley, P., | Explicit Congestion Notification (ECN) Field", BCP 124, | |||
Babiarz, J., and K. Chan, "RSVP Extensions for Admission | RFC 4774, November 2006. | |||
Control over Diffserv using Pre-congestion Notification", | ||||
draft-lefaucheur-rsvp-ecn-01 (work in progress), | ||||
June 2006. | ||||
[Re-TCP] Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, | ||||
"Re-ECN: Adding Accountability for Causing Congestion to | ||||
TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-05 (work in | ||||
progress), January 2008. | ||||
14.2. Informative References | 14.2. Informative References | |||
[CLoop_pol] | [CLoop_pol] | |||
Salvatori, A., "Closed Loop Traffic Policing", Politecnico | Salvatori, A., "Closed Loop Traffic Policing", Politecnico | |||
Torino and Institut Eurecom Masters Thesis , | Torino and Institut Eurecom Masters Thesis , | |||
September 2005. | September 2005. | |||
[ECN-BGP] Mortier, R. and I. Pratt, "Incentive Based Inter-Domain | [ECN-BGP] Mortier, R. and I. Pratt, "Incentive Based Inter-Domain | |||
Routeing", Proc Internet Charging and QoS Technology | Routeing", Proc Internet Charging and QoS Technology | |||
Workshop (ICQT'03) pp308--317, September 2003, <http:// | Workshop (ICQT'03) pp308--317, September 2003, <http:// | |||
research.microsoft.com/users/mort/publications.aspx>. | research.microsoft.com/users/mort/publications.aspx>. | |||
[I-D.arumaithurai-nsis-pcn] | [I-D.arumaithurai-nsis-pcn] | |||
Arumaithurai, M., "NSIS PCN-QoSM: A Quality of Service | Arumaithurai, M., "NSIS PCN-QoSM: A Quality of Service | |||
Model for Pre-Congestion Notification (PCN)", | Model for Pre-Congestion Notification (PCN)", | |||
draft-arumaithurai-nsis-pcn-00 (work in progress), | draft-arumaithurai-nsis-pcn-00 (work in progress), | |||
September 2007. | September 2007. | |||
[I-D.charny-pcn-single-marking] | ||||
Charny, A., Zhang, X., Faucheur, F., and V. Liatsos, "Pre- | ||||
Congestion Notification Using Single Marking for Admission | ||||
and Termination", draft-charny-pcn-single-marking-03 | ||||
(work in progress), November 2007. | ||||
[I-D.ietf-nsis-rmd] | [I-D.ietf-nsis-rmd] | |||
Bader, A., "RMD-QOSM - The Resource Management in Diffserv | Bader, A., "RMD-QOSM - The Resource Management in Diffserv | |||
QOS Model", draft-ietf-nsis-rmd-12 (work in progress), | QOS Model", draft-ietf-nsis-rmd-12 (work in progress), | |||
November 2007. | November 2007. | |||
[I-D.ietf-pcn-architecture] | [I-D.ietf-pcn-architecture] | |||
Eardley, P., "Pre-Congestion Notification Architecture", | Eardley, P., "Pre-Congestion Notification (PCN) | |||
draft-ietf-pcn-architecture-03 (work in progress), | Architecture", draft-ietf-pcn-architecture-06 (work in | |||
February 2008. | progress), September 2008. | |||
[I-D.ietf-tsvwg-admitted-realtime-dscp] | ||||
Baker, F., Polk, J., and M. Dolly, "DSCPs for Capacity- | ||||
Admitted Traffic", | ||||
draft-ietf-tsvwg-admitted-realtime-dscp-04 (work in | ||||
progress), February 2008. | ||||
[IXQoS] Briscoe, B. and S. Rudkin, "Commercial Models for IP | [IXQoS] Briscoe, B. and S. Rudkin, "Commercial Models for IP | |||
Quality of Service Interconnect", BT Technology Journal | Quality of Service Interconnect", BT Technology Journal | |||
(BTTJ) 23(2)171--195, April 2005, | (BTTJ) 23(2)171--195, April 2005, | |||
<http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#ixqos>. | <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#ixqos>. | |||
[QoS_scale] | ||||
Reid, A., "Economics and Scalability of QoS Solutions", BT | ||||
Technology Journal (BTTJ) 23(2)97--117, April 2005. | ||||
[RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. | [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. | |||
Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 | Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 | |||
Functional Specification", RFC 2205, September 1997. | Functional Specification", RFC 2205, September 1997. | |||
[RFC2207] Berger, L. and T. O'Malley, "RSVP Extensions for IPSEC | [RFC2207] Berger, L. and T. O'Malley, "RSVP Extensions for IPSEC | |||
Data Flows", RFC 2207, September 1997. | Data Flows", RFC 2207, September 1997. | |||
[RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell, | [RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell, | |||
M., Romanow, A., Weinrib, A., and L. Zhang, "Resource | M., Romanow, A., Weinrib, A., and L. Zhang, "Resource | |||
ReSerVation Protocol (RSVP) Version 1 Applicability | ReSerVation Protocol (RSVP) Version 1 Applicability | |||
skipping to change at page 49, line 51 | skipping to change at page 54, line 43 | |||
[RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., | [RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., | |||
Speer, M., Braden, R., Davie, B., Wroclawski, J., and E. | Speer, M., Braden, R., Davie, B., Wroclawski, J., and E. | |||
Felstaine, "A Framework for Integrated Services Operation | Felstaine, "A Framework for Integrated Services Operation | |||
over Diffserv Networks", RFC 2998, November 2000. | over Diffserv Networks", RFC 2998, November 2000. | |||
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | |||
Congestion Notification (ECN) Signaling with Nonces", | Congestion Notification (ECN) Signaling with Nonces", | |||
RFC 3540, June 2003. | RFC 3540, June 2003. | |||
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the | ||||
Internet Protocol", RFC 4301, December 2005. | ||||
[RFC4727] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4, | [RFC4727] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4, | |||
ICMPv6, UDP, and TCP Headers", RFC 4727, November 2006. | ICMPv6, UDP, and TCP Headers", RFC 4727, November 2006. | |||
[RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion | [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion | |||
Marking in MPLS", RFC 5129, January 2008. | Marking in MPLS", RFC 5129, January 2008. | |||
[RSVP-ECN] | ||||
Le Faucheur, F., Charny, A., Briscoe, B., Eardley, P., | ||||
Babiarz, J., and K. Chan, "RSVP Extensions for Admission | ||||
Control over Diffserv using Pre-congestion Notification", | ||||
draft-lefaucheur-rsvp-ecn-01 (work in progress), | ||||
June 2006. | ||||
[Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., | [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., | |||
Salvatori, A., Soppera, A., and M. Koyabe, "Policing | Salvatori, A., Soppera, A., and M. Koyabe, "Policing | |||
Congestion Response in an Internetwork Using Re-Feedback", | Congestion Response in an Internetwork Using Re-Feedback", | |||
ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// | ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// | |||
www.acm.org/sigs/sigcomm/sigcomm2005/ | www.acm.org/sigs/sigcomm/sigcomm2005/ | |||
techprog.html#session8>. | techprog.html#session8>. | |||
[Smart_rtg] | [Smart_rtg] | |||
Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang, | Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang, | |||
"Optimizing Cost and Performance for Multihoming", ACM | "Optimizing Cost and Performance for Multihoming", ACM | |||
skipping to change at page 50, line 31 | skipping to change at page 55, line 35 | |||
[Steps_DoS] | [Steps_DoS] | |||
Handley, M. and A. Greenhalgh, "Steps towards a DoS- | Handley, M. and A. Greenhalgh, "Steps towards a DoS- | |||
resistant Internet Architecture", Proc. ACM SIGCOMM | resistant Internet Architecture", Proc. ACM SIGCOMM | |||
workshop on Future directions in network architecture | workshop on Future directions in network architecture | |||
(FDNA'04) pp 49--56, August 2004. | (FDNA'04) pp 49--56, August 2004. | |||
Appendix A. Implementation | Appendix A. Implementation | |||
A.1. Ingress Gateway Algorithm for Blanking the RE flag | A.1. Ingress Gateway Algorithm for Blanking the RE flag | |||
The ingress gateway receives regular feedback reporting the fraction | The ingress gateway receives regular feedback 'PCN-feedback- | |||
of congestion marked octets for each aggregate arriving at the | information' reporting the fraction of congestion marked octets for | |||
egress. So for each aggregate it should blank the RE flag on the | each aggregate arriving at the egress. So for each aggregate it | |||
same fraction of octets. It is more efficient to calculate the | should blank the RE flag on this fraction of octets. A suitable | |||
reciprocal of this fraction when the signalling arrives, Z_0 = (1 / | pseudo-code algorithm for the ingress gateway is as follows: | |||
Congestion-Level-Estimate). Z_0 will be the number of octets of | ||||
packets the ingress should send with the RE flag set between those it | ||||
sends with the RE flag blanked. Z_0 will also take account of the | ||||
sustainable rate reported during the flow pre-emption process, if | ||||
necessary. | ||||
A suitable pseudo-code algorithm for the ingress gateway is as | ||||
follows: | ||||
==================================================================== | ==================================================================== | |||
B_i = 0 /* interblank volume */ | for each PCN-capable-packet { | |||
for each PCN-capable packet { | if RAND(0,1) <= PCN-feedback-information | |||
b = readLength(packet) /* set b to packet size */ | writeRE(0); | |||
B_i += b /* accumulate interblank volume */ | else | |||
if B_i < b * Z_0 { /* test whether interblank volume... */ | writeRE(1); | |||
writeRE(1) | ||||
} else { /* ...exceeds blank RE spacing * pkt size*/ | ||||
writeRE(0) /* ...and if so, clear RE */ | ||||
B_i = 0 /* ...and re-set interblank volume */ | ||||
} | ||||
} | } | |||
==================================================================== | ==================================================================== | |||
A.2. Downstream Congestion Metering Algorithms | A.2. Downstream Congestion Metering Algorithms | |||
A.2.1. Bulk Downstream Congestion Metering Algorithm | A.2.1. Bulk Downstream Congestion Metering Algorithm | |||
To meter the bulk amount of downstream pre-congestion in traffic | To meter the bulk amount of downstream pre-congestion in traffic | |||
crossing an inter-domain border, an algorithm is needed that | crossing an inter-domain border, an algorithm is needed that | |||
accumulates the size of positive packets and subtracts the size of | accumulates the size of positive packets and subtracts the size of | |||
skipping to change at page 51, line 40 | skipping to change at page 56, line 26 | |||
B: total data volume (in case it is needed) | B: total data volume (in case it is needed) | |||
A suitable pseudo-code algorithm for a border router is as follows: | A suitable pseudo-code algorithm for a border router is as follows: | |||
==================================================================== | ==================================================================== | |||
V_b = 0 | V_b = 0 | |||
B = 0 | B = 0 | |||
for each PCN-capable packet { | for each PCN-capable packet { | |||
b = readLength(packet) /* set b to packet size */ | b = readLength(packet) /* set b to packet size */ | |||
B += b /* accumulate total volume */ | B += b /* accumulate total volume */ | |||
if readEECN(packet) == (Re-Echo || FNE) { | if readEPCN(packet) == (Re-PCT-Echo || FNE) { | |||
V_b += b /* increment... */ | V_b += b /* increment... */ | |||
} elseif readEECN(packet) == ( AM(-1) || PM(-1) ) { | } elseif readEPCN(packet) == ( AM(-1) || TM(-1) ) { | |||
V_b -= b /* ...or decrement V_b... */ | V_b -= b /* ...or decrement V_b... */ | |||
} /*...depending on EECN field */ | } /*...depending on EPCN field */ | |||
} | } | |||
==================================================================== | ==================================================================== | |||
At the end of an accounting period this counter V_b represents the | At the end of an accounting period this counter V_b represents the | |||
pre-congestion volume that penalties could be applied to, as | pre-congestion volume that penalties could be applied to, as | |||
described in Section 5.3. | described in Section 5.3. | |||
For instance, accumulated volume of pre-congestion through a border | For instance, accumulated volume of pre-congestion through a border | |||
interface over a month might be V_b = 5PB (petabyte = 10^15 byte). | interface over a month might be V_b = 5TB (terabyte = 10^12 byte). | |||
This might have resulted from an average downstream pre-congestion | This might have resulted from an average downstream pre-congestion | |||
level of 1% on an accumulated total data volume of B = 500PB. | level of 0.001% on an accumulated total data volume of B = 500PB | |||
(petabyte = 10^15 byte). | ||||
A.2.2. Inflation Factor for Persistently Negative Flows | A.2.2. Inflation Factor for Persistently Negative Flows | |||
The following process is suggested to complement the simple algorithm | The following process is suggested to complement the simple algorithm | |||
above in order to protect against the various attacks from | above in order to protect against the various attacks from | |||
persistently negative flows described in Section 5.6.1. As explained | persistently negative flows described in Section 5.6.1. As explained | |||
in that section, the most important and first step is to estimate the | in that section, the most important and first step is to estimate the | |||
contribution of persistently negative flows to the bulk volume of | contribution of persistently negative flows to the bulk volume of | |||
downstream pre-congestion and to inflate this bulk volume as if these | downstream pre-congestion and to inflate this bulk volume as if these | |||
flows weren't there. The process below has been designed to give an | flows weren't there. The process below has been designed to give an | |||
unbiased estimate, but it may be possible to define other processes | unbiased estimate, but it may be possible to define other processes | |||
that achieve similar ends. | that achieve similar ends. | |||
While the above simple metering algorithm is counting the bulk of | While the above simple metering algorithm (Appendix A.2) is counting | |||
traffic over an accounting period, the meter should also select a | the bulk of traffic over an accounting period, the meter should also | |||
subset of the whole flow ID space that is small enough to be able to | select a subset of the whole flow ID space that is small enough to be | |||
realistically measure but large enough to give a realistic sample. | able to realistically measure but large enough to give a realistic | |||
Many different samples of different subsets of the ID space should be | sample. Many different samples of different subsets of the ID space | |||
taken at different times during the accounting period, preferably | should be taken at different times during the accounting period, | |||
covering the whole ID space. During each sample, the meter should | preferably covering the whole ID space. During each sample, the | |||
count the volume of positive packets and subtract the volume of | meter should count the volume of positive packets and subtract the | |||
negative, maintaining a separate account for each flow in the sample. | volume of negative, maintaining a separate account for each flow in | |||
It should run a lot longer than the large majority of flows, to avoid | the sample. It should run a lot longer than the large majority of | |||
a bias from missing the starts and ends of flows, which tend to be | flows, to avoid a bias from missing the starts and ends of flows, | |||
positive and negative respectively. | which tend to be positive and negative respectively. | |||
Once the accounting period finishes, the meter should calculate the | Once the accounting period finishes, the meter should calculate the | |||
total of the accounts V_{bI} for the subset of flows I in the sample, | total of the accounts V_{bI} for the subset of flows I in the sample, | |||
and the total of the accounts V_{fI} excluding flows with a negative | and the total of the accounts V_{fI} excluding flows with a negative | |||
account from the subset I. Then the weighted mean of all these | account from the subset I. Then the weighted mean of all these | |||
samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I} | samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I} | |||
V_{bI}. | V_{bI}. | |||
If V_b is the result of the bulk accounting algorithm over the | If V_b is the result of the bulk accounting algorithm over the | |||
accounting period (Appendix A.2.1) it can be inflated by this factor | accounting period (Appendix A.2.1) it can be inflated by this factor | |||
a_S to get a good unbiased estimate of the volume of downstream | a_S to get a good unbiased estimate of the volume of downstream | |||
congestion over the accounting period a_S.V_b, without being polluted | congestion over the accounting period a_S.V_b, without being polluted | |||
by the effect of persistently negative flows. | by the effect of persistently negative flows. | |||
A.3. Algorithm for Sanctioning Negative Traffic | A.3. Algorithm for Sanctioning Negative Traffic | |||
{ToDo: Write up algorithms similar to Appendix D of [Re-TCP] for the | {ToDo: Write up algorithms similar to Appendix E of | |||
negative flow monitor with flow management algorithm and the variant | [I-D.briscoe-tsvwg-re-ecn-tcp] for the negative flow monitor with | |||
with bounded flow state.} | flow management algorithm and the variant with bounded flow state.} | |||
Author's Address | Author's Address | |||
Bob Briscoe | Bob Briscoe | |||
BT & UCL | BT & UCL | |||
B54/77, Adastral Park | B54/77, Adastral Park | |||
Martlesham Heath | Martlesham Heath | |||
Ipswich IP5 3RE | Ipswich IP5 3RE | |||
UK | UK | |||
skipping to change at page 54, line 45 | skipping to change at page 59, line 45 | |||
such proprietary rights by implementers or users of this | such proprietary rights by implementers or users of this | |||
specification can be obtained from the IETF on-line IPR repository at | specification can be obtained from the IETF on-line IPR repository at | |||
http://www.ietf.org/ipr. | http://www.ietf.org/ipr. | |||
The IETF invites any interested party to bring to its attention any | The IETF invites any interested party to bring to its attention any | |||
copyrights, patents or patent applications, or other proprietary | copyrights, patents or patent applications, or other proprietary | |||
rights that may cover technology that may be required to implement | rights that may cover technology that may be required to implement | |||
this standard. Please address the information to the IETF at | this standard. Please address the information to the IETF at | |||
ietf-ipr@ietf.org. | ietf-ipr@ietf.org. | |||
Acknowledgments | Acknowledgment | |||
Funding for the RFC Editor function is provided by the IETF | This document was produced using xml2rfc v1.33 (of | |||
Administrative Support Activity (IASA). This document was produced | http://xml.resource.org/) from a source in RFC-2629 XML format. | |||
using xml2rfc v1.32 (of http://xml.resource.org/) from a source in | ||||
RFC-2629 XML format. | ||||
End of changes. 194 change blocks. | ||||
756 lines changed or deleted | 1004 lines changed or added | |||
This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |