| draft-briscoe-re-pcn-border-cheat-01.txt | draft-briscoe-re-pcn-border-cheat-02.txt | |||
|---|---|---|---|---|
| PCN Working Group B. Briscoe | PCN Working Group B. Briscoe | |||
| Internet-Draft BT & UCL | Internet-Draft BT & UCL | |||
| Intended status: Informational February 25, 2008 | Intended status: Standards Track September 13, 2008 | |||
| Expires: August 28, 2008 | Expires: March 17, 2009 | |||
| Emulating Border Flow Policing using Re-ECN on Bulk Data | Emulating Border Flow Policing using Re-PCN on Bulk Data | |||
| draft-briscoe-re-pcn-border-cheat-01 | draft-briscoe-re-pcn-border-cheat-02 | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 34 | skipping to change at page 1, line 34 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on August 28, 2008. | This Internet-Draft will expire on March 17, 2009. | |||
| Copyright Notice | ||||
| Copyright (C) The IETF Trust (2008). | ||||
| Abstract | Abstract | |||
| Scaling per flow admission control to the Internet is a hard problem. | Scaling per flow admission control to the Internet is a hard problem. | |||
| A recently proposed approach combines Diffserv and pre-congestion | The approach of combining Diffserv and pre-congestion notification | |||
| notification (PCN) to provide a service slightly better than Intserv | (PCN) provides a service slightly better than Intserv controlled load | |||
| controlled load. It scales to networks of any size, but only if | that scales to networks of any size without needing Diffserv's usual | |||
| domains trust each other to comply with admission control and rate | overprovisioning, but only if domains trust each other to comply with | |||
| policing. This memo claims to solve this trust problem without | admission control and rate policing. This memo claims to solve this | |||
| losing scalability. It describes bulk border policing that provides | trust problem without losing scalability. It provides a sufficient | |||
| a sufficient emulation of per-flow policing with the help of another | emulation of per-flow policing at borders but with only passive bulk | |||
| recently proposed extension to ECN, involving re-echoing ECN feedback | metering rather than per-flow processing. Measurements are | |||
| (re-ECN). With only passive bulk measurements at borders, sanctions | sufficient to apply penalties against cheating neighbour networks. | |||
| can be applied against cheating networks. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 7 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 2. Requirements Notation . . . . . . . . . . . . . . . . . . . . 9 | 2. Requirements Notation . . . . . . . . . . . . . . . . . . . . 11 | |||
| 3. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 3. The Problem . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 3.1. The Traditional Per-flow Policing Problem . . . . . . . . 10 | 3.1. The Traditional Per-flow Policing Problem . . . . . . . . 11 | |||
| 3.2. Generic Scenario . . . . . . . . . . . . . . . . . . . . . 12 | 3.2. Generic Scenario . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 4. Re-ECN Protocol for an RSVP (or similar) Transport . . . . . . 14 | 4. Re-ECN Protocol in IP with Two Congestion Marking Levels . . . 17 | |||
| 4.1. Protocol Overview . . . . . . . . . . . . . . . . . . . . 14 | 4.1. Protocol Overview . . . . . . . . . . . . . . . . . . . . 17 | |||
| 4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | 4.2. Re-PCN Abstracted Network Layer Wire Protocol (IPv4 or | |||
| v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 | v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 4.2.1. Re-ECN Recap . . . . . . . . . . . . . . . . . . . . . 16 | 4.2.1. Re-ECN Recap . . . . . . . . . . . . . . . . . . . . . 18 | |||
| 4.2.2. Re-ECN Combined with Pre-Congestion Notification | 4.2.2. Re-ECN Combined with Pre-Congestion Notification | |||
| (re-PCN) . . . . . . . . . . . . . . . . . . . . . . . 18 | (re-PCN) . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| 4.3. Protocol Operation . . . . . . . . . . . . . . . . . . . . 20 | 4.3. Protocol Operation . . . . . . . . . . . . . . . . . . . . 22 | |||
| 4.3.1. Protocol Operation for an Established Flow . . . . . . 20 | 4.3.1. Protocol Operation for an Established Flow . . . . . . 23 | |||
| 4.3.2. Aggregate Bootstrap . . . . . . . . . . . . . . . . . 21 | 4.3.2. Aggregate Bootstrap . . . . . . . . . . . . . . . . . 24 | |||
| 4.3.3. Flow Bootstrap . . . . . . . . . . . . . . . . . . . . 22 | 4.3.3. Flow Bootstrap . . . . . . . . . . . . . . . . . . . . 26 | |||
| 4.3.4. Router Forwarding Behaviour . . . . . . . . . . . . . 23 | 4.3.4. Router Forwarding Behaviour . . . . . . . . . . . . . 26 | |||
| 4.3.5. Extensions . . . . . . . . . . . . . . . . . . . . . . 25 | 4.3.5. Extensions . . . . . . . . . . . . . . . . . . . . . . 28 | |||
| 5. Emulating Border Policing with Re-ECN . . . . . . . . . . . . 25 | 5. Emulating Border Policing with Re-ECN . . . . . . . . . . . . 28 | |||
| 5.1. Informal Terminology . . . . . . . . . . . . . . . . . . . 25 | 5.1. Informal Terminology . . . . . . . . . . . . . . . . . . . 28 | |||
| 5.2. Policing Overview . . . . . . . . . . . . . . . . . . . . 26 | 5.2. Policing Overview . . . . . . . . . . . . . . . . . . . . 30 | |||
| 5.3. Pre-requisite Contractual Arrangements . . . . . . . . . . 28 | 5.3. Pre-requisite Contractual Arrangements . . . . . . . . . . 31 | |||
| 5.4. Emulation of Per-Flow Rate Policing: Rationale and | 5.4. Emulation of Per-Flow Rate Policing: Rationale and | |||
| Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 31 | Limits . . . . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
| 5.5. Sanctioning Dishonest Marking . . . . . . . . . . . . . . 32 | 5.5. Sanctioning Dishonest Marking . . . . . . . . . . . . . . 36 | |||
| 5.6. Border Mechanisms . . . . . . . . . . . . . . . . . . . . 34 | 5.6. Border Mechanisms . . . . . . . . . . . . . . . . . . . . 38 | |||
| 5.6.1. Border Accounting Mechanisms . . . . . . . . . . . . . 34 | 5.6.1. Border Accounting Mechanisms . . . . . . . . . . . . . 38 | |||
| 5.6.2. Competitive Routing . . . . . . . . . . . . . . . . . 38 | 5.6.2. Competitive Routing . . . . . . . . . . . . . . . . . 41 | |||
| 5.6.3. Fail-safes . . . . . . . . . . . . . . . . . . . . . . 39 | 5.6.3. Fail-safes . . . . . . . . . . . . . . . . . . . . . . 42 | |||
| 6. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 | 6. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 | |||
| 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 42 | 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 46 | |||
| 8. Design Choices and Rationale . . . . . . . . . . . . . . . . . 43 | 8. Design Choices and Rationale . . . . . . . . . . . . . . . . . 47 | |||
| 9. Security Considerations . . . . . . . . . . . . . . . . . . . 45 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 49 | |||
| 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 46 | 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 50 | |||
| 11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 46 | 11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 50 | |||
| 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 47 | 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 51 | |||
| 13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 47 | 13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 52 | |||
| 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 48 | 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 52 | |||
| 14.1. Normative References . . . . . . . . . . . . . . . . . . . 48 | 14.1. Normative References . . . . . . . . . . . . . . . . . . . 52 | |||
| 14.2. Informative References . . . . . . . . . . . . . . . . . . 48 | 14.2. Informative References . . . . . . . . . . . . . . . . . . 53 | |||
| Appendix A. Implementation . . . . . . . . . . . . . . . . . . . 50 | Appendix A. Implementation . . . . . . . . . . . . . . . . . . . 55 | |||
| A.1. Ingress Gateway Algorithm for Blanking the RE flag . . . . 50 | A.1. Ingress Gateway Algorithm for Blanking the RE flag . . . . 55 | |||
| A.2. Downstream Congestion Metering Algorithms . . . . . . . . 51 | A.2. Downstream Congestion Metering Algorithms . . . . . . . . 56 | |||
| A.2.1. Bulk Downstream Congestion Metering Algorithm . . . . 51 | A.2.1. Bulk Downstream Congestion Metering Algorithm . . . . 56 | |||
| A.2.2. Inflation Factor for Persistently Negative Flows . . . 52 | A.2.2. Inflation Factor for Persistently Negative Flows . . . 56 | |||
| A.3. Algorithm for Sanctioning Negative Traffic . . . . . . . . 52 | A.3. Algorithm for Sanctioning Negative Traffic . . . . . . . . 57 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 53 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 57 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . . 54 | Intellectual Property and Copyright Statements . . . . . . . . . . 59 | |||
| Status (to be removed by the RFC Editor) | Status (to be removed by the RFC Editor) | |||
| The IETF PCN working group is initially chartered to consider PCN | ||||
| domains only under a single trust authority. However, after its | ||||
| initial work is complete the charter says the working group may re- | ||||
| charter to consider concatenated Diffserv domains, amongst other new | ||||
| work items. The charter ends by stating "The details of these work | ||||
| items are outside the scope of the initial phase; but the WG may | ||||
| consider their requirements to design components that are | ||||
| sufficiently general to support such extensions in the future." | ||||
| This memo is therefore contributed to describe how PCN could be | ||||
| extended to inter-domain. We wanted to document the solution to | ||||
| reduce the chances that something else eats up the codepoint space | ||||
| needed before PCN re-charters to consider inter-domain. Losing the | ||||
| chance to standardise this simple, scalable solution to the problem | ||||
| of inter-domain flow admission control would be unfortunate | ||||
| (understatement), given it took years to find, and even then it was | ||||
| very difficult to find codepoint space for it. | ||||
| The scheme described here (Section 4) requires the PCN ingress | ||||
| gateway to re-echo any PCN feedback it receives back into the forward | ||||
| stream of IP packets (hence we call this scheme re-PCN). Re-PCN | ||||
| works in a very similar way to the re-ECN proposal on which it is | ||||
| based [I-D.briscoe-tsvwg-re-ecn-tcp], the only difference being that | ||||
| PCN might encode three states of congestion, whereas ECN encodes two. | ||||
| This document is written to stand alone from re-ECN, so that readers | ||||
| do not have to read [I-D.briscoe-tsvwg-re-ecn-tcp]. | ||||
| The authors seek comments from the Internet community on whether | ||||
| combining PCN and re-ECN to create re-PCN in this way is a sufficient | ||||
| solution to the problem of scaling microflow admission control to the | ||||
| Internet as a whole. Here we emphasise that scaling is not just an | ||||
| issue of numbers of flows, but also the number of security entities-- | ||||
| networks and users--who may all have conflicting interests. | ||||
| This memo is posted as an Internet-Draft with the intent to | This memo is posted as an Internet-Draft with the intent to | |||
| eventually be broken down in two documents; one for the standards | eventually be broken down in two documents; one for the standards | |||
| track and one for informational status. But until it becomes an item | track and one for informational status. But until it becomes an item | |||
| of IETF working group business the whole proposal has been kept | of IETF working group business the whole proposal has been kept | |||
| together to aid understanding. Only the text of Section 4 of this | together to aid understanding. Only the text of Section 4 of this | |||
| document requires standardisation. The rest of the sections describe | document is intended to be normative (requiring standardisation). | |||
| how a system might be built from these protocols by the operators of | The rest of the sections are merely informative, describing how a | |||
| an internetwork. Note in particular that the policing and monitoring | system might be built from these protocols by the operators of an | |||
| internetwork. Note in particular that the policing and monitoring | ||||
| functions proposed for the trust boundaries between operators would | functions proposed for the trust boundaries between operators would | |||
| not need standardisation by the IETF. They simply represent one way | not need standardisation by the IETF. They simply represent one | |||
| that the proposed protocols could be used to extend the PCN | possible way that the proposed protocols could be used to extend the | |||
| architecture [I-D.ietf-pcn-architecture] to span multiple domains | PCN architecture [I-D.ietf-pcn-architecture] to span multiple domains | |||
| without mutual trust between the operators. | without mutual trust between the operators. | |||
| To realise the system described, this document also depends on | Dependencies (to be removed by the RFC Editor) | |||
| standardisation of three other documents currently being discussed | ||||
| (but not on the standards track) in the IETF Transport Area: pre- | ||||
| congestion notification (PCN) marking on interior nodes [PCN]; | ||||
| feedback of aggregate PCN measurements by suitably extending the | ||||
| admission control signalling protocol (e.g. RSVP) [RSVP-ECN]; and | ||||
| re-insertion of the feedback into the forward stream of IP packets by | ||||
| the PCN ingress gateway in a similar way to that proposed for a TCP | ||||
| source [Re-TCP]. | ||||
| The authors seek comments from the Internet community on whether | To realise the system described, this document also depends on other | |||
| combining PCN and re-ECN in this way is a sufficient solution to the | documents chartered in the IETF Transport Area progressing along the | |||
| problem of scaling microflow admission control to the Internet as a | standards track: | |||
| whole, even though such scaling must take account of the increasing | ||||
| numbers of networks and users who may all have conflicting interests. | o Pre-congestion notification (PCN) marking on interior nodes | |||
| [I-D.eardley-pcn-marking-behaviour], chartered for standardisation | ||||
| in the PCN w-g; | ||||
| o The baseline encoding of pre-congestion notification in the IP | ||||
| header [I-D.moncaster-pcn-baseline-encoding], also chartered for | ||||
| standardisation in the PCN w-g; | ||||
| o Feedback of aggregate PCN measurements by suitably extending the | ||||
| admission control signalling protocol (e.g. RSVP extension | ||||
| [RSVP-ECN] or NSIS extension [I-D.arumaithurai-nsis-pcn]). | ||||
| The baseline encoding makes no new demands on codepoint space in the | ||||
| IP header but provides just two PCN encoding states (not marked and | ||||
| marked). The PCN architecture recognises that operators might want | ||||
| PCN marking to trigger two functions (admission control and flow | ||||
| termination) at different levels of pre-congestion, which seems to | ||||
| require three encoding states. A scheme has been proposed | ||||
| [I-D.charny-pcn-single-marking] that can do both functions with just | ||||
| two encoding states, but simulations have shown it performs poorly | ||||
| under certain conditions that might be typical. As it seems likely | ||||
| that PCN might need three encoding states to be fully operational, we | ||||
| want to be sure that three encoding states can be extended to work | ||||
| inter-domain. Therefore, we have defined a three-state extension | ||||
| encoding scheme in this document, then we have added the re-PCN | ||||
| scheme to it. The three-state encoding we have chosen depends on | ||||
| standardisation of yet another document in the IETF Transport Area: | ||||
| o Propagation beyond the tunnel decapsulator of any changes in the | ||||
| ECN field to ECT(0) or ECT(1) made within a tunnel (the ideal | ||||
| decapsulation rules of [I-D.briscoe-tsvwg-ecn-tunnel]); | ||||
| Changes from previous drafts (to be removed by the RFC Editor) | Changes from previous drafts (to be removed by the RFC Editor) | |||
| Full diffs of incremental changes between drafts are available at | Full diffs of incremental changes between drafts are available at | |||
| URL: <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#repcn> | URL: <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#repcn> | |||
| Changes from <draft-briscoe-re-pcn-border-cheat-01> to | ||||
| <draft-briscoe-re-pcn-border-cheat-02> (current version): | ||||
| Considerably updated the 'Status' note to explain the | ||||
| relationship of this draft to other documents in the IETF | ||||
| process (or not) and to chartered PCN w-g activity. | ||||
| Split out the dependencies into a separate note and added | ||||
| dependencies on new PCN documents in progress. | ||||
| Made scalability motivation in the introduction clearer, | ||||
| explaining why Diffserv over-provisioning doesn't scale unless | ||||
| PCN is used. | ||||
| Clarified that the standards action in Section 4 is to define | ||||
| the meanings of the combination of fields in the IP header: the | ||||
| RE flag and 2-level congestion marking in the ECN field. And | ||||
| that it is not characterised by a particular feedback style in | ||||
| the transport. | ||||
| Switched round the two ECT codepoints to be compatible with the | ||||
| new PCN baseline encoding and used less confusing naming for | ||||
| re-PCN codepoints (Section 4). | ||||
| Generalised rules for encoding probes when bootstrapping or re- | ||||
| starting aggregates & flows (Section 4.3.2). | ||||
| Downgraded drop sanction behaviour from MUST to conditional | ||||
| SHOULD (Section 5.5). | ||||
| Added incremental deployment safety justification for choice of | ||||
| which way round the RE flag works (Section 7). | ||||
| Added possible vulnerability to brief attacks and possible | ||||
| solution to security considerations (Section 9). | ||||
| Updated references and terminology, particularly taking account | ||||
| of recent new PCN w-g documents; | ||||
| Replaced suggested Ingress Gateway Algorithm for Blanking the | ||||
| RE flag (Appendix A.1) | ||||
| Clarifications throughout; | ||||
| Changes from <draft-briscoe-re-pcn-border-cheat-00> to | Changes from <draft-briscoe-re-pcn-border-cheat-00> to | |||
| <draft-briscoe-re-pcn-border-cheat-01> (current version): | <draft-briscoe-re-pcn-border-cheat-01>: | |||
| Updated references. | Updated references. | |||
| Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-01> | Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-01> | |||
| to <draft-briscoe-re-pcn-border-cheat-00>: | to <draft-briscoe-re-pcn-border-cheat-00>: | |||
| Changed filename to associate it with the new IETF PCN w-g, rather | Changed filename to associate it with the new IETF PCN w-g, | |||
| than the TSVWG w-g. | rather than the TSVWG w-g. | |||
| Introduction: Clarified that bulk policing only replaces per-flow | Introduction: Clarified that bulk policing only replaces per- | |||
| policing at interior inter-domain borders, while per-flow policing | flow policing at interior inter-domain borders, while per-flow | |||
| is still needed at the access interface to the internetwork. Also | policing is still needed at the access interface to the | |||
| clarified that the aim is to neutralise any gains from cheating | internetwork. Also clarified that the aim is to neutralise any | |||
| using local bilateral contracts between neighbouring networks, | gains from cheating using local bilateral contracts between | |||
| rather than merely identifying remote cheaters. | neighbouring networks, rather than merely identifying remote | |||
| cheaters. | ||||
| Section 3.1: Described the traditional per-flow policing problem | Section 3.1: Described the traditional per-flow policing | |||
| with inter-domain reservations more precisely, particularly with | problem with inter-domain reservations more precisely, | |||
| respect to direction of reservations and of traffic flows. | particularly with respect to direction of reservations and of | |||
| traffic flows. | ||||
| Clarified status of Section 5 onwards, in particular that policers | Clarified status of Section 5 onwards, in particular that | |||
| and monitors would not need standardisation, but that the protocol | policers and monitors would not need standardisation, but that | |||
| in Section 4 would require standardisation. | the protocol in Section 4 would require standardisation. | |||
| Section 5.6.2 on competitive routing: Added discussion of direct | Section 5.6.2 on competitive routing: Added discussion of | |||
| incentives for a receiver to switch to a different provider even | direct incentives for a receiver to switch to a different | |||
| if the provider has a termination monopoly. | provider even if the provider has a termination monopoly. | |||
| Clarified that "Designing in security from the start" merely means | Clarified that "Designing in security from the start" merely | |||
| allowing codepoint space in the PCN protocol encoding. There is | means allowing codepoint space in the PCN protocol encoding. | |||
| no need to actually implement inter-domain security mechanisms for | There is no need to actually implement inter-domain security | |||
| solutions confined to a single domain. | mechanisms for solutions confined to a single domain. | |||
| Updated some references and added a ref to the Security | Updated some references and added a ref to the Security | |||
| Considerations, as well as other minor corrections and | Considerations, as well as other minor corrections and | |||
| improvements. | improvements. | |||
| Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-00> to | Changes from <draft-briscoe-tsvwg-re-ecn-border-cheat-00> to | |||
| <draft-briscoe-tsvwg-re-ecn-border-cheat-01>: | <draft-briscoe-tsvwg-re-ecn-border-cheat-01>: | |||
| Added subsection on Border Accounting Mechanisms (Section 5.6.1) | Added subsection on Border Accounting Mechanisms | |||
| (Section 5.6.1) | ||||
| Section 4.2 on the re-ECN wire protocol clarified and re-organised | Section 4.2 on the re-ECN wire protocol clarified and re- | |||
| to separately discuss re-ECN for default ECN marking and for pre- | organised to separately discuss re-ECN for default ECN marking | |||
| congestion marking (PCN). | and for pre-congestion marking (PCN). | |||
| Router Forwarding Behaviour subsection added to re-organised | Router Forwarding Behaviour subsection added to re-organised | |||
| section on Protocol Operation (Section 4.3). Extensions section | section on Protocol Operation (Section 4.3). Extensions | |||
| moved within Protocol Operations. | section moved within Protocol Operations. | |||
| Emulating Border Policing (Section 5) reorganised, starting with a | Emulating Border Policing (Section 5) reorganised, starting | |||
| new Terminology subsection heading, and a simplified overview | with a new Terminology subsection heading, and a simplified | |||
| section. Added a large new subsection on Border Accounting | overview section. Added a large new subsection on Border | |||
| Mechanisms within a new section bringing together other | Accounting Mechanisms within a new section bringing together | |||
| subsections on Border Mechanisms generally (Section 5.6). Some | other subsections on Border Mechanisms generally (Section 5.6). | |||
| text moved from old subsections into these new ones. | Some text moved from old subsections into these new ones. | |||
| Added section on Incremental Deployment (Section 7), drawing | Added section on Incremental Deployment (Section 7), drawing | |||
| together relevant points about deployment made throughout. | together relevant points about deployment made throughout. | |||
| Sections on Design Rationale (Section 8) and Security | Sections on Design Rationale (Section 8) and Security | |||
| Considerations (Section 9) expanded with some new material, | Considerations (Section 9) expanded with some new material, | |||
| including new attacks and their defences. | including new attacks and their defences. | |||
| Suggested Border Metering Algorithms improved (Appendix A.2) for | Suggested Border Metering Algorithms improved (Appendix A.2) | |||
| resilience to newly identified attacks. | for resilience to newly identified attacks. | |||
| 1. Introduction | 1. Introduction | |||
| The Internet community largely lost interest in the Intserv | The Internet community largely lost interest in the Intserv | |||
| architecture after it was clarified that it would be unlikely to | architecture after it was clarified that it would be unlikely to | |||
| scale to the whole Internet [RFC2208]. Although Intserv mechanisms | scale to the whole Internet [RFC2208]. Although Intserv mechanisms | |||
| proved impractical, the bandwidth reservation service it aimed to | proved impractical, the bandwidth reservation service it aimed to | |||
| offer is still very much required. | offer is still very much required. | |||
| A recently proposed approach [I-D.ietf-pcn-architecture] combines | A recently proposed approach [I-D.ietf-pcn-architecture] combines | |||
| Diffserv and pre-congestion notification (PCN) to provide a service | Diffserv and pre-congestion notification (PCN) to provide a service | |||
| slightly better than Intserv controlled load [RFC2211]. It scales to | slightly better than Intserv controlled load [RFC2211]. PCN does not | |||
| any size network, but only if domains trust their neighbours to have | require the considerable over-provisioning that is normally required | |||
| checked that upstream customers aren't taking more bandwidth than | for admission control over Diffserv [RFC2998] to be robust against | |||
| they reserved, either accidentally or deliberately. This memo | re-routes or variation in the traffic matrix. It has been proved | |||
| describes border policing measures so that one network can protect | that Diffserv's over-provisioning requirement grows linearly with the | |||
| its interests, even if networks around it are deliberately trying to | network diameter in hops [QoS_scale]. | |||
| cheat. The approach provides a sufficient emulation of flow rate | ||||
| policing at trust boundaries but without per-flow processing. The | ||||
| emulation is not perfect, but it is sufficient to ensure that the | ||||
| punishment is at least proportionate to the severity of the cheat. | ||||
| Per-flow rate policing for each reservation is still expected to be | ||||
| used at the access edge of the internetwork, but at the borders | ||||
| between networks bulk policing can be used to emulate per-flow | ||||
| policing. | ||||
| The aim is to be able to scale controlled load service to any number | A number of PCN domains can be concatenated into a larger PCN region | |||
| of endpoints, even though such scaling must take account of the | without any per-flow processing between them, but only if each domain | |||
| increasing numbers of networks and users who may all have conflicting | trusts the ingress network to have checked that upstream customers | |||
| interests. To achieve such scaling, this memo combines two recent | aren't taking more bandwidth than they reserved, either accidentally | |||
| proposals, both of which it briefly recaps: | or deliberately. Unfortunately, networks can gain considerably by | |||
| breaking this trust. One way for a network to protect itself against | ||||
| others is to handle flow signalling at its own border and police | ||||
| traffic against reservations itself. However, this reintroduces the | ||||
| per-flow unscalability at borders that Intserv over Diffserv suffers | ||||
| from. | ||||
| o A deployment model for admission control over Diffserv using pre- | This memo describes a protocol called re-PCN that enables bulk border | |||
| congestion notification [I-D.ietf-pcn-architecture] describes how | measurements so that one network can protect its interests, even if | |||
| bulk pre-congestion notification on routers within an edge-to-edge | networks around it are deliberately trying to cheat. The approach | |||
| Diffserv region can emulate the precision of per-flow admission | provides a sufficient emulation of flow rate policing at trust | |||
| control to provide controlled load service without unscalable per- | boundaries but without per-flow processing. Per-flow rate policing | |||
| flow processing; | for each reservation is still expected to be used at the access edge | |||
| of the internetwork, but at the borders between networks bulk | ||||
| policing can be used to emulate per-flow policing. The emulation is | ||||
| not perfect, but it is sufficient to ensure that the punishment is at | ||||
| least proportionate to the severity of the cheat. Re-PCN neither | ||||
| requires the unscalable over-provisioning of Diffserv nor the per- | ||||
| flow processing at borders of Intserv over Diffserv. | ||||
| o Re-ECN: Adding Accountability to TCP/IP [Re-TCP]. The trick that | It should therefore scale controlled load service to the whole | |||
| addresses cheating at borders is to recognise that border policing | internetwork without the cost of Diffserv's linearly increasing over- | |||
| is mainly necessary because cheating upstream networks will admit | provisioning, or the cost of per-flow policing at each border. To | |||
| traffic when they shouldn't only as long as they don't directly | achieve such scaling, this memo combines two recent proposals, both | |||
| experience the downstream congestion their misbehaviour can cause. | of which it briefly recaps: | |||
| The re-ECN protocol requires upstream nodes to declare expected | ||||
| downstream congestion in all forwarded packets and it makes it in | o The pre-congestion notification (PCN) | |||
| their interests to declare it honestly. Operators can then | architecture[I-D.ietf-pcn-architecture] describes how bulk pre- | |||
| monitor downstream congestion in bulk at borders to emulate | congestion notification on routers within an edge-to-edge Diffserv | |||
| policing. | region can emulate the precision of per-flow admission control to | |||
| provide controlled load service without unscalable per-flow | ||||
| processing; | ||||
| o Re-ECN: Adding Accountability to TCP/ | ||||
| IP [I-D.briscoe-tsvwg-re-ecn-tcp]. | ||||
| We coin the term re-PCN for the combination of PCN and re-ECN. | ||||
| The trick that addresses cheating at borders is to recognise that | ||||
| border policing is mainly necessary because cheating upstream | ||||
| networks will admit traffic when they shouldn't only as long as they | ||||
| don't directly experience the downstream congestion their | ||||
| misbehaviour can cause. The re-ECN protocol ensures a network can be | ||||
| made to experience the congestion it causes in other networks. Re- | ||||
| ECN requires the sending node to declare expected downstream | ||||
| congestion in all packets and it makes it in its interest to declare | ||||
| this honestly. At the border between upstream network 'A' and | ||||
| downstream network 'B' (say), both networks can monitor packets | ||||
| crossing the border to measure how much congestion 'A' is causing in | ||||
| 'B' and beyond. 'B' can then include a limit or penalty based on | ||||
| this metric in its contract with 'A'. This is how 'A' experiences | ||||
| the effect of congestion it causes in other networks. 'A' no longer | ||||
| gains by admitting traffic when it shouldn't, which is why we can say | ||||
| re-PCN emulates flow policing, even though it doesn't measure flows. | ||||
| The aim is not to enable a network to _identify_ some remote cheating | The aim is not to enable a network to _identify_ some remote cheating | |||
| party, which would rarely be useful given the victim network would be | party, which would rarely be useful given the victim network would be | |||
| unlikely to be able to seek redress from a cheater in some remote | unlikely to be able to seek redress from a cheater in some remote | |||
| part of the world with whom no direct contractual relationship | part of the world with whom no direct contractual relationship | |||
| exists. Rather the aim is to ensure that any gain from cheating will | exists. Rather the aim is to ensure that any gain from cheating will | |||
| be cancelled out by penalties applied to the cheating party by its | be cancelled out by penalties applied to the cheating party by its | |||
| local network. Further, the solution ensures each of the chain of | local network. Further, the solution ensures each of the chain of | |||
| networks between the cheater and the victim will lose out if it | networks between the cheater and the victim will lose out if it | |||
| doesn't apply penalties to its neighbour. Thus the solution builds | doesn't apply penalties to its neighbour. Thus the solution builds | |||
| on the local bilateral contractual relationships that already exist | on the local bilateral contractual relationships that already exist | |||
| between neighbouring networks. | between neighbouring networks. | |||
| Rather than the end-to-end arrangement used when re-ECN was specified | Rather than the end-to-end arrangement used when re-ECN was specified | |||
| for the TCP transport [Re-TCP], this memo specifies re-ECN in an | for the TCP transport [I-D.briscoe-tsvwg-re-ecn-tcp], this memo | |||
| edge-to-edge arrangement, making it applicable to the above | specifies re-ECN in an edge-to-edge arrangement, making it applicable | |||
| deployment model for admission control over Diffserv. Also, rather | to deployment models where admission control over Diffserv is based | |||
| than using a TCP transport for regular congestion feedback, this memo | on pre-congestion notification. Also, rather than using a TCP | |||
| specifies re-ECN using RSVP as the transport for feedback [RSVP-ECN]. | transport for regular congestion feedback, this memo specifies re-ECN | |||
| A similar deployment model, but with a different transport for | using RSVP as the transport for feedback [RSVP-ECN]. RSVP is used to | |||
| signalling congestion feedback could be used (e.g. Arumaithurai | be concrete, but a similar deployment model, but with a different | |||
| [I-D.arumaithurai-nsis-pcn] and RMD [I-D.ietf-nsis-rmd] use NSIS). | transport for signalling congestion feedback could be used (e.g. | |||
| Arumaithurai [I-D.arumaithurai-nsis-pcn] and RMD [I-D.ietf-nsis-rmd] | ||||
| both use NSIS). | ||||
| This memo aims to do two things: i) define how to apply the re-ECN | This memo aims to do two things: i) define how to apply the re-PCN | |||
| protocol to the admission control over Diffserv scenario; and ii) | protocol to the admission control over Diffserv scenario; and ii) | |||
| explain why re-ECN sufficiently emulates border policing in that | explain why re-PCN sufficiently emulates border policing in that | |||
| scenario. Most of the memo is taken up with the second aim; | scenario. Most of the memo is taken up with the second aim; | |||
| explaining why it works. Applying re-ECN to the scenario actually | explaining why it works. Applying re-PCN to the scenario actually | |||
| involves quite a trivial modification to the ingress gateway. That | involves quite a trivial modification to the ingress gateway. That | |||
| modification can be added to gateways later, so our immediate goal is | modification can be added to gateways later, so our immediate goal is | |||
| to convince everyone to have the foresight to define the PCN wire | to convince everyone to have the foresight to define the PCN wire | |||
| protocol encoding to accommodate the extended codepoints defined in | protocol encoding to accommodate the extended codepoints defined in | |||
| this document, whether first deployments require border policing or | this document, whether first deployments require border policing or | |||
| not. Otherwise, when we want to add policing, we will have built | not. Otherwise, when we want to add policing, we will have built | |||
| ourselves a legacy problem. In other words, we aim to convince | ourselves a legacy problem. In other words, we aim to convince | |||
| people to "Design in security from the start." | people to "Design in security from the start." | |||
| The body of this memo is structured as follows: | The body of this memo is structured as follows: | |||
| Section 3 describes the border policing problem. We recap the | Section 3 describes the border policing problem. We recap the | |||
| traditional, unscalable view of how to solve the problem, and we | traditional, unscalable view of how to solve the problem, and we | |||
| recap the admission control solution which has the scalability we | recap the admission control solution which has the scalability we | |||
| do not want to lose when we add border policing; | do not want to lose when we add border policing; | |||
| Section 4 specifies the re-ECN protocol solution in detail; | Section 4 specifies the re-PCN protocol solution in detail; | |||
| Section 5 explains how to use the protocol to emulate border | Section 5 explains how to use the protocol to emulate border | |||
| policing, and why it works; | policing, and why it works; | |||
| Section 6 analyses the security of the proposed solution; | Section 6 analyses the security of the proposed solution; | |||
| Section 8 explains the sometimes subtle rationale behind our | Section 8 explains the sometimes subtle rationale behind our | |||
| design decisions; | design decisions; | |||
| Section 9 comments on the overall robustness of the security | Section 9 comments on the overall robustness of the security | |||
| skipping to change at page 10, line 49 | skipping to change at page 12, line 41 | |||
| were permitted, the ability of admission control to give assurances | were permitted, the ability of admission control to give assurances | |||
| to other flows will break. | to other flows will break. | |||
| Just as sources need not be trusted to keep within the requested flow | Just as sources need not be trusted to keep within the requested flow | |||
| spec, whole networks might also try to cheat. We will now set up a | spec, whole networks might also try to cheat. We will now set up a | |||
| concrete scenario to illustrate such cheats. Imagine reservations | concrete scenario to illustrate such cheats. Imagine reservations | |||
| for unidirectional flows, through at least two networks, an edge | for unidirectional flows, through at least two networks, an edge | |||
| network and its downstream transit provider. Imagine the edge | network and its downstream transit provider. Imagine the edge | |||
| network charges its retail customers per reservation but also has to | network charges its retail customers per reservation but also has to | |||
| pay its transit provider a charge per reservation. Typically, both | pay its transit provider a charge per reservation. Typically, both | |||
| its selling and buying charges might depend on the duration and rate | the charges for buying from the transit and selling to the retail | |||
| of each reservation. The level of the actual selling and buying | customer might depend on the duration and rate of each reservation. | |||
| prices are irrelevant to our discussion (most likely the network will | The level of the actual selling and buying prices are irrelevant to | |||
| sell at a higher price than it buys, of course). | our discussion (most likely the network will sell at a higher price | |||
| than it buys, of course). | ||||
| A cheating ingress network could systematically reduce the size of | A cheating ingress network could systematically reduce the size of | |||
| its retail customers' reservation signalling requests (e.g. the | its retail customers' reservation signalling requests (e.g. the | |||
| SENDER_TSPEC object in RSVP's PATH message) before forwarding them to | SENDER_TSPEC object in RSVP's PATH message) before forwarding them to | |||
| its transit provider and systematically reinstate the responses on | its transit provider and systematically reinstate the responses on | |||
| the way back (e.g. the FLOWSPEC object in RSVP's RESV message). It | the way back (e.g. the FLOWSPEC object in RSVP's RESV message). It | |||
| would then receive an honest income from its upstream retail customer | would then receive an honest income from its upstream retail customer | |||
| but only pay for fraudulently smaller reservations downstream. A | but only pay for fraudulently smaller reservations downstream. A | |||
| similar but opposite trick (increasing the TSPEC and decreasing the | similar but opposite trick (increasing the TSPEC and decreasing the | |||
| FLOWSPEC) could be perpetrated by the receiver's access network if | FLOWSPEC) could be perpetrated by the receiver's access network if | |||
| the reservation was paid for by the receiver. | the reservation was paid for by the receiver. | |||
| Equivalently, a cheating ingress network may feed the traffic from a | Equivalently, a cheating ingress network may feed the traffic from a | |||
| number of flows into an aggregate reservation over the transit that | number of flows into an aggregate reservation over the transit that | |||
| is smaller than the total of all the flows. Because of these fraud | is smaller than the total of all the flows. Because of these fraud | |||
| possibilities, in traditional QoS reservation architectures the | possibilities, in traditional QoS reservation architectures the | |||
| downstream network polices at each border. The policer checks that | downstream network polices traffic at each border. The policer | |||
| the actual sent data rate of each flow is within the signalled | checks that the actual sent data rate of each flow is within the | |||
| reservation. | signalled reservation. | |||
| Reservation signalling could be authenticated end to end, but this | Reservation signalling could be authenticated end to end, but this | |||
| wouldn't prevent the aggregation cheat just described. For this | wouldn't prevent the aggregation cheat just described. For this | |||
| reason, and to avoid the need for a global PKI, signalling integrity | reason, and to avoid the need for a global PKI, signalling integrity | |||
| is typically only protected on a hop-by-hop basis [RFC2747]. | is typically only protected on a hop-by-hop basis [RFC2747]. | |||
| A variant of the above cheat is where a router in an honest | A variant of the above cheat is where a router in an honest | |||
| downstream network denies admission to a new reservation, but a | downstream network denies admission to a new reservation, but a | |||
| cheating upstream network still admits the flow. For instance, the | cheating upstream network still admits the flow. For instance, the | |||
| networks may be using Diffserv internally, but Intserv admission | networks may be using Diffserv internally, but Intserv admission | |||
| skipping to change at page 12, line 47 | skipping to change at page 14, line 45 | |||
| <-------- edge-to-edge signalling -------> | <-------- edge-to-edge signalling -------> | |||
| (for admission control) | (for admission control) | |||
| <-------------------end-to-end QoS signalling protocol-------------> | <-------------------end-to-end QoS signalling protocol-------------> | |||
| Figure 1: Generic Scenario (see text for explanation of terms) | Figure 1: Generic Scenario (see text for explanation of terms) | |||
| An ingress and egress gateway (Ingr G/W and Egr G/W in Figure 1) | An ingress and egress gateway (Ingr G/W and Egr G/W in Figure 1) | |||
| connect the interior Diffserv region to the edge access networks | connect the interior Diffserv region to the edge access networks | |||
| where routers (not shown) use per-flow reservation processing. | where routers (not shown) use per-flow reservation processing. | |||
| Within the Diffserv region are three interior domains, A, B and C, as | Within the Diffserv region are three interior domains, 'A', 'B' and | |||
| well as the inward facing interfaces of the ingress and egress | 'C', as well as the inward facing interfaces of the ingress and | |||
| gateways. An ingress and egress border router (BR) is shown | egress gateways. An ingress and egress border router (BR) is shown | |||
| interconnecting each interior domain with the next. There may be | interconnecting each interior domain with the next. There will | |||
| other interior routers (not shown) within each interior domain. | typically be other interior routers (not shown) within each interior | |||
| domain. | ||||
| In two paragraphs we now briefly recap how pre-congestion | In two paragraphs we now briefly recap how pre-congestion | |||
| notification is intended to be used to control flow admission to a | notification is intended to be used to control flow admission to a | |||
| large Diffserv region. The first paragraph describes data plane | large Diffserv region. The first paragraph describes data plane | |||
| functions and the second describes signalling in the control plane. | functions and the second describes signalling in the control plane. | |||
| We omit many details from [I-D.ietf-pcn-architecture] including | We omit many details from [I-D.ietf-pcn-architecture] including | |||
| behaviour during routing changes. For brevity here we assume other | behaviour during routing changes. For brevity here we assume other | |||
| flows are already in progress across a path through the Diffserv | flows are already in progress across a path through the Diffserv | |||
| region before a new one arrives, but how bootstrap works is described | region before a new one arrives, but how bootstrap works is described | |||
| in Section 4.3.2. | in Section 4.3.2. | |||
| Figure 1 shows a single simplex reserved flow from the sending (Sx) | Figure 1 shows a single simplex reserved flow from the sending (Sx) | |||
| end host to the receiving (Rx) end host. The ingress gateway polices | end host to the receiving (Rx) end host. The ingress gateway polices | |||
| incoming traffic within its admitted reservation and remarks it to | incoming traffic and colours conforming traffic within an admitted | |||
| turn on an ECN-capable codepoint [RFC3168] and the controlled load | reservation to a combination of Diffserv codepoint and ECN field that | |||
| (CL) Diffserv codepoint. Together, these codepoints define which | defines the traffic as 'PCN-enabled'. This redefines the meaning of | |||
| traffic is entitled to the enhanced scheduling of the CL behaviour | the ECN field as a PCN field, which is largely the same as ECN | |||
| aggregate on routers within the Diffserv region. The CL PHB of | [RFC3168], but with slightly different semantics defined in | |||
| interior routers consists of a scheduling behaviour and a new ECN | [I-D.moncaster-pcn-baseline-encoding] (or various extensions that are | |||
| marking behaviour that we call `pre-congestion notification' [PCN]. | currently experimental). The Diffserv region is called a PCN-region | |||
| The CL PHB simply re-uses the definition of expedited forwarding | because all the queues within it are PCN-enabled. This means the | |||
| (EF) [RFC3246] for its scheduling behaviour. But it incorporates a | per-hop behaviour they apply to PCN-enabled traffic consists of both | |||
| new ECN marking behaviour, which sets the ECN field of an increasing | a scheduling behaviour and a new ECN marking behaviour that we call | |||
| number of CL packets to the admission marked (AM) codepoint as they | `pre-congestion notification' [I-D.eardley-pcn-marking-behaviour]. A | |||
| approach a threshold rate that is lower than the line rate. The use | PCN-enabled queue typically re-uses the definition of expedited | |||
| of virtual queues ensures real queues have hardly built up any | forwarding (EF) [RFC3246] for its scheduling behaviour. The new | |||
| congestion delay. The level of marking detected at the egress of the | congestion marking behaviour sets the PCN field of an increasing | |||
| Diffserv region is then used by the signalling system in order to | proportion of PCN packets to the PCN-marked (PM) codepoint | |||
| determine admission control as follows. | [I-D.moncaster-pcn-baseline-encoding] as their load approaches a | |||
| threshold rate that is lower than the line rate | ||||
| [I-D.eardley-pcn-marking-behaviour]. This can be achieved with an | ||||
| algorithm similar to a token-bucket called a virtual queue. The aim | ||||
| is for a queue to start marking PCN traffic to trigger admission | ||||
| control before the real queue builds up any congestion delay. The | ||||
| level of a queue's pre-congestion marking is detected at the egress | ||||
| of the Diffserv region and used by the signalling system to control | ||||
| admission of further traffic that would otherwise overload that | ||||
| queue, as follows. | ||||
| The end-to-end QoS signalling (e.g. RSVP) for a new reservation | The end-to-end QoS signalling for a new reservation (to be concrete | |||
| takes one giant hop from ingress to egress gateway, because interior | we will use RSVP) takes one giant hop from ingress to egress gateway, | |||
| routers within the Diffserv region are configured to ignore RSVP. | because interior routers within the Diffserv region are configured to | |||
| The egress gateway holds flow state because it takes part in the end- | ignore RSVP. The egress gateway holds flow state because it takes | |||
| to-end reservation. So it can classify all packets by flow and it | part in the end-to-end reservation. So it can classify all packets | |||
| can identify all flows that have the same previous RSVP hop (a CL- | by flow and it can identify all flows that have the same previous | |||
| region-aggregate). For each CL-region-aggregate of flows in | RSVP hop (an ingress-egress-aggregate). For each ingress-egress- | |||
| progress, the egress gateway maintains a per-packet moving average of | aggregate of flows in progress, the egress gateway maintains a per- | |||
| the fraction of pre-congestion-marked traffic. Once an RSVP PATH | packet moving average of the fraction of pre-congestion-marked | |||
| message for a new reservation has hopped across the Diffserv region | traffic. Once an RSVP PATH message for a new reservation has hopped | |||
| and reached the destination, an RSVP RESV message is returned. As | across the Diffserv region and reached the destination, an RSVP RESV | |||
| the RESV message passes, the egress gateway piggy-backs the relevant | message is returned. As the RESV message passes, the egress gateway | |||
| pre-congestion level onto it [RSVP-ECN]. Again, interior routers | piggy-backs the relevant pre-congestion level onto it [RSVP-ECN]. | |||
| ignore the RSVP message, but the ingress gateway strips off the pre- | Again, interior routers ignore the RSVP message, but the ingress | |||
| congestion level. If the pre-congestion level is above a threshold, | gateway strips off the pre-congestion level. If the pre-congestion | |||
| the ingress gateway denies admission to the new reservation, | level is above a threshold, the ingress gateway denies admission to | |||
| otherwise it returns the original RESV signal back towards the data | the new reservation, otherwise it returns the original RESV signal | |||
| sender. | back towards the data sender. | |||
| Once a reservation is admitted, its traffic will always receive low | Once a reservation is admitted, its traffic will always receive low | |||
| delay service for the duration of the reservation. This is because | delay service for the duration of the reservation. This is because | |||
| ingress gateways ensure that traffic not under a reservation cannot | ingress gateways ensure that traffic not under a reservation cannot | |||
| pass into the Diffserv region with the CL DSCP set. So non-reserved | pass into the PCN-region with a Diffserv codepoint that gives it | |||
| traffic will always be treated with a lower priority PHB at each | priority over the capacity used for PCN traffic. | |||
| interior router. And even if some disaster re-routes traffic after | ||||
| it has been admitted, if the traffic through any resource tips over a | Even if some disaster re-routes traffic after it has been admitted, | |||
| fail-safe threshold, pre-congestion notification will trigger flow | if the PCN traffic through any PCN resource tips over a higher, fail- | |||
| pre-emption to very quickly bring every router within the whole | safe threshold, pre-congestion notification can trigger flow | |||
| Diffserv region back below its operating point. | termination to very quickly bring every router within the whole PCN- | |||
| region back below its operating point. The same marking process and | ||||
| ECN codepoint can be used for both admission control and flow | ||||
| termination, by simply triggering them at different fractions of | ||||
| marking [I-D.charny-pcn-single-marking]. However simulations have | ||||
| confirmed that this approach is not robust in all circumstances that | ||||
| might typically be encountered, so approaches with two thresholds and | ||||
| two congestion encodings are expected to be required in production | ||||
| networks. | ||||
| The whole admission control system just described deliberately | The whole admission control system just described deliberately | |||
| confines per-flow processing to the access edges of the network, | confines per-flow processing to the access edges of the network, | |||
| where it will not limit the system's scalability. But ideally we | where it will not limit the system's scalability. But ideally we | |||
| want to extend this approach to multiple networks, to take even more | want to extend this approach to multiple networks, to take even more | |||
| advantage of its scaling potential. We would still need per-flow | advantage of its scaling potential. We would still need per-flow | |||
| processing at the access edges of each network, but not at the high | processing at the access edges of each network, but not at the high | |||
| speed interfaces where they interconnect. Even though such an | speed interfaces where they interconnect. Even though such an | |||
| admission control system would work technically, it would gain us no | admission control system would work technically, it would gain us no | |||
| scaling advantage if each network also wanted to police the rate of | scaling advantage if each network also wanted to police the rate of | |||
| each admitted flow for itself--border routers would still have to do | each admitted flow for itself--border routers would still have to do | |||
| complex packet operations per-flow anyway, given they don't trust | complex packet operations per-flow anyway, given they don't trust | |||
| upstream networks to do their policing for them. | upstream networks to do their policing for them. | |||
| This memo describes how to emulate per-flow rate policing using bulk | This memo describes how to emulate per-flow rate policing using bulk | |||
| mechanisms at border routers, so the full scalability potential of | mechanisms at border routers. Otherwise the full scalability | |||
| pre-congestion notification is not limited by the need for per-flow | potential of pre-congestion notification would be limited by the need | |||
| policing mechanisms at borders, which would make borders the most | for per-flow policing mechanisms at borders, which would make borders | |||
| cost-critical pinch-points. Then we can achieve the long sought-for | the most cost-critical pinch-points. Instead we can achieve the long | |||
| vision of secure Internet-wide bandwidth reservations without needing | sought-for vision of secure Internet-wide bandwidth reservations | |||
| per-flow processing at all in core and border routers--where | without over-generous provisioning or per-flow processing. We still | |||
| scalability is most critical. | use per-flow processing at the edge routers closest to the end-user, | |||
| but we need no per-flow processing at all in core _or border | ||||
| routers_--where scalability is most critical. | ||||
| 4. Re-ECN Protocol for an RSVP (or similar) Transport | 4. Re-ECN Protocol in IP with Two Congestion Marking Levels | |||
| 4.1. Protocol Overview | 4.1. Protocol Overview | |||
| First we need to recap the way routers accumulate congestion marking | First we need to recap the way routers accumulate PCN congestion | |||
| along a path. Each ECN-capable router marks some packets with CE, | marking along a path (it accumulates the same way as ECN). Each PCN- | |||
| the marking probability increasing with the length of the queue at | capable queue into a link might mark some packets with a PCN-marked | |||
| its egress link. The only difference with pre-congestion | (PM) codepoint, the marking probability increasing with the length of | |||
| marking [PCN] is that marking is based on the length of a virtual | the queue [I-D.eardley-pcn-marking-behaviour]. With a series of PCN- | |||
| queue, so that the real queue occupancy can remain very low. We will | capable routers on a path, a stream of packets accumulates the | |||
| use the terms congestion and pre-congestion interchangeably in the | fraction of PCN markings that each queue adds. The combined effect | |||
| following unless it is important to distinguish between them. | of the packet marking of all the queues along the path signals | |||
| congestion of the whole path to the receiver. So, for example, if | ||||
| one queue early in a path is marking 1% of packets and another later | ||||
| in a path is marking 2%, flows that pass through both queues will | ||||
| experience approximately 3% marking over a sequence of packets. | ||||
| With multiple ECN-capable routers on a path, the ECN field | (Note: Whenever the word 'congestion' is used in this document it | |||
| accumulates the fraction of CE marking that each router adds. The | should be taken to mean congestion of the virtual resource assigned | |||
| combined effect of the packet marking of all the routers along the | for use by PCN-traffic. This avoids cumbersome repetition of the | |||
| path signals congestion of the whole path to the receiver. So, for | strictly correct term 'pre-congestion'.) | |||
| example, if one router early in a path is marking 1% of packets and | ||||
| another later in a path is marking 2%, flows that pass through both | ||||
| routers will experience approximately 3% marking. | ||||
| The packets crossing an inter-domain trust boundary within the | The packets crossing an inter-domain trust boundary within the PCN- | |||
| Diffserv region will all have come from different ingress gateways | region will all have come from different ingress gateways and will | |||
| and will all be destined for different egress gateways. We will show | all be destined for different egress gateways. We will show that the | |||
| that the key to policing against theft of service is for a border | key to policing against theft of service is for a border router to be | |||
| router to be able to directly measure the congestion that is about to | able to directly measure the congestion that is about to be caused by | |||
| be caused by the traffic it forwards. That is, it can measure | the packets it forwards into any of the downstream paths between | |||
| locally the congestion on each of the downstream paths between itself | itself and the egress gateways that each packet is destined for. The | |||
| and the egress gateways that its traffic is destined for. | purpose of the re-PCN protocol is to make packets automatically carry | |||
| this information, which then merely needs to be counted locally at | ||||
| the border. | ||||
| With the original ECN protocol, if CE markings crossing the border | With the original PCN protocol, if a border router, e.g. that between | |||
| had been counted over a period, they would have represented the | domains 'A' & 'B' Figure 2), counts PCN markings crossing the border | |||
| accumulated upstream congestion that had already been experienced by | over a period, they represent the accumulated congestion that has | |||
| those packets. The general idea of re-ECN is for the ingress gateway | already been experienced by those packets (congestion upstream of the | |||
| to continuously encode path congestion into the IP header where, in | border, u). The idea of re-PCN is to make the ingress gateway | |||
| this case, `path' means from ingress to egress gateway. Then at any | continuously encode the path congestion it knows into a new field in | |||
| point on that path (e.g. between domains A & B in Figure 2 below), IP | the IP header (in this case, `path' means the path from the ingress | |||
| headers can be monitored to subtract upstream congestion from | to the egress gateway). This new field is _not_ altered by queues | |||
| expected path congestion in order to give the expected downstream | along the path. Then at any point on that path (e.g. between domains | |||
| congestion still to be experienced until the egress gateway. | 'A' & 'B'), IP headers can be monitored to measure both expected path | |||
| congestion, p and upstream congestion, u. Then congestion expected | ||||
| downstream of the border, v, can be derived simply by subtracting | ||||
| upstream congestion from expected path congestion. That is v ~= p - | ||||
| u. | ||||
| Importantly, it turns out that there is no need to monitor downstream | Importantly, it turns out that there is no need to monitor downstream | |||
| congestion on a per-flow basis. We will show that accounting for it | congestion on a per-flow, per-path or per-aggregate basis. We will | |||
| in bulk across all flows will be sufficient. | show that accounting for it in bulk by counting the volume of all | |||
| marked packet will be sufficient. | ||||
| _____________________________________ | _____________________________________ | |||
| _|__ ______ ______ ______ _|__ | _|__ ______ ______ ______ _|__ | |||
| | | | A | | B | | C | | | | | | | A | | B | | C | | | | |||
| +----+ +-+ +-+ +-+ +-+ +-+ +-+ +----+ | +----+ +-+ +-+ +-+ +-+ +-+ +-+ +----+ | |||
| | | |B| |B| |B| |B| |B| |B| | | | | | |B| |B| |B| |B| |B| |B| | | | |||
| |Ingr|==|R| |R|==|R| |R|==|R| |R|==|Egr | | |Ingr|==|R| |R|==|R| |R|==|R| |R|==|Egr | | |||
| |G/W | | | | |: | | | | | | | | |G/W | | |G/W | | | | |: | | | | | | | | |G/W | | |||
| +----+ +-+ +-+: +-+ +-+ +-+ +-+ +----+ | +----+ +-+ +-+: +-+ +-+ +-+ +-+ +----+ | |||
| | | | |: | | | | | | | | | | |: | | | | | | | |||
| skipping to change at page 16, line 26 | skipping to change at page 18, line 34 | |||
| : | : | |||
| | : | | | : | | |||
| |<-upstream-->:<-expected downstream->| | |<-upstream-->:<-expected downstream->| | |||
| | congestion : congestion | | | congestion : congestion | | |||
| | u v ~= p - u | | | u v ~= p - u | | |||
| | | | | | | |||
| |<--- expected path congestion, p --->| | |<--- expected path congestion, p --->| | |||
| Figure 2: Re-ECN concept | Figure 2: Re-ECN concept | |||
| 4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6) | 4.2. Re-PCN Abstracted Network Layer Wire Protocol (IPv4 or v6) | |||
| In this section we define the names of the various codepoints of the | In this section we define the names of the various codepoints of the | |||
| re-ECN protocol when used with pre-congestion notification, deferring | extended ECN field when used with pre-congestion notification, | |||
| description of their semantics to the following sections. But first | deferring description of their semantics to the following sections. | |||
| we recap the re-ECN wire protocol proposed in [Re-TCP]. | But first we recap the re-ECN wire protocol proposed in | |||
| [I-D.briscoe-tsvwg-re-ecn-tcp]. | ||||
| 4.2.1. Re-ECN Recap | 4.2.1. Re-ECN Recap | |||
| Re-ECN uses the two bit ECN field broadly as in RFC3168 [RFC3168]. | Re-ECN uses the two bit ECN field broadly as in RFC3168 [RFC3168]. | |||
| It also uses a new re-ECN extension (RE) flag. The actual position | It also uses a new re-ECN extension (RE) flag. The actual position | |||
| of the RE flag is different between IPv4 & v6 headers so we will use | of the RE flag is different between IPv4 & v6 headers so we will use | |||
| an abstraction of the IPv4 and v6 wire protocols by just calling it | an abstraction of the IPv4 and v6 wire protocols by just calling it | |||
| the RE flag. [Re-TCP] proposes using bit 48 (currently unused) in | the RE flag. [I-D.briscoe-tsvwg-re-ecn-tcp] proposes using bit 48 | |||
| the IPv4 header for the RE flag, while for IPv6 it proposes an ECN | (currently unused) in the IPv4 header for the RE flag, while for IPv6 | |||
| extension header. | it proposes an congestion extension header. | |||
| Unlike the ECN field, the RE flag is intended to be set by the sender | Unlike the ECN field, the RE flag is intended to be set by the sender | |||
| and remain unchanged along the path, although it can be read by | and remain unchanged along the path, although it can be read by | |||
| network elements that understand the re-ECN protocol. In the | network elements that understand the re-ECN protocol. In the | |||
| scenario used in this memo, the ingress gateway acts as a proxy for | scenario used in this memo, the ingress gateway is the 'sender' as | |||
| the sender, setting the RE flag as permitted in the specification of | far as the scope of the PCN region is concerned, so it sets the RE | |||
| re-ECN. | flag (as permitted for sender proxies in the specification of re- | |||
| ECN). | ||||
| Note that general-purpose routers do not have to read the RE flag, | Note that general-purpose routers do not have to read the RE flag, | |||
| only special policing elements at borders do. And no general-purpose | only special policing elements at borders do. And no general-purpose | |||
| routers have to change the RE flag, although the ingress and egress | routers have to change the RE flag, although the ingress and egress | |||
| gateways do because in the edge-to-edge deployment model we are | gateways do because in the edge-to-edge deployment model we are | |||
| using, they act as proxies for the endpoints. Therefore the RE flag | using, they act as the endpoints of the PCN region. Therefore the RE | |||
| does not even have to be visible to interior routers. So the RE flag | flag does not even have to be visible to interior routers. So the RE | |||
| has no implications on protocols like MPLS. Congested label | flag has no implications on protocols like MPLS. Congested label | |||
| switching routers (LSRs) would have to be able to notify their | switching routers (LSRs) would have to be able to notify their | |||
| congestion with an ECN/PCN codepoint in the MPLS shim [RFC5129], but | congestion with an ECN/PCN codepoint in the MPLS shim [RFC5129], but | |||
| like any interior IP router, they can be oblivious to the RE flag, | like any interior IP router, they can be oblivious to the RE flag, | |||
| which need only be read by border policing functions. | which need only be read by border policing functions. | |||
| Although the RE flag is a separate, single bit field, it can be read | Although the RE flag is a separate single bit field, it can be read | |||
| as an extension to the two-bit ECN field; the three concatenated bits | as an extension to the two-bit ECN field; the three concatenated bits | |||
| in what we will call the extended ECN field (EECN) make eight | in what we will call the extended ECN field (EECN) make eight | |||
| codepoints available. When the RE flag setting is "don't care", we | codepoints available. When the RE flag setting is "don't care", we | |||
| use the RFC3168 names of the ECN codepoints, but [Re-TCP] proposes | use the RFC3168 names of the ECN codepoints, but | |||
| the following six codepoint names for when there is a need to be more | [I-D.briscoe-tsvwg-re-ecn-tcp] proposes the following six codepoint | |||
| specific. | names for when there is a need to be more specific. | |||
| +--------+-------------+-------+-------------+----------------------+ | +--------+-------------+-------+-------------+----------------------+ | |||
| | ECN | RFC3168 | RE | Extended | Re-ECN meaning | | | ECN | RFC3168 | RE | Extended | Re-ECN meaning | | |||
| | field | codepoint | flag | ECN | | | | field | codepoint | flag | ECN | | | |||
| | | | | codepoint | | | | | | | codepoint | | | |||
| +--------+-------------+-------+-------------+----------------------+ | +--------+-------------+-------+-------------+----------------------+ | |||
| | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | |||
| | | | | | transport | | | | | | | transport | | |||
| | 00 | Not-ECT | 1 | FNE | Feedback not | | | 00 | Not-ECT | 1 | FNE | Feedback not | | |||
| | | | | | established | | | | | | | established | | |||
| | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | ||||
| | | | | | and RECT | | ||||
| | 01 | ECT(1) | 1 | RECT | Re-ECN capable | | ||||
| | | | | | transport | | ||||
| | 10 | ECT(0) | 0 | --- | Legacy ECN use | | | 10 | ECT(0) | 0 | --- | Legacy ECN use | | |||
| | | | | | only | | | | | | | only | | |||
| | 10 | ECT(0) | 1 | --CU-- | Currently unused | | | 10 | ECT(0) | 1 | --CU-- | Currently unused | | |||
| | | | | | | | | | | | | | | |||
| | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | ||||
| | | | | | and RECT | | ||||
| | 01 | ECT(1) | 1 | RECT | Re-ECN capable | | ||||
| | | | | | transport | | ||||
| | 11 | CE | 0 | CE(0) | Congestion | | | 11 | CE | 0 | CE(0) | Congestion | | |||
| | | | | | experienced with | | | | | | | experienced with | | |||
| | | | | | Re-Echo | | | | | | | Re-Echo | | |||
| | 11 | CE | 1 | CE(-1) | Congestion | | | 11 | CE | 1 | CE(-1) | Congestion | | |||
| | | | | | experienced | | | | | | | experienced | | |||
| +--------+-------------+-------+-------------+----------------------+ | +--------+-------------+-------+-------------+----------------------+ | |||
| Table 1: Re-cap of Default Extended ECN Codepoints Proposed for Re- | Table 1: Re-cap of Default Extended ECN Codepoints Proposed for Re- | |||
| ECN | ECN | |||
| 4.2.2. Re-ECN Combined with Pre-Congestion Notification (re-PCN) | 4.2.2. Re-ECN Combined with Pre-Congestion Notification (re-PCN) | |||
| As permitted by the ECN specification [RFC3168], a proposal is | As permitted by the ECN specification [RFC3168] and by the guidelines | |||
| currently being advanced in the IETF to define different semantics | for specifying alternative semantics for the ECN field [RFC4774], a | |||
| for how routers might mark the ECN field of certain packets. The | proposal is currently being advanced in the IETF to define different | |||
| idea is to be able to notify congestion when the router's load | semantics for how queues might mark the ECN field of certain packets. | |||
| The idea is to be able to notify congestion when the queue's load | ||||
| approaches a logical limit, rather than the physical limit of the | approaches a logical limit, rather than the physical limit of the | |||
| line. This new marking is called pre-congestion notification [PCN] | line. This new marking is called pre-congestion | |||
| and we will use the term PCN-enabled router for a router that can | notification [I-D.eardley-pcn-marking-behaviour] and we will use the | |||
| apply pre-congestion notification marking to the ECN fields of | term PCN-enabled queue for a queue that can apply pre-congestion | |||
| packets. | notification marking to the ECN fields of packets. | |||
| [RFC3168] recommends that a packet's Diffserv codepoint should | [RFC3168] recommends that a packet's Diffserv codepoint should | |||
| determine which type of ECN marking it receives. A Diffserv per-hop | determine which type of ECN marking it receives. A PCN-capable | |||
| behaviour (PHB) can specify that routers should apply pre-congestion | packet must meet two conditions; it must carry a DSCP that has been | |||
| notification marking to PCN-capable packets. We will call this a | associated with PCN marking and it must carry an ECN field that turns | |||
| PCN-enhanced PHB. A PCN-capable packet must meet two conditions, it | on PCN marking. | |||
| must carry a DSCP that maps to a PCN-enhanced PHB and it must carry | ||||
| an ECN field that turns on PCN marking. | ||||
| As an example, the controlled load (CL) PHB might specify expedited | As an example, a packet carrying the VOICE-ADMIT | |||
| forwarding as its scheduling behaviour and PCN marking as its | [I-D.ietf-tsvwg-admitted-realtime-dscp] DSCP would be associated with | |||
| congestion marking behaviour. Then we would say the CL PHB is a PCN- | expedited forwarding [RFC3246] as its scheduling behaviour and pre- | |||
| enhanced PHB, and that packets with a DSCP that maps to the CL PHB | congestion notification as its congestion marking behaviour. PCN | |||
| and with ECN turned on are PCN-capable packets. | would only be turned on within a PCN-region by an ECN codepoint other | |||
| than Not-ECT (00). Then we would describe packets with the VOICE- | ||||
| ADMIT DSCP and with ECN turned on as PCN-capable packets. | ||||
| [PCN] actually proposes that two logical limits should be used for | [I-D.eardley-pcn-marking-behaviour] actually proposes that two | |||
| pre-congestion notification, with the higher limit as a back-stop for | logical limits can be used for pre-congestion notification, with the | |||
| dealing with anomalous events. It envisages PCN will be used to | higher limit as a back-stop for dealing with anomalous events. It | |||
| admission control inelastic real-time traffic, so marking at the | envisages PCN will be used to admission control inelastic real-time | |||
| lower limit will trigger admission control, while at the higher limit | traffic, so marking at the lower limit will trigger admission | |||
| it will trigger flow pre-emption. | control, while at the higher limit it will trigger flow termination. | |||
| Because it needs two types of congestion marking, PCN seems to need | Because it needs two types of congestion marking, PCN needs four | |||
| five states: Not-ECT, ECT (ECN-capable transport), the ECN Nonce, | states: Not PCN-capable (Not-PCN), PCN-capable but not PCN-marked | |||
| Admission Marking (AM) and Flow Pre-emption Marking (PM). [PCN] | (NM), Admission Marked (AM) and Flow Termination Marked (TM). A | |||
| proposes various alternative encodings of the ECN field, attempting | proposed encoding of the four required PCN states is shown on the | |||
| various compromises to fit these five states into the four available | left of Table 2. Note that these codepoints of the ECN field only | |||
| ECN codepoints. | take on the semantics of pre-congestion notification if they are | |||
| combined with a Diffserv codepoint that the operator has configured | ||||
| to be associated with PCN marking. | ||||
| One of the five states to make room for is the ECN Nonce [RFC3540], | This encoding only correctly traverses an IP in IP tunnel if the | |||
| but the capability we describe in this memo supersedes any need for | ideal decapsulation rules in [I-D.briscoe-tsvwg-ecn-tunnel] are | |||
| the Nonce. The ECN Nonce is an elegant scheme, but it only allows a | followed when combining the ECN fields of the outer and inner | |||
| sending node (or its proxy) to detect suppression of congestion | headers. If instead the decapsulation rules in [RFC3168] or | |||
| marking in the feedback loop. Thus the Nonce requires the sender or | [RFC4301] are followed, any admission marking applied to an outer | |||
| its proxy to be trusted to respond correctly to congestion. But this | header will be incorrectly removed on decapsulation at the tunnel | |||
| is precisely the main cheat we want to protect against (as well as | egress. | |||
| many others). | ||||
| One of the compromise protocol encodings that [PCN] explores | The RFC3168 ECN field includes space for the experimental ECN | |||
| ("Alternative 5") leaves out support for the ECN Nonce. Therefore we | Nonce [RFC3540], which seems to require a fifth state if it is also | |||
| use that one. This encoding of PCN markings is shown on the left of | needed with re-PCN. But re-PCN supersedes any need for the Nonce | |||
| Table 2. Note that these codepoints of the ECN field only take on | within the PCN-region. The ECN Nonce is an elegant scheme, but it | |||
| the semantics of pre-congestion notification if they are combined | only allows a sending node (or its proxy) to detect suppression of | |||
| with a Diffserv codepoint that the operator has configured to cause | congestion marking in the feedback loop. Thus the Nonce requires the | |||
| PCN marking, by mapping it to a PCN-enhanced PHB. | sender (or in our case the PCN ingress) to be trusted to respond | |||
| correctly to congestion. But this is precisely the main cheat we | ||||
| want to protect against (as well as many others). Also, the ECN | ||||
| nonce only works once the receiver has placed packets in the same | ||||
| order as they left the ingress, which cannot be done by an edge node | ||||
| without adding unnecessary edge-edge packet ordering. Nonetheless, | ||||
| if the ECN nonce were in use outside the PCN region (end-to-end), the | ||||
| ingress would have to tunnel the arriving IP header across the PCN | ||||
| region ([I-D.ietf-pcn-architecture]). | ||||
| For the rest of this memo, we will not distinguish between Admission | For the rest of this memo, to mean either Admission Marking or | |||
| Marking and Pre-emption Marking unless we need to be specific. We | Termination Marking we will call both "congestion marking" or "PCN | |||
| will call both "congestion marking". With the above encoding, | marking" unless we need to be specific. With the above encoding, | |||
| congestion marking can be read to mean any packet with the left-most | congestion marking can be read to mean any packet with the right-most | |||
| bit of the ECN field set. | bit of the ECN field set. | |||
| The re-ECN protocol can be used to control misbehaving sources | The re-ECN protocol can be used to control misbehaving sources | |||
| whether congestion is with respect to a logical threshold (PCN) or | whether congestion is with respect to a logical threshold (PCN) or | |||
| the physical line rate (ECN). In either case the RE flag can be used | the physical line rate (ECN). In either case the RE flag can be used | |||
| to create an extended ECN field. For PCN-capable packets, the 8 | to create an extended ECN field. For PCN-capable packets, the 8 | |||
| possible encodings of this 3-bit extended ECN (EECN) field are | possible encodings of this 3-bit extended PCN (EPCN) field are | |||
| defined on the right of Table 2 below. The purposes of these | defined on the right of Table 2 below. The purposes of these | |||
| different codepoints will be introduced in subsequent sections. | different codepoints will be introduced in subsequent sections. | |||
| +-------+-----------------+------+--------------+-------------------+ | +--------+-----------+-------+-----------------+--------------------+ | |||
| | ECN | PCN codepoint | RE | Extended ECN | Re-ECN meaning | | | ECN | PCN | RE | Extended PCN | Re-PCN meaning | | |||
| | field | (Alternative 5) | flag | codepoint | | | | field | codepoint | flag | codepoint | | | |||
| +-------+-----------------+------+--------------+-------------------+ | +--------+-----------+-------+-----------------+--------------------+ | |||
| | 00 | Not-ECT | 0 | Not-RECT | Not | | | 00 | Not-PCN | 0 | Not-PCN | Not PCN-capable | | |||
| | | | | | re-ECN-capable | | ||||
| | | | | | transport | | | | | | | transport | | |||
| | 00 | Not-ECT | 1 | FNE | Feedback not | | | 00 | Not-PCN | 1 | FNE | Feedback not | | |||
| | | | | | established | | | | | | | established | | |||
| | 01 | ECT(1) | 0 | Re-Echo | Re-echoed | | | 10 | NM | 0 | Re-PCT-Echo | Re-echoed | | |||
| | | | | | congestion and | | | | | | | congestion and | | |||
| | | | | | RECT | | | | | | | Re-PCT | | |||
| | 01 | ECT(1) | 1 | RECT | Re-ECN capable | | | 10 | NM | 1 | Re-PCT | Re-PCN capable | | |||
| | | | | | transport | | | | | | | transport | | |||
| | 10 | AM | 0 | AM(0) | Admission Marking | | | 01 | AM | 0 | AM(0) | Admission Marking | | |||
| | | | | | with Re-Echo | | | | | | | with Re-Echo | | |||
| | 10 | AM | 1 | AM(-1) | Admission Marking | | | 01 | AM | 1 | AM(-1) | Admission Marking | | |||
| | | | | | | | | | | | | | | |||
| | 11 | PM | 0 | PM(0) | Pre-emption | | | 11 | TM | 0 | TM(0) | Termination | | |||
| | | | | | Marking with | | | | | | | Marking with | | |||
| | | | | | Re-Echo | | | | | | | Re-Echo | | |||
| | 11 | PM | 1 | PM(-1) | Pre-emption | | | 11 | TM | 1 | TM(-1) | Termination | | |||
| | | | | | Marking | | | | | | | Marking | | |||
| +-------+-----------------+------+--------------+-------------------+ | +--------+-----------+-------+-----------------+--------------------+ | |||
| Table 2: Extended ECN Codepoints if the Diffserv codepoint uses Pre- | Table 2: Extended ECN Codepoints if the Diffserv codepoint uses Pre- | |||
| congestion Notification (PCN) | congestion Notification (PCN) | |||
| Note that Table 2 shows re-PCN uses ECT(0) but Table 1 shows re-ECN | ||||
| uses ECT(1) for the unmarked state. The difference is intended-- | ||||
| although it makes it harder to remember the two schemes, it makes | ||||
| them both safer during incremental deployment. | ||||
| 4.3. Protocol Operation | 4.3. Protocol Operation | |||
| 4.3.1. Protocol Operation for an Established Flow | 4.3.1. Protocol Operation for an Established Flow | |||
| The re-ECN protocol involves a simple tweak to the action of the | The re-PCN protocol involves a simple addition to the action of the | |||
| gateway at the ingress edge of the CL region. In the deployment | gateway at the ingress edge of the PCN region (the PCN-ingress-node). | |||
| model just described [I-D.ietf-pcn-architecture], for each active | But first we will recap how PCN works without the addition. For each | |||
| traffic aggregate across the CL region (CL-region-aggregate) the | active traffic aggregate across a PCN region (ingress-egress- | |||
| ingress gateway will hold a fairly recent Congestion-Level-Estimate | aggregate) the egress gateway measures the level of PCN marking and | |||
| that the egress gateway will have fed back to it, piggybacked on the | feeds it back to the ingress piggy-backed as 'PCN-feedback- | |||
| signalling that sets up each flow. For instance, one aggregate might | information' on any control signal passing between the nodes (e.g. | |||
| have been experiencing 3% pre-congestion (that is, congestion marked | every flow set-up, refresh or tear-down). Therefore the ingress | |||
| octets whether Admission Marked or Pre-emption Marked). In this | gateway will always hold a fairly recent (typically at most 30sec) | |||
| case, the ingress gateway MUST clear the RE flag to "0" for the same | estimate of the ingress-egress-aggregate congestion level. For | |||
| percentage of octets of CL-packets (3%) and set it to "1" in the rest | instance, one aggregate might have been experiencing 3% pre- | |||
| congestion (that is, congestion marked octets whether Admission | ||||
| Marked or Termination Marked). | ||||
| To comply with the re-PCN protocol, for all PCN packets in each | ||||
| ingress-egress-aggregate the ingress gateway MUST clear the RE flag | ||||
| to "0" for the same percentage of octets as its current estimate of | ||||
| congestion on the aggregate (e.g. 3%) and set it to "1" in the rest | ||||
| (97%). Appendix A.1 gives a simple pseudo-code algorithm that the | (97%). Appendix A.1 gives a simple pseudo-code algorithm that the | |||
| ingress gateway may use to do this. | ingress gateway may use to do this. | |||
| The RE flag is set and cleared this way round for incremental | The RE flag is set and cleared this way round for incremental | |||
| deployment reasons (see [Re-TCP]). To avoid confusion we will use | deployment reasons (see Section 7). To avoid confusion we will use | |||
| the term `blanking' (rather than marking) when the RE flag is cleared | the term `blanking' (rather than marking) when the RE flag is cleared | |||
| to "0", so we will talk of the `RE blanking fraction' as the fraction | to "0", so we will talk of the `RE blanking fraction' as the fraction | |||
| of octets with the RE flag cleared to "0". | of octets with the RE flag cleared to "0". | |||
| ^ | ^ | |||
| | | | | |||
| | RE blanking fraction | | RE blanking fraction | |||
| 3% | +----------------------------+====+ | 3% | +----------------------------+====+ | |||
| | | | | | | | | | | |||
| 2% | | | | | 2% | | | | | |||
| skipping to change at page 20, line 51 | skipping to change at page 24, line 6 | |||
| | ^ ^ | | | ^ ^ | | |||
| ingress | | egress | ingress | | egress | |||
| 1.00% 2.00% marking fraction | 1.00% 2.00% marking fraction | |||
| Figure 3: Example Extended ECN codepoint Marking fractions | Figure 3: Example Extended ECN codepoint Marking fractions | |||
| (Imprecise) | (Imprecise) | |||
| Figure 3 illustrates our example. The horizontal axis represents the | Figure 3 illustrates our example. The horizontal axis represents the | |||
| index of each congestible resource (typically queues) along a path | index of each congestible resource (typically queues) along a path | |||
| through the Internet. The two superimposed plots show the fraction | through the Internet. The two superimposed plots show the fraction | |||
| of each ECN codepoint observed along this path, assuming there are | of each extended PCN codepoint observed along this path, assuming | |||
| two congested routers somewhere within domains A and C. And Table 3 | there are two congested routers somewhere within domains A and C. And | |||
| below shows the downstream pre-congestion measured at various border | Table 3 below shows the downstream pre-congestion measured at various | |||
| observation points along the path. Figure 4 (later) shows the same | border observation points along the path. Figure 4 (later) shows the | |||
| results of these subtractions, but in graphical form like the above | same results of these subtractions, but in graphical form like the | |||
| figure. The tabulated figures are actually reasonable approximations | above figure. The tabulated figures are actually reasonable | |||
| derived from more precise formulae given in Appendix A of [Re-TCP]. | approximations derived from more precise formulae given in Appendix A | |||
| The RE flag is not changed by interior routers, so it can be seen | of [I-D.briscoe-tsvwg-re-ecn-tcp]. The RE flag is not changed by | |||
| that it acts as a reference against which the congestion marking | interior routers, so it can be seen that it acts as a reference | |||
| fraction can be compared along the path. | against which the congestion marking fraction can be compared along | |||
| the path. | ||||
| +--------------------------+---------------------------------------+ | +--------------------------+---------------------------------------+ | |||
| | Border observation point | Approximate Downstream pre-congestion | | | Border observation point | Approximate Downstream pre-congestion | | |||
| +--------------------------+---------------------------------------+ | +--------------------------+---------------------------------------+ | |||
| | ingress -- A | 3% - 0% = 3% | | | ingress -- A | 3% - 0% = 3% | | |||
| | A -- B | 3% - 1% = 2% | | | A -- B | 3% - 1% = 2% | | |||
| | B -- C | 3% - 1% = 2% | | | B -- C | 3% - 1% = 2% | | |||
| | C -- egress | 3% - 3% = 0% | | | C -- egress | 3% - 3% = 0% | | |||
| +--------------------------+---------------------------------------+ | +--------------------------+---------------------------------------+ | |||
| skipping to change at page 21, line 36 | skipping to change at page 24, line 40 | |||
| aggregate using the most recent feedback from the relevant egress, | aggregate using the most recent feedback from the relevant egress, | |||
| arriving with each new reservation, or each refresh. These updates | arriving with each new reservation, or each refresh. These updates | |||
| arrive relatively infrequently compared to the speed with which | arrive relatively infrequently compared to the speed with which | |||
| congestion changes. Although this feedback will always be out of | congestion changes. Although this feedback will always be out of | |||
| date, on average positive errors should cancel out negative over a | date, on average positive errors should cancel out negative over a | |||
| sufficiently long duration. | sufficiently long duration. | |||
| In summary, the network adds pre-congestion marking in the forward | In summary, the network adds pre-congestion marking in the forward | |||
| data path, the egress feeds its level back to the ingress in RSVP (or | data path, the egress feeds its level back to the ingress in RSVP (or | |||
| similar signalling), then the ingress gateway re-echoes it into the | similar signalling), then the ingress gateway re-echoes it into the | |||
| forward data path by blanking the RE flag. Hence the name re-ECN. | forward data path by blanking the RE flag. Then at any border within | |||
| Then at any border within the Diffserv region, the pre-congestion | the PCN-region, the pre-congestion marking that every passing packet | |||
| marking that every passing packet will be expected to experience | will be expected to experience downstream can be measured to be the | |||
| downstream can be measured to be the RE blanking fraction minus the | RE blanking fraction minus the congestion marking fraction. | |||
| congestion marking fraction. | ||||
| 4.3.2. Aggregate Bootstrap | 4.3.2. Aggregate Bootstrap | |||
| When a new reservation PATH message arrives at the egress, if there | When a new reservation PATH message arrives at the egress, if there | |||
| are currently no flows in progress from the same ingress, there will | are currently no flows in progress from the same ingress, there will | |||
| be no state maintaining the current level of pre-congestion marking | be no state maintaining the current level of pre-congestion marking | |||
| for the aggregate. While the reservation signalling continues onward | for the aggregate. In the case of RSVP reservation signalling, while | |||
| towards the receiving host, the egress gateway returns an RSVP | the signal continues onward towards the receiving host, the egress | |||
| message to the ingress with a flag [RSVP-ECN] asking the ingress to | gateway can return an RSVP message to the ingress with a | |||
| send a specified number of data probes between them. This bootstrap | flag [RSVP-ECN] asking the ingress to send a specified number of data | |||
| behaviour is all described in the deployment | probes between them. The more general possibilities for bootstrap | |||
| model [I-D.ietf-pcn-architecture]. | behaviour are described in the PCN | |||
| architecture [I-D.ietf-pcn-architecture], including using the | ||||
| reservation signal itself as a probe. | ||||
| However, with our new re-ECN scheme, the ingress does not know what | However, with our new re-PCN scheme, the ingress does not know what | |||
| proportion of the data probes should have the RE flag blanked, | proportion of the data probes should have the RE flag blanked, | |||
| because it has no estimate yet of pre-congestion for the path across | because it has no estimate yet of pre-congestion for the path across | |||
| the Diffserv region. | the PCN-region. | |||
| To be conservative, following the guidance for specifying other re- | To be conservative, following the guidance for specifying other re- | |||
| ECN transports in [Re-TCP], the ingress SHOULD set the FNE codepoint | ECN transports in [I-D.briscoe-tsvwg-re-ecn-tcp], the ingress SHOULD | |||
| of the extended ECN header in all probe packets (Table 2). As per | set the FNE codepoint of the extended PCN header in all probe packets | |||
| the deployment model, the egress gateway measures the fraction of | (Table 2). As per the PCN deployment model, the egress gateway | |||
| congestion-marked probe octets and feeds back the resulting pre- | measures the fraction of congestion-marked probe octets and feeds | |||
| congestion level to the ingress, piggy-backed on the returning | back the resulting pre-congestion level to the ingress, piggy-backed | |||
| reservation response (RESV) for the new flow. Probe packets are | on the returning reservation response (RESV) for the new flow. Probe | |||
| identifiable by the egress because they have the ingress as the | packets are identifiable by the egress because they carry the FNE | |||
| source and the egress as the destination in the IP header. | codepoint. | |||
| It may seem inadvisable to expect the FNE codepoint to be set on | It may seem inadvisable to expect the FNE codepoint to be set on | |||
| probes, given legacy firewalls etc. might discard such packets | probes, given legacy firewalls etc. might discard such packets | |||
| (because this flag had no previous legitimate use). However, in the | (because this flag had no previous legitimate use). However, in the | |||
| deployment scenarios envisaged, each domain in the Diffserv region | deployment scenarios envisaged, each domain in the PCN-region has to | |||
| has to be explicitly configured to support the controlled load | be explicitly configured to support the admission controlled service. | |||
| service. So, before deploying the service, the operator MUST | So, before deploying the service, the operator MUST reconfigure such | |||
| reconfigure such a misbehaving middlebox to allow through packets | a badly implemented middlebox to allow through packets with the RE | |||
| with the RE flag set. | flag set. | |||
| Note that we have said SHOULD rather than MUST for the FNE setting | Note that we have said SHOULD rather than MUST for the FNE setting | |||
| behaviour of the ingress for probe packets. This entertains the | behaviour of the ingress for probe packets. This entertains the | |||
| possibility of an ingress implementation having the benefit of other | possibility of an ingress implementation having the benefit of other | |||
| knowledge of the path, which it re-uses for a newly starting | knowledge of the path, which it re-uses for a newly starting | |||
| aggregate. For instance, it may hold cached information from a | aggregate. For instance, it may hold cached information from a | |||
| recent use of the aggregate that is still sufficiently current to be | recent use of the aggregate that is still sufficiently current to be | |||
| useful. | useful. If not all probe packets are set to FNE, the ingress will | |||
| have to ensure probe packets are identifiable by some other means, | ||||
| perhaps by using the egress as the destination address. | ||||
| It might seem pedantic worrying about these few probe packets, but | It might seem pedantic worrying about these few probe packets, but | |||
| this behaviour ensures the system is safe, even if the proportion of | this behaviour ensures the system is safe, even if the proportion of | |||
| probe packets becomes large. | probe packets becomes large. | |||
| 4.3.3. Flow Bootstrap | 4.3.3. Flow Bootstrap | |||
| It might be expected that a new flow within an active aggregate would | It might be expected that a new flow within an active aggregate would | |||
| need no special bootstrap behaviour. If there was an aggregate | need no special bootstrap behaviour. If there was an aggregate | |||
| already in progress between the gateways the new flow was about to | already in progress between the gateways the new flow was about to | |||
| skipping to change at page 23, line 21 | skipping to change at page 26, line 32 | |||
| that sanctions may be too strict at the interface before the egress | that sanctions may be too strict at the interface before the egress | |||
| gateway. It will often be possible to apply sanctions at the | gateway. It will often be possible to apply sanctions at the | |||
| granularity of aggregates rather than flows, but in an internetworked | granularity of aggregates rather than flows, but in an internetworked | |||
| environment it cannot be guaranteed that aggregates will be | environment it cannot be guaranteed that aggregates will be | |||
| identifiable in remote networks. So setting FNE at the start of each | identifiable in remote networks. So setting FNE at the start of each | |||
| flow is a safe strategy. For instance, a remote network may have | flow is a safe strategy. For instance, a remote network may have | |||
| equal cost multi-path (ECMP) routing enabled, causing different flows | equal cost multi-path (ECMP) routing enabled, causing different flows | |||
| between the same gateways to traverse different paths. | between the same gateways to traverse different paths. | |||
| After an idle period of more than 1 second, the ingress gateway | After an idle period of more than 1 second, the ingress gateway | |||
| SHOULD set the EECN field of the next packet it sends to FNE. This | SHOULD set the EPCN field of the next packet it sends to FNE. This | |||
| allows the design of network policers to be deterministic (see | allows the design of network policers to be deterministic (see | |||
| [Re-TCP]). | [I-D.briscoe-tsvwg-re-ecn-tcp]). | |||
| However, if the ingress gateway can guarantee that the network(s) | However, if the ingress gateway can guarantee that the network(s) | |||
| that will carry the flow to its egress gateway all use a common | that will carry the flow to its egress gateway all use a common | |||
| identifier for the aggregate (e.g. a single MPLS network without ECMP | identifier for the aggregate (e.g. a single MPLS network without ECMP | |||
| routing), it MAY NOT set FNE when it adds a new flow to an active | routing), it MAY NOT set FNE when it adds a new flow to an active | |||
| aggregate. And an FNE packet need only be sent if a whole aggregate | aggregate. And an FNE packet need only be sent if a whole aggregate | |||
| has been idle for more than 1 second. | has been idle for more than 1 second. | |||
| 4.3.4. Router Forwarding Behaviour | 4.3.4. Router Forwarding Behaviour | |||
| Adding re-ECN works well without modifying the forwarding behaviour | Adding re-PCN works well with the regular PCN forwarding behaviour of | |||
| of any routers. However, below, two changes are proposed when | interior queues. However, below, two optional changes are proposed | |||
| forwarding packets with a per-hop-behaviour that requires pre- | when forwarding packets with a per-hop-behaviour that requires pre- | |||
| congestion notification: | congestion notification: | |||
| Preferential drop: When a router cannot avoid dropping ECN-capable | Preferential drop: When a router cannot avoid dropping PCN-capable | |||
| packets, preferential dropping of packets with different extended | packets, preferential dropping of packets with different extended | |||
| ECN codepoints SHOULD be implemented between packets within a PHB | PCN codepoints SHOULD be implemented between packets within a PHB | |||
| that uses PCN marking. The drop preference order to use is | that uses PCN marking. The drop preference order to use is | |||
| defined in Table 4. Note that to reduce configuration complexity, | defined in Table 4. Note that to reduce configuration complexity, | |||
| Re-Echo and FNE MAY be given the same drop preference, but if | Re-PCT-Echo and FNE MAY be given the same drop preference, but if | |||
| feasible, FNE should be dropped in preference to Re-Echo. | feasible, FNE SHOULD be dropped in preference to Re-PCT-Echo. | |||
| +---------+-------+----------------+---------+----------------------+ | ||||
| | ECN | RE | Extended ECN | Drop | Re-ECN meaning | | ||||
| | field | flag | codepoint | Pref | | | ||||
| +---------+-------+----------------+---------+----------------------+ | ||||
| | 01 | 0 | Re-Echo | 5/4 | Re-echoed congestion | | ||||
| | | | | | and RECT | | ||||
| | 00 | 1 | FNE | 4 | Feedback not | | ||||
| | | | | | established | | ||||
| | 01 | 1 | RECT | 3 | Re-ECN capable | | ||||
| | | | | | transport | | ||||
| | 10 | 0 | AM(0) | 3 | Admission Marking | | ||||
| | | | | | with Re-Echo | | ||||
| | 10 | 1 | AM(-1) | 3 | Admission Marking | | ||||
| | | | | | | | ||||
| | 11 | 0 | PM(0) | 2 | Pre-emption Marking | | ||||
| | | | | | with Re-Echo | | ||||
| | 11 | 1 | PM(-1) | 2 | Pre-emption Marking | | ||||
| | | | | | | | ||||
| | 00 | 0 | Not-RECT | 1 | Not re-ECN-capable | | ||||
| | | | | | transport | | ||||
| +---------+-------+----------------+---------+----------------------+ | ||||
| Table 4: Drop Preference of Extended ECN Codepoints (1 = drop 1st) | ||||
| Given this proposal is being advanced at the same time as PCN | If this proposal were advanced at the same time as PCN itself, we | |||
| itself, we strongly RECOMMEND that preferential drop based on | would recommend that preferential drop based on extended PCN | |||
| extended ECN codepoint is added to router forwarding at the same | codepoint SHOULD be added to router forwarding at the same time as | |||
| time as PCN marking. Preferential dropping can be difficult to | PCN marking. Preferential dropping can be difficult to implement, | |||
| implement, but we strongly RECOMMEND this security-related re-ECN | but we RECOMMEND this security-related re-PCN improvement where | |||
| improvement where feasible as it is an effective defence against | feasible as it is an effective defence against flooding attacks. | |||
| flooding attacks. | ||||
| Marking vs. Drop: We propose that PCN-routers SHOULD inspect the RE | Marking vs. Drop: We propose that PCN-routers SHOULD inspect the RE | |||
| flag as well as the ECN field to decide whether to drop or mark | flag as well as the ECN field to decide whether to drop or mark | |||
| PCN DSCPs. They MUST choose drop if the codepoint of this | PCN DSCPs. They MUST choose drop if the codepoint of this | |||
| extended ECN field is Not-RECT. Otherwise they SHOULD mark | extended ECN field is Not-PCN. Otherwise they SHOULD mark | |||
| (unless, of course, buffer space is exhausted). | (unless, of course, buffer space is exhausted). | |||
| A PCN-capable router MUST NOT ever congestion mark a packet | A PCN-capable router MUST NOT ever congestion mark a packet | |||
| carrying the Not-RECT codepoint because the transport will only | carrying the Not-PCN codepoint because the transport will only | |||
| understand drop, not congestion marking. But a PCN-capable router | understand drop, not congestion marking. But a PCN-capable router | |||
| can mark rather than drop an FNE packet, even though its ECN field | can mark rather than drop an FNE packet, even though its ECN field | |||
| when looked at in isolation is '00' which appears to be a legacy | when looked at in isolation is '00' which appears to be a legacy | |||
| Not-ECT packet. Therefore, if a packet's RE flag is '1', even if | Not-ECT packet. Therefore, if a packet's RE flag is '1', even if | |||
| its ECN field is '00', a PCN-enabled router SHOULD use congestion | its ECN field is '00', a PCN-enabled router SHOULD use congestion | |||
| marking. This allows the `feedback not established' (FNE) | marking. This allows the `feedback not established' (FNE) | |||
| codepoint to be used for probe packets, in order to pick up PCN | codepoint to be used for probe packets, in order to pick up PCN | |||
| marking when bootstrapping an aggregate. | marking when bootstrapping an aggregate. | |||
| ECN marking rather than dropping of FNE packets MUST only be | PCN marking rather than dropping of FNE packets MUST only be | |||
| deployed in controlled environments, such as that in | deployed in controlled environments, such as that in | |||
| [I-D.ietf-pcn-architecture], where the presence of an egress node | [I-D.ietf-pcn-architecture], where the presence of an egress node | |||
| that understands ECN marking is assured. Congestion events might | that understands PCN marking is assured. Congestion events might | |||
| otherwise be ignored if the receiver only understands drop, rather | otherwise be ignored if the receiver only understands drop, rather | |||
| than ECN marking. This is because there is no guarantee that ECN | than PCN marking. This is because there is no guarantee that PCN | |||
| capability has been negotiated if feedback is not established | capability has been negotiated if feedback is not established | |||
| (FNE). Also, [Re-TCP] places the strong condition that a router | (FNE). Also, [I-D.briscoe-tsvwg-re-ecn-tcp] places the strong | |||
| MUST apply drop rather than marking to FNE packets unless it can | condition that a router MUST apply drop rather than marking to FNE | |||
| guarantee that FNE packets are rate limited either locally or | packets unless it can guarantee that FNE packets are rate limited | |||
| upstream. | either locally or upstream. | |||
| +---------+-------+-----------------+---------+---------------------+ | ||||
| | PCN | RE | Extended PCN | Drop | Re-PCN meaning | | ||||
| | field | flag | codepoint | Pref | | | ||||
| +---------+-------+-----------------+---------+---------------------+ | ||||
| | 10 | 0 | Re-PCT-Echo | 5/4 | Re-echoed | | ||||
| | | | | | congestion and | | ||||
| | | | | | Re-PCT | | ||||
| | 00 | 1 | FNE | 4 | Feedback not | | ||||
| | | | | | established | | ||||
| | 10 | 1 | Re-PCT | 3 | Re-PCN capable | | ||||
| | | | | | transport | | ||||
| | 01 | 0 | AM(0) | 3 | Admission Marking | | ||||
| | | | | | with Re-Echo | | ||||
| | 01 | 1 | AM(-1) | 3 | Admission Marking | | ||||
| | | | | | | | ||||
| | 11 | 0 | TM(0) | 2 | Termination Marking | | ||||
| | | | | | with Re-Echo | | ||||
| | 11 | 1 | TM(-1) | 2 | Termination Marking | | ||||
| | | | | | | | ||||
| | 00 | 0 | Not-PCN | 1 | Not PCN-capable | | ||||
| | | | | | transport | | ||||
| +---------+-------+-----------------+---------+---------------------+ | ||||
| Table 4: Drop Preference of Extended ECN Codepoints (1 = drop 1st) | ||||
| 4.3.5. Extensions | 4.3.5. Extensions | |||
| If a different signalling system, such as NSIS, were used, but it | If a different signalling system, such as NSIS, were used but it | |||
| provided admission control in a similar way, using pre-congestion | provided admission control in a similar way using pre-congestion | |||
| notification (e.g. Arumaithurai [I-D.arumaithurai-nsis-pcn] or | notification (e.g. Arumaithurai [I-D.arumaithurai-nsis-pcn] or | |||
| RMD [I-D.ietf-nsis-rmd]) we believe re-ECN could be used to protect | RMD [I-D.ietf-nsis-rmd]), we believe re-PCN could be used to protect | |||
| against misbehaving networks in the same way as proposed above. | against misbehaving networks in the same way as proposed above. | |||
| 5. Emulating Border Policing with Re-ECN | 5. Emulating Border Policing with Re-ECN | |||
| Note that the re-ECN protocol described in Section 4 above would | The following sections are informative, not normative. The re-PCN | |||
| require standardisation, whereas operators acting in their own | protocol described in Section 4 above would require standardisation, | |||
| interests would be expected to deploy policing and monitoring | whereas operators acting in their own interests would be expected to | |||
| functions similar to those proposed in the sections below without any | deploy policing and monitoring functions similar to those proposed in | |||
| further need for standardisation by the IETF. Flexibility is | the sections below without any further need for standardisation by | |||
| expected in exactly how policing and monitoring is done. | the IETF. Flexibility is expected in exactly how policing and | |||
| monitoring is done. | ||||
| 5.1. Informal Terminology | 5.1. Informal Terminology | |||
| In the rest of this memo, where the context makes it clear, we will | In the rest of this memo, where the context makes it clear, we will | |||
| sometimes loosely use the term `congestion' rather than using the | sometimes loosely use the term `congestion' rather than using the | |||
| stricter `downstream pre-congestion'. Also we will loosely talk of | stricter `downstream pre-congestion'. Also we will loosely talk of | |||
| positive or negative flows, meaning flows where the moving average of | positive or negative flows, meaning flows where the moving average of | |||
| the downstream pre-congestion metric is persistently positive or | the downstream pre-congestion metric is persistently positive or | |||
| negative. The notion of a negative metric arises because it is | negative. The notion of a negative metric arises because it is | |||
| derived by subtracting one metric from another. Of course actual | derived by subtracting one metric from another. Of course actual | |||
| downstream congestion cannot be negative, only the metric can | downstream congestion cannot be negative, only the metric can | |||
| (whether due to time lags or deliberate malice). | (whether due to time lags or deliberate malice). | |||
| Just as we will loosely talk of positive and negative flows, we will | Just as we will loosely talk of positive and negative flows, we will | |||
| also talk of positive or negative packets, meaning packets that | also talk of positive or negative packets, meaning packets that | |||
| contribute positively or negatively to downstream pre-congestion. | contribute positively or negatively to downstream pre-congestion. | |||
| Therefore packets can be considered to have a `worth' of +1, 0 or -1, | Therefore packets can be considered to have a `worth' of +1, 0 or -1, | |||
| which, when multiplied by their size, indicates their contribution to | which, when multiplied by their size, indicates their contribution to | |||
| downstream congestion. Packets will usually be sent with a worth of | downstream congestion. Packets will usually be initialised by the | |||
| 0. Blanking the RE flag increments the worth of a packet to +1. | PCN ingress with a worth of 0. Blanking the RE flag increments the | |||
| Congestion marking a packet decrements its worth (whether admission | worth of a packet to +1. Congestion marking a packet decrements its | |||
| marking or pre-emption marking). Congestion marking a previously | worth (whether admission marking or termination marking). Congestion | |||
| blanked packet cancel out the positive and negative worth of each | marking a previously blanked packet cancels out the positive worth | |||
| marking (a worth of 0). The FNE codepoint is an exception. It has | with the negative worth of the congestion marking (resulting in a | |||
| the same positive worth as a packet with the Re-Echo codepoint. The | packet worth 0). The FNE codepoint is an exception. It has the same | |||
| table below specifies unambiguously the worth of each extended ECN | positive worth as a packet with the Re-PCT-Echo codepoint. The table | |||
| below specifies unambiguously the worth of each extended PCN | ||||
| codepoint. Note the order is different from the previous table to | codepoint. Note the order is different from the previous table to | |||
| emphasise how congestion marking processes decrement the worth. | emphasise how congestion marking processes decrement the worth (with | |||
| the exception of FNE). | ||||
| +---------+-------+-----------------+-------+-----------------------+ | +---------+-------+------------------+-------+----------------------+ | |||
| | ECN | RE | Extended ECN | Worth | Re-ECN meaning | | | ECN | RE | Extended PCN | Worth | Re-PCN meaning | | |||
| | field | flag | codepoint | | | | | field | flag | codepoint | | | | |||
| +---------+-------+-----------------+-------+-----------------------+ | +---------+-------+------------------+-------+----------------------+ | |||
| | 00 | 0 | Not-RECT | n/a | Not re-ECN-capable | | | 00 | 0 | Not-PCN | n/a | Not PCN-capable | | |||
| | | | | | transport | | | | | | | transport | | |||
| | 01 | 0 | Re-Echo | +1 | Re-echoed congestion | | | 10 | 0 | Re-PCT-Echo | +1 | Re-echoed congestion | | |||
| | | | | | and RECT | | | | | | | and Re-PCT | | |||
| | 10 | 0 | AM(0) | 0 | Admission Marking | | | 01 | 0 | AM(0) | 0 | Admission Marking | | |||
| | | | | | with Re-Echo | | | | | | | with Re-Echo | | |||
| | 11 | 0 | PM(0) | 0 | Pre-emption Marking | | | 11 | 0 | TM(0) | 0 | Termination Marking | | |||
| | | | | | with Re-Echo | | | | | | | with Re-Echo | | |||
| | 00 | 1 | FNE | +1 | Feedback not | | | 00 | 1 | FNE | +1 | Feedback not | | |||
| | | | | | established | | | | | | | established | | |||
| | 01 | 1 | RECT | 0 | Re-ECN capable | | | 10 | 1 | Re-PCT | 0 | Re-PCN capable | | |||
| | | | | | transport | | | | | | | transport | | |||
| | 10 | 1 | AM(-1) | -1 | Admission Marking | | | 01 | 1 | AM(-1) | -1 | Admission Marking | | |||
| | | | | | | | | | | | | | | |||
| | 11 | 1 | PM(-1) | -1 | Pre-emption Marking | | | 11 | 1 | TM(-1) | -1 | Termination Marking | | |||
| +---------+-------+-----------------+-------+-----------------------+ | +---------+-------+------------------+-------+----------------------+ | |||
| Table 5: 'Worth' of Extended ECN Codepoints | Table 5: 'Worth' of Extended ECN Codepoints | |||
| 5.2. Policing Overview | 5.2. Policing Overview | |||
| It will be recalled that downstream congestion can be found by | It will be recalled that downstream congestion can be found by | |||
| subtracting upstream congestion from path congestion. Figure 4 | subtracting upstream congestion from path congestion. Figure 4 | |||
| displays the difference between the two plots in Figure 3 to show | displays the difference between the two plots in Figure 3 to show | |||
| downstream pre-congestion across the same path through the Internet. | downstream pre-congestion across the same path through the Internet. | |||
| To emulate border policing, the general idea is for each domain to | To emulate border policing, the general idea is for each domain to | |||
| skipping to change at page 27, line 30 | skipping to change at page 30, line 46 | |||
| 1.00% 2.00%: pre-congestion | 1.00% 2.00%: pre-congestion | |||
| | | | | |||
| sanctions | sanctions | |||
| Figure 4: Policing Framework, showing creation of opposing pressures | Figure 4: Policing Framework, showing creation of opposing pressures | |||
| to under-declare and over-declare downstream pre-congestion, using | to under-declare and over-declare downstream pre-congestion, using | |||
| penalties and sanctions | penalties and sanctions | |||
| These penalties seem to encourage everyone to understate downstream | These penalties seem to encourage everyone to understate downstream | |||
| congestion in order to reduce the penalties they incur. But a | congestion in order to reduce the penalties they incur. But a | |||
| balancing pressure is introduced by the last domain, which applies | balancing pressure is introduced by the last domain (strictly by any | |||
| sanctions to flows if downstream congestion goes negative before the | domain), which applies sanctions to flows if downstream congestion | |||
| egress gateway. The upward arrow at Domain C's border with the | goes negative before the egress gateway. The upward arrow at Domain | |||
| egress gateway represents the incentive the sanctions would create to | C's border with the egress gateway represents the incentive the | |||
| prevent negative traffic. The same upward pressure can be applied at | sanctions would create to prevent negative traffic. The same upward | |||
| any domain border (arrows not shown). | pressure can be applied at any domain border (arrows not shown). | |||
| Any flow that persistently goes negative by the time it leaves a | Any flow that persistently goes negative by the time it leaves a | |||
| domain must not have been marked correctly in the first place. A | domain must not have been marked correctly in the first place. A | |||
| domain that discovers such a flow can adopt a range of strategies to | domain that discovers such a flow can adopt a range of strategies to | |||
| protect itself. Which strategy it uses will depend on policy, | protect itself. Which strategy it uses will depend on policy, | |||
| because it cannot immediately assume malice--there may be an innocent | because it cannot immediately assume malice--there may be an innocent | |||
| configuration error somewhere in the system. | configuration error somewhere in the system. | |||
| This memo does not propose to standardise any particular mechanism to | This memo does not propose to standardise any particular mechanism to | |||
| detect persistently negative flows, but Section 5.5 does give | detect persistently negative flows, but Section 5.5 does give | |||
| examples. Note that we have used the term flow, but there will be no | examples. Note that we have used the term flow, but there will be no | |||
| need to bury into the transport layer for port numbers; identifiers | need to bury into the transport layer for port numbers; identifiers | |||
| visible in the network layer will be sufficient (IP address pair, | visible in the network layer will be sufficient (IP address pair, | |||
| DSCP, protocol ID). The appendix also gives a mechanism to bound the | DSCP, protocol ID). The appendix also gives a mechanism to limit the | |||
| required flow state, preventing state exhaustion attacks. | required flow state, preventing state exhaustion attacks. | |||
| Of course, some domains may trust other domains to comply with | Of course, some domains may trust other domains to comply with | |||
| admission control without applying sanctions or penalties. In these | admission control without applying sanctions or penalties. In these | |||
| cases, the protocol should still be used but no penalties need be | cases, the protocol should still be used but no penalties need be | |||
| applied. The re-ECN protocol ensures downstream pre-congestion | applied. The re-PCN protocol ensures downstream pre-congestion | |||
| marking is passed on correctly whether or not penalties are applied | marking is passed on correctly whether or not penalties are applied | |||
| to it, so the system works just as well with a mixture of some | to it, so the system works just as well with a mixture of some | |||
| domains trusting each other and others not. | domains trusting each other and others not. | |||
| Providers should be free to agree the contractual terms they wish | Providers should be free to agree the contractual terms they wish | |||
| between themselves, so this memo does not propose to standardise how | between themselves, so this memo does not propose to standardise how | |||
| these penalties would be applied. It is sufficient to standardise | these penalties would be applied. It is sufficient to standardise | |||
| the re-ECN protocol so the downstream pre-congestion metric is | the re-PCN protocol so the downstream pre-congestion metric is | |||
| available if providers choose to use it. However, the next section | available if providers choose to use it. However, the next section | |||
| (Section 5.3) gives some examples of how these penalties might be | (Section 5.3) gives some examples of how these penalties might be | |||
| implemented. | implemented. | |||
| 5.3. Pre-requisite Contractual Arrangements | 5.3. Pre-requisite Contractual Arrangements | |||
| The re-ECN protocol has been chosen to solve the policing problem | The re-PCN protocol has been chosen to solve the policing problem | |||
| because it embeds a downstream pre-congestion metric in passing CL | because it embeds a downstream pre-congestion metric in passing PCN | |||
| traffic that is difficult to lie about and can be measured in bulk. | traffic that is difficult to lie about and can be measured in bulk. | |||
| The ability to emulate border policing depends on network operators | The ability to emulate border policing depends on network operators | |||
| choosing to use this metric as one of the elements in their contracts | choosing to use this metric as one of the elements in their contracts | |||
| with each other. | with each other. | |||
| Already many inter-domain agreements involve a capacity and a usage | Already many inter-domain agreements involve a capacity and a usage | |||
| element. The usage element may be based on volume or various | element. The usage element may be based on volume or various | |||
| measures of peak demand. We expect that those network operators who | measures of peak demand. We expect that those network operators who | |||
| choose to use pre-congestion notification for admission control would | choose to use pre-congestion notification for admission control would | |||
| also be willing to consider using this downstream pre-congestion | also be willing to consider using this downstream pre-congestion | |||
| metric as a usage element in their interconnection contracts for | metric as a usage element in their interconnection contracts for | |||
| admission controlled (CL) traffic. | admission controlled (PCN) traffic. | |||
| Congestion (or pre-congestion) has the dimension of [octet], being | Congestion (or pre-congestion) has the dimension of [octet], being | |||
| the product of volume transferred [octet] and the congestion fraction | the product of volume transferred [octet] and the congestion fraction | |||
| [dimensionless], which is the fraction of the offered load that the | [dimensionless], which is the fraction of the offered load that the | |||
| network isn't able to serve (or would rather not serve in the case of | network isn't able to serve (or would rather not serve in the case of | |||
| pre-congestion). Measuring downstream congestion gives a measure of | pre-congestion). Measuring downstream congestion gives a measure of | |||
| the volume transferred but modulated by congestion expected | the volume transferred but modulated by congestion expected | |||
| downstream. So volume transferred during off-peak periods counts as | downstream. So volume transferred during off-peak periods counts as | |||
| nearly nothing, while volume transferred at peak times counts very | nearly nothing, while volume transferred at peak times or over | |||
| highly. The re-ECN protocol allows one network to measure how much | temporarily congested links counts very highly. The re-PCN protocol | |||
| pre-congestion has been `dumped' into it by another network. And | allows one network to measure how much pre-congestion has been | |||
| then in turn how much of that pre-congestion it dumped into the next | `dumped' into it by another network. And then in turn how much of | |||
| downstream network. | that pre-congestion it dumped into the next downstream network. | |||
| Section 5.6 describes mechanisms for calculating border penalties | Section 5.6 describes mechanisms for calculating border penalties | |||
| referring to Appendix A.2 for suggested metering algorithms for | referring to Appendix A.2 for suggested metering algorithms for | |||
| downstream congestion at a border router. Conceptually, it could | downstream congestion at a border router. Conceptually, it could | |||
| hardly be simpler. It broadly involves accumulating the volume of | hardly be simpler. It broadly involves accumulating the volume of | |||
| packets with the RE flag blanked and the volume of those with | packets with the RE flag blanked and the volume of those with | |||
| congestion marking then subtracting the two. | congestion marking then subtracting the two. | |||
| Once this downstream pre-congestion metric is available, operators | Once this downstream pre-congestion metric is available, operators | |||
| are free to choose how they incorporate it into their interconnection | are free to choose how they incorporate it into their interconnection | |||
| skipping to change at page 30, line 9 | skipping to change at page 33, line 25 | |||
| other words, penalties are always paid in the same direction as the | other words, penalties are always paid in the same direction as the | |||
| data, and never against the data flow, even if downstream congestion | data, and never against the data flow, even if downstream congestion | |||
| seems to be negative. This is consistent with the definition of | seems to be negative. This is consistent with the definition of | |||
| physical congestion; when a resource is underutilised, it is not | physical congestion; when a resource is underutilised, it is not | |||
| negatively congested. Its congestion is just zero. So, although | negatively congested. Its congestion is just zero. So, although | |||
| short periods of negative marking can be tolerated to correct | short periods of negative marking can be tolerated to correct | |||
| temporary over-declarations due to lags in the feedback system, | temporary over-declarations due to lags in the feedback system, | |||
| persistent downstream negative congestion can have no physical | persistent downstream negative congestion can have no physical | |||
| meaning and therefore must signify a problem. The incentive for | meaning and therefore must signify a problem. The incentive for | |||
| domains not to tolerate persistently negative traffic depends on this | domains not to tolerate persistently negative traffic depends on this | |||
| principle that penalties must never be paid against the data flow. | principle that negative penalties must never be paid for negative | |||
| congestion. | ||||
| Also note that at the last egress of the Diffserv region, domain C | Also note that at the last egress of the PCN-region, domain C should | |||
| should not agree to pay any penalties to the egress gateway for pre- | not agree to pay any penalties to the egress gateway for pre- | |||
| congestion passed to the egress gateway. Downstream pre-congestion | congestion passed to the egress gateway. Downstream pre-congestion | |||
| to the egress gateway should have reached zero here. If domain C | to the egress gateway should have reached zero here. If domain C | |||
| were to agree to pay for any remaining downstream pre-congestion, it | were to agree to pay for any remaining downstream pre-congestion, it | |||
| would give the egress gateway an incentive to over-declare pre- | would give the egress gateway an incentive to over-declare pre- | |||
| congestion feedback and take the resulting profit from domain C. | congestion feedback and take the resulting profit from domain C. | |||
| To focus the discussion, from now on, unless otherwise stated, we | To focus the discussion, from now on, unless otherwise stated, we | |||
| will assume a downstream network charges its upstream neighbour in | will assume a downstream network charges its upstream neighbour in | |||
| proportion to the pre-congestion it sends (V_b in the notation of | proportion to the pre-congestion it sends (V_b in the notation of | |||
| Appendix A.2). Effectively tiered thresholds would be just more | Appendix A.2). Effectively tiered thresholds would be just more | |||
| coarse-grained approximations of the fine-grained case we choose to | coarse-grained approximations of the fine-grained case we choose to | |||
| examine. If these neighbours had previously agreed that the (fixed) | examine. If these neighbours had previously agreed that the (fixed) | |||
| price per octet of pre-congestion would be L, then the bill at the | price per octet of pre-congestion would be L, then the bill at the | |||
| end of the month would simply be the product L*V_b, plus any fixed | end of the month would simply be the product L*V_b, plus any fixed | |||
| charges they may also have agreed. | charges they may also have agreed. | |||
| We are well aware that the IETF tries to avoid standardising | We are well aware that the IETF tries to avoid standardising | |||
| technology that depends on a particular business model. Indeed, this | technology that depends on a particular business model. Indeed, this | |||
| principle is at the heart of all our own work. Our aim here is to | principle is at the heart of all our own work. Our aim here is to | |||
| make a new metric available that we believe is superior to all | make a new metric available that we believe is superior to all | |||
| existing metrics. Then, our aim is to show that border policing can | existing metrics. Then, our aim is to show that bulk border policing | |||
| at least work with the one model we have just outlined. We assume | can at least work with the one model we have just outlined. Of | |||
| that operators might then experiment with the metric in other models. | course, operators are free to complement this pre-congestion-based | |||
| Of course, operators are free to complement this pre-congestion-based | ||||
| usage element of their charges with traditional capacity charging, | usage element of their charges with traditional capacity charging, | |||
| and we expect they will. | and we expect they will. But if operators don't want to use this | |||
| business model at all, they don't have to do bulk border policing. | ||||
| We also assume that operators might experiment with the metric in | ||||
| other models. | ||||
| Also note well that everything we discuss in this memo only concerns | Also note well that everything we discuss in this memo only concerns | |||
| interconnection within the Diffserv region. ISPs are free to sell or | interconnection within the PCN-region. ISPs are free to sell or give | |||
| give away reservations however they want on the retail market. But | away reservations however they want on the retail market. But of | |||
| of course, interconnection charges will have a bearing on that. | course, interconnection charges will have a bearing on that. Indeed, | |||
| Indeed, in the present scenario, the ingress gateway effectively | in the present scenario, the ingress gateway effectively sells | |||
| sells reservations on one side and buys congestion penalties on the | reservations on one side and buys congestion penalties on the other. | |||
| other. As congestion rises, one can imagine the gateway discovering | As congestion rises, one can imagine the gateway discovering that | |||
| that congestion penalties have risen higher than the (probably fixed) | congestion penalties have risen higher than the (probably fixed) | |||
| revenue it will earn from selling the next flow reservation. This | revenue it will earn from selling the next flow reservation. This | |||
| encourages the gateway to cut its losses by blocking new calls, which | encourages the gateway to cut its losses by blocking new calls, which | |||
| is why we believe downstream congestion penalties can emulate per- | is why we believe downstream congestion penalties can emulate per- | |||
| flow rate policing at borders, as the next section explains. | flow rate policing at borders, as the next section explains. | |||
| 5.4. Emulation of Per-Flow Rate Policing: Rationale and Limits | 5.4. Emulation of Per-Flow Rate Policing: Rationale and Limits | |||
| The important feature of charging in proportion to congestion volume | The important feature of charging in proportion to congestion volume | |||
| is that the penalty aggregates and disaggregates correctly along with | is that the penalty aggregates and disaggregates correctly along with | |||
| packet flows. This is because the penalty rises linearly with bit | packet flows. This is because the penalty rises linearly with bit | |||
| skipping to change at page 31, line 36 | skipping to change at page 35, line 7 | |||
| utilisation of a particular resource. So if someone tries to push | utilisation of a particular resource. So if someone tries to push | |||
| another flow into a path that is already signalling enough pre- | another flow into a path that is already signalling enough pre- | |||
| congestion to warrant admission control, the penalty will be a lot | congestion to warrant admission control, the penalty will be a lot | |||
| greater than it would have been to add the same flow to a less | greater than it would have been to add the same flow to a less | |||
| congested path. This makes the incentive system fairly insensitive | congested path. This makes the incentive system fairly insensitive | |||
| to the actual level of pre-congestion for triggering admission | to the actual level of pre-congestion for triggering admission | |||
| control that each ingress chooses. The deterrent against exceeding | control that each ingress chooses. The deterrent against exceeding | |||
| whatever threshold is chosen rises very quickly with a small amount | whatever threshold is chosen rises very quickly with a small amount | |||
| of cheating. | of cheating. | |||
| These are the properties that allow re-ECN to emulate per-flow border | These are the properties that allow re-PCN to emulate per-flow border | |||
| policing of both rate and admission control. It is not a perfect | policing of both rate and admission control. It is not a perfect | |||
| emulation of per-flow border policing, but we claim it is sufficient | emulation of per-flow border policing, but we claim it is sufficient | |||
| to at least ensure the cost to others of a cheat is borne by the | to at least ensure the cost to others of a cheat is borne by the | |||
| cheater, because the penalties are at least proportionate to the | cheater, because the penalties are at least proportionate to the | |||
| level of the cheat. If an edge network operator is selling | level of the cheat. If an edge network operator is selling | |||
| reservations at a large profit over the congestion cost, these pre- | reservations at a large profit over the congestion cost, these pre- | |||
| congestion penalties will not be sufficient to ensure networks in the | congestion penalties will not be sufficient to ensure networks in the | |||
| middle get a share of those profits, but at least they can cover | middle get a share of those profits, but at least they can cover | |||
| their costs. | their costs. | |||
| skipping to change at page 32, line 20 | skipping to change at page 35, line 40 | |||
| the price L (per octet) of pre-congestion would be about 1000 times | the price L (per octet) of pre-congestion would be about 1000 times | |||
| the previously used (per octet) price for volume. We should add that | the previously used (per octet) price for volume. We should add that | |||
| a switch to pre-congestion is unlikely to exactly maintain the same | a switch to pre-congestion is unlikely to exactly maintain the same | |||
| overall level of usage charges, but this argument will be | overall level of usage charges, but this argument will be | |||
| approximately true, because usage charge will rise to at least the | approximately true, because usage charge will rise to at least the | |||
| level the market finds necessary to push back against usage. | level the market finds necessary to push back against usage. | |||
| From the above example it can be seen why a 1000x higher price will | From the above example it can be seen why a 1000x higher price will | |||
| make operators become acutely sensitive to the congestion they cause | make operators become acutely sensitive to the congestion they cause | |||
| in other networks, which is of course the desired effect; to | in other networks, which is of course the desired effect; to | |||
| encourage networks to _control_ the congestion they allow their users | encourage networks to _avoid_ the congestion they allow their users | |||
| to cause to others. | to cause to others. | |||
| If any network sends even one flow at higher rate, they will | If any network sends even one flow at higher rate, they will | |||
| immediately have to pay proportionately more usage charges. Because | immediately have to pay proportionately more usage charges. Because | |||
| there is no knowledge of reservations within the Diffserv region, no | there is no knowledge of reservations within the PCN-region, no | |||
| interior router can police whether the rate of each flow is greater | interior router can police whether the rate of each flow is greater | |||
| than each reservation. So the system doesn't truly emulate rate- | than each reservation. So the system doesn't truly emulate rate- | |||
| policing of each flow. But there is no incentive to pack a higher | policing of each flow. But there is no incentive to pack a higher | |||
| rate into a reservation, because the charges are directly | rate into a reservation, because the charges are directly | |||
| proportional to rate, irrespective of the reservations. | proportional to rate, irrespective of the reservations. | |||
| However, if virtual queues start to fill on any path, even though | However, if virtual queues start to fill on any path, even though | |||
| real queues will still be able to provide low latency service, pre- | real queues will still be able to provide low latency service, pre- | |||
| congestion marking will rise fairly quickly. It may eventually reach | congestion marking will rise fairly quickly. It may eventually reach | |||
| the threshold where the ingress gateway would deny admission to new | the threshold where the ingress gateway would deny admission to new | |||
| skipping to change at page 32, line 49 | skipping to change at page 36, line 22 | |||
| control should have been invoked. The ingress gateway will have to | control should have been invoked. The ingress gateway will have to | |||
| pay the penalty for such an extremely high pre-congestion level, so | pay the penalty for such an extremely high pre-congestion level, so | |||
| the pressure to invoke admission control should become unbearable. | the pressure to invoke admission control should become unbearable. | |||
| The above mechanisms protect against rational operators. In | The above mechanisms protect against rational operators. In | |||
| Section 5.6.3 we discuss how networks can protect themselves from | Section 5.6.3 we discuss how networks can protect themselves from | |||
| accidental or deliberate misconfiguration in neighbouring networks. | accidental or deliberate misconfiguration in neighbouring networks. | |||
| 5.5. Sanctioning Dishonest Marking | 5.5. Sanctioning Dishonest Marking | |||
| As CL traffic leaves the last network before the egress gateway | As PCN traffic leaves the last network before the egress gateway | |||
| (domain C) the RE blanking fraction should match the congestion | (domain 'C' in Figure 4) the RE blanking fraction should match the | |||
| marking fraction, when averaged over a sufficiently long duration | congestion marking fraction, when averaged over a sufficiently long | |||
| (perhaps ~10s to allow a few rounds of feedback through regular | duration (perhaps ~10s to allow a few rounds of feedback through | |||
| signalling of new and refreshed reservations). | regular signalling of new and refreshed reservations). | |||
| To protect itself, domain C should install a monitor at its egress. | To protect itself, domain 'C' should install a monitor at its egress. | |||
| It aims to detect flows of CL packets that are persistently negative. | It aims to detect flows of PCN packets that are persistently | |||
| If flows are positive, domain C need take no action--this simply | negative. If flows are positive, domain 'C' need take no action-- | |||
| means an upstream network must be paying more penalties than it needs | this simply means an upstream network must be paying more penalties | |||
| to. Appendix A.3 gives a suggested algorithm for the monitor, | than it needs to. Appendix A.3 gives a suggested algorithm for the | |||
| meeting the criteria below. | monitor, meeting the criteria below. | |||
| o It SHOULD introduce minimal false positives for honest flows; | o It SHOULD introduce minimal false positives for honest flows; | |||
| o It SHOULD quickly detect and sanction dishonest flows (minimal | o It SHOULD quickly detect and sanction dishonest flows (minimal | |||
| false negatives); | false negatives); | |||
| o It MUST be invulnerable to state exhaustion attacks from malicious | o It MUST be invulnerable to state exhaustion attacks from malicious | |||
| sources. For instance, if the dropper uses flow-state, it should | sources. For instance, if the dropper uses flow-state, it should | |||
| not be possible for a source to send numerous packets, each with a | not be possible for a source to send numerous packets, each with a | |||
| different flow ID, to force the dropper to exhaust its memory | different flow ID, to force the dropper to exhaust its memory | |||
| capacity; | capacity; | |||
| o It MUST introduce sufficient loss in goodput so that malicious | o If drop is used as a sanction, it SHOULD introduce sufficient loss | |||
| sources cannot play off losses in the egress dropper against | in goodput so that malicious sources cannot play off losses in the | |||
| higher allowed throughput. Salvatori [CLoop_pol] describes this | egress dropper against higher allowed throughput. | |||
| attack, which involves the source understating path congestion | Salvatori [CLoop_pol] describes this attack, which involves the | |||
| then inserting forward error correction (FEC) packets to | source understating path congestion then inserting forward error | |||
| compensate expected losses. | correction (FEC) packets to compensate expected losses. | |||
| Note that the monitor operates on flows but with careful design we | Note that the monitor operates on flows but with careful design we | |||
| can avoid per-flow state. This is why we have been careful to ensure | can avoid per-flow state. This is why we have been careful to ensure | |||
| that all flows MUST start with a packet marked with the FNE | that all flows MUST start with a packet marked with the FNE | |||
| codepoint. If a flow does not start with the FNE codepoint, a | codepoint. If a flow does not start with the FNE codepoint, a | |||
| monitor is likely to treat it unfavourably. This risk makes it worth | monitor is likely to treat it unfavourably. This risk makes it worth | |||
| setting the FNE codepoint at the start of a flow, even though there | setting the FNE codepoint at the start of a flow, even though there | |||
| is a cost to setting FNE (positive `worth'). | is a cost to setting FNE (positive `worth'). | |||
| Starting flows with an FNE packet also means that a monitor will be | Starting flows with an FNE packet also means that a monitor will be | |||
| skipping to change at page 34, line 9 | skipping to change at page 37, line 31 | |||
| across flows, a monitor MUST ignore packets with the FNE codepoint | across flows, a monitor MUST ignore packets with the FNE codepoint | |||
| set. An ingress gateway sets the FNE codepoint when it does not have | set. An ingress gateway sets the FNE codepoint when it does not have | |||
| the benefit of feedback from the egress. So counting packets with | the benefit of feedback from the egress. So counting packets with | |||
| FNE cleared would be likely to make the average unnecessarily | FNE cleared would be likely to make the average unnecessarily | |||
| positive, providing headroom (or should we say footroom?) for | positive, providing headroom (or should we say footroom?) for | |||
| dishonest (negative) traffic. | dishonest (negative) traffic. | |||
| If the monitor detects a persistently negative flow, it could drop | If the monitor detects a persistently negative flow, it could drop | |||
| sufficient negative and neutral packets to force the flow to not be | sufficient negative and neutral packets to force the flow to not be | |||
| negative. This is the approach taken for the `egress dropper' in | negative. This is the approach taken for the `egress dropper' in | |||
| [Re-TCP], but for the scenario in this memo, where everyone would | [I-D.briscoe-tsvwg-re-ecn-tcp], but for the scenario in this memo, | |||
| expect everyone else to keep to the protocol, a management alarm | where everyone would expect everyone else to keep to the protocol, a | |||
| SHOULD be raised on detecting persistently negative traffic and any | management alarm SHOULD be raised on detecting persistently negative | |||
| automatic sanctions taken SHOULD be logged. Even if the chosen | traffic and any automatic sanctions taken SHOULD be logged. Even if | |||
| policy is to take no automatic action, the cause can then be | the chosen policy is to take no automatic action, the cause can then | |||
| investigated manually. | be investigated manually. | |||
| Then all ingresses cannot understate downstream pre-congestion | Then all ingresses cannot understate downstream pre-congestion | |||
| without their action being logged. So network operators can deal | without their action being logged. So network operators can deal | |||
| with offending networks at the human level, out of band. As a last | with offending networks at the human level, out of band. As a last | |||
| resort, perhaps where the ingress gateway address seems to have been | resort, perhaps where the ingress gateway address seems to have been | |||
| spoofed in the signalling, packets can be dropped. Drops could be | spoofed in the signalling, packets can be dropped. Drops could be | |||
| focused on just sufficient packets in misbehaving flows to remove the | focused on just sufficient packets in misbehaving flows to remove the | |||
| negative bias while doing minimal harm. | negative bias while doing minimal harm. | |||
| A future version of this memo may define a control message that could | A future version of this memo may define a control message that could | |||
| skipping to change at page 34, line 43 | skipping to change at page 38, line 17 | |||
| traffic caused sufficient congestion to lead to drop but they | traffic caused sufficient congestion to lead to drop but they | |||
| understated path congestion to avoid penalties for causing high | understated path congestion to avoid penalties for causing high | |||
| congestion, the preferential drop recommendations in Section 4.3.4 | congestion, the preferential drop recommendations in Section 4.3.4 | |||
| would at least ensure that these flows would always be dropped before | would at least ensure that these flows would always be dropped before | |||
| honest flows.. | honest flows.. | |||
| 5.6. Border Mechanisms | 5.6. Border Mechanisms | |||
| 5.6.1. Border Accounting Mechanisms | 5.6.1. Border Accounting Mechanisms | |||
| One of the main design goals of re-ECN was for border security | One of the main design goals of re-PCN was for border security | |||
| mechanisms to be as simple as possible, otherwise they would become | mechanisms to be as simple as possible, otherwise they would become | |||
| the pinch-points that limit scalability of the whole internetwork. | the pinch-points that limit scalability of the whole internetwork. | |||
| As the title of this memo suggests, we want to avoid per-flow | As the title of this memo suggests, we want to avoid per-flow | |||
| processing at borders. We also want to keep to passive mechanisms | processing at borders. We also want to keep to passive mechanisms | |||
| that can monitor traffic in parallel to forwarding, rather than | that can monitor traffic in parallel to forwarding, rather than | |||
| having to filter traffic inline--in series with forwarding. As data | having to filter traffic inline--in series with forwarding. As data | |||
| rates continue to rise, we suspect that all-optical interconnection | rates continue to rise, we suspect that all-optical interconnection | |||
| between networks will soon be a requirement. So we want to avoid any | between networks will soon be a requirement. So we want to avoid any | |||
| new need for buffering (even though border filtering is current | new need for buffering (even though border filtering is current | |||
| practice for other reasons, we don't want to make it even less likely | practice for other reasons, we don't want to make it even less likely | |||
| that we will ever get rid of it). | that we will ever get rid of it). | |||
| So far, we have been able to keep the border mechanisms simple, | So far, we have been able to keep the border mechanisms simple, | |||
| despite having had to harden them against some subtle attacks on the | despite having had to harden them against some subtle attacks on the | |||
| re-ECN design. The mechanisms are still passive and avoid per-flow | re-PCN design. The mechanisms are still passive and avoid per-flow | |||
| processing, although we do use filtering as a fail-safe to | processing, although we do use filtering as a fail-safe to | |||
| temporarily shield against extreme events in other networks, such as | temporarily shield against extreme events in other networks, such as | |||
| accidental misconfigurations (Section 5.6.3). | accidental misconfigurations (Section 5.6.3). | |||
| The basic accounting mechanism at each border interface simply | The basic accounting mechanism at each border interface simply | |||
| involves accumulating the volume of packets with positive worth (Re- | involves accumulating the volume of packets with positive worth (Re- | |||
| Echo and FNE), and subtracting the volume of those with negative | PCT-Echo and FNE), and subtracting the volume of those with negative | |||
| worth: AM(-1) and PM(-1). Even though this mechanism takes no regard | worth: AM(-1) and TM(-1). Even though this mechanism takes no regard | |||
| of flows, over an accounting period (say a month) this subtraction | of flows, over an accounting period (say a month) this subtraction | |||
| will account for the downstream congestion caused by all the flows | will account for the downstream congestion caused by all the flows | |||
| traversing the interface, wherever they come from, and wherever they | traversing the interface, wherever they come from, and wherever they | |||
| go to. The two networks can agree to use this metric however they | go to. The two networks can agree to use this metric however they | |||
| wish to determine some congestion-related penalty against the | wish to determine some congestion-related penalty against the | |||
| upstream network (see Section 5.3 for examples). Although the | upstream network (see Section 5.3 for examples). Although the | |||
| algorithm could hardly be simpler, it is spelled out using pseudo- | algorithm could hardly be simpler, it is spelled out using pseudo- | |||
| code in Appendix A.2.1. | code in Appendix A.2.1. | |||
| Various attempts to subvert the re-ECN design have been made. In all | Various attempts to subvert the re-ECN design have been made. In all | |||
| skipping to change at page 36, line 22 | skipping to change at page 39, line 42 | |||
| o A network can simply create its own dummy traffic to congest | o A network can simply create its own dummy traffic to congest | |||
| another network, perhaps causing it to lose business at no cost to | another network, perhaps causing it to lose business at no cost to | |||
| the attacking network. This is a form of denial of service | the attacking network. This is a form of denial of service | |||
| perpetrated by one network on another. The preferential drop | perpetrated by one network on another. The preferential drop | |||
| measures in Section 4.3.4 provide crude protection against such | measures in Section 4.3.4 provide crude protection against such | |||
| attacks, but we are not overly worried about more accurate | attacks, but we are not overly worried about more accurate | |||
| prevention measures, because it is already possible for networks | prevention measures, because it is already possible for networks | |||
| to DoS other networks on the general Internet, but they generally | to DoS other networks on the general Internet, but they generally | |||
| don't because of the grave consequences of being found out. We | don't because of the grave consequences of being found out. We | |||
| are only concerned if re-ECN increases the motivation for such an | are only concerned if re-PCN increases the motivation for such an | |||
| attack, as in the next example. | attack, as in the next example. | |||
| o A network can just generate negative traffic and send it over its | o A network can just generate negative traffic and send it over its | |||
| border with a neighbour to reduce the overall penalties that it | border with a neighbour to reduce the overall penalties that it | |||
| should pay to that neighbour. It could even initialise the TTL so | should pay to that neighbour. It could even initialise the TTL so | |||
| it expired shortly after entering the neighbouring network, | it expired shortly after entering the neighbouring network, | |||
| reducing the chance of detection further downstream. This attack | reducing the chance of detection further downstream. This attack | |||
| need not be motivated by a desire to deny service and indeed need | need not be motivated by a desire to deny service and indeed need | |||
| not cause denial of service. A network's main motivator would | not cause denial of service. A network's main motivator would | |||
| most likely be to reduce the penalties it pays to a neighbour. | most likely be to reduce the penalties it pays to a neighbour. | |||
| But, the prospect of financial gain might tempt the network into | But, the prospect of financial gain might tempt the network into | |||
| mounting a DoS attack on the other network as well, given the gain | mounting a DoS attack on the other network as well, given the gain | |||
| would offset some of the risk of being detected. | would offset some of the risk of being detected. | |||
| Note that we have not included DoS by Internet hosts in the above | Note that we have not included DoS by Internet hosts in the above | |||
| list of attacks, because we have restricted ourselves to a scenario | list of attacks, because we have restricted ourselves to a scenario | |||
| with edge-to-edge admission control across a Diffserv region. In | with edge-to-edge admission control across a PCN-region. In this | |||
| this case, the edge ingress gateways insulate the Diffserv region | case, the edge ingress gateways insulate the PCN-region from DoS by | |||
| from DoS by Internet hosts. Re-ECN resists more general DoS attacks, | Internet hosts. Re-ECN resists more general DoS attacks, but this is | |||
| but this is discussed in [Re-TCP]. | discussed in [I-D.briscoe-tsvwg-re-ecn-tcp]. | |||
| The first step towards a solution to all these problems with negative | The first step towards a solution to all these problems with negative | |||
| flows is to be able to estimate the contribution they make to | flows is to be able to estimate the contribution they make to | |||
| downstream congestion at a border and to correct the measure | downstream congestion at a border and to correct the measure | |||
| accordingly. Although ideally we want to remove negative flows | accordingly. Although ideally we want to remove negative flows | |||
| themselves, perhaps surprisingly, the most effective first step is to | themselves, perhaps surprisingly, the most effective first step is to | |||
| cancel out the polluting effect negative flows have on the measure of | cancel out the polluting effect negative flows have on the measure of | |||
| downstream congestion at a border. It is more important to get an | downstream congestion at a border. It is more important to get an | |||
| unbiased estimate of their effect, than to try to remove them all. A | unbiased estimate of their effect, than to try to remove them all. A | |||
| suggested algorithm to give an unbiased estimate of the contribution | suggested algorithm to give an unbiased estimate of the contribution | |||
| from negative flows to the downstream congestion measure is given in | from negative flows to the downstream congestion measure is given in | |||
| Appendix A.2.2. | Appendix A.2.2. | |||
| Although making an accurate assessment of the contribution from | Although making an accurate assessment of the contribution from | |||
| negative flows may not be easy, just the single step of neutralising | negative flows may not be easy, just the single step of neutralising | |||
| their polluting effect on congestion metrics removes all the gains | their polluting effect on congestion metrics removes all the gains | |||
| networks could otherwise make from mounting dummy traffic attacks on | networks could otherwise make from mounting dummy traffic attacks on | |||
| each other. This puts all networks on the same side (only with | each other. This puts all networks on the same side (only with | |||
| respect to negative flows of course), rather than being pitched | respect to negative flows of course), rather than being pitched | |||
| against each other. The network where this flow goes negative as | against each other. The network where a flow goes negative as well | |||
| well as all the networks downstream lose out from not being | as all the networks downstream lose out from not being reimbursed for | |||
| reimbursed for any congestion this flow causes. So they all have an | any congestion this flow causes. So they all have an interest in | |||
| interest in getting rid of these negative flows. Networks forwarding | getting rid of these negative flows. Networks forwarding a flow | |||
| a flow before it goes negative aren't strictly on the same side, but | before it goes negative aren't strictly on the same side, but they | |||
| they are disinterested bystanders--they don't care that the flow goes | are disinterested bystanders--they don't care that the flow goes | |||
| negative downstream, but at least they can't actively gain from | negative downstream, but at least they can't actively gain from | |||
| making it go negative. The problem becomes localised so that once a | making it go negative. The problem becomes localised so that once a | |||
| flow goes negative, all the networks from where it happens and beyond | flow goes negative, all the networks from where it happens and beyond | |||
| downstream each have a small problem, each can detect it has a | downstream each have a small problem, each can detect it has a | |||
| problem and each can get rid of the problem if it chooses to. But | problem and each can get rid of the problem if it chooses to. But | |||
| negative flows can no longer be used for any new attacks. | negative flows can no longer be used for any new attacks. | |||
| Once an unbiased estimate of the effect of negative flows can be | Once an unbiased estimate of the effect of negative flows can be | |||
| made, the problem reduces to detecting and preferably removing flows | made, the problem reduces to detecting and preferably removing flows | |||
| that have gone negative as soon as possible. But importantly, | that have gone negative as soon as possible. But importantly, | |||
| skipping to change at page 37, line 48 | skipping to change at page 41, line 21 | |||
| For instance, if possible, flows should be removed as soon as they go | For instance, if possible, flows should be removed as soon as they go | |||
| negative, but we do NOT RECOMMEND any attempts to discard such flows | negative, but we do NOT RECOMMEND any attempts to discard such flows | |||
| further upstream while they are still positive. Such over-zealous | further upstream while they are still positive. Such over-zealous | |||
| push-back is unnecessary and potentially dangerous. These flows have | push-back is unnecessary and potentially dangerous. These flows have | |||
| paid their `fare' up to the point they go negative, so there is no | paid their `fare' up to the point they go negative, so there is no | |||
| harm in delivering them that far. If someone downstream asks for a | harm in delivering them that far. If someone downstream asks for a | |||
| flow to be dropped as near to the source as possible, because they | flow to be dropped as near to the source as possible, because they | |||
| say it is going to become negative later, an upstream node cannot | say it is going to become negative later, an upstream node cannot | |||
| test the truth of this assertion. Rather than have to authenticate | test the truth of this assertion. Rather than have to authenticate | |||
| such messages, re-ECN has been designed so that flows can be dropped | such messages, re-PCN has been designed so that flows can be dropped | |||
| solely based on locally measurable evidence. A message hinting that | solely based on locally measurable evidence. A message hinting that | |||
| a flow should be watched closely to test for negativity is fine. But | a flow should be watched closely to test for negativity is fine. But | |||
| not a message that claims that a positive flow will go negative | not a message that claims that a positive flow will go negative | |||
| later, so it should be dropped. . | later, so it should be dropped. | |||
| 5.6.2. Competitive Routing | 5.6.2. Competitive Routing | |||
| With the above penalty system, each domain seems to have a perverse | With the above penalty system, each domain seems to have a perverse | |||
| incentive to fake pre-congestion. For instance domain B profits from | incentive to fake pre-congestion. For instance domain 'B' profits | |||
| the difference between penalties it receives at its ingress (its | from the difference between penalties it receives at its ingress (its | |||
| revenue) and those it pays at its egress (its cost). So if B | revenue) and those it pays at its egress (its cost). So if 'B' | |||
| overstates internal pre-congestion it seems to increase its profit. | overstates internal pre-congestion it seems to increase its profit. | |||
| However, we can assume that domain A could bypass B, routing through | However, we can assume that domain 'A' could bypass 'B', routing | |||
| other domains to reach the egress. So the competitive discipline of | through other domains to reach the egress. So the competitive | |||
| least-cost routing can ensure that any domain tempted to fake pre- | discipline of least-cost routing can ensure that any domain tempted | |||
| congestion for profit risks losing _all_ its incoming traffic. The | to fake pre-congestion for profit risks losing _all_ its incoming | |||
| least congested route would eventually be able to win this | traffic. The least congested route would eventually be able to win | |||
| competitive game, only as long as it didn't declare more fake pre- | this competitive game, only as long as it didn't declare more fake | |||
| congestion than the next most competitive route. | pre-congestion than the next most competitive route. | |||
| The competitive effect of interdomain routing might be weaker nearer | The competitive effect of interdomain routing might be weaker nearer | |||
| to the egress. For instance, C may be the only route B can take to | to the egress. For instance, 'C' may be the only route 'B' can take | |||
| reach the ultimate receiver. And if C over-penalises B, the egress | to reach the ultimate receiver. And if 'C' over-penalises 'B', the | |||
| gateway and the ultimate receiver seem to have no incentive to move | egress gateway and the ultimate receiver seem to have no incentive to | |||
| their terminating attachment to another network, because only B and | move their terminating attachment to another network, because only | |||
| those upstream of B suffer the higher penalties. However, we must | 'B' and those upstream of 'B' suffer the higher penalties. However, | |||
| remember that we are only looking at the money flows at the | we must remember that we are only looking at the money flows at the | |||
| unidirectional network layer. There are likely to be all sorts of | unidirectional network layer. There are likely to be all sorts of | |||
| higher level business models constructed over the top of these low | higher level business models constructed over the top of these low | |||
| level 'sender-pays' penalties. For instance, we might expect a | level 'sender-pays' penalties. For instance, we might expect a | |||
| session layer charging model where the session originator pays for a | session layer charging model where the session originator pays for a | |||
| pair of duplex flows, one as receiver and one as sender. | pair of duplex flows, one as receiver and one as sender. | |||
| Traditionally this has been a common model for telephony and we might | Traditionally this has been a common model for telephony and we might | |||
| expect it to be used, at least sometimes, for other media such as | expect it to be used, at least sometimes, for other media such as | |||
| video. Wherever such a model is used, the data receiver will be | video. Wherever such a model is used, the data receiver will be | |||
| directly affected if its sessions terminate through a network like C | directly affected if its sessions terminate through a network like | |||
| that fakes congestion to over-penalise B. So end-customers will | 'C' that fakes congestion to over-penalise 'B'. So end-customers | |||
| experience a direct competitive pressure to switch to cheaper | will experience a direct competitive pressure to switch to cheaper | |||
| networks, away from networks like C that try to over-penalise B. | networks, away from networks like 'C' that try to over-penalise 'B'. | |||
| This memo does not need to standardise any particular mechanism for | This memo does not need to standardise any particular mechanism for | |||
| routing based on re-ECN. Goldenberg et al [Smart_rtg] refers to | routing based on re-PCN. Goldenberg et al [Smart_rtg] refers to | |||
| various commercial products and presents its own algorithms for | various commercial products and presents its own algorithms for | |||
| moving traffic between multi-homed routes based on usage charges. | moving traffic between multi-homed routes based on usage charges. | |||
| None of these systems require any changes to standards protocols | None of these systems require any changes to standards protocols | |||
| because the choice between the available border gateway protocol | because the choice between the available border gateway protocol | |||
| (BGP) routes is based on a combination of local knowledge of the | (BGP) routes is based on a combination of local knowledge of the | |||
| charging regime and local measurement of traffic levels. If, as we | charging regime and local measurement of traffic levels. If, as we | |||
| propose, charges or penalties were based on the level of re-ECN | propose, charges or penalties were based on the level of re-PCN | |||
| measured in passing traffic, a similar optimisation could be achieved | measured locally in passing traffic, a similar optimisation could be | |||
| without requiring any changes to standard routing protocols. | achieved without requiring any changes to standard routing protocols. | |||
| We must be clear that applying pre-congestion-based routing to this | We must be clear that applying pre-congestion-based routing to this | |||
| admission control system remains an open research issue. Traffic | admission control system remains an open research issue. Traffic | |||
| engineering based on congestion requires careful damping to avoid | engineering based on congestion requires careful damping to avoid | |||
| oscillations, and should not be attempted without adult supervision | oscillations, and should not be attempted without adult supervision | |||
| :) Mortier & Pratt [ECN-BGP] have analysed traffic engineering based | :) Mortier & Pratt [ECN-BGP] have analysed traffic engineering based | |||
| on congestion. But without the benefit of re-ECN, they had to add a | on congestion. But without the benefit of re-ECN or re-PCN, they had | |||
| path attribute to BGP to advertise a route's downstream congestion | to add a path attribute to BGP to advertise a route's downstream | |||
| (actually they proposed that BGP should advertise the charge for | congestion (actually they proposed that BGP should advertise the | |||
| congestion, which we believe wrongly embeds an assumption into BGP | charge for congestion, which we believe wrongly embeds an assumption | |||
| that the only thing to do with congestion is charge for it). | into BGP that the only thing to do with congestion is charge for it). | |||
| 5.6.3. Fail-safes | 5.6.3. Fail-safes | |||
| The mechanisms described so far create incentives for rational | The mechanisms described so far create incentives for rational | |||
| operators to behave. That is, one operator aims to make another | operators to behave. That is, one operator aims to make another | |||
| behave responsibly by applying penalties and expects a rational | behave responsibly by applying penalties and expects a rational | |||
| response (i.e. one that trades off costs against benefits). It is | response (i.e. one that trades off costs against benefits). It is | |||
| usually reasonable to assume that other network operators will behave | usually reasonable to assume that other network operators will behave | |||
| rationally (policy routing can avoid those that might not). But this | rationally (policy routing can avoid those that might not). But this | |||
| approach does not protect against the misconfigurations and accidents | approach does not protect against the misconfigurations and accidents | |||
| skipping to change at page 40, line 16 | skipping to change at page 43, line 40 | |||
| 6. Analysis | 6. Analysis | |||
| The domains in Figure 1 are not expected to be completely malicious | The domains in Figure 1 are not expected to be completely malicious | |||
| towards each other. After all, we can assume that they are all co- | towards each other. After all, we can assume that they are all co- | |||
| operating to provide an internetworking service to the benefit of | operating to provide an internetworking service to the benefit of | |||
| each of them and their customers. Otherwise their routing polices | each of them and their customers. Otherwise their routing polices | |||
| would not interconnect them in the first place. However, we assume | would not interconnect them in the first place. However, we assume | |||
| that they are also competitors of each other. So a network may try | that they are also competitors of each other. So a network may try | |||
| to contravene our proposed protocol if it would gain or make a | to contravene our proposed protocol if it would gain or make a | |||
| competitor lose, or both, but only if it can do so without being | competitor lose, or both. But only if it can do so without being | |||
| caught. Therefore we do not have to consider every possible random | caught. Therefore we do not have to consider every possible random | |||
| attack one network could launch on the traffic of another, given | attack one network could launch on the traffic of another, given | |||
| anyway one network can always drop or corrupt packets that it | anyway one network can always drop or corrupt packets that it | |||
| forwards on behalf of another. | forwards on behalf of another. | |||
| Therefore, we only consider new opportunities for _gainful_ attack | Therefore, we only consider new opportunities for _gainful_ attack | |||
| that our proposal introduces. But to a certain extent we can also | that our proposal introduces. But to a certain extent we can also | |||
| rely on the in depth defences we have described (Section 5.6.3 ) | rely on the in depth defences we have described (Section 5.6.3 ) | |||
| intended to mitigate the potential impact if one network accidentally | intended to mitigate the potential impact if one network accidentally | |||
| misconfiguring the workings of this protocol. | misconfiguring the workings of this protocol. | |||
| skipping to change at page 40, line 39 | skipping to change at page 44, line 16 | |||
| arrangement possible in Figure 1, without any surrounding network. | arrangement possible in Figure 1, without any surrounding network. | |||
| This allows us to consider more specific cases where these gateways | This allows us to consider more specific cases where these gateways | |||
| and a neighbouring network are operated by the same player. As well | and a neighbouring network are operated by the same player. As well | |||
| as cases where the same player operates neighbouring networks, we | as cases where the same player operates neighbouring networks, we | |||
| will also consider cases where the two gateways collude as one player | will also consider cases where the two gateways collude as one player | |||
| and where the sender and receiver collude as one. Collusion of other | and where the sender and receiver collude as one. Collusion of other | |||
| sets of domains is less likely, but we will consider such cases. In | sets of domains is less likely, but we will consider such cases. In | |||
| the general case, we will assume none of the nine trust domains | the general case, we will assume none of the nine trust domains | |||
| across the figure fully trust any of the others. | across the figure fully trust any of the others. | |||
| As we only propose to change routers within the Diffserv region, we | As we only propose to change routers within the PCN-region, we assume | |||
| assume the operators of networks outside the region will be doing | the operators of networks outside the region will be doing per-flow | |||
| per-flow policing. That is, we assume the networks outside the | policing. That is, we assume the networks outside the PCN-region and | |||
| Diffserv region and the gateways around its edges can protect | the gateways around its edges can protect themselves. So given we | |||
| themselves. So given we are proposing to remove flow policing from | are proposing to remove flow policing from some networks, our primary | |||
| some networks, our primary concern must be to protect networks that | concern must be to protect networks that don't do per-flow policing | |||
| don't do per-flow policing (the potential `victims') from those that | (the potential `victims') from those that do (the `enemy'). The | |||
| do (the `enemy'). The ingress and egress gateways are the only way | ingress and egress gateways are the only way the outer enemy can get | |||
| the outer enemy can get at the middle victim, so we can consider the | at the middle victim, so we can consider the gateways as the | |||
| gateways as the representatives of the enemy as far as domains A, B | representatives of the enemy as far as domains 'A', 'B' and 'C' are | |||
| and C are concerned. We will call this trust scenario `edges against | concerned. We will call this trust scenario `edges against middles'. | |||
| middles'. | ||||
| Earlier in this memo, we outlined the classic border rate policing | Earlier in this memo, we outlined the classic border rate policing | |||
| problem (Section 3). It will now be useful to reiterate the | problem (Section 3). It will now be useful to reiterate the | |||
| motivations that are the root cause of the problem. The more | motivations that are the root cause of the problem. The more | |||
| reservations a gateway can allow, the more revenue it receives. The | reservations a gateway can allow, the more revenue it receives. The | |||
| middle networks want the edges to comply with the admission control | middle networks want the edges to comply with the admission control | |||
| protocol when they become so congested that their service to others | protocol when they become so congested that their service to others | |||
| might suffer. The middle networks also want to ensure the edges | might suffer. The middle networks also want to ensure the edges | |||
| cannot steal more service from them than they are entitled to. | cannot steal more service from them than they are entitled to. | |||
| In the context of this `edges against middles' scenario, the re-ECN | In the context of this `edges against middles' scenario, the re-PCN | |||
| protocol has two main effects: | protocol has two main effects: | |||
| o The more pre-congestion there is on a path across the Diffserv | o The more pre-congestion there is on a path across the PCN-region, | |||
| region, the higher the ingress gateway must declare downstream | the higher the ingress gateway must declare downstream pre- | |||
| pre-congestion. | congestion. | |||
| o If the ingress gateway does not declare downstream pre-congestion | o If the ingress gateway does not declare downstream pre-congestion | |||
| high enough on average, it will `hit the ground before the | high enough on average, it will `hit the ground before the | |||
| runway', going negative and triggering sanctions, either directly | runway', going negative and triggering sanctions, either directly | |||
| against the traffic or against the ingress gateway at a management | against the traffic or against the ingress gateway at a management | |||
| level | level | |||
| An executive summary of our security analysis can be stated in three | An executive summary of our security analysis can be stated in three | |||
| parts, distinguished by the type of collusion considered. | parts, distinguished by the type of collusion considered. | |||
| Neighbour-only Middle-Middle Collusion: Here there is no collusion | Neighbour-only Middle-Middle Collusion: Here there is no collusion | |||
| or collusion is limited to neighbours in the feedback loop. In | or collusion is limited to neighbours in the feedback loop. In | |||
| other words, two neighbouring networks can be assumed to act as | other words, two neighbouring networks can be assumed to act as | |||
| one. Or the egress gateway might collude with domain C. Or the | one. Or the egress gateway might collude with domain 'C'. Or the | |||
| ingress gateway might collude with domain A. Or ingress and egress | ingress gateway might collude with domain 'A'. Or ingress and | |||
| gateways might collude with each other. | egress gateways might collude with each other. | |||
| In these cases where only neighbours in the feedback loop collude, | In these cases where only neighbours in the feedback loop collude, | |||
| we concludes that all parties have a positive incentive to declare | we concludes that all parties have a positive incentive to declare | |||
| downstream pre-congestion truthfully, and the ingress gateway has | downstream pre-congestion truthfully, and the ingress gateway has | |||
| a positive incentive to invoke admission control when congestion | a positive incentive to invoke admission control when congestion | |||
| rises above the admission threshold in any network in the region | rises above the admission threshold in any network in the region | |||
| (including its own). No party has an incentive to send more | (including its own). No party has an incentive to send more | |||
| traffic than declared in reservation signalling (even though only | traffic than declared in reservation signalling (even though only | |||
| the gateways read this signalling). In short, no party can gain | the gateways read this signalling). In short, no party can gain | |||
| at the expense of another. | at the expense of another. | |||
| Non-neighbour Middle-Middle Collusion: In the case of other forms of | Non-neighbour Middle-Middle Collusion: In the case of other forms of | |||
| collusion between middle networks (e.g. between domain A and C) it | collusion between middle networks (e.g. between domain 'A' and | |||
| would be possible for say A & C to create a tunnel between | 'C') it would be possible for say 'A' & 'C' to create a tunnel | |||
| themselves so that A would gain at the expense of B. But C would | between themselves so that 'A' would gain at the expense of 'B'. | |||
| then lose the gain that A had made. Therefore the value to A & C | But 'C' would then lose the gain that 'A' had made. Therefore the | |||
| of colluding to mount this attack seems questionable. It is made | value to 'A' & 'C' of colluding to mount this attack seems | |||
| more questionable, because the attack can be statistically | questionable. It is made more questionable, because the attack | |||
| detected by B using the second `defence in depth' mechanism | can be statistically detected by 'B' using the second `defence in | |||
| mentioned already. Note that C can defend itself from being | depth' mechanism mentioned already. Note that 'C' can defend | |||
| attacked through a tunnel by treating the tunnel end point as a | itself from being attacked through a tunnel by treating the tunnel | |||
| direct link to a neighbouring network (e.g. as if A were a | end point as a direct link to a neighbouring network (e.g. as if | |||
| neighbour of C, via the tunnel), which falls back to the safety of | 'A' were a neighbour of 'C', via the tunnel), which falls back to | |||
| the neighbour-only scenario. | the safety of the neighbour-only scenario. | |||
| Middle-Edge Collusion: Collusion between networks or gateways within | Middle-Edge Collusion: Collusion between networks or gateways within | |||
| the Diffserv region and networks or users outside the region has | the PCN-region and networks or users outside the region has not | |||
| not yet been fully analysed. The presence of full per-flow | yet been fully analysed. The presence of full per-flow policing | |||
| policing at the ingress gateway seems to make this a less likely | at the ingress gateway seems to make this a less likely source of | |||
| source of a successful attack. | a successful attack. | |||
| {ToDo: Due to lack of time, the full write up of the security | {ToDo: Due to lack of time, the full write up of the security | |||
| analysis is deferred to the next version of this memo.} | analysis is deferred to the next version of this memo.} | |||
| Finally, it is well known that the best person to analyse the | Finally, it is well known that the best person to analyse the | |||
| security of a system is not the designer. Therefore, our confident | security of a system is not the designer. Therefore, our confident | |||
| claims must be hedged with doubt until others with perhaps a greater | claims must be hedged with doubt until others with perhaps a greater | |||
| incentive to break it have mounted a full analysis. | incentive to break it have mounted a full analysis. | |||
| 7. Incremental Deployment | 7. Incremental Deployment | |||
| We believe ECN has so far not been widely deployed because it | We believe ECN has so far not been widely deployed because it | |||
| requires widespread end system and network deployment just to achieve | requires end system and widespread network deployment just to achieve | |||
| a marginal improvement in performance. The ability to offer a new | a marginal improvement in performance. The ability to offer a new | |||
| service (admission control) would be a much stronger driver for ECN | service (admission control) would be a much stronger driver for ECN | |||
| deployment. | deployment. | |||
| As stated in the introduction, the aim of this memo is to "Design in | As stated in the introduction, the aim of this memo is to "Design in | |||
| security from the start" when admission control is based on pre- | security from the start" when admission control is based on pre- | |||
| congestion notification. The proposal has been designed so that | congestion notification. The proposal has been designed so that | |||
| security can be added some time after first deployment, but only if | security can be added some time after first deployment, but only if | |||
| the PCN wire protocol encoding is defined with the foresight to | the PCN wire protocol encoding is defined with the foresight to | |||
| accommodate the extended set of codepoints defined in this document. | accommodate the extended set of codepoints defined in this document. | |||
| Given admission control based on pre-congestion notification requires | Given admission control based on pre-congestion notification requires | |||
| few changes to standards, it should be deployable fairly soon. | few changes to standards, it should be deployable fairly soon. | |||
| However, re-ECN requires a change to IP, which may take a little | However, re-PCN requires a change to IP, which may take a little | |||
| longer. | longer :) | |||
| We expect that initial deployments of PCN-based admission control | We expect that initial deployments of PCN-based admission control | |||
| will be confined to single networks, or to clubs of networks that | will be confined to single networks, or to clubs of networks that | |||
| trust each other. The proposal in this memo will only become | trust each other. The proposal in this memo will only become | |||
| relevant once networks with conflicting interests wish to | relevant once networks with conflicting interests wish to | |||
| interconnect their admission controlled services, but without the | interconnect their admission controlled services, but without the | |||
| scalability constraints of per-flow border policing. It will not be | scalability constraints of per-flow border policing. It will not be | |||
| possible to use re-ECN, even in a controlled environment between | possible to use re-PCN, even in a controlled environment between | |||
| consenting operators, unless it is standardised into IP. Given the | consenting operators, unless it is standardised into IP. Given the | |||
| IPv4 header has limited space for further changes, current IESG | IPv4 header has limited space for further changes, current IESG | |||
| policy [RFC4727] is not to allow experimental use of codepoints in | policy [RFC4727] is not to allow experimental use of codepoints in | |||
| the IPv4 header, as whenever an experiment isn't taken up, the space | the IPv4 header, as whenever an experiment isn't taken up, the space | |||
| it used tends to be impossible to reclaim. | it used tends to be impossible to reclaim. Therefore, for IPv4 at | |||
| least, we will need to find a way to run an experiment so that the | ||||
| header fields it uses can be reclaimed if the experiment is not a | ||||
| success. | ||||
| If PCN-based admission control is deployed before re-ECN is | If PCN-based admission control is deployed before re-PCN is | |||
| standardised into IP, wherever a networks (or club of networks) | standardised into IP, wherever a network (or club of networks) | |||
| connects to another network (or club of networks) with conflicting | connects to another network (or club of networks) with conflicting | |||
| interests, they will place a gateway between the two regions that | interests, they will place a gateway between the two regions that | |||
| does per-flow rate policing and admission control. If re-ECN is | does per-flow rate policing and admission control. If re-PCN is | |||
| eventually standardised into IP, it will be possible for these | eventually standardised into IP, it will be possible for these | |||
| separate regions to upgrade all their gateways to use re-ECN before | separate regions to upgrade all their ingress gateways to support re- | |||
| removing the per-flow policing gateways between them. Given the | PCN before removing the per-flow policing gateways between them. | |||
| edge-to-edge deployment model of PCN-based admission control, it is | Given the edge-to-edge deployment model of PCN-based admission | |||
| reasonable to imagine this incremental deployment model without | control, it is reasonable to expect incremental deployment of re-PCN | |||
| needing to cater for partial deployment of re-ECN in just some of the | will be feasible on a domain-by domain basis, without needing to | |||
| gateways around one Diffserv region. | cater for partial deployment of re-PCN in just some of the gateways | |||
| around one PCN-domain. | ||||
| Only the edge gateways around a Diffserv region have to be upgraded | Nonetheless, if the upgrade of one ingress gateway is accidentally | |||
| to add re-ECN support, not interior routers. It is also necessary to | overlooked, the RE flag has been defined the safe way round for the | |||
| add the mechanisms that use re-ECN to secure a network against | default legacy behaviour (leaving RE cleared as "0"). A legacy | |||
| misbehaving gateways and networks. Specifically, these are the | ingress will appear to be declaring a high level of pre-congestion | |||
| border mechanisms (Section 5.6) and the mechanisms to sanction | into the aggregate. The fail-safe border mechanism in Section 5.6.3 | |||
| dishonest marking (Section 5.5). | might trigger management alarms (which would help in tracking down | |||
| the need to upgrade the ingress), but all packets would continue to | ||||
| be delivered safely, as overstatement of downstream congestion | ||||
| requires no sanction. | ||||
| Only the ingress edge gateways around a PCN-region have to be | ||||
| upgraded to add re-PCN support, not interior routers. It is also | ||||
| necessary to add the mechanisms that monitor re-PCN to secure a | ||||
| network against misbehaving gateways and networks. Specifically, | ||||
| these are the border mechanisms (Section 5.6) and the mechanisms to | ||||
| sanction dishonest marking (Section 5.5). | ||||
| We also RECOMMEND adding improvements to forwarding on interior | We also RECOMMEND adding improvements to forwarding on interior | |||
| routers (Section 4.3.4). But the system works whether all, some or | routers (Section 4.3.4). But the system works whether all, some or | |||
| none are upgraded, so interior routers may be upgraded in a piecemeal | none are upgraded, so interior routers may be upgraded in a piecemeal | |||
| fashion at any time. | fashion at any time. | |||
| 8. Design Choices and Rationale | 8. Design Choices and Rationale | |||
| The primary insight of this work is that downstream congestion is the | The primary insight of this work is that downstream congestion is the | |||
| metric that would be most useful to control an internetwork, and | metric that would be most useful to control an internetwork, and | |||
| particularly to police how one network responds to the congestion it | particularly to police how one network responds to the congestion it | |||
| causes in a remote network. This is the problem that has previously | causes in a remote network. This is the problem that has previously | |||
| made it so hard to provide scalable admission control. | made it so hard to provide scalable admission control. | |||
| The case for using re-feedback (a generalisation of re-ECN) to police | The case for using re-feedback (a generalisation of re-ECN) to police | |||
| congestion response and provide QoS is made in [Re-fb]. Essentially, | congestion response and provide QoS is made in [Re-fb]. Essentially, | |||
| the insight is that congestion is a factor that crosses layers from | the insight is that congestion is a factor that crosses layers from | |||
| the physical upwards. Therefore re-feedback polices congestion where | the physical upwards. Therefore re-feedback polices congestion as it | |||
| it emerges from a physical interface between networks. This is | crosses the physical interface between networks. This is achieved by | |||
| achieved by bringing the congestion information to the interface, | bringing information about congestion of resources later on the path | |||
| rather than examining packet addressing where there is congestion. | to the interface, rather than trying to deal with congestion where it | |||
| happens by examining the notoriously unreliable source address in | ||||
| Then congestion crossing the physical interface at a border can be | packets. Then congestion crossing the physical interface at a border | |||
| policed at the interface, rather than policing the congestion on | can be policed at the interface, rather than policing the congestion | |||
| packets that claim to come from an address (which may be spoofed). | on packets that claim to come from an address (which may be spoofed). | |||
| Also, re-feedback works in the network layer independently of other | Also, re-feedback works in the network layer independently of other | |||
| layers--despite its name re-feedback does not actually require | layers--despite its name re-feedback does not actually require | |||
| feedback. It requires a source to act conservatively before it gets | feedback. It makes a source to act conservatively before it gets | |||
| feedback. | feedback. | |||
| On the subject of lack of feedback, the feedback not established | On the subject of lack of feedback, the feedback not established | |||
| (FNE) codepoint is motivated by arguments for a state set-up bit in | (FNE) codepoint is motivated by arguments for a state set-up bit in | |||
| IP to prevent state exhaustion attacks. This idea was first put | IP to prevent state exhaustion attacks. This idea was first put | |||
| forward informally by David Clark and documented by Handley and | forward informally by David Clark and developed by Handley and | |||
| Greenhalgh in [Steps_DoS]. The idea is that network layer datagrams | Greenhalgh in [Steps_DoS]. The idea is that network layer datagrams | |||
| should signal explicitly when they require state to be created in the | should signal explicitly when they require state to be created in the | |||
| network layer or the layer above (e.g. at flow start). Then a node | network layer or the layer above (e.g. at flow start). Then a node | |||
| can refuse to create any state unless a datagram declares this | can refuse to create any state unless a datagram declares this | |||
| intent. We believe the proposed FNE codepoint serves the same | intent. We believe the proposed FNE codepoint serves the same | |||
| purpose as the proposed state-set-up bit, but it has been overloaded | purpose as the proposed state set-up bit, but it has been overloaded | |||
| with a more specific purpose, using it on more packets than just the | with a more specific purpose, using it on more packets than just the | |||
| first in a flow, but never less (i.e. it is idempotent). In effect | first in a flow, but never less (i.e. it is idempotent). In effect | |||
| the FNE codepoint serves the purpose of a `soft-state set-up | the FNE codepoint serves the purpose of a `soft-state set-up | |||
| codepoint'. | codepoint'. | |||
| The re-feedback paper [Re-fb] also makes the case for converting the | The re-feedback paper [Re-fb] also makes the case for converting the | |||
| economic interpretation of congestion into hard engineering | economic interpretation of congestion into hard engineering | |||
| mechanism, which is the basis of the approach used in this memo. The | mechanism, which is the basis of the approach used in this memo. The | |||
| admission control gateways around the Diffserv region use hard | admission control gateways around the PCN-region use hard | |||
| engineering, not incentives, to prevent end users from sending more | engineering, not incentives, to prevent end users from sending more | |||
| traffic than they have reserved. Incentive-based mechanisms are only | traffic than they have reserved. Incentive-based mechanisms are only | |||
| used between networks, because they are expected to respond to | used between networks, because they are expected to respond to | |||
| incentives more rationally than end-users can be expected to. | incentives more rationally than end-users can be expected to. | |||
| However, even then, a network can use fail-safes to protect itself | However, even then, a network can use fail-safes to protect itself | |||
| from excessively unusual behaviour by neighbouring networks, whether | from excessively unusual behaviour by neighbouring networks, whether | |||
| due to an accidental misconfiguration or malicious intent. | due to an accidental misconfiguration or malicious intent. | |||
| The guiding principle behind the incentive-based approach used | The guiding principle behind the incentive-based approach used | |||
| between networks is that any gain from subverting the protocol should | between networks is that any gain from subverting the protocol should | |||
| skipping to change at page 45, line 5 | skipping to change at page 48, line 44 | |||
| will most likely open up a new vulnerability, where the amplifying | will most likely open up a new vulnerability, where the amplifying | |||
| effect of the punishment mechanism can be turned on others. | effect of the punishment mechanism can be turned on others. | |||
| The re-feedback paper also makes the case against the use of | The re-feedback paper also makes the case against the use of | |||
| congestion charging to police congestion if it is based on classic | congestion charging to police congestion if it is based on classic | |||
| feedback (where only upstream congestion is visible to network | feedback (where only upstream congestion is visible to network | |||
| elements). It argues this would open up receiving networks to | elements). It argues this would open up receiving networks to | |||
| `denial of funds' attacks and would require end users to accept | `denial of funds' attacks and would require end users to accept | |||
| dynamic pricing (which few would). | dynamic pricing (which few would). | |||
| Re-ECN has been deliberately designed to simplify policing at the | Re-PCN has been deliberately designed to simplify policing at the | |||
| borders between networks. These trust boundaries are the critical | borders between networks. These trust boundaries are the critical | |||
| pinch-points that will limit the scalability of the whole | pinch-points that will limit the scalability of the whole | |||
| internetwork unless the overall design minimises the complexity of | internetwork unless the overall design minimises the complexity of | |||
| security functions at these borders. The border mechanisms described | security functions at these borders. The border mechanisms described | |||
| in this memo run passively in parallel to data forwarding and they do | in this memo run passively in parallel to data forwarding and they do | |||
| not require per-flow processing. | not require per-flow processing. | |||
| 9. Security Considerations | 9. Security Considerations | |||
| This whole memo concerns the security of a scalable admission control | This whole memo concerns the security of a scalable admission control | |||
| skipping to change at page 45, line 39 | skipping to change at page 49, line 31 | |||
| markings introduced by an upstream network, but it would only lose | markings introduced by an upstream network, but it would only lose | |||
| out on the penalties it could apply to a downstream network. | out on the penalties it could apply to a downstream network. | |||
| When one network forwards a neighbouring network's traffic it will | When one network forwards a neighbouring network's traffic it will | |||
| always be possible to cause damage by dropping or corrupting it. | always be possible to cause damage by dropping or corrupting it. | |||
| Therefore we do not believe networks would set their routing policies | Therefore we do not believe networks would set their routing policies | |||
| to interconnect in the first place if they didn't trust the other | to interconnect in the first place if they didn't trust the other | |||
| networks not to arbitrarily damage their traffic. | networks not to arbitrarily damage their traffic. | |||
| Having said this, we do want to highlight some of the weaker parts of | Having said this, we do want to highlight some of the weaker parts of | |||
| our argument. We have argued that networks will be dissuaded from | our argument. | |||
| faking congestion marking by the possibility that upstream networks | ||||
| will route round them. As we have said, these arguments are based on | o We have argued that networks will be dissuaded from faking | |||
| congestion marking by the possibility that upstream networks will | ||||
| route round them. As we have said, these arguments are based on | ||||
| fairly delicate assumptions and will remain fairly tenuous until | fairly delicate assumptions and will remain fairly tenuous until | |||
| proved in practice, particularly close to the egress where less | proved in practice, particularly close to the egress where less | |||
| competitive routing is likely. | competitive routing is likely. | |||
| We should also point out that the approach in this memo was only | o Given the congestion feedback system is piggy-backed on flow | |||
| signalling, which can be fairly infrequent, sanctions may not be | ||||
| appropriate until a flow has been persistently negative for | ||||
| perhaps 20s. This may allow brief attacks to go unpunished. | ||||
| However, vulnerability to brief attacks may be reduced if the | ||||
| egress triggers asynchronous feedback when the congestion level on | ||||
| an aggregate has risen sufficiently since the last feedback, | ||||
| rather than waiting for the next opportunity to piggy-back on a | ||||
| signal. | ||||
| o We should also point out that the approach in this memo was only | ||||
| designed to be robust for admission control. We do not claim the | designed to be robust for admission control. We do not claim the | |||
| incentives will always be strong enough to force correct flow pre- | incentives will always be strong enough to force correct flow | |||
| emption behaviour. This is because a user will tend to perceive much | termination behaviour. This is because a user will tend to | |||
| greater loss in value if a flow is pre-empted than if admission is | perceive much greater loss in value if a flow is terminated than | |||
| denied at the start. However, in general the incentives for correct | if admission is denied at the start. However, in general the | |||
| flow pre-emption are similar to those for admission control. | incentives for correct flow termination are similar to those for | |||
| admission control. | ||||
| Finally, it may seem that the 8 codepoints that have been made | Finally, it may seem that the 8 codepoints that have been made | |||
| available by extending the ECN field with the RE flag have been used | available by extending the ECN field with the RE flag have been used | |||
| rather wastefully. In effect the RE flag has been used as an | rather wastefully. In effect the RE flag has been used as an | |||
| orthogonal single bit in nearly all cases. The only exception being | orthogonal single bit in nearly all cases. The only exception being | |||
| when the ECN field is cleared to "00". The mapping of the codepoints | when the ECN field is cleared to "00". The mapping of the codepoints | |||
| in an earlier version of this proposal used the codepoint space more | in an earlier version of this proposal used the codepoint space more | |||
| efficiently, but the scheme became vulnerable to a network operator | efficiently, but the scheme became vulnerable to a network operator | |||
| focusing its congestion marking to mark more positive than neutral | focusing its congestion marking to mark more positive than neutral | |||
| packets in order to reduce its penalties (see Appendix B of | packets in order to reduce its penalties (see Appendix B of | |||
| [Re-TCP]). | [I-D.briscoe-tsvwg-re-ecn-tcp]). | |||
| With the scheme as now proposed, once the RE flag is set or cleared | With the scheme as now proposed, once the RE flag is set or cleared | |||
| by the sender or its proxy, it should not be written by the network, | by the sender or its proxy, it should not be written by the network, | |||
| only read. So the gateways can detect if any network maliciously | only read. So the gateways can detect if any network maliciously | |||
| alters the RE flag. IPSec AH integrity checking does not cover the | alters the RE flag. IPSec AH integrity checking does not cover the | |||
| IPv4 option flags (they were considered mutable--even the one we | IPv4 option flags (they were considered mutable--even the one we | |||
| propose using for the RE flag that was `currently unused' when IPSec | propose using for the RE flag that was `currently unused' when IPSec | |||
| was defined). But it would be sufficient for a pair of gateways to | was defined). But it would be sufficient for a pair of gateways to | |||
| make random checks on whether the RE flag was the same when it | make random checks on whether the RE flag was the same when it | |||
| reached the egress gateway as when it left the ingress. Indeed, if | reached the egress gateway as when it left the ingress. Indeed, if | |||
| IPSec AH had covered the RE flag, any network intending to alter | IPSec AH had covered the RE flag, any network intending to alter | |||
| sufficient RE flags to make a gain would have focused its alterations | sufficient RE flags to make a gain would have focused its alterations | |||
| on packets without authenticating headers (AHs). | on packets without authenticating headers (AHs). | |||
| No cryptographic algorithms have been harmed in the making of this | Therefore, no cryptographic algorithms have been exploited in the | |||
| proposal. | making of this proposal. | |||
| 10. IANA Considerations | 10. IANA Considerations | |||
| This memo includes no request to IANA. | This memo includes no request to IANA. | |||
| 11. Conclusions | 11. Conclusions | |||
| This memo builds on a promising technique to solve the classic | This memo solves the classic problem of making flow admission control | |||
| problem of making flow admission control scale to any size network. | scale to any size network. It builds on a technique, called PCN, | |||
| It involves the use of Diffserv in a deployment model that uses pre- | which involves the use of Diffserv in a domain and uses pre- | |||
| congestion notification feedback to control admission into a network | congestion notification feedback to control admission into each | |||
| path [I-D.ietf-pcn-architecture]. However as it stands, that | network path across the domain [I-D.ietf-pcn-architecture]. | |||
| deployment model depends on all network domains trusting each other | ||||
| to comply with the protocols, invoking admission control and flow | ||||
| pre-emption when requested. | ||||
| We propose that the congestion feedback used in that deployment model | Without PCN, Diffserv requires over-provisioning that must grow | |||
| should be re-echoed into the forward data path, by making a trivial | linearly with network diameter to cater for variation in the traffic | |||
| modification to the ingress gateway. We then explain how the | matrix. However, even with PCN, multiple network domains can only | |||
| join together into one larger PCN region if all domains trust each | ||||
| other to comply with the protocols, invoking admission control and | ||||
| flow termination when requested. Domains could join together and | ||||
| still police flows at their borders by requiring reservation | ||||
| signalling to touch each border and only use PCN internally to each | ||||
| domain. But the per-flow processing at borders would still limit | ||||
| scalability. | ||||
| Instead, this memo proposes a technique called re-PCN which enables a | ||||
| PCN region to extend across multiple domains, without unscalable per- | ||||
| flow processing at borders, and still without the need for linear | ||||
| growth in capacity over-provisioning as the hop-diameter of the | ||||
| Diffserv region grows. | ||||
| We propose that the congestion feedback used for PCN-based admission | ||||
| control should be re-echoed into the forward data path, by making a | ||||
| trivial modification to the ingress gateway. We then explain how the | ||||
| resulting downstream pre-congestion metric in packets can be | resulting downstream pre-congestion metric in packets can be | |||
| monitored in bulk at borders to sufficiently emulate flow rate | monitored in bulk at borders to sufficiently emulate flow rate | |||
| policing. | policing. | |||
| We claim the result of combining these two approaches is an admission | We claim the result of combining these two approaches is an admission | |||
| control system that scales to any size network _and_ any number of | control system that scales to any size network _and_ any number of | |||
| interconnected networks, even if they all act in their own interests. | interconnected networks, even if they all act in their own interests. | |||
| This proposal aims to convince its readers to "Design in Security | This proposal aims to convince its readers to "Design in Security | |||
| from the start," by ensuring the PCN wire protocol encoding can | from the start," by ensuring the PCN wire protocol encoding can | |||
| accommodate the extended set of codepoints defined in this document, | accommodate the extended set of codepoints defined in this document, | |||
| even if border policing is not needed at first. This way, we will | even if per-flow policing is used at first rather than the bulk | |||
| not build ourselves tomorrow's legacy problem. | border policing described here. This way, we will not build | |||
| ourselves tomorrow's legacy problem. | ||||
| Re-echoing congestion feedback is based on a principled technique | Re-echoing congestion feedback is based on a principled technique | |||
| called Re-ECN [Re-TCP], designed to add accountability for causing | called Re-ECN [I-D.briscoe-tsvwg-re-ecn-tcp], designed to add | |||
| congestion to the general-purpose IP datagram service. Re-ECN | accountability for causing congestion to the general-purpose IP | |||
| proposes to consume the last completely unused bit in the basic IPv4 | datagram service. Re-ECN proposes to consume the last completely | |||
| header. | unused bit in the basic IPv4 header or it uses extension header in | |||
| IPv6. | ||||
| 12. Acknowledgements | 12. Acknowledgements | |||
| All the following have given helpful comments and some may become co- | All the following have given helpful comments either on re-PCN or on | |||
| authors of later drafts: Arnaud Jacquet, Alessandro Salvatori, Steve | relevant parts of re-ECN that re-PCN uses: Arnaud Jacquet, Alessandro | |||
| Rudkin, David Songhurst, John Davey, Ian Self, Anthony Sheppard, | Salvatori, Steve Rudkin, David Songhurst, John Davey, Ian Self, | |||
| Carla Di Cairano-Gilfedder (BT), Mark Handley (who identified the | Anthony Sheppard, Carla Di Cairano-Gilfedder (BT), Mark Handley (who | |||
| excess canceled packets attack), Stephen Hailes, Adam Greenhalgh | identified the excess canceled packets attack), Stephen Hailes, Adam | |||
| (UCL), Francois Le Faucheur, Anna Charny (Cisco), Jozef Babiarz, | Greenhalgh (UCL), Francois Le Faucheur, Anna Charny (Cisco), Jozef | |||
| Kwok-Ho Chan, Corey Alexander (Nortel), David Clark, Bill Lehr, | Babiarz, Kwok-Ho Chan, Corey Alexander (Nortel), David Clark, Bill | |||
| Sharon Gillett, Steve Bauer (MIT) (who publicised various dummy | Lehr, Sharon Gillett, Steve Bauer (MIT) (who publicised various dummy | |||
| traffic attacks), Sally Floyd (ICIR) and comments from participants | traffic attacks), Sally Floyd (ICIR) and comments from participants | |||
| in the CFP/CRN Inter-Provider QoS, Broadband and DoS-Resistant | in the CFP/CRN Inter-Provider QoS, Broadband and DoS-Resistant | |||
| Internet working groups. | Internet working groups. | |||
| 13. Comments Solicited | 13. Comments Solicited | |||
| Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
| addressed to the IETF Congestion and Pre-Congestion Notification | addressed to the IETF Congestion and Pre-Congestion Notification | |||
| working group's mailing list <pcn@ietf.org>, and/or to the author(s). | working group's mailing list <pcn@ietf.org>, and/or to the author(s). | |||
| 14. References | 14. References | |||
| 14.1. Normative References | 14.1. Normative References | |||
| [PCN] Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F., | [I-D.briscoe-tsvwg-ecn-tunnel] | |||
| Charny, A., Liatsos, V., Babiarz, J., Chan, K., Dudley, | Briscoe, B., "Layered Encapsulation of Congestion | |||
| S., Westberg, L., Bader, A., and G. Karagiannis, "Pre- | Notification", draft-briscoe-tsvwg-ecn-tunnel-01 (work in | |||
| Congestion Notification Marking", | progress), July 2008. | |||
| draft-briscoe-tsvwg-cl-phb-03 (work in progress), | ||||
| October 2006. | [I-D.briscoe-tsvwg-re-ecn-tcp] | |||
| Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, | ||||
| "Re-ECN: Adding Accountability for Causing Congestion to | ||||
| TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-06 (work in | ||||
| progress), August 2008. | ||||
| [I-D.eardley-pcn-marking-behaviour] | ||||
| Eardley, P., "Marking behaviour of PCN-nodes", | ||||
| draft-eardley-pcn-marking-behaviour-01 (work in progress), | ||||
| June 2008. | ||||
| [I-D.moncaster-pcn-baseline-encoding] | ||||
| Moncaster, T., Briscoe, B., and M. Menth, "Baseline | ||||
| Encoding and Transport of Pre-Congestion Information", | ||||
| draft-moncaster-pcn-baseline-encoding-02 (work in | ||||
| progress), July 2008. | ||||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
| [RFC2211] Wroclawski, J., "Specification of the Controlled-Load | [RFC2211] Wroclawski, J., "Specification of the Controlled-Load | |||
| Network Element Service", RFC 2211, September 1997. | Network Element Service", RFC 2211, September 1997. | |||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
| RFC 3168, September 2001. | RFC 3168, September 2001. | |||
| [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, | [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, | |||
| J., Courtney, W., Davari, S., Firoiu, V., and D. | J., Courtney, W., Davari, S., Firoiu, V., and D. | |||
| Stiliadis, "An Expedited Forwarding PHB (Per-Hop | Stiliadis, "An Expedited Forwarding PHB (Per-Hop | |||
| Behavior)", RFC 3246, March 2002. | Behavior)", RFC 3246, March 2002. | |||
| [RSVP-ECN] | [RFC4774] Floyd, S., "Specifying Alternate Semantics for the | |||
| Le Faucheur, F., Charny, A., Briscoe, B., Eardley, P., | Explicit Congestion Notification (ECN) Field", BCP 124, | |||
| Babiarz, J., and K. Chan, "RSVP Extensions for Admission | RFC 4774, November 2006. | |||
| Control over Diffserv using Pre-congestion Notification", | ||||
| draft-lefaucheur-rsvp-ecn-01 (work in progress), | ||||
| June 2006. | ||||
| [Re-TCP] Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, | ||||
| "Re-ECN: Adding Accountability for Causing Congestion to | ||||
| TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-05 (work in | ||||
| progress), January 2008. | ||||
| 14.2. Informative References | 14.2. Informative References | |||
| [CLoop_pol] | [CLoop_pol] | |||
| Salvatori, A., "Closed Loop Traffic Policing", Politecnico | Salvatori, A., "Closed Loop Traffic Policing", Politecnico | |||
| Torino and Institut Eurecom Masters Thesis , | Torino and Institut Eurecom Masters Thesis , | |||
| September 2005. | September 2005. | |||
| [ECN-BGP] Mortier, R. and I. Pratt, "Incentive Based Inter-Domain | [ECN-BGP] Mortier, R. and I. Pratt, "Incentive Based Inter-Domain | |||
| Routeing", Proc Internet Charging and QoS Technology | Routeing", Proc Internet Charging and QoS Technology | |||
| Workshop (ICQT'03) pp308--317, September 2003, <http:// | Workshop (ICQT'03) pp308--317, September 2003, <http:// | |||
| research.microsoft.com/users/mort/publications.aspx>. | research.microsoft.com/users/mort/publications.aspx>. | |||
| [I-D.arumaithurai-nsis-pcn] | [I-D.arumaithurai-nsis-pcn] | |||
| Arumaithurai, M., "NSIS PCN-QoSM: A Quality of Service | Arumaithurai, M., "NSIS PCN-QoSM: A Quality of Service | |||
| Model for Pre-Congestion Notification (PCN)", | Model for Pre-Congestion Notification (PCN)", | |||
| draft-arumaithurai-nsis-pcn-00 (work in progress), | draft-arumaithurai-nsis-pcn-00 (work in progress), | |||
| September 2007. | September 2007. | |||
| [I-D.charny-pcn-single-marking] | ||||
| Charny, A., Zhang, X., Faucheur, F., and V. Liatsos, "Pre- | ||||
| Congestion Notification Using Single Marking for Admission | ||||
| and Termination", draft-charny-pcn-single-marking-03 | ||||
| (work in progress), November 2007. | ||||
| [I-D.ietf-nsis-rmd] | [I-D.ietf-nsis-rmd] | |||
| Bader, A., "RMD-QOSM - The Resource Management in Diffserv | Bader, A., "RMD-QOSM - The Resource Management in Diffserv | |||
| QOS Model", draft-ietf-nsis-rmd-12 (work in progress), | QOS Model", draft-ietf-nsis-rmd-12 (work in progress), | |||
| November 2007. | November 2007. | |||
| [I-D.ietf-pcn-architecture] | [I-D.ietf-pcn-architecture] | |||
| Eardley, P., "Pre-Congestion Notification Architecture", | Eardley, P., "Pre-Congestion Notification (PCN) | |||
| draft-ietf-pcn-architecture-03 (work in progress), | Architecture", draft-ietf-pcn-architecture-06 (work in | |||
| February 2008. | progress), September 2008. | |||
| [I-D.ietf-tsvwg-admitted-realtime-dscp] | ||||
| Baker, F., Polk, J., and M. Dolly, "DSCPs for Capacity- | ||||
| Admitted Traffic", | ||||
| draft-ietf-tsvwg-admitted-realtime-dscp-04 (work in | ||||
| progress), February 2008. | ||||
| [IXQoS] Briscoe, B. and S. Rudkin, "Commercial Models for IP | [IXQoS] Briscoe, B. and S. Rudkin, "Commercial Models for IP | |||
| Quality of Service Interconnect", BT Technology Journal | Quality of Service Interconnect", BT Technology Journal | |||
| (BTTJ) 23(2)171--195, April 2005, | (BTTJ) 23(2)171--195, April 2005, | |||
| <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#ixqos>. | <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#ixqos>. | |||
| [QoS_scale] | ||||
| Reid, A., "Economics and Scalability of QoS Solutions", BT | ||||
| Technology Journal (BTTJ) 23(2)97--117, April 2005. | ||||
| [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. | [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. | |||
| Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 | Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 | |||
| Functional Specification", RFC 2205, September 1997. | Functional Specification", RFC 2205, September 1997. | |||
| [RFC2207] Berger, L. and T. O'Malley, "RSVP Extensions for IPSEC | [RFC2207] Berger, L. and T. O'Malley, "RSVP Extensions for IPSEC | |||
| Data Flows", RFC 2207, September 1997. | Data Flows", RFC 2207, September 1997. | |||
| [RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell, | [RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell, | |||
| M., Romanow, A., Weinrib, A., and L. Zhang, "Resource | M., Romanow, A., Weinrib, A., and L. Zhang, "Resource | |||
| ReSerVation Protocol (RSVP) Version 1 Applicability | ReSerVation Protocol (RSVP) Version 1 Applicability | |||
| skipping to change at page 49, line 51 | skipping to change at page 54, line 43 | |||
| [RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., | [RFC2998] Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L., | |||
| Speer, M., Braden, R., Davie, B., Wroclawski, J., and E. | Speer, M., Braden, R., Davie, B., Wroclawski, J., and E. | |||
| Felstaine, "A Framework for Integrated Services Operation | Felstaine, "A Framework for Integrated Services Operation | |||
| over Diffserv Networks", RFC 2998, November 2000. | over Diffserv Networks", RFC 2998, November 2000. | |||
| [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | |||
| Congestion Notification (ECN) Signaling with Nonces", | Congestion Notification (ECN) Signaling with Nonces", | |||
| RFC 3540, June 2003. | RFC 3540, June 2003. | |||
| [RFC4301] Kent, S. and K. Seo, "Security Architecture for the | ||||
| Internet Protocol", RFC 4301, December 2005. | ||||
| [RFC4727] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4, | [RFC4727] Fenner, B., "Experimental Values In IPv4, IPv6, ICMPv4, | |||
| ICMPv6, UDP, and TCP Headers", RFC 4727, November 2006. | ICMPv6, UDP, and TCP Headers", RFC 4727, November 2006. | |||
| [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion | [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion | |||
| Marking in MPLS", RFC 5129, January 2008. | Marking in MPLS", RFC 5129, January 2008. | |||
| [RSVP-ECN] | ||||
| Le Faucheur, F., Charny, A., Briscoe, B., Eardley, P., | ||||
| Babiarz, J., and K. Chan, "RSVP Extensions for Admission | ||||
| Control over Diffserv using Pre-congestion Notification", | ||||
| draft-lefaucheur-rsvp-ecn-01 (work in progress), | ||||
| June 2006. | ||||
| [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., | [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., | |||
| Salvatori, A., Soppera, A., and M. Koyabe, "Policing | Salvatori, A., Soppera, A., and M. Koyabe, "Policing | |||
| Congestion Response in an Internetwork Using Re-Feedback", | Congestion Response in an Internetwork Using Re-Feedback", | |||
| ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// | ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// | |||
| www.acm.org/sigs/sigcomm/sigcomm2005/ | www.acm.org/sigs/sigcomm/sigcomm2005/ | |||
| techprog.html#session8>. | techprog.html#session8>. | |||
| [Smart_rtg] | [Smart_rtg] | |||
| Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang, | Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang, | |||
| "Optimizing Cost and Performance for Multihoming", ACM | "Optimizing Cost and Performance for Multihoming", ACM | |||
| skipping to change at page 50, line 31 | skipping to change at page 55, line 35 | |||
| [Steps_DoS] | [Steps_DoS] | |||
| Handley, M. and A. Greenhalgh, "Steps towards a DoS- | Handley, M. and A. Greenhalgh, "Steps towards a DoS- | |||
| resistant Internet Architecture", Proc. ACM SIGCOMM | resistant Internet Architecture", Proc. ACM SIGCOMM | |||
| workshop on Future directions in network architecture | workshop on Future directions in network architecture | |||
| (FDNA'04) pp 49--56, August 2004. | (FDNA'04) pp 49--56, August 2004. | |||
| Appendix A. Implementation | Appendix A. Implementation | |||
| A.1. Ingress Gateway Algorithm for Blanking the RE flag | A.1. Ingress Gateway Algorithm for Blanking the RE flag | |||
| The ingress gateway receives regular feedback reporting the fraction | The ingress gateway receives regular feedback 'PCN-feedback- | |||
| of congestion marked octets for each aggregate arriving at the | information' reporting the fraction of congestion marked octets for | |||
| egress. So for each aggregate it should blank the RE flag on the | each aggregate arriving at the egress. So for each aggregate it | |||
| same fraction of octets. It is more efficient to calculate the | should blank the RE flag on this fraction of octets. A suitable | |||
| reciprocal of this fraction when the signalling arrives, Z_0 = (1 / | pseudo-code algorithm for the ingress gateway is as follows: | |||
| Congestion-Level-Estimate). Z_0 will be the number of octets of | ||||
| packets the ingress should send with the RE flag set between those it | ||||
| sends with the RE flag blanked. Z_0 will also take account of the | ||||
| sustainable rate reported during the flow pre-emption process, if | ||||
| necessary. | ||||
| A suitable pseudo-code algorithm for the ingress gateway is as | ||||
| follows: | ||||
| ==================================================================== | ==================================================================== | |||
| B_i = 0 /* interblank volume */ | for each PCN-capable-packet { | |||
| for each PCN-capable packet { | if RAND(0,1) <= PCN-feedback-information | |||
| b = readLength(packet) /* set b to packet size */ | writeRE(0); | |||
| B_i += b /* accumulate interblank volume */ | else | |||
| if B_i < b * Z_0 { /* test whether interblank volume... */ | writeRE(1); | |||
| writeRE(1) | ||||
| } else { /* ...exceeds blank RE spacing * pkt size*/ | ||||
| writeRE(0) /* ...and if so, clear RE */ | ||||
| B_i = 0 /* ...and re-set interblank volume */ | ||||
| } | ||||
| } | } | |||
| ==================================================================== | ==================================================================== | |||
| A.2. Downstream Congestion Metering Algorithms | A.2. Downstream Congestion Metering Algorithms | |||
| A.2.1. Bulk Downstream Congestion Metering Algorithm | A.2.1. Bulk Downstream Congestion Metering Algorithm | |||
| To meter the bulk amount of downstream pre-congestion in traffic | To meter the bulk amount of downstream pre-congestion in traffic | |||
| crossing an inter-domain border, an algorithm is needed that | crossing an inter-domain border, an algorithm is needed that | |||
| accumulates the size of positive packets and subtracts the size of | accumulates the size of positive packets and subtracts the size of | |||
| skipping to change at page 51, line 40 | skipping to change at page 56, line 26 | |||
| B: total data volume (in case it is needed) | B: total data volume (in case it is needed) | |||
| A suitable pseudo-code algorithm for a border router is as follows: | A suitable pseudo-code algorithm for a border router is as follows: | |||
| ==================================================================== | ==================================================================== | |||
| V_b = 0 | V_b = 0 | |||
| B = 0 | B = 0 | |||
| for each PCN-capable packet { | for each PCN-capable packet { | |||
| b = readLength(packet) /* set b to packet size */ | b = readLength(packet) /* set b to packet size */ | |||
| B += b /* accumulate total volume */ | B += b /* accumulate total volume */ | |||
| if readEECN(packet) == (Re-Echo || FNE) { | if readEPCN(packet) == (Re-PCT-Echo || FNE) { | |||
| V_b += b /* increment... */ | V_b += b /* increment... */ | |||
| } elseif readEECN(packet) == ( AM(-1) || PM(-1) ) { | } elseif readEPCN(packet) == ( AM(-1) || TM(-1) ) { | |||
| V_b -= b /* ...or decrement V_b... */ | V_b -= b /* ...or decrement V_b... */ | |||
| } /*...depending on EECN field */ | } /*...depending on EPCN field */ | |||
| } | } | |||
| ==================================================================== | ==================================================================== | |||
| At the end of an accounting period this counter V_b represents the | At the end of an accounting period this counter V_b represents the | |||
| pre-congestion volume that penalties could be applied to, as | pre-congestion volume that penalties could be applied to, as | |||
| described in Section 5.3. | described in Section 5.3. | |||
| For instance, accumulated volume of pre-congestion through a border | For instance, accumulated volume of pre-congestion through a border | |||
| interface over a month might be V_b = 5PB (petabyte = 10^15 byte). | interface over a month might be V_b = 5TB (terabyte = 10^12 byte). | |||
| This might have resulted from an average downstream pre-congestion | This might have resulted from an average downstream pre-congestion | |||
| level of 1% on an accumulated total data volume of B = 500PB. | level of 0.001% on an accumulated total data volume of B = 500PB | |||
| (petabyte = 10^15 byte). | ||||
| A.2.2. Inflation Factor for Persistently Negative Flows | A.2.2. Inflation Factor for Persistently Negative Flows | |||
| The following process is suggested to complement the simple algorithm | The following process is suggested to complement the simple algorithm | |||
| above in order to protect against the various attacks from | above in order to protect against the various attacks from | |||
| persistently negative flows described in Section 5.6.1. As explained | persistently negative flows described in Section 5.6.1. As explained | |||
| in that section, the most important and first step is to estimate the | in that section, the most important and first step is to estimate the | |||
| contribution of persistently negative flows to the bulk volume of | contribution of persistently negative flows to the bulk volume of | |||
| downstream pre-congestion and to inflate this bulk volume as if these | downstream pre-congestion and to inflate this bulk volume as if these | |||
| flows weren't there. The process below has been designed to give an | flows weren't there. The process below has been designed to give an | |||
| unbiased estimate, but it may be possible to define other processes | unbiased estimate, but it may be possible to define other processes | |||
| that achieve similar ends. | that achieve similar ends. | |||
| While the above simple metering algorithm is counting the bulk of | While the above simple metering algorithm (Appendix A.2) is counting | |||
| traffic over an accounting period, the meter should also select a | the bulk of traffic over an accounting period, the meter should also | |||
| subset of the whole flow ID space that is small enough to be able to | select a subset of the whole flow ID space that is small enough to be | |||
| realistically measure but large enough to give a realistic sample. | able to realistically measure but large enough to give a realistic | |||
| Many different samples of different subsets of the ID space should be | sample. Many different samples of different subsets of the ID space | |||
| taken at different times during the accounting period, preferably | should be taken at different times during the accounting period, | |||
| covering the whole ID space. During each sample, the meter should | preferably covering the whole ID space. During each sample, the | |||
| count the volume of positive packets and subtract the volume of | meter should count the volume of positive packets and subtract the | |||
| negative, maintaining a separate account for each flow in the sample. | volume of negative, maintaining a separate account for each flow in | |||
| It should run a lot longer than the large majority of flows, to avoid | the sample. It should run a lot longer than the large majority of | |||
| a bias from missing the starts and ends of flows, which tend to be | flows, to avoid a bias from missing the starts and ends of flows, | |||
| positive and negative respectively. | which tend to be positive and negative respectively. | |||
| Once the accounting period finishes, the meter should calculate the | Once the accounting period finishes, the meter should calculate the | |||
| total of the accounts V_{bI} for the subset of flows I in the sample, | total of the accounts V_{bI} for the subset of flows I in the sample, | |||
| and the total of the accounts V_{fI} excluding flows with a negative | and the total of the accounts V_{fI} excluding flows with a negative | |||
| account from the subset I. Then the weighted mean of all these | account from the subset I. Then the weighted mean of all these | |||
| samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I} | samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I} | |||
| V_{bI}. | V_{bI}. | |||
| If V_b is the result of the bulk accounting algorithm over the | If V_b is the result of the bulk accounting algorithm over the | |||
| accounting period (Appendix A.2.1) it can be inflated by this factor | accounting period (Appendix A.2.1) it can be inflated by this factor | |||
| a_S to get a good unbiased estimate of the volume of downstream | a_S to get a good unbiased estimate of the volume of downstream | |||
| congestion over the accounting period a_S.V_b, without being polluted | congestion over the accounting period a_S.V_b, without being polluted | |||
| by the effect of persistently negative flows. | by the effect of persistently negative flows. | |||
| A.3. Algorithm for Sanctioning Negative Traffic | A.3. Algorithm for Sanctioning Negative Traffic | |||
| {ToDo: Write up algorithms similar to Appendix D of [Re-TCP] for the | {ToDo: Write up algorithms similar to Appendix E of | |||
| negative flow monitor with flow management algorithm and the variant | [I-D.briscoe-tsvwg-re-ecn-tcp] for the negative flow monitor with | |||
| with bounded flow state.} | flow management algorithm and the variant with bounded flow state.} | |||
| Author's Address | Author's Address | |||
| Bob Briscoe | Bob Briscoe | |||
| BT & UCL | BT & UCL | |||
| B54/77, Adastral Park | B54/77, Adastral Park | |||
| Martlesham Heath | Martlesham Heath | |||
| Ipswich IP5 3RE | Ipswich IP5 3RE | |||
| UK | UK | |||
| skipping to change at page 54, line 45 | skipping to change at page 59, line 45 | |||
| such proprietary rights by implementers or users of this | such proprietary rights by implementers or users of this | |||
| specification can be obtained from the IETF on-line IPR repository at | specification can be obtained from the IETF on-line IPR repository at | |||
| http://www.ietf.org/ipr. | http://www.ietf.org/ipr. | |||
| The IETF invites any interested party to bring to its attention any | The IETF invites any interested party to bring to its attention any | |||
| copyrights, patents or patent applications, or other proprietary | copyrights, patents or patent applications, or other proprietary | |||
| rights that may cover technology that may be required to implement | rights that may cover technology that may be required to implement | |||
| this standard. Please address the information to the IETF at | this standard. Please address the information to the IETF at | |||
| ietf-ipr@ietf.org. | ietf-ipr@ietf.org. | |||
| Acknowledgments | Acknowledgment | |||
| Funding for the RFC Editor function is provided by the IETF | This document was produced using xml2rfc v1.33 (of | |||
| Administrative Support Activity (IASA). This document was produced | http://xml.resource.org/) from a source in RFC-2629 XML format. | |||
| using xml2rfc v1.32 (of http://xml.resource.org/) from a source in | ||||
| RFC-2629 XML format. | ||||
| End of changes. 194 change blocks. | ||||
| 756 lines changed or deleted | 1004 lines changed or added | |||
This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||