| draft-briscoe-tsvwg-re-ecn-tcp-06.txt | draft-briscoe-tsvwg-re-ecn-tcp-07.txt | |||
|---|---|---|---|---|
| Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
| Internet-Draft BT & UCL | Internet-Draft BT & UCL | |||
| Intended status: Standards Track A. Jacquet | Intended status: Standards Track A. Jacquet | |||
| Expires: January 15, 2009 T. Moncaster | Expires: September 4, 2009 T. Moncaster | |||
| A. Smith | A. Smith | |||
| BT | BT | |||
| July 14, 2008 | March 3, 2009 | |||
| Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | |||
| draft-briscoe-tsvwg-re-ecn-tcp-06 | draft-briscoe-tsvwg-re-ecn-tcp-07 | |||
| Status of this Memo | Status of This Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| other groups may also distribute working documents as Internet- | other groups may also distribute working documents as Internet- | |||
| Drafts. | Drafts. | |||
| skipping to change at page 1, line 37 | skipping to change at page 1, line 37 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on January 15, 2009. | This Internet-Draft will expire on September 4, 2009. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The IETF Trust (2008). | Copyright (c) 2009 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | ||||
| This document is subject to BCP 78 and the IETF Trust's Legal | ||||
| Provisions Relating to IETF Documents in effect on the date of | ||||
| publication of this document (http://trustee.ietf.org/license-info). | ||||
| Please review these documents carefully, as they describe your rights | ||||
| and restrictions with respect to this document. | ||||
| Abstract | Abstract | |||
| This document introduces a new protocol for explicit congestion | This document introduces a new protocol for explicit congestion | |||
| notification (ECN), termed re-ECN, which can be deployed | notification (ECN), termed re-ECN, which can be deployed | |||
| incrementally around unmodified routers. It enbales the the upstream | incrementally around unmodified routers. The protocol works by | |||
| party at any trust boundary in the internetwork to be held | arranging an extended ECN field in each packet so that, as it crosses | |||
| responsible for the congestion they cause, or allow to be caused. | any interface in an internetwork, it will carry a truthful prediction | |||
| of congestion on the remainder of its path. The purpose of this | ||||
| So, networks can introduce straightforward accountability for | document is to specify the re-ECN protocol at the IP layer and to | |||
| congestion and policing mechanisms for incoming traffic from end- | give guidelines on any consequent changes required to transport | |||
| customers or from neighbouring network domains. The protocol works | protocols. It includes the changes required to TCP both as an | |||
| by arranging an extended ECN field in each packet so that, as it | example and as a specification. It briefly gives examples of | |||
| crosses any interface in an internetwork, it will carry a truthful | ||||
| prediction of congestion on the remainder of its path. The purpose | ||||
| of this document is to specify the re-ECN protocol at the IP layer | ||||
| and to give guidelines on any consequent changes required to | ||||
| transport protocols. It includes the changes required to TCP both as | ||||
| an example and as a specification. It also gives examples of | ||||
| mechanisms that can use the protocol to ensure data sources respond | mechanisms that can use the protocol to ensure data sources respond | |||
| correctly to congestion. And it describes example mechanisms that | correctly to congestion,and these are described more fully in a | |||
| ensure the dominant selfish strategy of both network domains and end- | companion document [re-ecn-motive]. | |||
| points will be to set the extended ECN field honestly. | ||||
| Authors' Statement: Status (to be removed by the RFC Editor) | Authors' Statement: Status (to be removed by the RFC Editor) | |||
| Although the re-ECN protocol is intended to make a simple but far- | Although the re-ECN protocol is intended to make a simple but far- | |||
| reaching change to the Internet architecture, the most immediate | reaching change to the Internet architecture, the most immediate | |||
| priority for the authors is to delay any move of the ECN nonce to | priority for the authors is to delay any move of the ECN nonce to | |||
| Proposed Standard status. The argument for this position is | Proposed Standard status. The argument for this position is | |||
| developed in Appendix I. | developed in Appendix E. | |||
| Changes from previous drafts (to be removed by the RFC Editor) | Changes from previous drafts (to be removed by the RFC Editor) | |||
| Full diffs created using the rfcdiff tool are available at | Full diffs created using the rfcdiff tool are available at | |||
| <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp> | <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp> | |||
| From -05 to -06 (current version): | From -06 to -07 (current version): | |||
| Clarifications made to Section 1 and Section 3. | ||||
| Minor editorial changes throughout. | ||||
| From -04 to -05: | ||||
| Completed justification for packet marking with FNE during slow- | ||||
| start(Appendix D). | ||||
| Minor editorial changes throughout. | ||||
| From -03 to -04: | ||||
| Clarified reasons for holding back ECN nonce (Section 3.3 & | ||||
| Appendix I). | ||||
| Clarified Figure 2. | ||||
| Added Section 4.1.1.1 on equivalence of drops and ECN marks. | ||||
| Improved precision of Section 5.6 on IP in IP tunnels. | ||||
| Explained the RTT fairness is possible to enforce, but unlikely to | ||||
| be required (Section 6.1.3 & Appendix F). | ||||
| Explained that bulk per-user policing should be adequate but per- | ||||
| flow policing is also possible if desired, though it is not likely | ||||
| to be necessary (Section 6.1.5 & Appendix G). | ||||
| Reinforced need for passive policing at inter-domain borders to | ||||
| enable all-optical networking (Section 6.1.6). | ||||
| Minor editorial changes throughout. | ||||
| From -02 to -03: | Major changes made following splitting this protocol document from | |||
| the related motivations document [re-ecn-motive]. | ||||
| Started guidelines for re-ECN support in DCCP and SCTP. | Significant re-ordering of remaining text. | |||
| Added annex on limitations of nonce mechanism. | New terminology introduced for clarity. | |||
| Minor editorial changes throughout. | Minor editorial changes throughout. | |||
| From -01 to -02: | ||||
| Explanation on informal terminology in Section 3.5 clarified. | ||||
| IPv6 wire protocol encoding added (Section 5.2). | ||||
| Text on (non-)issues with tunnels, encryption and link layer | ||||
| congestion notification added (Section 5.6 & Section 5.7). | ||||
| Section added giving evolvability arguments against encouraging | ||||
| bottleneck policing (Section 6.1.2). And text on re-ECN's | ||||
| evolvability by design added to Section 6.1.3 | ||||
| Text on inter-domain policing (Section 6.1.6) and inter-domain | ||||
| fail-safes (Section 6.1.7) added. | ||||
| From -00 to -01: | ||||
| Encoding of re-ECN wire protocol changed for reasons given in | ||||
| Appendix B and consequently draft substantially re-written. | ||||
| Substantial text added in sections on applications, incremental | ||||
| deployment, architectural rationale and security considerations. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 8 | 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 6 | |||
| 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 | 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 3.1. Background and Applicability . . . . . . . . . . . . . . . 8 | 4. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 3.2. Simplified Re-ECN Protocol . . . . . . . . . . . . . . . . 10 | 4.1. Simplified Re-ECN Protocol . . . . . . . . . . . . . . . . 7 | |||
| 3.3. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | 4.1.1. Congestion Control and Policing the Protocol . . . . . 7 | |||
| v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 4.1.2. Background and Applicability . . . . . . . . . . . . . 8 | |||
| 3.4. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 12 | 4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | |||
| 3.5. Informal Terminology . . . . . . . . . . . . . . . . . . . 14 | v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15 | 4.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 10 | |||
| 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 | 4.4. Positive and Negative Flows . . . . . . . . . . . . . . . 12 | |||
| 4.1.1. RECN mode: Full Re-ECN capable transport . . . . . . . 17 | 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 4.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168 | 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 13 | |||
| compliant ECN Receiver . . . . . . . . . . . . . . . . 20 | 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 15 | |||
| 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 21 | 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 16 | |||
| 4.1.4. Extended ECN (EECN) Field Settings during Flow | 5.4. Justification for Setting the First SYN to FNE . . . . . . 17 | |||
| Start or after Idle Periods . . . . . . . . . . . . . 23 | 5.5. Control and Management . . . . . . . . . . . . . . . . . . 18 | |||
| 4.1.5. Pure ACKS, Retransmissions, Window Probes and | 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 18 | |||
| Partial ACKs . . . . . . . . . . . . . . . . . . . . . 27 | 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 19 | |||
| 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 27 | 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 4.2.1. General Guidelines for Adding Re-ECN to Other | 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
| Transports . . . . . . . . . . . . . . . . . . . . . . 27 | 6. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 28 | 6.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 28 | 6.1.1. RECN mode: Full Re-ECN capable transport . . . . . . . 22 | |||
| 4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 29 | 6.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168 | |||
| 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 29 | compliant ECN Receiver . . . . . . . . . . . . . . . . 24 | |||
| 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 29 | 6.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 26 | |||
| 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 30 | 6.1.4. Extended ECN (EECN) Field Settings during Flow | |||
| 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 31 | Start or after Idle Periods . . . . . . . . . . . . . 27 | |||
| 5.4. Justification for Setting the First SYN to FNE . . . . . . 33 | 6.1.5. Pure ACKS, Retransmissions, Window Probes and | |||
| 5.5. Control and Management . . . . . . . . . . . . . . . . . . 34 | Partial ACKs . . . . . . . . . . . . . . . . . . . . . 31 | |||
| 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 34 | 6.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 32 | |||
| 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 35 | 6.2.1. General Guidelines for Adding Re-ECN to Other | |||
| 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 35 | Transports . . . . . . . . . . . . . . . . . . . . . . 32 | |||
| 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 36 | 6.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 32 | |||
| 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 37 | 6.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 33 | |||
| 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 37 | 6.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 33 | |||
| 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 37 | 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 33 | |||
| 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 38 | 8. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
| 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 39 | 8.1. Congestion Notification Integrity . . . . . . . . . . . . 34 | |||
| 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 46 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 35 | |||
| 6.1.5. Policing . . . . . . . . . . . . . . . . . . . . . . . 47 | 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 | |||
| 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 49 | 11. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 37 | |||
| 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 52 | 12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37 | |||
| 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 53 | 13. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 38 | |||
| 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 53 | 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 38 | |||
| 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 53 | 14.1. Normative References . . . . . . . . . . . . . . . . . . . 38 | |||
| 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 54 | 14.2. Informative References . . . . . . . . . . . . . . . . . . 39 | |||
| 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 55 | Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 41 | |||
| 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 55 | ||||
| 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 55 | ||||
| 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 56 | ||||
| 7.1. Incremental Deployment Features . . . . . . . . . . . . . 56 | ||||
| 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 57 | ||||
| 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 62 | ||||
| 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 65 | ||||
| 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 65 | ||||
| 9.2. Congestion Notification Integrity . . . . . . . . . . . . 66 | ||||
| 9.3. Identifying Upstream and Downstream Congestion . . . . . . 67 | ||||
| 10. Security Considerations . . . . . . . . . . . . . . . . . . . 67 | ||||
| 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 68 | ||||
| 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 69 | ||||
| 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 69 | ||||
| 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 69 | ||||
| 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 70 | ||||
| 15.1. Normative References . . . . . . . . . . . . . . . . . . . 70 | ||||
| 15.2. Informative References . . . . . . . . . . . . . . . . . . 70 | ||||
| Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 74 | ||||
| Appendix B. Justification for Two Codepoints Signifying Zero | Appendix B. Justification for Two Codepoints Signifying Zero | |||
| Worth Packets . . . . . . . . . . . . . . . . . . . . 75 | Worth Packets . . . . . . . . . . . . . . . . . . . . 43 | |||
| Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 76 | Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 44 | |||
| Appendix D. Packet Marking with FNE During Flow Start . . . . . . 78 | Appendix D. Packet Marking with FNE During Flow Start . . . . . . 45 | |||
| Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 80 | Appendix E. Argument for holding back the ECN nonce . . . . . . . 47 | |||
| Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 80 | Appendix F. Alternative Terminology Used in Other Documents . . . 49 | |||
| Appendix G. Policer Designs to ensure Congestion | ||||
| Responsiveness . . . . . . . . . . . . . . . . . . . 80 | ||||
| G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 80 | ||||
| G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 82 | ||||
| Appendix H. Downstream Congestion Metering Algorithms . . . . . . 84 | ||||
| H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 84 | ||||
| H.2. Inflation Factor for Persistently Negative Flows . . . . . 85 | ||||
| Appendix I. Argument for holding back the ECN nonce . . . . . . . 86 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 88 | ||||
| Intellectual Property and Copyright Statements . . . . . . . . . . 90 | ||||
| 1. Introduction | 1. Introduction | |||
| This document aims: | This document aims to provide a complete specification of the | |||
| addition of the re-ECN protocol to IP and guidelines on how to add it | ||||
| o To provide a complete specification of the addition of the re-ECN | to transport layer protocols, including a complete specification of | |||
| protocol to IP and guidelines on how to add it to transport layer | re-ECN in TCP as an example. The motivation behind this proposal is | |||
| protocols, including a complete specification of re-ECN in TCP as | given in [re-ecn-motive], but we include a brief summary here. | |||
| an example; | ||||
| o To show how a number of hard problems become much easier to solve | ||||
| once re-ECN is available in IP. | ||||
| In ECN [RFC3168] congested queues probabilistically mark packets as | Re-ECN is intended to allow senders to inform the network of the | |||
| they approach a congested state. The receiver informs the sender | level of congestion they expect their flows to see. This information | |||
| that they have seen one or more marks. In re-ECN the sender must | is currently only visible at the transport layer. ECN [RFC3168] | |||
| predict the level of congestion on the path by re-inserting feedback | reveals the upstream congestion state of any path by monitoring the | |||
| according to the marking scheme described later in this draft. This | rate of CE marks. The receiver then informs the sender when they | |||
| results in packets that carry a prediction of downstream congestion. | have seen a marked packet. Re-ECN builds on ECN by providing new | |||
| codepoints that allow the sender to declare the level of congestion | ||||
| they expect on the forward path. It is closely related to ECN and | ||||
| indeed we define a compatability mode to allow a re-ECN sender to | ||||
| communicate with an ECN receiver [xref]. | ||||
| If a sender understates expected congestion compared to actual | If a sender understates expected congestion compared to actual | |||
| congestion then the network could discard packets or enact some other | congestion then the network could discard packets or enact some other | |||
| sanction. A policer can also be introduced at the ingress of | sanction. A policer can also be introduced at the ingress of | |||
| networks that can limit the congestion caused (or base penalties on | networks that can limit the level of congestion being caused. | |||
| it). | ||||
| It is important to add a few key points. | ||||
| o It can be seen that it takes one round trip before any feedback is | ||||
| received. For this reason a sender must make a conservative | ||||
| prediction by transmitting IP packets with a special Feedback Not | ||||
| Established (FNE) marking. | ||||
| o It should be noted that the prediction is carried in-band in | ||||
| normal data packets and for many transports feedback can be | ||||
| carried in the normal acknowledgements or control packets. | ||||
| o The re-ECN protocol is independent of the transport. In TCP, | ||||
| acknowledgments are used to convey the feedback from receiver to | ||||
| sender. This memo concentrates on TCP as an example transport | ||||
| protocol, however the re-ECN protocol is compatible with any | ||||
| transport where feedback can be sent from receiver to sender. | ||||
| A general statement of the problem solved by re-ECN is to provide | A general statement of the problem solved by re-ECN is to provide | |||
| sufficient information in each IP datagram to be able to hold senders | sufficient information in each IP datagram to be able to hold senders | |||
| and whole networks accountable for the congestion they cause | and whole networks accountable for the congestion they cause | |||
| downstream, before they cause it. But the every-day problems that | downstream, before they cause it. But the every-day problems that | |||
| re-ECN can solve are much more recognisable than this rather generic | re-ECN can solve are much more recognisable than this rather generic | |||
| statement: mitigating distributed denial of service (DDoS); | statement: mitigating distributed denial of service (DDoS); | |||
| simplifying differentiation of quality of service (QoS); policing | simplifying differentiation of quality of service (QoS); policing | |||
| compliance to congestion control; and so on. | compliance to congestion control; and so on. | |||
| Uniquely, re-ECN manages to enable solutions to these problems | It is important to add a few key points. | |||
| without unduly stifling innovative new ways to use the Internet. | ||||
| This was a hard balance to strike, given it could be argued that DDoS | ||||
| is an innovative way to use the Internet. The most valuable insight | ||||
| was to allow each network to choose the level of constraint it wishes | ||||
| to impose. Also re-ECN has been carefully designed so that networks | ||||
| that choose to use it conservatively can protect themselves against | ||||
| the congestion caused in their network by users on other networks | ||||
| with more liberal policies. | ||||
| For instance, some network owners want to block applications like | ||||
| voice and video unless their network is compensated for the extra | ||||
| share of bottleneck bandwidth taken. These real-time applications | ||||
| tend to be unresponsive when congestion arises. Whereas elastic TCP- | ||||
| based applications back away quickly, ending up taking a much smaller | ||||
| share of congested capacity for themselves. Other network owners | ||||
| want to invest in large amounts of capacity and make their gains from | ||||
| simplicity of operation and economies of scale. | ||||
| re-ECN allows the more conservative networks to police out flows that | ||||
| have not asked to be unresponsive to congestion---not because they | ||||
| are voice or video---just because they don't respond to congestion. | ||||
| But it also allows other networks to choose not to police. | ||||
| Crucially, when flows from liberal networks cross into a conservative | ||||
| network, re-ECN enables the conservative network to apply penalties | ||||
| to its neighbouring networks for the congestion they allow to be | ||||
| caused. And these penalties can be applied to bulk data, without | ||||
| regard to flows. | ||||
| Then, if unresponsive applications become so dominant that some of | o In any stnadard network it always takes one round trip before any | |||
| the more liberal networks experience congestion collapse [RFC3714], | feedback is received. For this reason a sender must make a | |||
| they can change their minds and use re-ECN to apply tighter controls | conservative prediction by transmitting IP packets with a special | |||
| in order to bring congestion back under control. | Cautious marking. | |||
| re-ECN works by arranging that each packet arrives at each network | o It should be noted that the prediction is carried in-band in | |||
| element carrying a view of expected congestion on its own downstream | normal data packets and for many transports feedback can be | |||
| path, albeit averaged over multiple packets. Most usefully, | carried in the normal acknowledgements or control packets. | |||
| congestion on the remainder of the path becomes visible in the IP | ||||
| header at the first ingress. Many of the applications of re-ECN | ||||
| involve a policer at this ingress using the view of downstream | ||||
| congestion arriving in packets to police or control the packet rate. | ||||
| Importantly, the scheme is recursive: a whole network harbouring | o The re-ECN protocol is independent of the transport. In TCP, | |||
| users causing congestion in downstream networks can be held | acknowledgments are used to convey the feedback from receiver to | |||
| responsible or policed by its downstream neighbour. | sender. This memo concentrates on TCP as an example transport | |||
| protocol, however the re-ECN protocol is compatible with any | ||||
| transport where feedback can be sent from receiver to sender. | ||||
| This document is structured as follows. First an overview of the re- | This document is structured as follows. First an overview of the re- | |||
| ECN protocol is given (Section 3), outlining its attributes and | ECN protocol is given (Section 4), outlining its attributes and | |||
| explaining conceptually how it works as a whole. The two main parts | explaining conceptually how it works as a whole. The two main parts | |||
| of the document follow. That is, the protocol specification divided | of the document follow. That is, the protocol specification divided | |||
| into transport (Section 4) and network (Section 5) layers which | into network (Section 5) and transport (Section 6) layers. | |||
| contain most of the standards compliance terminology, then the | ||||
| applications re-ECN can be put to, such as policing DDoS, QoS and | ||||
| congestion control (Section 6). Although these applications do not | ||||
| require standardisation themselves, they are described in a fair | ||||
| degree of detail in order to explain how re-ECN can be used. Given | ||||
| re-ECN proposes to use the last undefined bit in the IPv4 header, we | ||||
| felt it necessary to outline the potential that re-ECN could release | ||||
| in return for being given that bit. | ||||
| Deployment issues discussed throughout the document are brought | Deployment issues discussed throughout the document are brought | |||
| together in Section 7, which is followed by a brief section | together in Section 7. Related work is discussed in (Section 8). | |||
| explaining the somewhat subtle rationale for the design from an | ||||
| architectural perspective (Section 8). We end by describing related | ||||
| work (Section 9), listing security considerations (Section 10) and | ||||
| finally drawing conclusions (Section 12). | ||||
| 2. Requirements notation | 2. Requirements notation | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
| This document first specifies a protocol, then describes a framework | 3. Terminology | |||
| that creates the right incentives to ensure compliance to the | ||||
| protocol. This could cause confusion because the second part of the | ||||
| document considers many cases where malicious nodes may not comply | ||||
| with the protocol. When such contingencies are described, if any of | ||||
| the above keywords are not capitalised, that is deliberate. So, for | ||||
| instance, the following two apparently contradictory sentences would | ||||
| be perfectly consistent: i) x MUST do this; ii) x may not do this. | ||||
| 3. Protocol Overview | The following terminology is used throughout this memo. Some of this | |||
| terminology is new and, to avoid confusion, Appendix F sets out all | ||||
| the alternative terminology that has been used in other re-ECN | ||||
| related documents. | ||||
| 3.1. Background and Applicability | o Neutral packet - a packet that is able to be congestion marked by | |||
| an ECN or re-ECN queue. | ||||
| o Negative packet - a Neutral packet that has been congestion marked | ||||
| by an ECN or re-ECN queue. | ||||
| o Positive packet - a packet that has been marked by the sender to | ||||
| indicate the expected level of congestion along its path. In | ||||
| general Positive packets should only be sent in response to | ||||
| feedback received from the receiver.* | ||||
| o Cancelled packet - a Positive Packet that has been congestion | ||||
| marked by an ECN or re-ECN queue. | ||||
| o Cautious packet - a packet that has been marked by the sender to | ||||
| indeiate the expected level of congestion along its path. In | ||||
| general Cautious packets should be used when there is insufficient | ||||
| feedback to be confident about the congestion state of the | ||||
| network.* | ||||
| o * the difference between positive and cautious packets is | ||||
| explained in detail later in the document along with guidelines on | ||||
| the use of Cautious packets. | ||||
| All the above terms have related IP codepoints as defined in | ||||
| (Section 5). | ||||
| 4. Protocol Overview | ||||
| 4.1. Simplified Re-ECN Protocol | ||||
| We describe here the simplified re-ECN protocol. To simplify the | ||||
| description we assume packets and segments are synonymous. | ||||
| Packets are sent from a sender to a receiver. In Figure 1 the queues | ||||
| (Q1 and Q2) are ECN enabled as per RFC 3168 [RFC3168]. If congestion | ||||
| occurs then packets are marked with the congestion experienced (CE) | ||||
| flag exactly as in the ECN protocol [RFC3168]; the routers do not | ||||
| need to be modified and do not need to know the re-ECN protocol. The | ||||
| receiver constantly informs the sender of the current count of | ||||
| Positive packets it has seen. The sender uses this information | ||||
| determine how many Positive packets it must send into the network. | ||||
| The receiver's aim is to balance the number of bytes that have been | ||||
| congestion marked with the number of Positive bytes it has sent. | ||||
| +--------- Feedback----------+ | ||||
| | | | ||||
| v | | ||||
| +---+ +----+ +----+ +---+ | ||||
| | | | | | | | | | ||||
| | S |--->| Q1 |--->| Q2 |--->| R | | ||||
| | | | | | | | | | ||||
| +---+ +----+ +----+ +---+ | ||||
| Figure 1: Simple Re-ECN | ||||
| 4.1.1. Congestion Control and Policing the Protocol | ||||
| The arrangement of the protocol ensures that packets carry a | ||||
| declaration of the amount of congestion that will be experienced on | ||||
| the path. The re-ECN protocol is orthogonal to to any congestion | ||||
| control algorithms, but can be used to ensure that congestion control | ||||
| is being applied by the sender. | ||||
| In general we assume that there will be a policer at the network | ||||
| ingress which can rate limit traffic based on the amount of | ||||
| congestion declared. | ||||
| At the network egress there is a droper which can impose sanctions on | ||||
| flows that incorrectly declare congestion. | ||||
| Policers and droppers are explained in more detail in | ||||
| [re-ecn-motive]. | ||||
| 4.1.2. Background and Applicability | ||||
| The re-ECN protocol makes no changes and has no effect on the TCP | The re-ECN protocol makes no changes and has no effect on the TCP | |||
| congestion control algorithm or on other rate responses to | congestion control algorithm or on other rate responses to | |||
| congestion. re-ECN is not a new congestion control protocol, rather | congestion. re-ECN is not a new congestion control protocol, rather | |||
| it is orthogonal to congestion control itself. Re-ECN is concerned | it is orthogonal to congestion control itself. Re-ECN is concerned | |||
| with revealing information about congestion so that users and | with revealing information about congestion so that users and | |||
| networks can be held accountable for the congestion they cause, or | networks can be held accountable for the congestion they cause, or | |||
| allow to be caused. | allow to be caused. | |||
| Re-ECN builds on ECN so we briefly recap the essentials of the ECN | Re-ECN builds on ECN so we briefly recap the essentials of the ECN | |||
| protocol [RFC3168]. Two bits in the IP protocol (v4 or v6) are | protocol [RFC3168]. Two bits in the IP protocol (v4 or v6) are | |||
| assigned to the ECN field. The sender clears the field to "00" (Not- | assigned to the ECN field. The sender clears the field to "00" (Not- | |||
| ECT) if either end-point transport is not ECN-capable. Otherwise it | ECT) if either end-point transport is not ECN-capable. Otherwise it | |||
| indicates an ECN-capable transport (ECT) using either of the two | indicates an ECN-capable transport (ECT) using either of the two | |||
| code-points "10" or "01" (ECT(0) and ECT(1) resp.). | code-points "10" or "01" (ECT(0) and ECT(1) resp.). | |||
| ECN-capable queues probabilistically set "11" if congestion is | ECN-capable queues probabilistically set this field to "11" if | |||
| experienced (CE), the marking probability increasing with the length | congestion is experienced (CE). In general this marking probability | |||
| of the queue at its egress link (typically using the RED | will increase with the length of the queue at its egress link | |||
| algorithm [RFC2309]). However, they still drop rather than mark Not- | (typically using the RED algorithm [RFC2309]). However, they still | |||
| ECT packets. With multiple ECN-capable queues on a path, a flow of | drop rather than mark Not-ECT packets. With multiple ECN-capable | |||
| packets accumulates the fraction of CE marking that each queue adds. | queues on a path, a flow of packets accumulates the fraction of CE | |||
| The combined effect of the packet marking of all the queues along the | marking that each queue adds. The combined effect of the packet | |||
| path signals congestion of the whole path to the receiver. So, for | marking of all the queues along the path signals congestion of the | |||
| example, if one queue early in a path is marking 1% of packets and | whole path to the receiver. So, for example, if one queue early in a | |||
| another later in a path is marking 2%, flows that pass through both | path is marking 1% of packets and another later in a path is marking | |||
| queues will experience approximately 3% marking (see Appendix A for a | 2%, flows that pass through both queues will experience approximately | |||
| precise treatment). | 3% marking (see Appendix A for a precise treatment). | |||
| The choice of two ECT code-points in the ECN field [RFC3168] | The choice of two ECT code-points in the ECN field [RFC3168] | |||
| permitted future flexibility, optionally allowing the sender to | permitted future flexibility, optionally allowing the sender to | |||
| encode the experimental ECN nonce [RFC3540] in the packet stream. | encode the experimental ECN nonce [RFC3540] in the packet stream. | |||
| The nonce is designed to allow a sender to check the integrity of | The nonce is designed to allow a sender to check the integrity of | |||
| congestion feedback. But Section 9.2 explains that it still gives no | congestion feedback. But Section 8.1 explains that it still gives no | |||
| control over how fast the sender transmits as a result of the | control over how fast the sender transmits as a result of the | |||
| feedback. On the other hand, re-ECN is designed both to ensure that | feedback. On the other hand, re-ECN is designed both to ensure that | |||
| congestion is declared honestly and that the sender's rate responds | congestion is declared honestly and that the sender's rate responds | |||
| appropriately. | appropriately. | |||
| Re-ECN is based on a feedback arrangement called `re- | Re-ECN is based on a feedback arrangement called `re- | |||
| feedback' [Re-fb]. The word is short for either receiver-aligned, | feedback' [Re-fb]. The word is short for either receiver-aligned, | |||
| re-inserted or re-echoed feedback. But it actually works even when | re-inserted or re-echoed feedback. But it actually works even when | |||
| no feedback is available. In fact it has been carefully designed to | no feedback is available. In fact it has been carefully designed to | |||
| work for single datagram flows. It also encourages aggregation of | work for single datagram flows. It also encourages aggregation of | |||
| single packet flows by congestion control proxies. Then, even if the | single packet flows by congestion control proxies. Then, even if the | |||
| traffic mix of the Internet were to become dominated by short | traffic mix of the Internet were to become dominated by short | |||
| messages, it would still be possible to control congestion | messages, it would still be possible to control congestion | |||
| effectively and efficiently. | effectively and efficiently. | |||
| Changing the Internet's feedback architecture seems to imply | Changing the Internet's feedback architecture seems to imply | |||
| considerable upheaval. But re-ECN can be deployed incrementally at | considerable upheaval. But re-ECN can be deployed incrementally at | |||
| the transport layer around unmodified queues using existing fields in | the transport layer around unmodified queues using existing fields in | |||
| IP (v4 or v6). However it does also require the last undefined bit | IP (v4 or v6). However it does also require the last undefined bit | |||
| in the IPv4 header, which it uses in combination with the 2-bit ECN | in the IPv4 header, which it uses in combination with the 2-bit ECN | |||
| field to create four new codepoints. Nonetheless, we RECOMMENDED | field to create four new codepoints. Nonetheless, we RECOMMEND | |||
| adding optional preferentail drop to IP queues based on the re-ECN | adding optional preferentail drop to IP queues based on the re-ECN | |||
| fields in order to improve resilience against DoS attacks. | fields in order to improve resilience against DoS attacks. | |||
| Similarly, re-ECN works best if both the sender and receiver | Similarly, re-ECN works best if both the sender and receiver | |||
| transports are re-ECN-capable, but it can work with just sender | transports are re-ECN-capable, but it can work with just sender | |||
| support. Section 7.1 summarises the incremental deployment strategy. | support(Section 6.1.2). | |||
| Before re-ECN can be considered worthy of using up the last bit in | ||||
| the IP header, we must be sure that all our claims are robust. We | ||||
| have gradually been reducing the list of outstanding issues, but the | ||||
| few that still remain are listed in Section 6.3. We expect new | ||||
| attacks may still be found, but we offer the re-ECN protocol on the | ||||
| basis that it is built on fairly solid theoretical foundations and, | ||||
| so far, it has proved possible to keep it relatively robust. | ||||
| 3.2. Simplified Re-ECN Protocol | ||||
| We describe here the simplified re-ECN protocol. In this first | ||||
| description we assume packets and segments are synonymous. | ||||
| Packets are sent from a sender to a receiver. In Figure 1 the queues | ||||
| (Q1 and Q2) are ECN enabled as per RFC 3168 [ref]. If congestion | ||||
| occurs then packets are marked with the congestion experienced (CE) | ||||
| flag exactly as in the ECN protocol [RFC3168]; the routers do not | ||||
| need to be modified and do not need to know the re-ECN protocol. On | ||||
| reception of marked packets the receiver notifies the sender of the | ||||
| current count of marked packets. Note that this is the number of | ||||
| packets marked rather than the setting of the ECE flag in ECN. The | ||||
| sender uses this information to re-echo mark packets in exact | ||||
| correspondence to the number of CE marked bytes observed at the | ||||
| receiver. | ||||
| +--------- Feedback----------+ | ||||
| | | | ||||
| v | | ||||
| +---+ +----+ +----+ +---+ | ||||
| | | RE | | | | | | | ||||
| | S |--->| Q1 |--->| Q2 |--->| R | | ||||
| | | | | | | | | | ||||
| +---+ +----+ +----+ +---+ | ||||
| Figure 1: Simple Re-ECN | ||||
| 3.3. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6) | 4.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6) | |||
| The re-ECN wire protocol uses the two bit ECN field broadly as in | The re-ECN wire protocol uses the two bit ECN field broadly as in | |||
| RFC3168 [RFC3168] as described above, but with five differences of | RFC3168 [RFC3168] as described above, but with five differences of | |||
| detail (brought together in a list in Section 7.1). This | detail (brought together in a list in Section 7). This specification | |||
| specification defines a new re-ECN extension (RE) flag. We will | defines a new re-ECN extension (RE) flag. We will defer the | |||
| defer the definition of the actual position of the RE flag in the | definition of the actual position of the RE flag in the IPv4 & v6 | |||
| IPv4 & v6 headers until Section 5. When we don't need to choose | headers until Section 5. When we don't need to choose between IPv4 | |||
| between IPv4 and v6 wire protocols it will suffice call it the RE | and v6 wire protocols it will suffice call it the RE flag. | |||
| flag. | ||||
| Unlike the ECN field, the RE flag is intended to be set by the sender | Unlike the ECN field, the RE flag is intended to be set by the sender | |||
| and remain unchanged along the path, although it can be read by | and SHOULD remain unchanged along the path, although it can be read | |||
| network elements that understand the re-ECN protocol. It is feasible | by network elements that understand the re-ECN protocol. It is | |||
| that a network element MAY change the setting of the RE flag, perhaps | feasible that a network element MAY change the setting of the RE | |||
| acting as a proxy for an end-point, but such a protocol would have to | flag, perhaps acting as a proxy for an end-point, but such a protocol | |||
| be defined in another specification (e.g. [Re-PCN]). | would have to be defined in another specification (e.g. [Re-PCN]). | |||
| Although the RE flag is a separate, single bit field, it can be read | Although the RE flag is a separate, single bit field, it can be read | |||
| as an extension to the two-bit ECN field; the three concatenated bits | as an extension to the two-bit ECN field; the three concatenated bits | |||
| in what we will call the extended ECN field (EECN) giving eight | in what we will call the extended ECN field (EECN) giving eight | |||
| codepoints. We will use the RFC3168 names of the ECN codepoints to | codepoints. We will use the RFC3168 names of the ECN codepoints to | |||
| describe settings of the ECN field when the RE flag setting is "don't | describe settings of the ECN field when the RE flag setting is "don't | |||
| care", but we also define the following six extended ECN codepoint | care", but we also define the following six extended ECN codepoint | |||
| names for when we need to be more specific. | names for when we need to be more specific. | |||
| One of re-ECN's codepoints is an alternative use of the codepoint set | One of re-ECN's codepoints is an alternative use of the codepoint set | |||
| aside in RFC3168 for the ECN nonce (ECT(1)). Transports using re-ECN | aside in RFC3168 for the ECN nonce (ECT(1)). Transports using re-ECN | |||
| do not need to use the ECN nonce as long as the sender is also | do not need to use the ECN nonce as long as the sender is also | |||
| checking for transport protocol compliance | checking for transport protocol compliance | |||
| [I-D.moncaster-tcpm-rcv-cheat]. The case for doing this is given in | [I-D.moncaster-tcpm-rcv-cheat]. The case for doing this is given in | |||
| Appendix I. Two re-ECN codepoints are given compatible uses to those | Appendix E. Two re-ECN codepoints are given compatible uses to those | |||
| defined in RFC3168 (Not-ECT and CE). The other codepoint used by | defined in RFC3168 (Not-ECT and CE). The other codepoint used by | |||
| RFC3168 (ECT(0)) isn't used for re-ECN. Altogether this leave one | RFC3168 (ECT(0)) isn't used for re-ECN. Altogether this leave one | |||
| codepoint of the eight unused by ECN or re-ECN and available for | codepoint of the eight unused by ECN or re-ECN and available for | |||
| future use. | future use. | |||
| +-------+------------+------+--------------+------------------------+ | +--------+-------------+-------+-----------+------------------------+ | |||
| | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | | ECN | RFC3168 | RE | EECN | re-ECN meaning | | |||
| | field | codepoint | flag | codepoint | | | | field | codepoint | flag | codepoint | | | |||
| +-------+------------+------+--------------+------------------------+ | +--------+-------------+-------+-----------+------------------------+ | |||
| | 00 | Not-ECT | 0 | Not-ECT | Not re-ECN-capable | | | 00 | Not-ECT | 0 | Not-ECT | Not re-ECN-capable | | |||
| | | | | | transport | | | | | | | transport (Legacy) | | |||
| | 00 | --- | 1 | FNE | Feedback not | | | 00 | --- | 1 | FNE | Feedback not | | |||
| | | | | | established | | | | | | | established (Cautious) | | |||
| | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | |||
| | | | | | and RECT | | | | | | | and RECT (Positive) | | |||
| | 01 | --- | 1 | RECT | Re-ECN capable | | | 01 | --- | 1 | RECT | Re-ECN capable | | |||
| | | | | | transport | | | | | | | transport (Neutral) | | |||
| | 10 | ECT(0) | 0 | ECT(0) | RFC3168 ECN use only | | | 10 | ECT(0) | 0 | ECT(0) | RFC3168 ECN use only | | |||
| | | | | | | | | | | | | | | |||
| | 10 | --- | 1 | --CU-- | Currently unused | | | 10 | --- | 1 | --CU-- | Currently unused | | |||
| | | | | | | | | | | | | | | |||
| | 11 | CE | 0 | CE(0) | Re-Echo canceled by | | | 11 | CE | 0 | CE(0) | Re-Echo cancelled by | | |||
| | | | | | congestion experienced | | | | | | | CE (Cancelled) | | |||
| | 11 | --- | 1 | CE(-1) | Congestion experienced | | | 11 | --- | 1 | CE(-1) | Congestion Experienced | | |||
| +-------+------------+------+--------------+------------------------+ | | | | | | (Negative) | | |||
| +--------+-------------+-------+-----------+------------------------+ | ||||
| Table 1: Extended ECN Codepoints | Table 1: Extended ECN Codepoints | |||
| 3.4. Re-ECN Protocol Operation | 4.3. Re-ECN Protocol Operation | |||
| In this section we will give an overview of the operation of the re- | In this section we will give an overview of the operation of the re- | |||
| ECN protocol for TCP/IP, leaving a detailed specification to the | ECN protocol for TCP/IP, leaving a detailed specification to the | |||
| following sections. Other transports will be discussed later. | following sections. Other transports will be discussed later. | |||
| In summary, the protocol adds a third `re-echo' stage to the existing | In summary, the protocol adds a third `re-echo' stage to the existing | |||
| TCP/IP ECN protocol. Whenever the network adds CE congestion | TCP/IP ECN protocol. Whenever the network adds CE congestion | |||
| signalling to the IP header on the forward data path, the receiver | signalling to the IP header on the forward data path, the receiver | |||
| feeds it back to the ingress using TCP, then the sender re-echoes it | feeds it back to the ingress using TCP, then the sender re-echoes it | |||
| into the forward data path using the RE flag in the next packet. | into the forward data path using the RE flag in the next packet. | |||
| Prior to receiving any feedback a sender will not know which setting | Prior to receiving any feedback a sender will not know which setting | |||
| of the RE flag to use, so it sets the feedback not established (FNE) | of the RE flag to use, so it sends Cautious packets by setting the | |||
| codepoint. The network reads the FNE codepoint conservatively as | FNE codepoint. The network reads the FNE codepoint conservatively as | |||
| equivalent to re-echoed congestion. | equivalent to re-echoed congestion. | |||
| Specifically, once feedback from a flow is established, a re-ECN | Specifically, once feedback from an ECN or re-ECN capable flow is | |||
| sender always initialises the ECN field to ECT(1). And it usually | established, a re-ECN sender always initialises the ECN field to | |||
| sets the RE flag to "1". Whenever a queue marks a packet to CE, the | ECT(1). And it usually sets the RE flag to "1" indicating a Neutral | |||
| receiver feeds back this event to the sender. On receiving this | packet. Whenever a queue marks a packet to CE, the receiver feeds | |||
| feedback, the re-ECN sender will clear the RE flag to "0" in the next | back this event to the sender. On receiving this feedback, the re- | |||
| packet it sends. | ECN sender will clear the RE flag to "0" in the next packet it sends | |||
| (indicating a Positive packet). | ||||
| We chose to set and clear the RE flag this way round to ease | We chose to set and clear the RE flag this way round to ease | |||
| incremental deployment (see Section 7.1). To avoid confusion we will | incremental deployment (see Section 7). To avoid confusion we will | |||
| use the term `blanking' (rather than marking) when the RE flag is | use the term `blanking' (rather than marking) when the RE flag is | |||
| cleared to "0". So, over a stream of packets, we will talk of the | cleared to "0". So, over a stream of packets, we will talk of the | |||
| `RE blanking fraction' as the fraction of octets in packets with the | `RE blanking fraction' as the fraction of octets in packets with the | |||
| RE flag cleared to "0". | RE flag cleared to "0". | |||
| +---+ +----+ +----+ +---+ | +---+ +----+ +----+ +---+ | |||
| | S |--| Q1 |----------------| Q2 |--| R | | | S |--| Q1 |----------------| Q2 |--| R | | |||
| +---+ +----+ +----+ +---+ | +---+ +----+ +----+ +---+ | |||
| . . . . | . . . . | |||
| ^ . . . . | ^ . . . . | |||
| skipping to change at page 14, line 5 | skipping to change at page 12, line 5 | |||
| horizontal line at 3% in the figure. The CE marked fraction is shown | horizontal line at 3% in the figure. The CE marked fraction is shown | |||
| by the stepped line which rises to meet the RE blanking fraction line | by the stepped line which rises to meet the RE blanking fraction line | |||
| with steps at at each queue where packets are marked. Two queues are | with steps at at each queue where packets are marked. Two queues are | |||
| shown (Q1 and Q2) that are currently congested. Each time packets | shown (Q1 and Q2) that are currently congested. Each time packets | |||
| pass through a fraction are marked; 1% at Q1 and 2% at Q2). The | pass through a fraction are marked; 1% at Q1 and 2% at Q2). The | |||
| approximate downstream congestion can be measured at the observation | approximate downstream congestion can be measured at the observation | |||
| points shown along the path by subtracting the CE marking fraction | points shown along the path by subtracting the CE marking fraction | |||
| from the RE blanking fraction, as shown in the table below | from the RE blanking fraction, as shown in the table below | |||
| (Appendix A derives these approximations from a precise analysis). | (Appendix A derives these approximations from a precise analysis). | |||
| +-------------------+------------------------------+ | NB due to the unary nature of ECN marking and the equivalent unary | |||
| | Observation point | Approx downstream congestion | | nature of re-ECN blanking, the precise fraction of marked bytes must | |||
| +-------------------+------------------------------+ | be calculated by maintaining a moving average of the number of | |||
| | L | 3% - 0% = 3% | | packets that have been marked as a proportion of the total number of | |||
| | M | 3% - 1% = 2% | | packets. | |||
| | N | 3% - 3% = 0% | | ||||
| +-------------------+------------------------------+ | ||||
| Table 2: Downstream Congestion Measured at Example Observation Points | ||||
| All along the path, whole-path congestion remains unchanged so it can | Along the path the fraction of packets that had their RE field | |||
| be used as a reference against which to compare upstream congestion. | cleared remains unchanged so it can be used as a reference against | |||
| The difference predicts downstream congestion for the rest of the | which to compare upstream congestion. The difference predicts | |||
| path. Therefore, measuring the fractions of each codepoint at any | downstream congestion for the rest of the path. Therefore, measuring | |||
| point in the Internet will reveal upstream, downstream and whole path | the fractions of each codepoint at any point in the Internet will | |||
| congestion. | reveal upstream, downstream and whole path congestion. | |||
| Note that we have introduced discussion of marking and blanking | Note that we have introduced discussion of marking and blanking | |||
| fractions solely for illustration. To be absolutely clear, for TCP | fractions solely for illustration. We are not saying any protocol | |||
| these fractions are averages that would result from the behaviour of | handler will work with these average fractions directly. In fact the | |||
| the protocol handler mechanically blanking outgoing packets in direct | protocol actually requires the number of marked and blanked bytes to | |||
| response to incoming feedback---we are not saying any protocol | balance by the time the packet reaches the receiver. | |||
| handler has to work with these average fractions directly. | ||||
| 3.5. Informal Terminology | ||||
| In the rest of this memo we will loosely talk of positive or negative | 4.4. Positive and Negative Flows | |||
| flows, meaning flows where the moving average of the downstream | ||||
| congestion metric is persistently positive or negative. A negative | ||||
| flow is one where more CE marked packets than re-ECN blanked packets | ||||
| arrive. Likewise in positive flows more re-ECN blanked packets | ||||
| arrive than CE marked packets. The notion of a negative metric | ||||
| arises because it is derived by subtracting one metric from another. | ||||
| Of course actual downstream congestion cannot be negative, only the | ||||
| metric can (whether due to time lags or deliberate malice). | ||||
| Just as we will loosely talk of positive and negative flows, we will | In Section 3 we introduced the terms Positive, Neutral, Negative, | |||
| also talk of positive or negative packets, meaning packets that | Cautious and Cancelled. This terminology is based on the requirement | |||
| contribute positively or negatively to the downstream congestion | to balance the proportion of bytes marked as CE with the proportion | |||
| metric. | of bytes that are re-echo marked. In the rest of this memo we will | |||
| loosely talk of positive or negative flows, meaning flows where the | ||||
| moving average of the downstream congestion metric is persistently | ||||
| positive or negative. A negative flow is one where more CE marked | ||||
| packets than re-ECN blanked packets arrive. Likewise in positive | ||||
| flows more re-ECN blanked packets arrive than CE marked packets. The | ||||
| notion of a negative metric arises because it is derived by | ||||
| subtracting one metric from another. Of course actual downstream | ||||
| congestion cannot be negative, only the metric can (whether due to | ||||
| time lags or deliberate malice). | ||||
| Therefore we will talk of packets having `worth' of +1, 0 or -1, | Therefore we will talk of packets having `worth' of +1, 0 or -1, | |||
| which, when multiplied by their size, indicates their contribution to | which, when multiplied by their size, indicates their contribution to | |||
| the downstream congestion metric. | the downstream congestion metric. The worth of each type of packet | |||
| is given below in Table 2. The idea is that most flows start with | ||||
| The idea is that most packets start with zero worth. Every time the | zero worth. Every time the network decrements the worth of a packet, | |||
| network decrements the worth of a packet, the sender increments the | the sender increments the worth of a later packet. Then, over time, | |||
| worth of a later packet. Then, over time, as many positive octets | as many positive octets should arrive at the receiver as negative. | |||
| should arrive at the receiver as negative. Note we have said octets | Note we have said octets not packets, so if packets are of different | |||
| not packets, so if packets are of different sizes, the worth should | sizes, the worth should be incremented on enough octets to balance | |||
| be incremented on enough octets to balance the octets in negative | the octets in negative packets arriving at the receiver. It is this | |||
| packets arriving at the receiver. It is this balance that will allow | balance that will allow the network to hold the sender accountable | |||
| the network to hold the sender accountable for the congestion it | for the congestion it causes. | |||
| causes. | ||||
| If a packet carrying re-echoed congestion happens to also be | If a packet carrying re-echoed congestion happens to also be | |||
| congestion marked, the +1 worth added by the sender will be cancelled | congestion marked, the +1 worth added by the sender will be cancelled | |||
| out by the -1 network congestion marking. Although the two worth | out by the -1 network congestion marking. Although the two worth | |||
| values correctly cancel out, neither the congestion marking nor the | values correctly cancel out, neither the congestion marking nor the | |||
| re-echoed congestion are lost, because the RE bit and the ECN field | re-echoed congestion are lost, because the RE bit and the ECN field | |||
| are orthogonal. So, whenever this happens, the receiver will | are orthogonal. So, whenever this happens, the receiver will | |||
| correctly detect and re-echo the new congestion event as well. | correctly detect and re-echo the new congestion event as well. | |||
| The table below specifies unambiguously the worth of each extended | The table below specifies unambiguously the worth of each extended | |||
| ECN codepoint. Note the order is different from the previous table | ECN codepoint. Note the order is different from the previous table | |||
| to better show how the worth increments and decrements. The FNE | to better show how the worth increments and decrements. | |||
| codepoint is used in the flow bootstrap process (explained later) and | ||||
| has the same positive (+1) worth as a packet with the Re-Echo | ||||
| codepoint. | ||||
| +--------+------+----------------+-------+--------------------------+ | +---------+-------+---------------+-------+-------------------------+ | |||
| | ECN | RE | Extended ECN | Worth | Re-ECN meaning | | | ECN | RE | Extended ECN | Worth | Re-ECN Term | | |||
| | field | bit | codepoint | | | | | field | bit | codepoint | | | | |||
| +--------+------+----------------+-------+--------------------------+ | +---------+-------+---------------+-------+-------------------------+ | |||
| | 00 | 0 | Not-RECT | ... | Not re-ECN-capable | | | 00 | 0 | Not-RECT | ... | --- | | |||
| | | | | | transport | | | 00 | 1 | FNE | +1 | Cautious | | |||
| | 00 | 1 | FNE | +1 | Feedback not established | | | 01 | 0 | Re-Echo | +1 | Positive | | |||
| | 01 | 0 | Re-Echo | +1 | Re-echoed congestion and | | | 10 | 0 | Legacy | ... | RFC3168 ECN use only | | |||
| | | | | | RECT | | | | | | | | | |||
| | 10 | 0 | --- | ... | RFC3168 ECN use only | | | 11 | 0 | CE(0) | 0 | Negative | | |||
| | 11 | 0 | CE(0) | 0 | Re-Echo canceled by | | | 01 | 1 | RECT | 0 | Neutral | | |||
| | | | | | congestion experienced | | ||||
| | 01 | 1 | RECT | 0 | Re-ECN capable transport | | ||||
| | 10 | 1 | --CU-- | ... | Currently unused | | | 10 | 1 | --CU-- | ... | Currently unused | | |||
| | | | | | | | | | | | | | | |||
| | 11 | 1 | CE(-1) | -1 | Congestion experienced | | | 11 | 1 | CE(-1) | -1 | Negative | | |||
| +--------+------+----------------+-------+--------------------------+ | +---------+-------+---------------+-------+-------------------------+ | |||
| Table 3: 'Worth' of Extended ECN Codepoints | Table 2: 'Worth' of Extended ECN Codepoints | |||
| 4. Transport Layers | 5. Network Layer | |||
| 4.1. TCP | 5.1. Re-ECN IPv4 Wire Protocol | |||
| The wire protocol of the ECN field in the IP header remains largely | ||||
| unchanged from [RFC3168]. However, an extension to the ECN field we | ||||
| call the RE (Re-ECN extension) flag (Section 4.2) is defined in this | ||||
| document. It doubles the extended ECN codepoint space, giving 8 | ||||
| potential codepoints. The semantics of the extra codepoints are | ||||
| backward compatible with the semantics of the 4 original codepoints | ||||
| [RFC3168] (Section 7 collects together and summarises all the changes | ||||
| defined in this document). | ||||
| For IPv4, this document proposes that the new RE control flag will be | ||||
| positioned where the `reserved' control flag was at bit 48 of the | ||||
| IPv4 header (counting from 0). Alternatively, some would call this | ||||
| bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4 | ||||
| header (Figure 3). | ||||
| 0 1 2 | ||||
| +---+---+---+ | ||||
| | R | D | M | | ||||
| | E | F | F | | ||||
| +---+---+---+ | ||||
| Figure 3: New Definition of the Re-ECN Extension (RE) Control Flag at | ||||
| the Start of Byte 7 of the IPv4 Header | ||||
| The semantics of the RE flag are described in outline in Section 4 | ||||
| and specified fully in Section 6. The RE flag is always considered | ||||
| in conjunction with the 2-bit ECN field, as if they were concatenated | ||||
| together to form a 3-bit extended ECN field. If the ECN field is set | ||||
| to either the ECT(1) or CE codepoint, when the RE flag is blanked | ||||
| (cleared to "0") it represents a re-echo of congestion experienced by | ||||
| an early packet. If the ECN field is set to the Not-ECT codepoint, | ||||
| when the RE flag is set to "1" it represents the feedback not | ||||
| established (FNE) codepoint, which signals that the packet was sent | ||||
| without the benefit of congestion feedback. | ||||
| It is believed that the FNE codepoint can simultaneously serve other | ||||
| purposes, particularly where the start of a flow needs distinguishing | ||||
| from packets later in the flow. For instance it would have been | ||||
| useful to identify new flows for tag switching and might enable | ||||
| similar developments in the future if it were adopted. It is similar | ||||
| to the state set-up bit idea designed to protect against memory | ||||
| exhaustion attacks. This idea was proposed informally by David Clark | ||||
| and documented by Handley and Greenhalgh [Steps_DoS]. The FNE | ||||
| codepoint can be thought of as a `soft-state set-up flag', because it | ||||
| is idempotent (i.e. one occurrence of the flag is sufficient but | ||||
| further occurrences achieve the same effect if previous ones were | ||||
| lost). | ||||
| We are sure there will probably be other claims pending on the use of | ||||
| bit 48. We know of at least two [ARI05], [RFC3514] but neither have | ||||
| been pursued in the IETF, so far, although the present proposal would | ||||
| meet the needs of the latter. | ||||
| The security flag proposal (commonly known as the evil bit) was | ||||
| published on 1 April 2003 as Informational RFC 3514, but it was not | ||||
| adopted due to confusion over whether evil-doers might set it | ||||
| inappropriately. The present proposal is backward compatible with | ||||
| RFC3514 because if re-ECN compliant senders were benign they would | ||||
| correctly clear the evil bit to honestly declare that they had just | ||||
| received congestion feedback. Whereas evil-doers would hide | ||||
| congestion feedback by setting the evil bit continuously, or at least | ||||
| more often than they should. So, evil senders can be identified, | ||||
| because they declare that they are good less often than they should. | ||||
| 5.2. Re-ECN IPv6 Wire Protocol | ||||
| For IPv6, this document proposes that the new RE control flag will be | ||||
| positioned as the first bit of the option field of a new Congestion | ||||
| hop by hop option header (Figure 4). | ||||
| 0 1 2 3 | ||||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Next Header | Hdr ext Len | Option Type | Opt Length =4 | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| |R| Reserved for future use | | ||||
| |E| | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| Figure 4: Definition of a New IPv6 Congestion Hop by Hop Option | ||||
| Header containing the re-ECN Extension (RE) Control Flag | ||||
| 0 1 2 3 4 5 6 7 8 | ||||
| +-+-+-+-+-+-+-+-+- | ||||
| |AIU|C|Option ID| | ||||
| +-+-+-+-+-+-+-+-+- | ||||
| Figure 5: Congestion Hop by Hop Option Type Encoding | ||||
| The Hop-by-Hop Options header enables packets to carry information to | ||||
| be examined and processed by routers or nodes along the packet's | ||||
| delivery path, including the source and destination nodes. For re- | ||||
| ECN, the two bits of the Action If Unrecognized (AIU) flag of the | ||||
| Congestion extension header MUST be set to "00" meaning if | ||||
| unrecognized `skip over option and continue processing the header'. | ||||
| Then, any routers or a receiver not upgraded with the optional re-ECN | ||||
| features described in this memo will simply ignore this header. But | ||||
| routers with these optional re-ECN features or a re-ECN policing | ||||
| function, will process this Congestion extension header. | ||||
| The `C' flag MUST be set to "1" to specify that the Option Data | ||||
| (currently only the RE control flag) can change en-route to the | ||||
| packet's final destination. This ensures that, when an | ||||
| Authentication header (AH [RFC4302]) is present in the packet, for | ||||
| any option whose data may change en-route, its entire Option Data | ||||
| field will be treated as zero-valued octets when computing or | ||||
| verifying the packet's authenticating value. | ||||
| Although the RE control flag should not be changed along the path, we | ||||
| expect that the rest of this option field that is currently `Reserved | ||||
| for future use' could be used for a multi-bit congestion notification | ||||
| field which we would expect to change en route. As the RE flag does | ||||
| not need end-to-end authentication, we set the C flag to '1'. | ||||
| {ToDo: A Congestion Hop by Hop Option ID will need to be registered | ||||
| with IANA.} | ||||
| 5.3. Router Forwarding Behaviour | ||||
| Re-ECN works well without modifying the forwarding behaviour of any | ||||
| routers. However, below, two OPTIONAL changes to forwarding | ||||
| behaviour are defined which respectively enhance performance and | ||||
| improve a router's discrimination against flooding attacks. They are | ||||
| both OPTIONAL additions that we propose MAY apply by default to all | ||||
| Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN | ||||
| marking behaviours [RFC3168]. Specifications for PHBs MAY define | ||||
| different forwarding behaviours from this default, but this is not | ||||
| required. [Re-PCN] is one example. | ||||
| FNE indicates ECT: | ||||
| The FNE codepoint tells a router to assume that the packet was | ||||
| sent by an ECN-capable transport (see Section 5.4). Therefore an | ||||
| FNE packet MAY be marked rather than dropped. Note that the FNE | ||||
| codepoint has been intentionally chosen so that, to RFC3168 | ||||
| compliant routers (which do not inspect the RE flag) an FNE packet | ||||
| appears to be Not-ECT so it will be dropped by legacy AQM | ||||
| algorithms. | ||||
| A network operator MUST NOT configure a queue to ECN mark rather | ||||
| than drop FNE packets unless it can guarantee that FNE packets | ||||
| will be rate limited, either locally or upstream. The ingress | ||||
| policers discussed in [re-ecn-motive] would count as rate limiters | ||||
| for this purpose. | ||||
| Preferential Drop: If a re-ECN capable router queue experiences very | ||||
| high load so that it has to drop arriving packets (e.g. a DoS | ||||
| attack), it MAY preferentially drop packets within the same | ||||
| Diffserv PHB using the preference order for extended ECN | ||||
| codepoints given in Table 3. Preferential dropping can be | ||||
| difficult to implement on some hardware, but if feasible it would | ||||
| discriminate against attack traffic if done as part of the overall | ||||
| policing framework of [re-ecn-motive]. If nowhere else, routers | ||||
| at the egress of a network SHOULD implement preferential drop | ||||
| (stronger than the MAY above). For simplicity, preferences 4 & 5 | ||||
| MAY be merged into one preference level. | ||||
| +-------+-----+------------+-------+------------+-------------------+ | ||||
| | ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | ||||
| | field | bit | ECN | | (1 = drop | | | ||||
| | | | codepoint | | 1st) | | | ||||
| +-------+-----+------------+-------+------------+-------------------+ | ||||
| | 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | ||||
| | | | | | | congestion and | | ||||
| | | | | | | RECT | | ||||
| | 00 | 1 | FNE | +1 | 4 | Feedback not | | ||||
| | | | | | | established | | ||||
| | 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | ||||
| | | | | | | by congestion | | ||||
| | | | | | | experienced | | ||||
| | 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | ||||
| | | | | | | transport | | ||||
| | 11 | 1 | CE(-1) | -1 | 3 | Congestion | | ||||
| | | | | | | experienced | | ||||
| | 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | ||||
| | 10 | 0 | --- | n/a | 2 | RFC3168 ECN use | | ||||
| | | | | | | only | | ||||
| | 00 | 0 | Not-RECT | n/a | 1 | Not | | ||||
| | | | | | | Re-ECN-capable | | ||||
| | | | | | | transport | | ||||
| +-------+-----+------------+-------+------------+-------------------+ | ||||
| Table 3: Drop Preference of EECN Codepoints (Sorted by `Worth') | ||||
| The above drop preferences are arranged to preserve packets with | ||||
| more positive worth (Section 4.4), given senders of positive | ||||
| packets must have honestly declared downstream congestion. A full | ||||
| treatment of this is provided in the companion document desribing | ||||
| the motivation and architecture for re-ECN [re-ecn-motive] | ||||
| particularly when the application of re-ECN to protect against | ||||
| DDoS attacks is described. | ||||
| 5.4. Justification for Setting the First SYN to FNE | ||||
| the initial SYN MUST be set to FNE by Re-ECT client A (Section 6.1.4) | ||||
| and (Section 5.3) says a queue MAY optionally treat an FNE packet as | ||||
| ECN capable, so an initial SYN may be marked CE(-1) rather than | ||||
| dropped. This seems dangerous, because the sender has not yet | ||||
| established whether the receiver is a RFC3168 one that does not | ||||
| understand congestion marking. It also seems to allow malicious | ||||
| senders to take advantage of ECN marking to avoid so much drop when | ||||
| launching SYN flooding attacks. Below we explain the features of the | ||||
| protocol design that remove both these dangers. | ||||
| ECN-capable initial SYN with a Not-ECT server: If the TCP server B | ||||
| is re-ECN capable, provision is made for it to feedback a possible | ||||
| congestion marked SYN in the SYN ACK (Section 6.1.4). But if the | ||||
| TCP client A finds out from the SYN ACK that the server was not | ||||
| ECN-capable, the TCP client MUST conservatively consider the first | ||||
| SYN as congestion marked before setting itself into Not-ECT mode. | ||||
| Section 6.1.4 mandates that such a TCP client MUST also set its | ||||
| initial window to 1 segment. In this way we remove the need to | ||||
| cautiously avoid setting the first SYN to Not-RECT. This will | ||||
| give worse performance while deployment is patchy, but better | ||||
| performance once deployment is widespread. | ||||
| SYN flooding attacks can't exploit ECN-capability: Malicious hosts | ||||
| may think they can use the advantage that ECN-marking gives over | ||||
| drop in launching classic SYN-flood attacks. But Section 5.3 | ||||
| mandates that a router MUST only be configured to treat packets | ||||
| with the FNE codepoint as ECN-capable if FNE packets are rate | ||||
| limited somewhere. Introduction of the FNE codepoint was a | ||||
| deliberate move to enable transport-neutral handling of flow-start | ||||
| and flow state set-up in the IP layer where it belongs. It then | ||||
| becomes possible to protect against flooding attacks of all forms | ||||
| (not just SYN flooding) without transport-specific inspection for | ||||
| things like the SYN flag in TCP headers. Then, for instance, SYN | ||||
| flooding attacks using IPSec ESP encryption can also be rate | ||||
| limited at the IP layer. | ||||
| It might seem pedantic going to all this trouble to enable ECN on the | ||||
| initial packet of a flow, but it is motivated by a much wider concern | ||||
| to ensure safe congestion control will still be possible even if the | ||||
| application mix evolves to the point where the majority of flows | ||||
| consist of a single window or even a single packet. It also allows | ||||
| denial of service attacks to be more easily isolated and prevented. | ||||
| 5.5. Control and Management | ||||
| 5.5.1. Negative Balance Warning | ||||
| A new ICMP message type is being considered so that a dropper can | ||||
| warn the apparent sender of a flow that it has started to sanction | ||||
| the flow. The message would have similar semantics to the `Time | ||||
| exceeded' ICMP message type. To ensure the sender has to invest some | ||||
| work before the network will generate such a message, a dropper | ||||
| SHOULD only send such a message for flows that have demonstrated that | ||||
| they have started correctly by establishing a positive record, but | ||||
| have later gone negative. The threshold is up to the implementation. | ||||
| The purpose of the message is to deconfuse the cause of drops from | ||||
| other causes, such as congestion or transmission losses. The dropper | ||||
| would send the message to the sender of the flow, not the receiver. | ||||
| If we did define this message type, it would be REQUIRED for all re- | ||||
| ECT senders to parse and understand it. Note that a sender MUST only | ||||
| use this message to explain why losses are occurring. A sender MUST | ||||
| NOT take this message to mean that losses have occurred that it was | ||||
| not aware of. Otherwise, spoof messages could be sent by malicious | ||||
| sources to slow down a sender (c.f. ICMP source quench). | ||||
| However, the need for this message type is not yet confirmed, as we | ||||
| are considering how to prevent it being used by malicious senders to | ||||
| scan for droppers and to test their threshold settings. {ToDo: | ||||
| Complete this section.} | ||||
| 5.5.2. Rate Response Control | ||||
| As discussed in [re-ecn-motive] the sender's access operator will be | ||||
| expected to use bulk per-user policing, but they might choose to | ||||
| introduce a per-flow policer. In cases where operators do introduce | ||||
| per-flow policing, there may be a need for a sender to send a request | ||||
| to the ingress policer asking for permission to apply a non-default | ||||
| response to congestion (where TCP-friendly is assumed to be the | ||||
| default). This would require the sender to know what message | ||||
| format(s) to use and to be able to discover how to address the | ||||
| policer. The required control protocol(s) are outside the scope of | ||||
| this document, but will require definition elsewhere. | ||||
| The policer is likely to be local to the sender and inline, probably | ||||
| at the ingress interface to the internetwork. So, discovery should | ||||
| not be hard. A variety of control protocols already exist for some | ||||
| widely used rate-responses to congestion. For instance DCCP | ||||
| congestion control identifiers (CCIDs [RFC4340]) fulfil this role and | ||||
| so does QoS signalling (e.g. and RSVP request for controlled load | ||||
| service is equivalent to a request for no rate response to | ||||
| congestion, but with admission control). | ||||
| 5.6. IP in IP Tunnels | ||||
| For re-ECN to work correctly through IP in IP tunnels, it needs | ||||
| slightly different tunnel handling to regular ECN [RFC3168]. | ||||
| Currently there is some incosistency between how the handling of IP | ||||
| in IP tunnels is defined in [RFC3168] and how it is defined in | ||||
| [RFC4301], but re-ECN would work fine with the IPsec behaviour. This | ||||
| inconsistency is addressed in a new Internet Draft [ECN-tunnel] that | ||||
| proposes to update RFC3168 tunnel behaviour to bring it into line | ||||
| with IPsec. Ideally, for re-ECN to work through a tunnel, the tunnel | ||||
| entry should copy both the RE flag and the ECN field from the inner | ||||
| to the outer IP header. Then at the tunnel exit, any congestion | ||||
| marking of the outer ECN field should overwrite the inner ECN field | ||||
| (unless the inner field is Not-ECT in which case an alarm should be | ||||
| raised). The RE flag shouldn't change along a path, so the outer RE | ||||
| flag should be the same as the inner. If it isn't a management alarm | ||||
| should be raised. This behaviour is the same as the full- | ||||
| functionality variant of [RFC3168] at tunnel exit, but different at | ||||
| tunnel entry. | ||||
| If tunnels are left as they are specified in [RFC3168], whether the | ||||
| limited or full-functionality variants are used, a problem arises | ||||
| with re-ECN if a tunnel crosses an inter-domain boundary, because the | ||||
| difference between positive and negative markings will not be | ||||
| correctly accounted for. In a limited functionality ECN tunnel, the | ||||
| flow will appear to be RFC3168 compliant traffic, and therefore may | ||||
| be wrongly rate limited. In a full-functionality ECN tunnel, the | ||||
| result will depend whether the tunnel entry copies the inner RE flag | ||||
| to the outer header or the RE flag in the outer header is always | ||||
| cleared. If the former, the flow will tend to be too positive when | ||||
| accounted for at borders. If the latter, it will be too negative. | ||||
| If the rules set out in [ECN-tunnel] are followed then this will not | ||||
| be an issue. | ||||
| 5.7. Non-Issues | ||||
| The following issues might seem to cause unfavourable interactions | ||||
| with re-ECN, but we will explain why they don't: | ||||
| o Various link layers support explicit congestion notification, such | ||||
| as Frame Relay and ATM. Explicit congestion notification is | ||||
| proposed to be added to other link layers, such as Ethernet | ||||
| (802.3ar Ethernet congestion management) and MPLS [RFC5129]; | ||||
| o Encryption and IPSec. | ||||
| In the case of congestion notification at the link layer, each | ||||
| particular link layer scheme either manages congestion on the link | ||||
| with its own link-level feedback (the usual arrangement in the cases | ||||
| of ATM and Frame Relay), or congestion notification from the link | ||||
| layer is merged into congestion notification at the IP level when the | ||||
| frame headers are decapsulated at the end of the link (the | ||||
| recommended arrangement in the Ethernet and MPLS cases). Given the | ||||
| RE flag is not intended to change along the path, this means that | ||||
| downstream congestion will still be measureable at any point where IP | ||||
| is processed on the path by subtracting positive from negative | ||||
| markings. | ||||
| In the case of encryption, as long as the tunnel issues described in | ||||
| Section 5.6 are dealt with, payload encryption itself will not be a | ||||
| problem. The design goal of re-ECN is to include downstream | ||||
| congestion in the IP header so that it is not necessary to bury into | ||||
| inner headers. Obfuscation of flow identifiers is not a problem for | ||||
| re-ECN policing elements. Re-ECN doesn't ever require flow | ||||
| identifiers to be valid, it only requires them to be unique. So if | ||||
| an IPSec encapsulating security payload (ESP [RFC4305]) or an | ||||
| authentication header (AH [RFC4302]) is used, the security parameters | ||||
| index (SPI) will be a sufficient flow identifier, as it is intended | ||||
| to be unique to a flow without revealing actual port numbers. | ||||
| In general, even if endpoints use some locally agreed scheme to hide | ||||
| port numbers, re-ECN policing elements can just consider the pair of | ||||
| source and destination IP addresses as the flow identifier. Re-ECN | ||||
| encourages endpoints to at least tell the network layer that a | ||||
| sequence of packets are all part of the same flow, if indeed they | ||||
| are. The alternative would be for the sender to make each packet | ||||
| appear to be a new flow, which would require them all to be marked | ||||
| FNE in order to avoid being treated with the bulk of malicious flows | ||||
| at the egress dropper. Given the FNE marking is worth +1 and | ||||
| networks are likely to rate limit FNE packets, endpoints are given an | ||||
| incentive not to set FNE on each packet. But if the sender really | ||||
| does want to hide the flow relationship between packets it can choose | ||||
| to pay the cost of multiple FNE packets, which in the long run will | ||||
| compensate for the extra memory required on network policing elements | ||||
| to process each flow. | ||||
| 6. Transport Layers | ||||
| 6.1. TCP | ||||
| Re-ECN capability at the sender is essential. At the receiver it is | Re-ECN capability at the sender is essential. At the receiver it is | |||
| optional, as long as the receiver has a basic RFC3168-compliant ECN- | optional, as long as the receiver has a basic RFC3168-compliant ECN- | |||
| capable transport (ECT) [RFC3168]. Given re-ECN is not the first | capable transport (ECT) [RFC3168]. Given re-ECN is not the first | |||
| attempt to define the semantics of the ECN field, we give a table | attempt to define the semantics of the ECN field, we give a table | |||
| below summarising what happens for various combinations of | below summarising what happens for various combinations of | |||
| capabilities of the sender S and receiver R, as indicated in the | capabilities of the sender S and receiver R, as indicated in the | |||
| first four columns below. The last column gives the mode a half- | first four columns below. The last column gives the mode a half- | |||
| connection should be in after the first two of the three TCP | connection should be in after the first two of the three TCP | |||
| handshakes. | handshakes. | |||
| skipping to change at page 17, line 5 | skipping to change at page 22, line 40 | |||
| at least one of the transports does not understand even basic ECN | at least one of the transports does not understand even basic ECN | |||
| marking. | marking. | |||
| Note that we use the term Re-ECT for a host transport that is re-ECN- | Note that we use the term Re-ECT for a host transport that is re-ECN- | |||
| capable but RECN for the modes of the half connections between hosts | capable but RECN for the modes of the half connections between hosts | |||
| when they are both Re-ECT. If a host transport is Re-ECT, this fact | when they are both Re-ECT. If a host transport is Re-ECT, this fact | |||
| alone does NOT imply either of its half connections will necessarily | alone does NOT imply either of its half connections will necessarily | |||
| be in RECN mode, at least not until it has confirmed that the other | be in RECN mode, at least not until it has confirmed that the other | |||
| host is Re-ECT. | host is Re-ECT. | |||
| 4.1.1. RECN mode: Full Re-ECN capable transport | 6.1.1. RECN mode: Full Re-ECN capable transport | |||
| In full RECN mode, for each half connection, both the sender and the | In full RECN mode, for each half connection, both the sender and the | |||
| receiver each maintain an unsigned integer counter we will call ECC | receiver each maintain an unsigned integer counter we will call ECC | |||
| (echo congestion counter). The receiver maintains a count of how | (echo congestion counter). The receiver maintains a count of how | |||
| many times a CE marked packet has arrived during the half-connection. | many times a CE marked packet has arrived during the half-connection. | |||
| Once a RECN connection is established, the three TCP option flags | Once a RECN connection is established, the three TCP option flags | |||
| (ECE, CWR & NS) used for ECN-related functions in other versions of | (ECE, CWR & NS) used for ECN-related functions in other versions of | |||
| ECN are used as a 3-bit field for the receiver to repeatedly tell the | ECN are used as a 3-bit field for the receiver to repeatedly tell the | |||
| sender the current value of ECC, modulo 8, whenever it sends a TCP | sender the current value of ECC, modulo 8, whenever it sends a TCP | |||
| ACK. We will call this the echo congestion increment (ECI) field. | ACK. We will call this the echo congestion increment (ECI) field. | |||
| This overloaded use of these 3 option flags as one 3-bit ECI field is | This overloaded use of these 3 option flags as one 3-bit ECI field is | |||
| shown in Figure 4. The actual definition of the TCP header, | shown in Figure 7. The actual definition of the TCP header, | |||
| including the addition of support for the ECN nonce, is shown for | including the addition of support for the ECN nonce, is shown for | |||
| comparison in Figure 3. This specification does not redefine the | comparison in Figure 6. This specification does not redefine the | |||
| names of these three TCP option flags, it merely overloads them with | names of these three TCP option flags, it merely overloads them with | |||
| another definition once a flow is established. | another definition once a flow is established. | |||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | | N | C | E | U | A | P | R | S | F | | | | | N | C | E | U | A | P | R | S | F | | |||
| | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | |||
| | | | | R | E | G | K | H | T | N | N | | | | | | R | E | G | K | H | T | N | N | | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| Figure 3: The (post-ECN Nonce) definition of bytes 13 and 14 of the | Figure 6: The (post-ECN Nonce) definition of bytes 13 and 14 of the | |||
| TCP Header | TCP Header | |||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | | | U | A | P | R | S | F | | | | | | U | A | P | R | S | F | | |||
| | Header Length | Reserved | ECI | R | C | S | S | Y | I | | | Header Length | Reserved | ECI | R | C | S | S | Y | I | | |||
| | | | | G | K | H | T | N | N | | | | | | G | K | H | T | N | N | | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| Figure 4: Definition of the ECI field within bytes 13 and 14 of the | Figure 7: Definition of the ECI field within bytes 13 and 14 of the | |||
| TCP Header, overloading the current definitions above for established | TCP Header, overloading the current definitions above for established | |||
| RECN flows. | RECN flows. | |||
| Receiver Action in RECN Mode | Receiver Action in RECN Mode | |||
| Every time a CE marked packet arrives at a receiver in RECN mode, | Every time a CE marked packet arrives at a receiver in RECN mode, | |||
| the receiver transport increments its local value of ECC and MUST | the receiver transport increments its local value of ECC and MUST | |||
| echo its value, modulo 8, to the sender in the ECI field of the | echo its value, modulo 8, to the sender in the ECI field of the | |||
| next ACK. It MUST repeat the same value of ECI in every | next ACK. It MUST repeat the same value of ECI in every | |||
| subsequent ACK until the next CE event, when it increments ECI | subsequent ACK until the next CE event, when it increments ECI | |||
| skipping to change at page 18, line 30 | skipping to change at page 24, line 22 | |||
| below for the sender's safety strategy). Whenever the ECI field | below for the sender's safety strategy). Whenever the ECI field | |||
| increments by D (and/or d drops are detected), the sender MUST | increments by D (and/or d drops are detected), the sender MUST | |||
| clear the RE flag to "0" in the IP header of the next D' data | clear the RE flag to "0" in the IP header of the next D' data | |||
| packets it sends (where D' = D + d), effectively re-echoing each | packets it sends (where D' = D + d), effectively re-echoing each | |||
| single increment of ECI. Otherwise the data sender MUST send all | single increment of ECI. Otherwise the data sender MUST send all | |||
| data packets with RE set to "1". | data packets with RE set to "1". | |||
| As a general rule, once a flow is established, as well as setting | As a general rule, once a flow is established, as well as setting | |||
| or clearing the RE flag as above, a data sender in RECN mode MUST | or clearing the RE flag as above, a data sender in RECN mode MUST | |||
| always set the ECN field to ECT(1). However, the settings of the | always set the ECN field to ECT(1). However, the settings of the | |||
| extended ECN field during flow start are defined in Section 4.1.4. | extended ECN field during flow start are defined in Section 6.1.4. | |||
| As we have already emphasised, the re-ECN protocol makes no | As we have already emphasised, the re-ECN protocol makes no | |||
| changes and has no effect on the TCP congestion control algorithm. | changes and has no effect on the TCP congestion control algorithm. | |||
| So, the first increment of ECI (or detection of a drop) in a RTT | So, the first increment of ECI (or detection of a drop) in a RTT | |||
| triggers the standard TCP congestion response, no more than one | triggers the standard TCP congestion response, no more than one | |||
| congestion response per round trip, as usual. However, the sender | congestion response per round trip, as usual. However, the sender | |||
| re-echoes every increment of ECI irrespective of RTTs. | re-echoes every increment of ECI irrespective of RTTs. | |||
| A TCP sender also acts as the receiver for the other half- | A TCP sender also acts as the receiver for the other half- | |||
| connection. The host will maintain two ECC values S.ECC and R.ECC | connection. The host will maintain two ECC values S.ECC and R.ECC | |||
| as sender and receiver respectively. Every TCP header sent by a | as sender and receiver respectively. Every TCP header sent by a | |||
| host in RECN mode will also repeat the prevailing value of R.ECC | host in RECN mode will also repeat the prevailing value of R.ECC | |||
| in its ECI field. If a sender in RECN mode has to retransmit a | in its ECI field. If a sender in RECN mode has to retransmit a | |||
| packet due to a suspected loss, the re-transmitted packet MUST | packet due to a suspected loss, the re-transmitted packet MUST | |||
| carry the latest prevailing value of R.ECC when it is re- | carry the latest prevailing value of R.ECC when it is re- | |||
| transmitted, which will not necessarily be the one it carried | transmitted, which will not necessarily be the one it carried | |||
| originally. | originally. | |||
| 4.1.1.1. Drops and Marks | 6.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168 compliant ECN | |||
| Re-ECN is based on the ECN protocol [RFC3168] . In turn the | ||||
| congestion markings ECN uses are typically based on the RED | ||||
| algorithm [RFC2309]. This algorithm marks packets as CE with a | ||||
| probability that increases as the size of the router queue increases. | ||||
| However, if the queue becomes too full then it will revert to | ||||
| dropping packets. Because of this it is important that a re-ECN | ||||
| sender treats each packet drop it detects as if it were actually a CE | ||||
| mark. This ensures that it can continue to correctly echo congestion | ||||
| even through a highly congested path. | ||||
| In order to ensure that drops are correctly echoed the sender needs | ||||
| to add the number of drops detected per RTT to the difference in ECI | ||||
| value waiting to be echoed. Drop detection is defined as set out in | ||||
| [RFC2581] -- if the connection is in slow start then a single | ||||
| duplicate aknowledgement will be treated as an indication of a drop. | ||||
| When the system is in the congestion avoidance stage then 3 duplicate | ||||
| acknowledgements will be treated as a sign of a drop. In all cases, | ||||
| if a re-transmission time-out occurs then that will be treatd as a | ||||
| drop. | ||||
| 4.1.1.2. Safety against Long Pure ACK Loss Sequences | ||||
| The ECI method was chosen for echoing congestion marking because a | ||||
| re-ECN sender needs to know about every CE mark arriving at the | ||||
| receiver, not just whether at least one arrives within a round trip | ||||
| time (which is all the ECE/CWR mechanism supported). And, as pure | ||||
| ACKs are not protected by TCP reliable delivery, we repeat the same | ||||
| ECI value in every ACK until it changes. Even if many ACKs in a row | ||||
| are lost, as soon as one gets through, the ECI field it repeats from | ||||
| previous ACKs that didn't get through will update the sender on how | ||||
| many CE marks arrived since the last ACK got through. | ||||
| The sender will only lose a record of the arrival of a CE mark if all | ||||
| the ACKS are lost (and all of them were pure ACKs) for a stream of | ||||
| data long enough to contain 8 or more CE marks. So, if the marking | ||||
| fraction was p, at least 8/p pure ACKs would have to be lost. For | ||||
| example, if p was 5%, a sequence of 160 pure ACKs would all have to | ||||
| be lost. To protect against such extremely unlikely events, if a re- | ||||
| ECN sender detects a sequence of pure ACKs has been lost it SHOULD | ||||
| assume the ECI field wrapped as many times as possible within the | ||||
| sequence. | ||||
| Specifically, if a re-ECN sender receives an ACK with an | ||||
| acknowledgement number that acknowledges L segments since the | ||||
| previous ACK but with a sequence number unchanged from the previously | ||||
| received ACK, it SHOULD conservatively assume that the ECI field | ||||
| incremented by D' = L - ((L-D) mod 8), where D is the apparent | ||||
| increase in the ECI field. For example if the ACK arriving after 9 | ||||
| pure ACK losses apparently increased ECI by 2, the assumed increment | ||||
| of ECI would still be 2. But if ECI apparently increased by 2 after | ||||
| 11 pure ACK losses, ECI should be assumed to have increased by 10. | ||||
| A re-ECN sender MAY implement a heuristic algorithm to predict beyond | ||||
| reasonable doubt that the ECI field probably did not wrap within a | ||||
| sequence of lost pure ACKs. But such an algorithm is OPTIONAL. Such | ||||
| an algorithm MUST NOT be used unless it is proven to work even in the | ||||
| presence of correlation between high ACK loss rate on the back | ||||
| channel and high CE marking rate on the forward channel. | ||||
| Whatever assumption a re-ECN sender makes about potentially lost CE | ||||
| marks, both its congestion control and its re-echoing behaviour | ||||
| SHOULD be consistent with the assumption it makes. | ||||
| 4.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168 compliant ECN | ||||
| Receiver | Receiver | |||
| If the half-connection is in RECN-Co mode, ECN feedback proceeds no | If the half-connection is in RECN-Co mode, ECN feedback proceeds no | |||
| differently to that of RFC3168 compliant ECN. In other words, the | differently to that of RFC3168 compliant ECN. In other words, the | |||
| receiver sets the ECE flag repeatedly in the TCP header and the | receiver sets the ECE flag repeatedly in the TCP header and the | |||
| sender responds by setting the CWR flag. Although RECN-Co mode is | sender responds by setting the CWR flag. Although RECN-Co mode is | |||
| used when the receiver has not implemented the re-ECN protocol, the | used when the receiver has not implemented the re-ECN protocol, the | |||
| sender can infer enough from its RFC3168 compliant ECN feedback to | sender can infer enough from its RFC3168 compliant ECN feedback to | |||
| set or clear the RE flag reasonably well. Specifically, every time | set or clear the RE flag reasonably well. Specifically, every time | |||
| the receiver toggles the ECE field from "0" to "1" (or a loss is | the receiver toggles the ECE field from "0" to "1" (or a loss is | |||
| skipping to change at page 20, line 45 | skipping to change at page 25, line 19 | |||
| packets with RE set to "1". Once a flow is established, a re-ECN | packets with RE set to "1". Once a flow is established, a re-ECN | |||
| data sender in RECN-Co mode MUST always set the ECN field to ECT(1). | data sender in RECN-Co mode MUST always set the ECN field to ECT(1). | |||
| If a CE marked packet arrives at the receiver within a round trip | If a CE marked packet arrives at the receiver within a round trip | |||
| time of a previous mark, the receiver will still be echoing ECE for | time of a previous mark, the receiver will still be echoing ECE for | |||
| the last CE mark. Therefore, such a mark will be missed by the | the last CE mark. Therefore, such a mark will be missed by the | |||
| sender. Of course, this isn't of concern for congestion control, but | sender. Of course, this isn't of concern for congestion control, but | |||
| it does mean that very occasionally the RE blanking fraction will be | it does mean that very occasionally the RE blanking fraction will be | |||
| understated. Therefore flows in RECN-Co mode may occasionally be | understated. Therefore flows in RECN-Co mode may occasionally be | |||
| mistaken for very lightly cheating flows and consequently might | mistaken for very lightly cheating flows and consequently might | |||
| suffer a small number of packet drops through an egress dropper | suffer a small number of packet drops through an egress dropper. We | |||
| (Section 6.1.4). We expect re-ECN would be deployed for some time | expect re-ECN would be deployed for some time before policers and | |||
| before policers and droppers start to enforce it. So, given there is | droppers start to enforce it. So, given there is not much ECN | |||
| not much ECN deployment yet anyway, this minor problem may affect | deployment yet anyway, this minor problem may affect only a very | |||
| only a very small proportion of flows, reducing to nothing over the | small proportion of flows, reducing to nothing over the years as | |||
| years as RFC3168 compliant ECN hosts upgrade. The use of RECN-Co | RFC3168 compliant ECN hosts upgrade. The use of RECN-Co mode would | |||
| mode would need to be reviewed in the light of experience at the time | need to be reviewed in the light of experience at the time of re-ECN | |||
| of re-ECN deployment. | deployment. | |||
| RECN-Co mode is OPTIONAL. Re-ECN implementers who want to keep their | RECN-Co mode is OPTIONAL. Re-ECN implementers who want to keep their | |||
| code simple, MAY choose not to implement this mode. If they do not, | code simple, MAY choose not to implement this mode. If they do not, | |||
| a re-ECN sender SHOULD fall back to RFC3168 compliant ECT mode in the | a re-ECN sender SHOULD fall back to RFC3168 compliant ECT mode in the | |||
| presence of an ECN-capable receiver. It MAY choose to fall back to | presence of an ECN-capable receiver. It MAY choose to fall back to | |||
| the ECT-Nonce mode, but if re-ECN implementers don't want to be | the ECT-Nonce mode, but if re-ECN implementers don't want to be | |||
| bothered with RECN-Co mode, they probably won't want to add an ECT- | bothered with RECN-Co mode, they probably won't want to add an ECT- | |||
| Nonce mode either. | Nonce mode either. | |||
| 4.1.2.1. Re-ECN support for the ECN Nonce | 6.1.2.1. Re-ECN support for the ECN Nonce | |||
| A TCP half-connection in RECN-Co mode MUST NOT support the ECN | A TCP half-connection in RECN-Co mode MUST NOT support the ECN | |||
| Nonce [RFC3540]. This means that the sending code of a re-ECN | Nonce [RFC3540]. This means that the sending code of a re-ECN | |||
| implementation will never need to include ECN Nonce support. Re-ECN | implementation will never need to include ECN Nonce support. Re-ECN | |||
| is intended to provide wider protection than the ECN nonce against | is intended to provide wider protection than the ECN nonce against | |||
| congestion control misbehaviour, and re-ECN only requires support | congestion control misbehaviour, and re-ECN only requires support | |||
| from the sender, therefore it is preferable to specifically rule out | from the sender, therefore it is preferable to specifically rule out | |||
| the need for dual sender implementations. As a consequence, a re-ECN | the need for dual sender implementations. As a consequence, a re-ECN | |||
| capable sender will never set ECT(0), so it will be easier for | capable sender will never set ECT(0), so it will be easier for | |||
| network elements to discriminate re-ECN traffic flows from other ECN | network elements to discriminate re-ECN traffic flows from other ECN | |||
| skipping to change at page 21, line 41 | skipping to change at page 26, line 15 | |||
| RFC3540 allows an ECN nonce sender to choose whether to sanction a | RFC3540 allows an ECN nonce sender to choose whether to sanction a | |||
| receiver that does not ever set the nonce sum. Given re-ECN is | receiver that does not ever set the nonce sum. Given re-ECN is | |||
| intended to provide wider protection than the ECN nonce against | intended to provide wider protection than the ECN nonce against | |||
| congestion control misbehaviour, implementers of re-ECN receivers MAY | congestion control misbehaviour, implementers of re-ECN receivers MAY | |||
| choose not to implement backwards compatibility with the ECN nonce | choose not to implement backwards compatibility with the ECN nonce | |||
| capability. This may be because they deem that the risk of sanctions | capability. This may be because they deem that the risk of sanctions | |||
| is low, perhaps because significant deployment of the ECN nonce seems | is low, perhaps because significant deployment of the ECN nonce seems | |||
| unlikely at implementation time. | unlikely at implementation time. | |||
| 4.1.3. Capability Negotiation | 6.1.3. Capability Negotiation | |||
| During the TCP hand-shake at the start of a connection, an originator | During the TCP hand-shake at the start of a connection, an originator | |||
| of the connection (host A) with a re-ECN-capable transport MUST | of the connection (host A) with a re-ECN-capable transport MUST | |||
| indicate it is Re-ECT by setting the TCP flags NS=1, CWR=1 and ECE=1 | indicate it is Re-ECT by setting the TCP flags NS=1, CWR=1 and ECE=1 | |||
| in the initial SYN. | in the initial SYN. | |||
| A responding Re-ECT host (host B) MUST return a SYN ACK with flags | A responding Re-ECT host (host B) MUST return a SYN ACK with flags | |||
| CWR=1 and ECE=0. The responding host MUST NOT set this combination | CWR=1 and ECE=0. The responding host MUST NOT set this combination | |||
| of flags unless the preceding SYN has already indicated Re-ECT | of flags unless the preceding SYN has already indicated Re-ECT | |||
| support as above. Normally a Re-ECT server (B) will reply to a Re- | support as above. Normally a Re-ECT server (B) will reply to a Re- | |||
| skipping to change at page 23, line 19 | skipping to change at page 27, line 42 | |||
| preceding SYN (because there is a broken RFC3168 compliant | preceding SYN (because there is a broken RFC3168 compliant | |||
| implementation that behaves this way), RFC3168 specifies that the | implementation that behaves this way), RFC3168 specifies that the | |||
| whole connection MUST revert to Not-ECT. | whole connection MUST revert to Not-ECT. | |||
| Also note that, whenever the SYN flag of a TCP segment is set | Also note that, whenever the SYN flag of a TCP segment is set | |||
| (including when the ACK flag is also set), the NS, CWR and ECE flags | (including when the ACK flag is also set), the NS, CWR and ECE flags | |||
| ( i.e the ECI field of the SYNACK) MUST NOT be interpreted as the | ( i.e the ECI field of the SYNACK) MUST NOT be interpreted as the | |||
| 3-bit ECI value, which is only set as a copy of the local ECC value | 3-bit ECI value, which is only set as a copy of the local ECC value | |||
| in non-SYN packets. | in non-SYN packets. | |||
| 4.1.4. Extended ECN (EECN) Field Settings during Flow Start or after | 6.1.4. Extended ECN (EECN) Field Settings during Flow Start or after | |||
| Idle Periods | Idle Periods | |||
| If the originator (A) of a TCP connection supports re-ECN it MUST set | If the originator (A) of a TCP connection supports re-ECN it MUST set | |||
| the extended ECN (EECN) field in the IP header of the initial SYN | the extended ECN (EECN) field in the IP header of the initial SYN | |||
| packet to the feedback not established (FNE) codepoint. | packet to the feedback not established (FNE) codepoint. | |||
| FNE is a new extended ECN codepoint defined by this specification | FNE is a new extended ECN codepoint defined by this specification | |||
| (Section 3.3). The feedback not established (FNE) codepoint is used | (Section 4.2). The feedback not established (FNE) codepoint is used | |||
| when the transport does not have the benefit of ECN feedback so it | when the transport does not have the benefit of ECN feedback so it | |||
| cannot decide whether to set or clear the RE flag. | cannot decide whether to set or clear the RE flag. | |||
| If after receiving a SYN the server B has set its sending half- | If after receiving a SYN the server B has set its sending half- | |||
| connection into RECN mode or RECN-Co mode, it MUST set the extended | connection into RECN mode or RECN-Co mode, it MUST set the extended | |||
| ECN field in the IP header of its SYN ACK to the feedback not | ECN field in the IP header of its SYN ACK to the feedback not | |||
| established (FNE) codepoint. Note the careful wording here, which | established (FNE) codepoint. Note the careful wording here, which | |||
| means that Re-ECT server B MUST set FNE on a SYN ACK whether it is | means that Re-ECT server B MUST set FNE on a SYN ACK whether it is | |||
| responding to a SYN from a Re-ECT client or from a client that is | responding to a SYN from a Re-ECT client or from a client that is | |||
| merely ECN-capable. This is because FNE indicates the transport is | merely ECN-capable. This is because FNE indicates the transport is | |||
| skipping to change at page 27, line 5 | skipping to change at page 31, line 9 | |||
| trip time. We use the lower bound of the retransmission timeout | trip time. We use the lower bound of the retransmission timeout | |||
| (RTO) [RFC2988], which is commonly used as the idle period before TCP | (RTO) [RFC2988], which is commonly used as the idle period before TCP | |||
| must reduce to the restart window [RFC2581]. Note our specification | must reduce to the restart window [RFC2581]. Note our specification | |||
| of re-ECN's idle period is NOT intended to change the idle period for | of re-ECN's idle period is NOT intended to change the idle period for | |||
| TCP's restart, nor indeed for any other purposes. | TCP's restart, nor indeed for any other purposes. | |||
| {ToDo: Describe how the sender falls back to RFC3168 modes if packets | {ToDo: Describe how the sender falls back to RFC3168 modes if packets | |||
| don't appear to be getting through (to work round firewalls | don't appear to be getting through (to work round firewalls | |||
| discarding packets they consider unusual).} | discarding packets they consider unusual).} | |||
| 4.1.5. Pure ACKS, Retransmissions, Window Probes and Partial ACKs | 6.1.5. Pure ACKS, Retransmissions, Window Probes and Partial ACKs | |||
| A re-ECN sender MUST clear the RE flag to "0" and set the ECN field | A re-ECN sender MUST clear the RE flag to "0" and set the ECN field | |||
| to Not-ECT in pure ACKs, retransmissions and window probes, as | to Not-ECT in pure ACKs, retransmissions and window probes, as | |||
| specified in [RFC3168]. Our eventual goal is for all packets to be | specified in [RFC3168]. Our eventual goal is for all packets to be | |||
| sent with re-ECN enabled, and we believe the semantics of the ECI | sent with re-ECN enabled, and we believe the semantics of the ECI | |||
| field go a long way towards being able to achieve this. However, we | field go a long way towards being able to achieve this. However, we | |||
| have not completed a full security analysis for these cases, | have not completed a full security analysis for these cases, | |||
| therefore, currently we merely re-state current practice. | therefore, currently we merely re-state current practice. | |||
| We must also reconcile the facts that congestion marking is applied | We must also reconcile the facts that congestion marking is applied | |||
| skipping to change at page 27, line 47 | skipping to change at page 32, line 5 | |||
| through the variable R. | through the variable R. | |||
| This does not ensure precisely the same number of octets have RE | This does not ensure precisely the same number of octets have RE | |||
| blanked as were CE marked. But we believe positive errors will | blanked as were CE marked. But we believe positive errors will | |||
| cancel negative over a long enough period. {ToDo: However, more | cancel negative over a long enough period. {ToDo: However, more | |||
| research is needed to prove whether this is so. If it is not, it may | research is needed to prove whether this is so. If it is not, it may | |||
| be necessary to increment and decrement R in octets rather than | be necessary to increment and decrement R in octets rather than | |||
| packets, by incrementing R as the product of D and the size in octets | packets, by incrementing R as the product of D and the size in octets | |||
| of packets being sent (typically the MSS).} | of packets being sent (typically the MSS).} | |||
| 4.2. Other Transports | 6.2. Other Transports | |||
| 4.2.1. General Guidelines for Adding Re-ECN to Other Transports | 6.2.1. General Guidelines for Adding Re-ECN to Other Transports | |||
| As a general rule, Re-ECT sender transports that have established the | As a general rule, Re-ECT sender transports that have established the | |||
| receiver transport is at least ECN-capable (not necessarily re-ECN | receiver transport is at least ECN-capable (not necessarily re-ECN | |||
| capable) MUST blank the RE codepoint for at least as many octets as | capable) MUST blank the RE codepoint for at least as many octets as | |||
| arrive at receiver with the CE codepoint set. Re-ECN-capable sender | arrive at receiver with the CE codepoint set. Re-ECN-capable sender | |||
| transports should always initialise the ECN field to the ECT(1) | transports should always initialise the ECN field to the ECT(1) | |||
| codepoint once a flow is established. | codepoint once a flow is established. | |||
| If the sender transport does not have sufficient feedback to even | If the sender transport does not have sufficient feedback to even | |||
| estimate the path's CE rate, it SHOULD set FNE continuously. If the | estimate the path's CE rate, it SHOULD set FNE continuously. If the | |||
| skipping to change at page 28, line 32 | skipping to change at page 32, line 39 | |||
| following: | following: | |||
| o UDP fire and forget (e.g. DNS) | o UDP fire and forget (e.g. DNS) | |||
| o UDP streaming with no feedback | o UDP streaming with no feedback | |||
| o UDP streaming with feedback | o UDP streaming with feedback | |||
| } | } | |||
| 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS | 6.2.2. Guidelines for adding Re-ECN to RSVP or NSIS | |||
| A separate I-D has been submitted [Re-PCN] describing how re-ECN can | A separate I-D has been submitted [Re-PCN] describing how re-ECN can | |||
| be used in an edge-to-edge rather than end-to-end scenario. It can | be used in an edge-to-edge rather than end-to-end scenario. It can | |||
| then be used by downstream networks to police whether upstream | then be used by downstream networks to police whether upstream | |||
| networks are blocking new flow reservations when downstream | networks are blocking new flow reservations when downstream | |||
| congestion is too high, even though the congestion is in other | congestion is too high, even though the congestion is in other | |||
| operators' downstream networks. This relates to current IETF work on | operators' downstream networks. This relates to current IETF work on | |||
| Admission Control over Diffserv using Pre-Congestion Notification | Admission Control over Diffserv using Pre-Congestion Notification | |||
| (PCN) [PCN-arch]. | (PCN) [PCN-arch]. | |||
| 4.2.3. Guidelines for adding Re-ECN to DCCP | 6.2.3. Guidelines for adding Re-ECN to DCCP | |||
| Beside adjusting the initial features negotiation sequence, operating | Beside adjusting the initial features negotiation sequence, operating | |||
| re-ECN in DCCP [RFC4340] could be achieved by defining a new option | re-ECN in DCCP [RFC4340] could be achieved by defining a new option | |||
| to be added to acknowledgments, that would include a multibit field | to be added to acknowledgments, that would include a multibit field | |||
| where the destination could copy its ECC. | where the destination could copy its ECC. | |||
| 4.2.4. Guidelines for adding Re-ECN to SCTP | 6.2.4. Guidelines for adding Re-ECN to SCTP | |||
| Appendix A in [RFC4960] gives the specifications for SCTP to support | Appendix A in [RFC4960] gives the specifications for SCTP to support | |||
| ECN. Similar steps should be taken to support re-ECN. Beside | ECN. Similar steps should be taken to support re-ECN. Beside | |||
| adjusting the initial features negotiation sequence, operating re-ECN | adjusting the initial features negotiation sequence, operating re-ECN | |||
| in SCTP could be achieved by defining a new control chunk, that would | in SCTP could be achieved by defining a new control chunk, that would | |||
| include a multibit field where the destination could copy its ECC | include a multibit field where the destination could copy its ECC | |||
| 5. Network Layer | ||||
| 5.1. Re-ECN IPv4 Wire Protocol | ||||
| The wire protocol of the ECN field in the IP header remains largely | ||||
| unchanged from [RFC3168]. However, an extension to the ECN field we | ||||
| call the RE (Re-ECN extension) flag (Section 3.3) is defined in this | ||||
| document. It doubles the extended ECN codepoint space, giving 8 | ||||
| potential codepoints. The semantics of the extra codepoints are | ||||
| backward compatible with the semantics of the 4 original codepoints | ||||
| [RFC3168] (Section 7.1 collects together and summarises all the | ||||
| changes defined in this document). | ||||
| For IPv4, this document proposes that the new RE control flag will be | ||||
| positioned where the `reserved' control flag was at bit 48 of the | ||||
| IPv4 header (counting from 0). Alternatively, some would call this | ||||
| bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4 | ||||
| header (Figure 5). | ||||
| 0 1 2 | ||||
| +---+---+---+ | ||||
| | R | D | M | | ||||
| | E | F | F | | ||||
| +---+---+---+ | ||||
| Figure 5: New Definition of the Re-ECN Extension (RE) Control Flag at | ||||
| the Start of Byte 7 of the IPv4 Header | ||||
| The semantics of the RE flag are described in outline in Section 3 | ||||
| and specified fully in Section 4. The RE flag is always considered | ||||
| in conjunction with the 2-bit ECN field, as if they were concatenated | ||||
| together to form a 3-bit extended ECN field. If the ECN field is set | ||||
| to either the ECT(1) or CE codepoint, when the RE flag is blanked | ||||
| (cleared to "0") it represents a re-echo of congestion experienced by | ||||
| an early packet. If the ECN field is set to the Not-ECT codepoint, | ||||
| when the RE flag is set to "1" it represents the feedback not | ||||
| established (FNE) codepoint, which signals that the packet was sent | ||||
| without the benefit of congestion feedback. | ||||
| It is believed that the FNE codepoint can simultaneously serve other | ||||
| purposes, particularly where the start of a flow needs distinguishing | ||||
| from packets later in the flow. For instance it would have been | ||||
| useful to identify new flows for tag switching and might enable | ||||
| similar developments in the future if it were adopted. It is similar | ||||
| to the state set-up bit idea designed to protect against memory | ||||
| exhaustion attacks. This idea was proposed informally by David Clark | ||||
| and documented by Handley and Greenhalgh [Steps_DoS]. The FNE | ||||
| codepoint can be thought of as a `soft-state set-up flag', because it | ||||
| is idempotent (i.e. one occurrence of the flag is sufficient but | ||||
| further occurrences achieve the same effect if previous ones were | ||||
| lost). | ||||
| We are sure there will probably be other claims pending on the use of | ||||
| bit 48. We know of at least two [ARI05], [RFC3514] but neither have | ||||
| been pursued in the IETF, so far, although the present proposal would | ||||
| meet the needs of the former. | ||||
| The security flag proposal (commonly known as the evil bit) was | ||||
| published on 1 April 2003 as Informational RFC 3514, but it was not | ||||
| adopted due to confusion over whether evil-doers might set it | ||||
| inappropriately. The present proposal is backward compatible with | ||||
| RFC3514 because if re-ECN compliant senders were benign they would | ||||
| correctly clear the evil bit to honestly declare that they had just | ||||
| received congestion feedback. Whereas evil-doers would hide | ||||
| congestion feedback by setting the evil bit continuously, or at least | ||||
| more often than they should. So, evil senders can be identified, | ||||
| because they declare that they are good less often than they should. | ||||
| 5.2. Re-ECN IPv6 Wire Protocol | ||||
| For IPv6, this document proposes that the new RE control flag will be | ||||
| positioned as the first bit of the option field of a new Congestion | ||||
| hop by hop option header (Figure 6). | ||||
| 0 1 2 3 | ||||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| | Next Header | Hdr ext Len | Option Type | Opt Length =4 | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| |R| Reserved for future use | | ||||
| |E| | | ||||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ||||
| Figure 6: Definition of a New IPv6 Congestion Hop by Hop Option | ||||
| Header containing the re-ECN Extension (RE) Control Flag | ||||
| 0 1 2 3 4 5 6 7 8 | ||||
| +-+-+-+-+-+-+-+-+- | ||||
| |AIU|C|Option ID| | ||||
| +-+-+-+-+-+-+-+-+- | ||||
| Figure 7: Congestion Hop by Hop Option Type Encoding | ||||
| The Hop-by-Hop Options header enables packets to carry information to | ||||
| be examined and processed by routers or nodes along the packet's | ||||
| delivery path, including the source and destination nodes. For re- | ||||
| ECN, the two bits of the Action If Unrecognized (AIU) flag of the | ||||
| Congestion extension header MUST be set to "00" meaning if | ||||
| unrecognized `skip over option and continue processing the header'. | ||||
| Then, any routers or a receiver not upgraded with the optional re-ECN | ||||
| features described in this memo will simply ignore this header. But | ||||
| routers with these optional re-ECN features or a re-ECN policing | ||||
| function, will process this Congestion extension header. | ||||
| The `C' flag MUST be set to "1" to specify that the Option Data | ||||
| (currently only the RE control flag) can change en-route to the | ||||
| packet's final destination. This ensures that, when an | ||||
| Authentication header (AH [RFC4302]) is present in the packet, for | ||||
| any option whose data may change en-route, its entire Option Data | ||||
| field will be treated as zero-valued octets when computing or | ||||
| verifying the packet's authenticating value. | ||||
| Although the RE control flag should not be changed along the path, we | ||||
| expect that the rest of this option field that is currently `Reserved | ||||
| for future use' could be used for a multi-bit congestion notification | ||||
| field which we would expect to change en route. As the RE flag does | ||||
| not need end-to-end authentication, we set the C flag to '1'. | ||||
| {ToDo: A Congestion Hop by Hop Option ID will need to be registered | ||||
| with IANA.} | ||||
| 5.3. Router Forwarding Behaviour | ||||
| Re-ECN works well without modifying the forwarding behaviour of any | ||||
| routers. However, below, two OPTIONAL changes to forwarding | ||||
| behaviour are defined which respectively enhance performance and | ||||
| improve a router's discrimination against flooding attacks. They are | ||||
| both OPTIONAL additions that we propose MAY apply by default to all | ||||
| Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN | ||||
| marking behaviours [RFC3168]. Specifications for PHBs MAY define | ||||
| different forwarding behaviours from this default, but this is not | ||||
| required. [Re-PCN] is one example. | ||||
| FNE indicates ECT: | ||||
| The FNE codepoint tells a router to assume that the packet was | ||||
| sent by an ECN-capable transport (see Section 5.4). Therefore an | ||||
| FNE packet MAY be marked rather than dropped. Note that the FNE | ||||
| codepoint has been intentionally chosen so that, to RFC3168 | ||||
| compliant routers (which do not inspect the RE flag) an FNE packet | ||||
| appears to be Not-ECT so it will be dropped by legacy AQM | ||||
| algorithms. | ||||
| A network operator MUST NOT configure a queue to ECN mark rather | ||||
| than drop FNE packets unless it can guarantee that FNE packets | ||||
| will be rate limited, either locally or upstream. The ingress | ||||
| policers discussed in Section 6.1.5 would count as rate limiters | ||||
| for this purpose. | ||||
| Preferential Drop: If a re-ECN capable router queue experiences very | ||||
| high load so that it has to drop arriving packets (e.g. a DoS | ||||
| attack), it MAY preferentially drop packets within the same | ||||
| Diffserv PHB using the preference order for extended ECN | ||||
| codepoints given in Table 7. Preferential dropping can be | ||||
| difficult to implement on some hardware, but if feasible it would | ||||
| discriminate against attack traffic if done as part of the overall | ||||
| policing framework of Section 6.1.3. If nowhere else, routers at | ||||
| the egress of a network SHOULD implement preferential drop | ||||
| (stronger than the MAY above). For simplicity, preferences 4 & 5 | ||||
| MAY be merged into one preference level. | ||||
| +-------+-----+------------+-------+------------+-------------------+ | ||||
| | ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | ||||
| | field | bit | ECN | | (1 = drop | | | ||||
| | | | codepoint | | 1st) | | | ||||
| +-------+-----+------------+-------+------------+-------------------+ | ||||
| | 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | ||||
| | | | | | | congestion and | | ||||
| | | | | | | RECT | | ||||
| | 00 | 1 | FNE | +1 | 4 | Feedback not | | ||||
| | | | | | | established | | ||||
| | 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | ||||
| | | | | | | by congestion | | ||||
| | | | | | | experienced | | ||||
| | 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | ||||
| | | | | | | transport | | ||||
| | 11 | 1 | CE(-1) | -1 | 3 | Congestion | | ||||
| | | | | | | experienced | | ||||
| | 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | ||||
| | 10 | 0 | --- | n/a | 2 | RFC3168 ECN use | | ||||
| | | | | | | only | | ||||
| | 00 | 0 | Not-RECT | n/a | 1 | Not | | ||||
| | | | | | | Re-ECN-capable | | ||||
| | | | | | | transport | | ||||
| +-------+-----+------------+-------+------------+-------------------+ | ||||
| Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth') | ||||
| The above drop preferences are arranged to preserve packets with | ||||
| more positive worth (Section 3.5), given senders of positive | ||||
| packets must have honestly declared downstream congestion. This | ||||
| is explained fully in Section 6 on applications, particularly when | ||||
| the application of re-ECN to protect against DDoS attacks is | ||||
| described. | ||||
| 5.4. Justification for Setting the First SYN to FNE | ||||
| the initial SYN MUST be set to FNE by Re-ECT client A (Section 4.1.4) | ||||
| and (Section 5.3) says a queue MAY optionally treat an FNE packet as | ||||
| ECN capable, so an initial SYN may be marked CE(-1) rather than | ||||
| dropped. This seems dangerous, because the sender has not yet | ||||
| established whether the receiver is a RFC3168 one that does not | ||||
| understand congestion marking. It also seems to allow malicious | ||||
| senders to take advantage of ECN marking to avoid so much drop when | ||||
| launching SYN flooding attacks. Below we explain the features of the | ||||
| protocol design that remove both these dangers. | ||||
| ECN-capable initial SYN with a Not-ECT server: If the TCP server B | ||||
| is re-ECN capable, provision is made for it to feedback a possible | ||||
| congestion marked SYN in the SYN ACK (Section 4.1.4). But if the | ||||
| TCP client A finds out from the SYN ACK that the server was not | ||||
| ECN-capable, the TCP client MUST conservatively consider the first | ||||
| SYN as congestion marked before setting itself into Not-ECT mode. | ||||
| Section 4.1.4 mandates that such a TCP client MUST also set its | ||||
| initial window to 1 segment. In this way we remove the need to | ||||
| cautiously avoid setting the first SYN to Not-RECT. This will | ||||
| give worse performance while deployment is patchy, but better | ||||
| performance once deployment is widespread. | ||||
| SYN flooding attacks can't exploit ECN-capability: Malicious hosts | ||||
| may think they can use the advantage that ECN-marking gives over | ||||
| drop in launching classic SYN-flood attacks. But Section 5.3 | ||||
| mandates that a router MUST only be configured to treat packets | ||||
| with the FNE codepoint as ECN-capable if FNE packets are rate | ||||
| limited somewhere. Introduction of the FNE codepoint was a | ||||
| deliberate move to enable transport-neutral handling of flow-start | ||||
| and flow state set-up in the IP layer where it belongs. It then | ||||
| becomes possible to protect against flooding attacks of all forms | ||||
| (not just SYN flooding) without transport-specific inspection for | ||||
| things like the SYN flag in TCP headers. Then, for instance, SYN | ||||
| flooding attacks using IPSec ESP encryption can also be rate | ||||
| limited at the IP layer. | ||||
| It might seem pedantic going to all this trouble to enable ECN on the | ||||
| initial packet of a flow, but it is motivated by a much wider concern | ||||
| to ensure safe congestion control will still be possible even if the | ||||
| application mix evolves to the point where the majority of flows | ||||
| consist of a single window or even a single packet. It also allows | ||||
| denial of service attacks to be more easily isolated and prevented. | ||||
| 5.5. Control and Management | ||||
| 5.5.1. Negative Balance Warning | ||||
| A new ICMP message type is being considered so that a dropper can | ||||
| warn the apparent sender of a flow that it has started to sanction | ||||
| the flow. The message would have similar semantics to the `Time | ||||
| exceeded' ICMP message type. To ensure the sender has to invest some | ||||
| work before the network will generate such a message, a dropper | ||||
| SHOULD only send such a message for flows that have demonstrated that | ||||
| they have started correctly by establishing a positive record, but | ||||
| have later gone negative. The threshold is up to the implementation. | ||||
| The purpose of the message is to deconfuse the cause of drops from | ||||
| other causes, such as congestion or transmission losses. The dropper | ||||
| would send the message to the sender of the flow, not the receiver. | ||||
| If we did define this message type, it would be REQUIRED for all re- | ||||
| ECT senders to parse and understand it. Note that a sender MUST only | ||||
| use this message to explain why losses are occurring. A sender MUST | ||||
| NOT take this message to mean that losses have occurred that it was | ||||
| not aware of. Otherwise, spoof messages could be sent by malicious | ||||
| sources to slow down a sender (c.f. ICMP source quench). | ||||
| However, the need for this message type is not yet confirmed, as we | ||||
| are considering how to prevent it being used by malicious senders to | ||||
| scan for droppers and to test their threshold settings. {ToDo: | ||||
| Complete this section.} | ||||
| 5.5.2. Rate Response Control | ||||
| As discussed in Section 6.1.5 the sender's access operator will be | ||||
| expected to use bulk per-user policing, but they might choose to | ||||
| introduce a per-flow policer. In cases where operators do introduce | ||||
| per-flow policing, there may be a need for a sender to send a request | ||||
| to the ingress policer asking for permission to apply a non-default | ||||
| response to congestion (where TCP-friendly is assumed to be the | ||||
| default). This would require the sender to know what message | ||||
| format(s) to use and to be able to discover how to address the | ||||
| policer. The required control protocol(s) are outside the scope of | ||||
| this document, but will require definition elsewhere. | ||||
| The policer is likely to be local to the sender and inline, probably | ||||
| at the ingress interface to the internetwork. So, discovery should | ||||
| not be hard. A variety of control protocols already exist for some | ||||
| widely used rate-responses to congestion. For instance DCCP | ||||
| congestion control identifiers (CCIDs [RFC4340]) fulfil this role and | ||||
| so does QoS signalling (e.g. and RSVP request for controlled load | ||||
| service is equivalent to a request for no rate response to | ||||
| congestion, but with admission control). | ||||
| 5.6. IP in IP Tunnels | ||||
| For re-ECN to work correctly through IP in IP tunnels, it needs | ||||
| slightly different tunnel handling to regular ECN [RFC3168]. | ||||
| Currently there is some incosistency between how the handling of IP | ||||
| in IP tunnels is defined in [RFC3168] and how it is defined in | ||||
| [RFC4301], but re-ECN would work fine with the IPsec behaviour. This | ||||
| inconsistency is addressed in a new Internet Draft [ECN-tunnel] that | ||||
| proposes to update RFC3168 tunnel behaviour to bring it into line | ||||
| with IPsec. Ideally, for re-ECN to work through a tunnel, the tunnel | ||||
| entry should copy both the RE flag and the ECN field from the inner | ||||
| to the outer IP header. Then at the tunnel exit, any congestion | ||||
| marking of the outer ECN field should overwrite the inner ECN field | ||||
| (unless the inner field is Not-ECT in which case an alarm should be | ||||
| raised). The RE flag shouldn't change along a path, so the outer RE | ||||
| flag should be the same as the inner. If it isn't a management alarm | ||||
| should be raised. This behaviour is the same as the full- | ||||
| functionality variant of [RFC3168] at tunnel exit, but different at | ||||
| tunnel entry. | ||||
| If tunnels are left as they are specified in [RFC3168], whether the | ||||
| limited or full-functionality variants are used, a problem arises | ||||
| with re-ECN if a tunnel crosses an inter-domain boundary, because the | ||||
| difference between positive and negative markings will not be | ||||
| correctly accounted for. In a limited functionality ECN tunnel, the | ||||
| flow will appear to be RFC3168 compliant traffic, and therefore may | ||||
| be wrongly rate limited. In a full-functionality ECN tunnel, the | ||||
| result will depend whether the tunnel entry copies the inner RE flag | ||||
| to the outer header or the RE flag in the outer header is always | ||||
| cleared. If the former, the flow will tend to be too positive when | ||||
| accounted for at borders. If the latter, it will be too negative. | ||||
| If the rules set out in [ECN-tunnel] are followed then this will not | ||||
| be an issue. | ||||
| 5.7. Non-Issues | ||||
| The following issues might seem to cause unfavourable interactions | ||||
| with re-ECN, but we will explain why they don't: | ||||
| o Various link layers support explicit congestion notification, such | ||||
| as Frame Relay and ATM. Explicit congestion notification is | ||||
| proposed to be added to other link layers, such as Ethernet | ||||
| (802.3ar Ethernet congestion management) and MPLS [RFC5129]; | ||||
| o Encryption and IPSec. | ||||
| In the case of congestion notification at the link layer, each | ||||
| particular link layer scheme either manages congestion on the link | ||||
| with its own link-level feedback (the usual arrangement in the cases | ||||
| of ATM and Frame Relay), or congestion notification from the link | ||||
| layer is merged into congestion notification at the IP level when the | ||||
| frame headers are decapsulated at the end of the link (the | ||||
| recommended arrangement in the Ethernet and MPLS cases). Given the | ||||
| RE flag is not intended to change along the path, this means that | ||||
| downstream congestion will still be measureable at any point where IP | ||||
| is processed on the path by subtracting positive from negative | ||||
| markings. | ||||
| In the case of encryption, as long as the tunnel issues described in | ||||
| Section 5.6 are dealt with, payload encryption itself will not be a | ||||
| problem. The design goal of re-ECN is to include downstream | ||||
| congestion in the IP header so that it is not necessary to bury into | ||||
| inner headers. Obfuscation of flow identifiers is not a problem for | ||||
| re-ECN policing elements. Re-ECN doesn't ever require flow | ||||
| identifiers to be valid, it only requires them to be unique. So if | ||||
| an IPSec encapsulating security payload (ESP [RFC4305]) or an | ||||
| authentication header (AH [RFC4302]) is used, the security parameters | ||||
| index (SPI) will be a sufficient flow identifier, as it is intended | ||||
| to be unique to a flow without revealing actual port numbers. | ||||
| In general, even if endpoints use some locally agreed scheme to hide | ||||
| port numbers, re-ECN policing elements can just consider the pair of | ||||
| source and destination IP addresses as the flow identifier. Re-ECN | ||||
| encourages endpoints to at least tell the network layer that a | ||||
| sequence of packets are all part of the same flow, if indeed they | ||||
| are. The alternative would be for the sender to make each packet | ||||
| appear to be a new flow, which would require them all to be marked | ||||
| FNE in order to avoid being treated with the bulk of malicious flows | ||||
| at the egress dropper. Given the FNE marking is worth +1 and | ||||
| networks are likely to rate limit FNE packets, endpoints are given an | ||||
| incentive not to set FNE on each packet. But if the sender really | ||||
| does want to hide the flow relationship between packets it can choose | ||||
| to pay the cost of multiple FNE packets, which in the long run will | ||||
| compensate for the extra memory required on network policing elements | ||||
| to process each flow. | ||||
| 6. Applications | ||||
| 6.1. Policing Congestion Response | ||||
| 6.1.1. The Policing Problem | ||||
| The current Internet architecture trusts hosts to respond voluntarily | ||||
| to congestion. Limited evidence shows that the large majority of | ||||
| end-points on the Internet comply with a TCP-friendly response to | ||||
| congestion. But telephony (and increasingly video) services over the | ||||
| best effort Internet are attracting the interest of major commercial | ||||
| operations. Most of these applications do not respond to congestion | ||||
| at all. Those that can switch to lower rate codecs, still have a | ||||
| lower bound below which they must become unresponsive to congestion. | ||||
| Of course, the Internet is intended to support many different | ||||
| application behaviours. But the problem is that this freedom can be | ||||
| exercised irresponsibly. The greater problem is that we will never | ||||
| be able to agree on where the boundary is between responsible and | ||||
| irresponsible. Therefore re-ECN is designed to allow different | ||||
| networks to set their own view of the limit to irresponsibility, and | ||||
| to allow networks that choose a more conservative limit to push back | ||||
| against congestion caused in more liberal networks. | ||||
| As an example of the impossibility of setting a standard for | ||||
| fairness, mandating TCP-friendliness would set the bar too high for | ||||
| unresponsive streaming media, but still some would say the bar was | ||||
| too low. Even though all known peer-to-peer filesharing applications | ||||
| are TCP-compatible, they can cause a disproportionate amount of | ||||
| congestion, simply by using multiple flows and by transferring data | ||||
| continuously relative to other short-lived sessions. On the other | ||||
| hand, if we swung the other way and set the bar low enough to allow | ||||
| streaming media to be unresponsive, we would also allow denial of | ||||
| service attacks, which are typically unresponsive to congestion and | ||||
| consist of multiple continuous flows. | ||||
| Applications that need (or choose) to be unresponsive to congestion | ||||
| can effectively take (some would say steal) whatever share of | ||||
| bottleneck resources they want from responsive flows. Whether or not | ||||
| such free-riding is common, inability to prevent it increases the | ||||
| risk of poor returns for investors in network infrastructure, leading | ||||
| to under-investment. An increasing proportion of unresponsive or | ||||
| free-riding demand coupled with persistent under-supply is a broken | ||||
| economic cycle. Therefore, if the current, largely co-operative | ||||
| consensus continues to erode, congestion collapse could become more | ||||
| common in more areas of the Internet [RFC3714]. | ||||
| While we have designed re-ECN so that networks can choose to deploy | ||||
| stringent policing, this does not imply we advocate that every | ||||
| network should introduce tight controls on those that cause | ||||
| congestion. Re-ECN has been specifically designed to allow different | ||||
| networks to choose how conservative or liberal they wish to be with | ||||
| respect to policing congestion. But those that choose to be | ||||
| conservative can protect themselves from the excesses that liberal | ||||
| networks allow their users. | ||||
| 6.1.2. The Case Against Bottleneck Policing | ||||
| The state of the art in rate policing is the bottleneck policer, | ||||
| which is intended to be deployed at any forwarding resource that may | ||||
| become congested. Its aim is to detect flows that cause | ||||
| significantly more local congestion than others. Although operators | ||||
| might solve their immediate problems by deploying bottleneck | ||||
| policers, we are concerned that widespread deployment would make it | ||||
| extremely hard to evolve new application behaviours. We believe the | ||||
| IETF should offer re-ECN as the preferred protocol on which to base | ||||
| solutions to the policing problems of operators, because it would not | ||||
| harm evolvability and, frankly, it would be far more effective (see | ||||
| later for why). | ||||
| Approaches like [XCHOKe] & [pBox] are nice approaches for rate | ||||
| policing traffic without the benefit of whole path information (such | ||||
| as could be provided by re-ECN). But they must be deployed at | ||||
| bottlenecks in order to work. Unfortunately, a large proportion of | ||||
| traffic traverses at least two bottlenecks (in two access networks), | ||||
| particularly with the current traffic mix where peer-to-peer file- | ||||
| sharing is prevalent. If ECN were deployed, we believe it would be | ||||
| likely that these bottleneck policers would be adapted to combine ECN | ||||
| congestion marking from the upstream path with local congestion | ||||
| knowledge. But then the only useful placement for such policers | ||||
| would be close to the egress of the internetwork. | ||||
| But then, if these bottleneck policers were widely deployed (which | ||||
| would require them to be more effective than they are now), the | ||||
| Internet would find itself with one universal rate adaptation policy | ||||
| (probably TCP-friendliness) embedded throughout the network. Given | ||||
| TCP's congestion control algorithm is already known to be hitting its | ||||
| scalability limits and new algorithms are being developed for high- | ||||
| speed congestion control, embedding TCP policing into the Internet | ||||
| would make evolution to new algorithms extremely painful. If a | ||||
| source wanted to use a different algorithm, it would have to first | ||||
| discover then negotiate with all the policers on its path, | ||||
| particularly those in the far access network. The IETF has already | ||||
| traveled that path with the Intserv architecture and found it | ||||
| constrains scalability [RFC2208]. | ||||
| Anyway, if bottleneck policers were ever widely deployed, they would | ||||
| be likely to be bypassed by determined attackers. They inherently | ||||
| have to police fairness per flow or per source-destination pair. | ||||
| Therefore they can easily be circumvented either by opening multiple | ||||
| flows (by varying the end-point port number); or by spoofing the | ||||
| source address but arranging with the receiver to hide the true | ||||
| return address at a higher layer. | ||||
| 6.1.3. Re-ECN Incentive Framework | ||||
| The aim is to create an incentive environment that ensures optimal | ||||
| sharing of capacity despite everyone acting selfishly (including | ||||
| lying and cheating). Of course, the mechanisms put in place for this | ||||
| can lie dormant wherever co-operation is the norm. | ||||
| Throughout this document we focus on path congestion. But some forms | ||||
| of fairness, particularly TCP's, also depend on round trip time. If | ||||
| TCP-fairness is required, we also propose to measure downstream path | ||||
| delay using re-feedback. We give a simple outline of how this could | ||||
| work in Appendix F. However, we do not expect this to be necessary, | ||||
| as researchers tend to agree that only congestion control dynamics | ||||
| need to depend on RTT, not the rate that the algorithm would converge | ||||
| on after a period of stability. | ||||
| Figure 8 sketches the incentive framework that we will describe piece | ||||
| by piece throughout this section. We will do a first pass in | ||||
| overview, then return to each piece in detail. We re-use the earlier | ||||
| example of how downstream congestion is derived by subtracting | ||||
| upstream congestion from path congestion (Figure 2) but depict | ||||
| multiple trust boundaries to turn it into an internetwork. For | ||||
| clarity, only downstream congestion is shown (the difference between | ||||
| the two earlier plots). The graph displays downstream path | ||||
| congestion seen in a typical flow as it traverses an example path | ||||
| from sender S to receiver R, across networks N1, N2 & N3. Everyone | ||||
| is shown using re-ECN correctly, but we intend to show why everyone | ||||
| would /choose/ to use it correctly, and honestly. | ||||
| Three main types of self-interest can be identified: | ||||
| o Users want to transmit data across the network as fast as | ||||
| possible, paying as little as possible for the privilege. In this | ||||
| respect, there is no distinction between senders and receivers, | ||||
| but we must be wary of potential malice by one on the other; | ||||
| o Network operators want to maximise revenues from the resources | ||||
| they invest in. They compete amongst themselves for the custom of | ||||
| users. | ||||
| o Attackers (whether users or networks) want to use any opportunity | ||||
| to subvert the new re-ECN system for their own gain or to damage | ||||
| the service of their victims, whether targeted or random. | ||||
| policer dropper | ||||
| | | | ||||
| | | | ||||
| S <-----N1----> <---N2---> <---N3--> R domain | ||||
| | | ||||
| 3% |---------+ | ||||
| | | | ||||
| 2% | +-----------------------+ | ||||
| | downstream congestion | | ||||
| 1% | | | ||||
| | | | ||||
| 0% +---------------------------------+====== | ||||
| 0 i | ||||
| Figure 8: Incentive Framework, showing creation of opposing pressures | ||||
| to under-declare and over-declare downstream congestion, using a | ||||
| policer and a dropper | ||||
| Source congestion control: We want to ensure that the sender will | ||||
| throttle its rate as downstream congestion increases. Whatever | ||||
| the agreed congestion response (whether TCP-compatible or some | ||||
| enhanced QoS), to some extent it will always be against the | ||||
| sender's interest to comply. | ||||
| Ingress policing: But it is in all the network operators' interests | ||||
| to encourage fair congestion response, so that their investments | ||||
| are employed to satisfy the most valuable demand. The re-ECN | ||||
| protocol ensures packets carry the necessary information about | ||||
| their own expected downstream congestion so that N1 can deploy a | ||||
| policer at its ingress to check that S1 is complying with whatever | ||||
| congestion control it should be using (Section 6.1.5). If N1 is | ||||
| extremely conservative it could police each flow, but it is likely | ||||
| to just police the bulk amount of congestion each customer causes | ||||
| without regard to flows, or if it is extremely liberal it need not | ||||
| police congestion control at all. Whatever, it is always | ||||
| preferable to police traffic at the very first ingress into an | ||||
| internetwork, before non-compliant traffic can cause any damage. | ||||
| Edge egress dropper: If the policer ensures the source has less | ||||
| right to a high rate the higher it declares downstream congestion, | ||||
| the source has a clear incentive to understate downstream | ||||
| congestion. But, if flows of packets are understated when they | ||||
| enter the internetwork, they will have become negative by the time | ||||
| they leave. So, we introduce a dropper at the last network | ||||
| egress, which drops packets in flows that persistently declare | ||||
| negative downstream congestion (see Section 6.1.4 for details). | ||||
| Inter-domain traffic policing: But next we must ask, if congestion | ||||
| arises downstream (say in N3), what is the ingress network's | ||||
| (N1's) incentive to police its customers' response? If N1 turns a | ||||
| blind eye, its own customers benefit while other networks suffer. | ||||
| This is why all inter-domain QoS architectures (e.g. Intserv, | ||||
| Diffserv) police traffic each time it crosses a trust boundary. | ||||
| We have already shown that re-ECN gives a trustworthy measure of | ||||
| the expected downstream congestion that a flow will cause by | ||||
| subtracting negative volume from positive at any intermediate | ||||
| point on a path. N3 (say) can use this measure to police all the | ||||
| responses to congestion of all the sources beyond its upstream | ||||
| neighbour (N2), but in bulk with one very simple passive | ||||
| mechanism, rather than per flow, as we will now explain. | ||||
| Emulating policing with inter-domain congestion penalties: Between | ||||
| high-speed networks, we would rather avoid per-flow policing, and | ||||
| we would rather avoid holding back traffic while it is policed. | ||||
| Instead, once re-ECN has arranged headers to carry downstream | ||||
| congestion honestly, N2 can contract to pay N3 penalties in | ||||
| proportion to a single bulk count of the congestion metrics | ||||
| crossing their mutual trust boundary (Section 6.1.6). In this | ||||
| way, N3 puts pressure on N2 to suppress downstream congestion, for | ||||
| every flow passing through the border interface, even though they | ||||
| will all start and end in different places, and even though they | ||||
| may all be allowed different responses to congestion. The figure | ||||
| depicts this downward pressure on N2 by the solid downward arrow | ||||
| at the egress of N2. Then N2 has an incentive either to police | ||||
| the congestion response of its own ingress traffic (from N1) or to | ||||
| emulate policing by applying penalties to N1 in turn on the basis | ||||
| of congestion counted at their mutual boundary. In this recursive | ||||
| way, the incentives for each flow to respond correctly to | ||||
| congestion trace back with each flow precisely to each source, | ||||
| despite the mechanism not recognising flows (see Section 6.2.2). | ||||
| Inter-domain congestion charging diversity: Any two networks are | ||||
| free to agree any of a range of penalty regimes between themselves | ||||
| but they would only provide the right incentives if they were | ||||
| within the following reasonable constraints. N2 should expect to | ||||
| have to pay penalties to N3 where penalties monotonically increase | ||||
| with the volume of congestion and negative penalties are not | ||||
| allowed. For instance, they may agree an SLA with tiered | ||||
| congestion thresholds, where higher penalties apply the higher the | ||||
| threshold that is broken. But the most obvious (and useful) form | ||||
| of penalty is where N3 levies a charge on N2 proportional to the | ||||
| volume of downstream congestion N2 dumps into N3. In the | ||||
| explanation that follows, we assume this specific variant of | ||||
| volume charging between networks - charging proportionate to the | ||||
| volume of congestion. | ||||
| We must make clear that we are not advocating that everyone should | ||||
| use this form of contract. We are well aware that the IETF tries | ||||
| to avoid standardising technology that depends on a particular | ||||
| business model. And we strongly share this desire to encourage | ||||
| diversity. But our aim is merely to show that border policing can | ||||
| at least work with this one model, then we can assume that | ||||
| operators might experiment with the metric in other models (see | ||||
| Section 6.1.6 for examples). Of course, operators are free to | ||||
| complement this usage element of their charges with traditional | ||||
| capacity charging, and we expect they will as predicted by | ||||
| economics. | ||||
| No congestion charging to users: Bulk congestion penalties at trust | ||||
| boundaries are passive and extremely simple, and lose none of | ||||
| their per-packet precision from one boundary to the next (unlike | ||||
| Diffserv all-address traffic conditioning agreements, which | ||||
| dissipate their effectiveness across long topologies). But at any | ||||
| trust boundary, there is no imperative to use congestion charging. | ||||
| Traditional traffic policing can be used, if the complexity and | ||||
| cost is preferred. In particular, at the boundary with end | ||||
| customers (e.g. between S and N1), traffic policing will most | ||||
| likely be more appropriate. Policer complexity is less of a | ||||
| concern at the edge of the network. And end-customers are known | ||||
| to be highly averse to the unpredictability of congestion | ||||
| charging. | ||||
| NOTE WELL: This document neither advocates nor requires congestion | ||||
| charging for end customers and advocates but does not require | ||||
| inter-domain congestion charging. | ||||
| Competitive discipline of inter-domain traffic engineering: With | ||||
| inter-domain congestion charging, a domain seems to have a | ||||
| perverse incentive to fake congestion; N2's profit depends on the | ||||
| difference between congestion at its ingress (its revenue) and at | ||||
| its egress (its cost). So, overstating internal congestion seems | ||||
| to increase profit. However, smart border routing [Smart_rtg] by | ||||
| N1 will bias its routing towards the least cost routes. So, N2 | ||||
| risks losing all its revenue to competitive routes if it | ||||
| overstates congestion (see Section 6.2.3). In other words, if N2 | ||||
| is the least congested route, its ability to raise excess profits | ||||
| is limited by the congestion on the next least congested route. | ||||
| Closing the loop: All the above elements conspire to trap everyone | ||||
| between two opposing pressures, ensuring the downstream congestion | ||||
| metric arrives at the destination neither above nor below zero. | ||||
| So, we have arrived back where we started in our argument. The | ||||
| ingress edge network can rely on downstream congestion declared in | ||||
| the packet headers presented by the sender. So it can police the | ||||
| sender's congestion response accordingly. | ||||
| Evolvability of congestion control: We have seen that re-ECN enables | ||||
| policing at the very first ingress. We have also seen that, as | ||||
| flows continue on their path through further networks downstream, | ||||
| re-ECN removes the need for further per-domain ingress policing of | ||||
| all the different congestion responses allowed to each different | ||||
| flow. This is why the evolvability of re-ECN policing is so | ||||
| superior to bottleneck policing or to any policing of different | ||||
| QoS for different flows. Even if all access networks choose to | ||||
| conservatively police congestion per flow, each will want to | ||||
| compete with the others to allow new responses to congestion for | ||||
| new types of application. With re-ECN, each can introduce new | ||||
| controls independently, without coordinating with other networks | ||||
| and without having to standardise anything. But, as we have just | ||||
| seen, by making inter-domain penalties proportionate to bulk | ||||
| downtream congestion, downstream networks can be agnostic to the | ||||
| specific congestion response for each flow, but they can still | ||||
| apply more penalty the more liberal the ingress access network has | ||||
| been in the response to congestion it allowed for each flow. | ||||
| 6.1.3.1. The Case against Classic Feedback | ||||
| A system that produces an optimal outcome as a result of everyone's | ||||
| selfish actions is extremely powerful. Especially one that enables | ||||
| evolvability of congestion control. But why do we have to change to | ||||
| re-ECN to achieve it? Can't classic congestion feedback (as used | ||||
| already by standard ECN) be arranged to provide similar incentives | ||||
| and similar evolvability? Superficially it can. Kelly's seminal | ||||
| work showed how we can allow everyone the freedom to evolve whatever | ||||
| congestion control behaviour is in their application's best interest | ||||
| but still optimise the whole system of networks and users by placing | ||||
| a price on congestion to ensure responsible use of this | ||||
| freedom [Evol_cc]). Kelly used ECN with its classic congestion | ||||
| feedback model as the mechanism to convey congestion price | ||||
| information. The mechanism could be thought of as volume charging; | ||||
| except only the volume of packets marked with congestion experienced | ||||
| (CE) was counted. | ||||
| However, below we explain why relying on classic feedback /required/ | ||||
| congestion charging to be used, while re-ECN achieves the same | ||||
| powerful outcome (given it is built on Kelly's foundations), but does | ||||
| not /require/ congestion charging. In brief, the problem with | ||||
| classic feedback is that the incentives have to trace the indirect | ||||
| path back to the sender---the long way round the feedback loop. For | ||||
| example, if classic feedback were used in Figure 8, N2 would have had | ||||
| to influence N1 via all of N3, R & S rather than directly. | ||||
| Inability to agree what is happening downstream: In order to police | ||||
| its upstream neighbour's congestion response, the neighbours | ||||
| should be able to agree on the congestion to be responded to. | ||||
| Whatever the feedback regime, as packets change hands at each | ||||
| trust boundary, any path metrics they carry are verifiable by both | ||||
| neighbours. But, with a classic path metric, they can only agree | ||||
| on the /upstream/ path congestion. | ||||
| Inaccessible back-channel: The network needs a whole-path congestion | ||||
| metric if it wants to control the source. Classically, whole path | ||||
| congestion emerges at the destination, to be fed back from | ||||
| receiver to sender in a back-channel. But, in any data network, | ||||
| back-channels need not be visible to relays, as they are | ||||
| essentially communications between the end-points. They may be | ||||
| encrypted, asymmetrically routed or simply omitted, so no network | ||||
| element can reliably intercept them. The congestion charging | ||||
| literature solves this problem by charging the receiver and | ||||
| assuming this will cause the receiver to refer the charges to the | ||||
| sender. But, of course, this creates unintended side-effects... | ||||
| `Receiver pays' unacceptable: In connectionless datagram networks, | ||||
| receivers and receiving networks cannot prevent reception from | ||||
| malicious senders, so `receiver pays' opens them to `denial of | ||||
| funds' attacks. | ||||
| End-user congestion charging unacceptable: Even if 'denial of funds' | ||||
| were not a problem, we know that end-users are highly averse to | ||||
| the unpredictability of congestion charging and anyway, we want to | ||||
| avoid restricting network operators to just one retail tariff. | ||||
| But with classic feedback only an upstream metric is available, so | ||||
| we cannot avoid having to wrap the `receiver pays' money flow | ||||
| around the feedback loop, necessarily forcing end-users to be | ||||
| subjected to congestion charging. | ||||
| To summarise so far, with classic feedback, policing congestion | ||||
| response without losing evolvability /requires/ congestion charging | ||||
| of end-users and a `receiver pays' model, whereas, with re-ECN, it is | ||||
| still possible to influence incentives using congestion charging but | ||||
| using the safer `sender pays' model. However, congestion charging is | ||||
| only likely to be appropriate between domains. So, without losing | ||||
| evolvability, re-ECN enables technical policing mechanisms that are | ||||
| more appropriate for end users than congestion pricing. | ||||
| We now take a second pass over the incentive framework, filling in | ||||
| the detail. | ||||
| 6.1.4. Egress Dropper | ||||
| As traffic leaves the last network before the receiver (domain N3 in | ||||
| Figure 8), the fraction of positive octets in a flow should match the | ||||
| fraction of negative octets introduced by congestion marking, leaving | ||||
| a balance of zero. If it is less (a negative flow), it implies that | ||||
| the source is understating path congestion (which will reduce the | ||||
| penalties that N2 owes N3). | ||||
| If flows are positive, N3 need take no action---this simply means its | ||||
| upstream neighbour is paying more penalties than it needs to, and the | ||||
| source is going slower than it needs to. But, to protect itself | ||||
| against persistently negative flows, N3 will need to install a | ||||
| dropper at its egress. Appendix E gives a suggested algorithm for | ||||
| this dropper. There is no intention that the dropper algorithm needs | ||||
| to be standardised, it is merely provided to show that an efficient, | ||||
| robust algorithm is possible. But whatever algorithm is used must | ||||
| meet the criteria below: | ||||
| o It SHOULD introduce minimal false positives for honest flows; | ||||
| o It SHOULD quickly detect and sanction dishonest flows (minimal | ||||
| false negatives); | ||||
| o It MUST be invulnerable to state exhaustion attacks from malicious | ||||
| sources. For instance, if the dropper uses flow-state, it should | ||||
| not be possible for a source to send numerous packets, each with a | ||||
| different flow ID, to force the dropper to exhaust its memory | ||||
| capacity; | ||||
| o It MUST introduce sufficient loss in goodput so that malicious | ||||
| sources cannot play off losses in the egress dropper against | ||||
| higher allowed throughput. Salvatori [CLoop_pol] describes this | ||||
| attack, which involves the source understating path congestion | ||||
| then inserting forward error correction (FEC) packets to | ||||
| compensate expected losses. | ||||
| Note that the dropper operates on flows but we would like it not to | ||||
| require per-flow state. This is why we have been careful to ensure | ||||
| that all flows MUST start with a packet marked with the FNE | ||||
| codepoint. If a flow does not start with the FNE codepoint, a | ||||
| dropper is likely to treat it unfavourably. This risk makes it worth | ||||
| setting the FNE codepoint at the start of a flow, even though there | ||||
| is a cost to the sender of setting FNE (positive `worth'). Indeed, | ||||
| with the FNE codepoint, the rate at which a sender can generate new | ||||
| flows can be limited (Appendix G). In this respect, the FNE | ||||
| codepoint works like Handley's state set-up bit [Steps_DoS]. | ||||
| Appendix E also gives an example dropper implementation that | ||||
| aggregates flow state. Dropper algorithms will often maintain a | ||||
| moving average across flows of the fraction of RE blanked packets. | ||||
| When maintaining an average across flows, a dropper SHOULD only allow | ||||
| flows into the average if they start with FNE, but it SHOULD NOT | ||||
| include packets with the FNE codepoint set in the average. A sender | ||||
| sets the FNE codepoint when it does not have the benefit of feedback | ||||
| from the receiver. So, counting packets with FNE cleared would be | ||||
| likely to make the average unnecessarily positive, providing headroom | ||||
| (or should we say footroom?) for dishonest (negative) traffic. | ||||
| If the dropper detects a persistently negative flow, it SHOULD drop | ||||
| sufficient negative and neutral packets to force the flow to not be | ||||
| negative. Drops SHOULD be focused on just sufficient packets in | ||||
| misbehaving flows to remove the negative bias while doing minimal | ||||
| extra harm. | ||||
| 6.1.5. Policing | ||||
| Access operators who wish to limit the congeston that a sender is | ||||
| able to cause can deploy policers at the very first ingress to the | ||||
| internetwork. Re-ECN has been designed to avoid the need for | ||||
| bottleneck policing so that we can avoid a future where a single rate | ||||
| adaptation policy is embedded throughout the network. Instead, re- | ||||
| ECN allows the particular rate adaptation policy to be solely agreed | ||||
| bilaterally between the sender and its ingress access provider | ||||
| (Section 5.5.2 discusses possible ways to signal between them), which | ||||
| allows congestion control to be policed, but maintains its | ||||
| evolvability, requiring only a single, local box to be updated. | ||||
| Appendix G gives examples of per-user policing algorithms. But there | ||||
| is no implication that these algorithms are to be standardised, or | ||||
| that they are ideal. The ingress rate policer is the part of the re- | ||||
| ECN incentive framework that is intended to be the most flexible. | ||||
| Once endpoint protocol handlers for re-ECN and egress droppers are in | ||||
| place, operators can choose exactly which congestion response they | ||||
| want to police, and whether they want to do it per user, per flow or | ||||
| not at all. | ||||
| The re-ECN protocol allows these ingress policers to easily perform | ||||
| bulk per-user policing (Appendix G.1). This is likely to provide | ||||
| sufficient incentive to the user to correctly respond to congestion | ||||
| without needing the policing function to be overly complex. If an | ||||
| access operator chose they could use per-flow policing according to | ||||
| the widely adopted TCP rate adaptation ( Appendix G.2) or other | ||||
| alternatives, however this would introduce extra complexity to the | ||||
| system. | ||||
| If a per-flow rate policer is used, it should use path (not | ||||
| downstream) congestion as the relevant metric, which is represented | ||||
| by the fraction of octets in packets with positive (Re-Echo and FNE) | ||||
| and canceled (CE(0)) markings. Of course, re-ECN provides all the | ||||
| information a policer needs directly in the packets being policed. | ||||
| So, even policing TCP's AIMD algorithm is relatively straightforward | ||||
| (Appendix G.2). | ||||
| Note that we have included canceled packets in the measure of path | ||||
| congestion. Canceled packets arise when the sender re-echoes earlier | ||||
| congestion, but then this Re-Echo packet just happens to be | ||||
| congestion marked itself. One would not normally expect many | ||||
| canceled packets at the first ingress because one would not normally | ||||
| expect much congestion marking to have been necessary that soon in | ||||
| the path. However, a home network or campus network may well sit | ||||
| between the sending endpoint and the ingress policer, so some | ||||
| congestion may occur upstream of the policer. And if congestion does | ||||
| occur upstream, some canceled packets should be visible, and should | ||||
| be taken into account in the measure of path congestion. | ||||
| But a much more important reason for including canceled packets in | ||||
| the measure of path congestion at an ingress policer is that a sender | ||||
| might otherwise subvert the protocol by sending canceled packets | ||||
| instead of neutral (RECT) packets. Like neutral, canceled packets | ||||
| are worth zero, so the sender knows they won't be counted against any | ||||
| quota it might have been allowed. But unlike neutral packets, | ||||
| canceled packets are immune to congestion marking, because they have | ||||
| already been congestion marked. So, it is both correct and useful | ||||
| that canceled packets should be included in a policer's measure of | ||||
| path congestion, as this removes the incentive the sender would | ||||
| otherwise have to mark more packets as canceled than it should. | ||||
| An ingress policer should also ensure that flows are not already | ||||
| negative when they enter the access network. As with canceled | ||||
| packets, the presence of negative packets will typically be unusual. | ||||
| Therefore it will be easy to detect negative flows at the ingress by | ||||
| just detecting negative packets then monitoring the flow they belong | ||||
| to. | ||||
| Of course, even if the sender does operate its own network, it may | ||||
| arrange not to congestion mark traffic. Whether the sender does this | ||||
| or not is of no concern to anyone else except the sender. Such a | ||||
| sender will not be policed against its own network's contribution to | ||||
| congestion, but the only resulting problem would be overload in the | ||||
| sender's own network. | ||||
| Finally, we must not forget that an easy way to circumvent re-ECN's | ||||
| defences is for the source to turn off re-ECN support, by setting the | ||||
| Not-RECT codepoint, implying RFC3168 compliant traffic. Therefore an | ||||
| ingress policer should put a general rate-limit on Not-RECT traffic, | ||||
| which SHOULD be lax during early, patchy deployment, but will have to | ||||
| become stricter as deployment widens. Similarly, flows starting | ||||
| without an FNE packet can be confined by a strict rate-limit used for | ||||
| the remainder of flows that haven't proved they are well-behaved by | ||||
| starting correctly (therefore they need not consume any flow state--- | ||||
| they are just confined to the `misbehaving' bin if they carry an | ||||
| unrecognised flow ID). | ||||
| 6.1.6. Inter-domain Policing | ||||
| One of the main design goals of re-ECN is for border security | ||||
| mechanisms to be as simple as possible, otherwise they will become | ||||
| the pinch-points that limit scalability of the whole internetwork. | ||||
| We want to avoid per-flow processing at borders and to keep to | ||||
| passive mechanisms that can monitor traffic in parallel to | ||||
| forwarding, rather than having to filter traffic inline---in series | ||||
| with forwarding. Such passive, off-line mechanisms are essential for | ||||
| future high-speed all-optical border interconnection where packets | ||||
| cannot be buffered while they are checked for policy compliance. | ||||
| So far, we have been able to keep the border mechanisms simple, | ||||
| despite having had to harden them against some subtle attacks on the | ||||
| re-ECN design. The mechanisms are still passive and avoid per-flow | ||||
| processing. | ||||
| The basic accounting mechanism at each border interface simply | ||||
| involves accumulating the volume of packets with positive worth (Re- | ||||
| Echo and FNE), and subtracting the volume of those with negative | ||||
| worth: CE(-1). Even though this mechanism takes no regard of flows, | ||||
| over an accounting period (say a month) this subtraction will account | ||||
| for the downstream congestion caused by all the flows traversing the | ||||
| interface, wherever they come from, and wherever they go to. The two | ||||
| networks can agree to use this metric however they wish to determine | ||||
| some congestion-related penalty against the upstream network. | ||||
| Although the algorithm could hardly be simpler, it is spelled out | ||||
| using pseudo-code in Appendix H.1. | ||||
| Various attempts to subvert the re-ECN design have been made. In all | ||||
| cases their root cause is persistently negative flows. But, after | ||||
| describing these attacks we will show that we don't actually have to | ||||
| get rid of all persistently negative flows in order to thwart the | ||||
| attacks. | ||||
| In honest flows, downstream congestion is measured as positive minus | ||||
| negative volume. So if all flows are honest (i.e. not persistently | ||||
| negative), adding all positive volume and all negative volume without | ||||
| regard to flows will give an aggregate measure of downstream | ||||
| congestion. But such simple aggregation is only possible if no flows | ||||
| are persistently negative. Unless persistently negative flows are | ||||
| completely removed, they will reduce the aggregate measure of | ||||
| congestion. The aggregate may still be positive overall, but not as | ||||
| positive as it would have been had the negative flows been removed. | ||||
| In Section 6.1.4 we discussed how to sanction traffic to remove, or | ||||
| at least to identify, persistently negative flows. But, even if the | ||||
| sanction for negative traffic is to discard it, unless it is | ||||
| discarded at the exact point it goes negative, it will wrongly | ||||
| subtract from aggregate downstream congestion, at least at any | ||||
| borders it crosses after it has gone negative but before it is | ||||
| discarded. | ||||
| We rely on sanctions to deter dishonest understatement of congestion. | ||||
| But even the ultimate sanction of discard can only be effective if | ||||
| the sender is bothered about the data getting through to its | ||||
| destination. A number of attacks have been identified where a sender | ||||
| gains from sending dummy traffic or it can attack someone or | ||||
| something using dummy traffic even though it isn't communicating any | ||||
| information to anyone: | ||||
| o A host can send traffic with no positive markings towards its | ||||
| intended destination, aiming to transmit as much traffic as any | ||||
| dropper will allow [Bauer06]. It may add forward error correction | ||||
| (FEC) to repair as much drop as it experiences. | ||||
| o A host can send dummy traffic into the network with no positive | ||||
| markings and with no intention of communicating with anyone, but | ||||
| merely to cause higher levels of congestion for others who do want | ||||
| to communicate (DoS). So, to ride over the extra congestion, | ||||
| everyone else has to spend more of whatever rights to cause | ||||
| congestion they have been allowed. | ||||
| o A network can simply create its own dummy traffic to congest | ||||
| another network, perhaps causing it to lose business at no cost to | ||||
| the attacking network. This is a form of denial of service | ||||
| perpetrated by one network on another. The preferential drop | ||||
| measures in Section 5.3 provide crude protection against such | ||||
| attacks, but we are not overly worried about more accurate | ||||
| prevention measures, because it is already possible for networks | ||||
| to DoS other networks on the general Internet, but they generally | ||||
| don't because of the grave consequences of being found out. We | ||||
| are only concerned if re-ECN increases the motivation for such an | ||||
| attack, as in the next example. | ||||
| o A network can just generate negative traffic and send it over its | ||||
| border with a neighbour to reduce the overall penalties that it | ||||
| should pay to that neighbour. It could even initialise the TTL so | ||||
| it expired shortly after entering the neighbouring network, | ||||
| reducing the chance of detection further downstream. This attack | ||||
| need not be motivated by a desire to deny service and indeed need | ||||
| not cause denial of service. A network's main motivator would | ||||
| most likely be to reduce the penalties it pays to a neighbour. | ||||
| But, the prospect of financial gain might tempt the network into | ||||
| mounting a DoS attack on the other network as well, given the gain | ||||
| would offset some of the risk of being detected. | ||||
| The first step towards a solution to all these problems with negative | ||||
| flows is to be able to estimate the contribution they make to | ||||
| downstream congestion at a border and to correct the measure | ||||
| accordingly. Although ideally we want to remove negative flows | ||||
| themselves, perhaps surprisingly, the most effective first step is to | ||||
| cancel out the polluting effect negative flows have on the measure of | ||||
| downstream congestion at a border. It is more important to get an | ||||
| unbiased estimate of their effect, than to try to remove them all. A | ||||
| suggested algorithm to give an unbiased estimate of the contribution | ||||
| from negative flows to the downstream congestion measure is given in | ||||
| Appendix H.2. | ||||
| Although making an accurate assessment of the contribution from | ||||
| negative flows may not be easy, just the single step of neutralising | ||||
| their polluting effect on congestion metrics removes all the gains | ||||
| networks could otherwise make from mounting dummy traffic attacks on | ||||
| each other. This puts all networks on the same side (only with | ||||
| respect to negative flows of course), rather than being pitched | ||||
| against each other. The network where this flow goes negative as | ||||
| well as all the networks downstream lose out from not being | ||||
| reimbursed for any congestion this flow causes. So they all have an | ||||
| interest in getting rid of these negative flows. Networks forwarding | ||||
| a flow before it goes negative aren't strictly on the same side, but | ||||
| they are disinterested bystanders---they don't care that the flow | ||||
| goes negative downstream, but at least they can't actively gain from | ||||
| making it go negative. The problem becomes localised so that once a | ||||
| flow goes negative, all the networks from where it happens and beyond | ||||
| downstream each have a small problem, each can detect it has a | ||||
| problem and each can get rid of the problem if it chooses to. But | ||||
| negative flows can no longer be used for any new attacks. | ||||
| Once an unbiased estimate of the effect of negative flows can be | ||||
| made, the problem reduces to detecting and preferably removing flows | ||||
| that have gone negative as soon as possible. But importantly, | ||||
| complete eradication of negative flows is no longer critical---best | ||||
| endeavours will be sufficient. | ||||
| For instance, let us consider the case where a source sends traffic | ||||
| with no positive markings at all, hoping to at least get as much | ||||
| traffic delivered as network-based droppers will allow. The flow is | ||||
| likely to go at least slightly negative in the first network on the | ||||
| path (N1 if we use the example network layout in Figure 8). If all | ||||
| networks use the algorithm in Appendix H.2 to inflate penalties at | ||||
| their border with an upstream network, they will remove the effect of | ||||
| negative flows. So, for instance, N2 will not be paying a penalty to | ||||
| N1 for this flow. Further, because the flow contributes no positive | ||||
| markings at all, a dropper at the egress will completely remove it. | ||||
| The remaining problem is that every network is carrying a flow that | ||||
| is causing congestion to others but not being held to account for the | ||||
| congestion it is causing. Whenever the fail-safe border algorithm | ||||
| (Section 6.1.7) or the border algorithm to compensate for negative | ||||
| flows (Appendix H.2) detects a negative flow, it can instantiate a | ||||
| focused dropper for that flow locally. It may be some time before | ||||
| the flow is detected, but the more strongly negative the flow is, the | ||||
| more quickly it will be detected by the fail-safe algorithm. But, in | ||||
| the meantime, it will not be distorting border incentives. Until it | ||||
| is detected, if it contributes to drop anywhere, its packets will | ||||
| tend to be dropped before others if queues use the preferential drop | ||||
| rules in Section 5.3, which discriminate against non-positive | ||||
| packets. All networks below the point where a flow goes negative | ||||
| (N1, N2 and N3 in this case) have an incentive to remove this flow, | ||||
| but the queue where it first goes negative (in N1) can of course | ||||
| remove the problem for everyone downstream. | ||||
| In the case of DDoS attacks, Section 6.2.1 describes how re-ECN | ||||
| mitigates their force. | ||||
| 6.1.7. Inter-domain Fail-safes | ||||
| The mechanisms described so far create incentives for rational | ||||
| network operators to behave. That is, one operator aims to make | ||||
| another behave responsibly by applying penalties and expects a | ||||
| rational response (i.e. one that trades off costs against benefits). | ||||
| It is usually reasonable to assume that other network operators will | ||||
| behave rationally (policy routing can avoid those that might not). | ||||
| But this approach does not protect against the misconfigurations and | ||||
| accidents of other operators. | ||||
| Therefore, we propose the following two mechanisms at a network's | ||||
| borders to provide "defence in depth". Both are similar: | ||||
| Highly positive flows: A small sample of positive packets should be | ||||
| picked randomly as they cross a border interface. Then subsequent | ||||
| packets matching the same source and destination address and DSCP | ||||
| should be monitored. If the fraction of positive marking is well | ||||
| above a threshold (to be determined by operational practice), a | ||||
| management alarm SHOULD be raised, and the flow MAY be | ||||
| automatically subject to focused drop. | ||||
| Persistently negative flows: A small sample of congestion marked | ||||
| (negative) packets should be picked randomly as they cross a | ||||
| border interface. Then subsequent packets matching the same | ||||
| source and destination address and DSCP should be monitored. If | ||||
| the balance of positive minus negative markings is persistently | ||||
| negative, a management alarm SHOULD be raised, and the flow MAY be | ||||
| automatically subject to focused drop. | ||||
| Both these mechanisms rely on the fact that highly positive (or | ||||
| negative) flows will appear more quickly in the sample by selecting | ||||
| randomly solely from positive (or negative) packets. | ||||
| 6.1.8. Simulations | ||||
| Simulations of policer and dropper performance done for the multi-bit | ||||
| version of re-feedback have been included in section 5 "Dropper | ||||
| Performance" of [Re-fb]. Simulations of policer and dropper for the | ||||
| re-ECN version described in this document are work in progress. | ||||
| 6.2. Other Applications | ||||
| 6.2.1. DDoS Mitigation | ||||
| A flooding attack is inherently about congestion of a resource. | ||||
| Because re-ECN ensures the sources causing network congestion | ||||
| experience the cost of their own actions, it acts as a first line of | ||||
| defence against DDoS. As load focuses on a victim, upstream queues | ||||
| grow, requiring honest sources to pre-load packets with a higher | ||||
| fraction of positive packets. Once downstream queues are so | ||||
| congested that they are dropping traffic, they will be CE marking the | ||||
| traffic they do forward 100%. Honest sources will therefore be | ||||
| sending Re-Echo 100% (and therefore being severely rate-limited at | ||||
| the ingress). | ||||
| Senders under malicious control can either do the same as honest | ||||
| sources, and be rate-limited at ingress, or they can understate | ||||
| congestion by sending more neutral RECT packets than they should. If | ||||
| sources understate congestion (i.e. do not re-echo sufficient | ||||
| positive packets) and the preferential drop ranking is implemented on | ||||
| queues (Section 5.3), these queues will preserve positive traffic | ||||
| until last. So, the neutral traffic from malicious sources will all | ||||
| be automatically dropped first. Either way, the malicious sources | ||||
| cannot send more than honest sources. | ||||
| Further, hosts under malicious control will tend to be re-used for | ||||
| many different attacks. They will therefore build up a long term | ||||
| history of causing congestion. Therefore, as long as the population | ||||
| of potentially compromisable hosts around the Internet is limited, | ||||
| the per-user policing algorithms in Appendix G.1 will gradually | ||||
| throttle down zombies and other launchpads for attacks. Therefore, | ||||
| widespread deployment of re-ECN could considerably dampen the force | ||||
| of DDoS. Certainly, zombie armies could hold their fire for long | ||||
| enough to be able to build up enough credit in the per-user policers | ||||
| to launch an attack. But they would then still be limited to no more | ||||
| throughput than other, honest users. | ||||
| Inter-domain traffic policing (see Section 6.1.6)ensures that any | ||||
| network that harbours compromised `zombie' hosts will have to bear | ||||
| the cost of the congestion caused by traffic from zombies in | ||||
| downstream networks. Such networks will be incentivised to deploy | ||||
| per-user policers that rate-limit hosts that are unresponsive to | ||||
| congestion so they can only send very slowly into congested paths. | ||||
| As well as protecting other networks, the extremely poor performance | ||||
| at any sign of congestion will incentivise the zombie's owner to | ||||
| clean it up. However, the host should behave normally when using | ||||
| uncongested paths. | ||||
| Uniquely, re-ECN handles DDoS traffic without relying on the validity | ||||
| of identifiers in packets. Certainly the egress dropper relies on | ||||
| uniqueness of flow identifiers, but not their validity. So if a | ||||
| source spoofs another address, re-ECN works just as well, as long as | ||||
| the attacker cannot imitate all the flow identifiers of another | ||||
| active flow passing through the same dropper (see Section 6.3). | ||||
| Similarly, the ingress policer relies on uniqueness of flow IDs, not | ||||
| their validity. Because a new flow will only be allowed any rate at | ||||
| all if it starts with FNE, and the more FNE packets there are | ||||
| starting new flows, the more they will be limited. Essentially a re- | ||||
| ECN policer limits the bulk of all congestion entering the network | ||||
| through a physical interface; limiting the congestion caused by each | ||||
| flow is merely an optional extra. | ||||
| 6.2.2. End-to-end QoS | ||||
| {ToDo: (Section 3.3.2 of [Re-fb] entitled `Edge QoS' gives an outline | ||||
| of the text that will be added here).} | ||||
| 6.2.3. Traffic Engineering | ||||
| {ToDo: } | ||||
| 6.2.4. Inter-Provider Service Monitoring | ||||
| {ToDo: } | ||||
| 6.3. Limitations | ||||
| The known limitations of the re-ECN approach are: | ||||
| o We still cannot defend against the attack described in Section 10 | ||||
| where a malicious source sends negative traffic through the same | ||||
| egress dropper as another flow and imitates its flow identifiers, | ||||
| allowing a malicious source to cause an innocent flow to | ||||
| experience heavy drop. | ||||
| o Re-feedback for TTL (re-TTL) would also be desirable at the same | ||||
| time as re-ECN. Unfortunately this requires a further standards | ||||
| action for the mechanisms briefly described in Appendix F | ||||
| o Traffic must be ECN-capable for re-ECN to be effective. The only | ||||
| defence against malicious users who turn off ECN capbility is that | ||||
| networks are expected to rate limit Not-ECT traffic and to apply | ||||
| higher drop preference to it during congestion. Although these | ||||
| are blunt instruments, they at least represent a feasible scenario | ||||
| for the future Internet where Not-ECT traffic co-exists with re- | ||||
| ECN traffic, but as a severely hobbled under-class. We recommend | ||||
| (Section 7.1) that while accommodating a smooth initial transition | ||||
| to re-ECN, policing policies should gradually be tightened to rate | ||||
| limit Not-ECT traffic more strictly in the longer term. | ||||
| o When checking whether a flow is balancing positive markings with | ||||
| congestion marking, re-ECN can only account for congestion | ||||
| marking, not drops. So, whenever a sender experiences drop, it | ||||
| does not have to re-echo the congestion event. Nonetheless, it is | ||||
| hardly any advantage to be able to send faster than other flows | ||||
| only if your traffic is dropped and the other traffic isn't. | ||||
| o We are considering the issue of whether it would be useful to | ||||
| truncate rather than drop packets that appear to be malicious, so | ||||
| that the feedback loop is not broken but useful data can be | ||||
| removed. | ||||
| 7. Incremental Deployment | 7. Incremental Deployment | |||
| 7.1. Incremental Deployment Features | ||||
| The design of the re-ECN protocol started from the fact that the | The design of the re-ECN protocol started from the fact that the | |||
| current ECN marking behaviour of queues was sufficient and that re- | current ECN marking behaviour of queues was sufficient and that re- | |||
| feedback could be introduced around these queues by changing the | feedback could be introduced around these queues by changing the | |||
| sender behaviour but not the routers. Otherwise, if we had required | sender behaviour but not the routers. Otherwise, if we had required | |||
| routers to be changed, the chance of encountering a path that had | routers to be changed, the chance of encountering a path that had | |||
| every router upgraded would be vanishly small during early | every router upgraded would be vanishly small during early | |||
| deployment, giving no incentive to start deployment. Also, as there | deployment, giving no incentive to start deployment. Also, as there | |||
| is no new forwarding behaviour, routers and hosts do not have to | is no new forwarding behaviour, routers and hosts do not have to | |||
| signal or negotiate anything. | signal or negotiate anything. | |||
| skipping to change at page 57, line 6 | skipping to change at page 34, line 18 | |||
| sources will gain by upgrading to re-ECN. Thus, towards the end of | sources will gain by upgrading to re-ECN. Thus, towards the end of | |||
| the voluntary incremental deployment period, RFC3168 compliant | the voluntary incremental deployment period, RFC3168 compliant | |||
| transports can be given progressively stronger encouragement to | transports can be given progressively stronger encouragement to | |||
| upgrade. | upgrade. | |||
| The following list of minor changes, brings together all the points | The following list of minor changes, brings together all the points | |||
| where re-ECN semantics for use of the two-bit ECN field are different | where re-ECN semantics for use of the two-bit ECN field are different | |||
| compared to RFC3168: | compared to RFC3168: | |||
| o A re-ECN sender sets ECT(1) by default, whereas an RFC3168 sender | o A re-ECN sender sets ECT(1) by default, whereas an RFC3168 sender | |||
| sets ECT(0) by default (Section 3.4); | sets ECT(0) by default (Section 4.3); | |||
| o No provision is necessary for a re-ECN capable source transport to | o No provision is necessary for a re-ECN capable source transport to | |||
| use the ECN nonce (Section 4.1.2.1); | use the ECN nonce (Section 6.1.2.1); | |||
| o Routers MAY preferentially drop different extended ECN codepoints | o Routers MAY preferentially drop different extended ECN codepoints | |||
| (Section 5.3); | (Section 5.3); | |||
| o Packets carrying the feedback not established (FNE) codepoint MAY | o Packets carrying the feedback not established (FNE) codepoint MAY | |||
| optionally be marked rather than dropped by routers, even though | optionally be marked rather than dropped by routers, even though | |||
| their ECN field is Not-ECT (with the important caveat in | their ECN field is Not-ECT (with the important caveat in | |||
| Section 5.3); | Section 5.3); | |||
| o Packets may be dropped by policing nodes because of apparent | o Packets may be dropped by policing nodes because of apparent | |||
| misbehaviour, not just because of congestion (Section 6); | misbehaviour, not just because of congestion ; | |||
| o Tunnel entry behaviour is still to be defined, but may have to be | o Tunnel entry behaviour is still to be defined, but may have to be | |||
| different from RFC3168 (Section 5.6). | different from RFC3168 (Section 5.6). | |||
| None of these changes REQUIRE any modifications to routers. Also | None of these changes REQUIRE any modifications to routers. Also | |||
| none of these changes affect anything about end to end congestion | none of these changes affect anything about end to end congestion | |||
| control; they are all to do with allowing networks to police that end | control; they are all to do with allowing networks to police that end | |||
| to end congestion control is well-behaved. | to end congestion control is well-behaved. | |||
| 7.2. Incremental Deployment Incentives | 8. Related Work | |||
| It would only be worth standardising the re-ECN protocol if there | ||||
| existed a coherent story for how it might be incrementally deployed. | ||||
| In order for it to have a chance of deployment, everyone who needs to | ||||
| act must have a strong incentive to act, and the incentives must | ||||
| arise in the order that deployment would have to happen. Re-ECN | ||||
| works around unmodified ECN routers, but we can't just discuss why | ||||
| and how re-ECN deployment might build on ECN deployment, because | ||||
| there is precious little to build on in the first place. Instead, we | ||||
| aim to show that re-ECN deployment could carry ECN with it. We focus | ||||
| on commercial deployment incentives, although some of the arguments | ||||
| apply equally to academic or government sectors. | ||||
| ECN deployment: | ||||
| ECN is largely implemented in commercial routers, but generally | ||||
| not as a supported feature, and it has largely not been deployed | ||||
| by commercial network operators. It has been released in many | ||||
| Unix-based operating systems, but not in proprietary OSs like | ||||
| Windows or those in many mobile devices. For detailed deployment | ||||
| status, see [ECN-Deploy]. We believe the reason ECN deployment | ||||
| has not happened is twofold: | ||||
| * ECN requires changes to both routers and hosts. If someone | ||||
| wanted to sell the improvement that ECN offers, they would have | ||||
| to co-ordinate deployment of their product with others. An ECN | ||||
| server only gives any improvement on an ECN network. An ECN | ||||
| network only gives any improvement if used by ECN devices. | ||||
| Deployment that requires co-ordination adds cost and delay and | ||||
| tends to dilute any competitive advantage that might be gained. | ||||
| * ECN `only' gives a performance improvement. Making a product a | ||||
| bit faster (whether the product is a device or a network), | ||||
| isn't usually a sufficient selling point to be worth the cost | ||||
| of co-ordinating across the industry to deploy it. Network | ||||
| operators tend to avoid re-configuring a working network unless | ||||
| launching a new product. | ||||
| ECN and Re-ECN for Edge-to-edge Assured QoS: | ||||
| We believe the proposal to provide assured QoS sessions using a | ||||
| form of ECN called pre-congestion notification (PCN) [PCN-arch] is | ||||
| most likely to break the deadlock in ECN deployment first. It | ||||
| only requires edge-to-edge deployment so it does not require | ||||
| endpoint support. It can be deployed in a single network, then | ||||
| grow incrementally to interconnected networks. And it provides a | ||||
| different `product' (internetworked assured QoS), rather than | ||||
| merely making an existing product a bit faster. | ||||
| Not only could this assured QoS application kick-start ECN | ||||
| deployment, it could also carry re-ECN deployment with it; because | ||||
| re-ECN can enable the assured QoS region to expand to a large | ||||
| internetwork where neighbouring networks do not trust each other. | ||||
| [Re-PCN] argues that re-ECN security should be built in to the QoS | ||||
| system from the start, explaining why and how. | ||||
| If ECN and re-ECN were deployed edge-to-edge for assured QoS, | ||||
| operators would gain valuable experience. They would also clear | ||||
| away many technical obstacles such as firewall configurations that | ||||
| block all but the RFC3168 settings of the ECN field and the RE | ||||
| flag. | ||||
| ECN in Access Networks: | ||||
| The next obstacle to ECN deployment would be extension to access | ||||
| and backhaul networks, where considerable link layer differences | ||||
| makes implementation non-trivial, particularly on congested | ||||
| wireless links. ECN and re-ECN work fine during partial | ||||
| deployment, but they will not be very useful if the most congested | ||||
| elements in networks are the last to support them. Access network | ||||
| support is one of the weakest parts of this deployment story. All | ||||
| we can hope is that, once the benefits of ECN are better | ||||
| understood by operators, they will push for the necessary link | ||||
| layer implementations as deployment proceeds. | ||||
| Policing Unresponsive Flows: | ||||
| Re-ECN allows a network to offer differentiated quality of service | ||||
| as explained in Section 6.2.2. But we do not believe this will | ||||
| motivate initial deployment of re-ECN, because the industry is | ||||
| already set on alternative ways of doing QoS. Despite being much | ||||
| more complicated and expensive, the alternative approaches are | ||||
| here and now. | ||||
| But re-ECN is critical to QoS deployment in another respect. It | ||||
| can be used to prevent applications from taking whatever bandwidth | ||||
| they choose without asking. | ||||
| Currently, applications that remain resolute in their lack of | ||||
| response to congestion are rewarded by other TCP applications. In | ||||
| other words, TCP is naively friendly, in that it reduces its rate | ||||
| in response to congestion whether it is competing with friends | ||||
| (other TCPs) or with enemies (unresponsive applications). | ||||
| Therefore, those network owners that want to sell QoS will be keen | ||||
| to ensure that their users can't help themselves to QoS for free. | ||||
| Given the very large revenues at stake, we believe effective | ||||
| policing of congestion response will become highly sought after by | ||||
| network owners. | ||||
| But this does not necessarily argue for re-ECN deployment. | ||||
| Network owners might choose to deploy bottleneck policers rather | ||||
| than re-ECN-based policing. However, under Related Work | ||||
| (Section 9) we argue that bottleneck policers are inherently | ||||
| vulnerable to circumvention. | ||||
| Therefore we believe there will be a strong demand from network | ||||
| owners for re-ECN deployment so they can police flows that do not | ||||
| ask to be unresponsive to congestion, in order to protect their | ||||
| revenues from flows that do ask (QoS). In particular, we suspect | ||||
| that the operators of cellular networks will want to prevent VoIP | ||||
| and video applications being used freely on their networks as a | ||||
| more open market develops in GPRS and 3G devices. | ||||
| Initial deployments are likely to be isolated to single cellular | ||||
| networks. Cellular operators would first place requirements on | ||||
| device manufacturers to include re-ECN in the standards for mobile | ||||
| devices. In parallel, they would put out tenders for ingress and | ||||
| egress policers. Then, after a while they would start to tighten | ||||
| rate limits on Not-ECT traffic from non-standard devices and they | ||||
| would start policing whatever non-accredited applications people | ||||
| might install on mobile devices with re-ECN support in the | ||||
| operating system. This would force even independent mobile device | ||||
| manufacturers to provide re-ECN support. Early standardisation | ||||
| across the cellular operators is likely, including interconnection | ||||
| agreements with penalties for excess downstream congestion. | ||||
| We suspect some fixed broadband networks (whether cable or DSL) | ||||
| would follow a similar path. However, we also believe that larger | ||||
| parts of the fixed Internet would not choose to police on a per- | ||||
| flow basis. Some might choose to police congestion on a per-user | ||||
| basis in order to manage heavy peer-to-peer file-sharing, but it | ||||
| seems likely that a sizeable majority would not deploy any form of | ||||
| policing. | ||||
| This hybrid situation begs the question, "How does re-ECN work for | ||||
| networks that choose to using policing if they connect with others | ||||
| that don't?" Traffic from non-ECN capable sources will arrive | ||||
| from other networks and cause congestion within the policed, ECN- | ||||
| capable networks. So networks that chose to police congestion | ||||
| would rate-limit Not-ECT traffic throughout their network, | ||||
| particularly at their borders. They would probably also set | ||||
| higher usage prices in their interconnection contracts for | ||||
| incoming Not-ECT and Not-RECT traffic. We assume that | ||||
| interconnection contracts between networks in the same tier will | ||||
| include congestion penalties before contracts with provider | ||||
| backbones do. | ||||
| A hybrid situation could remain for all time. As was explained in | ||||
| the introduction, we believe in healthy competition between | ||||
| policing and not policing, with no imperative to convert the whole | ||||
| world to the religion of policing. Networks that chose not to | ||||
| deploy egress droppers would leave themselves open to being | ||||
| congested by senders in other networks. But that would be their | ||||
| choice. | ||||
| The important aspect of the egress dropper though is that it most | ||||
| protects the network that deploys it. If a network does not | ||||
| deploy an egress dropper, sources sending into it from other | ||||
| networks will be able to understate the congestion they are | ||||
| causing. Whereas, if a network deploys an egress dropper, it can | ||||
| know how much congestion other networks are dumping into it, and | ||||
| apply penalties or charges accordingly. So, whether or not a | ||||
| network polices its own sources at ingress, it is in its interests | ||||
| to deploy an egress dropper. | ||||
| Host support: | ||||
| In the above deployment scenario, host operating system support | ||||
| for re-ECN came about through the cellular operators demanding it | ||||
| in device standards (i.e. 3GPP). Of course, increasingly, mobile | ||||
| devices are being built to support multiple wireless technologies. | ||||
| So, if re-ECN were stipulated for cellular devices, it would | ||||
| automatically appear in those devices connected to the wireless | ||||
| fringes of fixed networks if they coupled cellular with WiFi or | ||||
| Bluetooth technology, for instance. Also, once implemented in the | ||||
| operating system of one mobile device, it would tend to be found | ||||
| in other devices using the same family of operating system. | ||||
| Therefore, whether or not a fixed network deployed ECN, or | ||||
| deployed re-ECN policers and droppers, many of its hosts might | ||||
| well be using re-ECN over it. Indeed, they would be at an | ||||
| advantage when communicating with hosts across re-ECN policed | ||||
| networks that rate limited Not-RECT traffic. | ||||
| Other possible scenarios: | ||||
| The above is thankfully not the only plausible scenario we can | ||||
| think of. One of the many clubs of operators that meet regularly | ||||
| around the world might decide to act together to persuade a major | ||||
| operating system manufacturer to implement re-ECN. And they may | ||||
| agree between them on an interconnection model that includes | ||||
| congestion penalties. | ||||
| Re-ECN provides an interesting opportunity for device | ||||
| manufacturers as well as network operators. Policers can be | ||||
| configured loosely when first deployed. Then as re-ECN take-up | ||||
| increases, they can be tightened up, so that a network with re-ECN | ||||
| deployed can gradually squeeze down the service provided to | ||||
| RFC3168 compliant devices that have not upgraded to re-ECN. Many | ||||
| device vendors rely on replacement sales. And operating system | ||||
| companies rely heavily on new release sales. Also support | ||||
| services would like to be able to force stragglers to upgrade. | ||||
| So, the ability to throttle service to RFC3168 compliant operating | ||||
| systems is quite valuable. | ||||
| Also, policing unresponsive sources may not be the only or even | ||||
| the first application that drives deployment. It may be policing | ||||
| causes of heavy congestion (e.g. peer-to-peer file-sharing). Or | ||||
| it may be mitigation of denial of service. Or we may be wrong in | ||||
| thinking simpler QoS will not be the initial motivation for re-ECN | ||||
| deployment. Indeed, the combined pressure for all these may be | ||||
| the motivator, but it seems optimistic to expect such a level of | ||||
| joined-up thinking from today's communications industry. We | ||||
| believe a single application alone must be a sufficient motivator. | ||||
| In short, everyone gains from adding accountability to TCP/IP, | ||||
| except the selfish or malicious. So, deployment incentives tend | ||||
| to be strong. | ||||
| 8. Architectural Rationale | ||||
| In the Internet's technical community, the danger of not responding | ||||
| to congestion is well-understood, as well as its attendant risk of | ||||
| congestion collapse [RFC3714]. However, one side of the Internet's | ||||
| commercial community considers that the very essence of IP is to | ||||
| provide open access to the internetwork for all applications. They | ||||
| see congestion as a symptom of over-conservative investment, and rely | ||||
| on revising application designs to find novel ways to keep | ||||
| applications working despite congestion. They argue that the | ||||
| Internet was never intended to be solely for TCP-friendly | ||||
| applications. Meanwhile, another side of the Internet's commercial | ||||
| community believes that it is worthwhile providing a network for | ||||
| novel applications only if it has sufficient capacity, which can | ||||
| happen only if a greater share of application revenues can be | ||||
| /assured/ for the infrastructure provider. Otherwise the major | ||||
| investments required would carry too much risk and wouldn't happen. | ||||
| The lesson articulated in [Tussle] is that we shouldn't embed our | ||||
| view on these arguments into the Internet at design time. Instead we | ||||
| should design the Internet so that the outcome of these arguments can | ||||
| get decided at run-time. Re-ECN is designed in that spirit. Once | ||||
| the protocol is available, different network operators can choose how | ||||
| liberal they want to be in holding people accountable for the | ||||
| congestion they cause. Some might boldly invest in capacity and not | ||||
| police its use at all, hoping that novel applications will result. | ||||
| Others might use re-ECN for fine-grained flow policing, expecting to | ||||
| make money selling vertically integrated services. Yet others might | ||||
| sit somewhere half-way, perhaps doing coarse, per-user policing. All | ||||
| might change their minds later. But re-ECN always allows them to | ||||
| interconnect so that the careful ones can protect themselves from the | ||||
| liberal ones. | ||||
| The incentive-based approach used for re-ECN is based on Gibbens and | ||||
| Kelly's arguments [Evol_cc] on allowing endpoints the freedom to | ||||
| evolve new congestion control algorithms for new applications. They | ||||
| ensured responsible behaviour despite everyone's self-interest by | ||||
| applying pricing to ECN marking, and Kelly had proved stability and | ||||
| optimality in an earlier paper. | ||||
| Re-ECN keeps all the underlying economic incentives, but rearranges | ||||
| the feedback. The idea is to allow a network operator (if it | ||||
| chooses) to deploy engineering mechanisms like policers at the front | ||||
| of the network which can be designed to behave /as if/ they are | ||||
| responding to congestion prices. Rather than having to subject users | ||||
| to congestion pricing, networks can then use more traditional | ||||
| charging regimes (or novel ones). But the engineering can constrain | ||||
| the overall amount of congestion a user can cause. This provides a | ||||
| buffer against completely outrageous congestion control, but still | ||||
| makes it easy for novel applications to evolve if they need different | ||||
| congestion control to the norms. It also allows novel charging | ||||
| regimes to evolve. | ||||
| Despite being achieved with a relatively minor protocol change, re- | ||||
| ECN is an architectural change. Previously, Internet congestion | ||||
| could only be controlled by the data sender, because it was the only | ||||
| one both in a position to control the load and in a position to see | ||||
| information on congestion. Re-ECN levels the playing field. It | ||||
| recognises that the network also has a role to play in moderating | ||||
| (policing) congestion control. But policing is only truly effective | ||||
| at the first ingress into an internetwork, whereas path congestion | ||||
| was previously only visible at the last egress. So, re-ECN | ||||
| democratises congestion information. Then the choice over who | ||||
| actually controls congestion can be made at run-time, not design | ||||
| time---a bit like an aircraft with dual controls. And different | ||||
| operators can make different choices. We believe non-architectural | ||||
| approaches to this problem are unlikely to offer more than partial | ||||
| solutions (see Section 9). | ||||
| Importantly, re-ECN does not require assumptions about specific | ||||
| congestion responses to be embedded in any network elements, except | ||||
| at the first ingress to the internetwork if that level of control is | ||||
| desired by the ingress operator. But such tight policing will be a | ||||
| matter of agreement between the source and its access network | ||||
| operator. The ingress operator need not police congestion response | ||||
| at flow granularity; it can simply hold a source responsible for the | ||||
| aggregate congestion it causes, perhaps keeping it within a monthly | ||||
| congestion quota. Or if the ingress network trusts the source, it | ||||
| can do nothing. | ||||
| Therefore, the aim of the re-ECN protocol is NOT solely to police | ||||
| TCP-friendliness. Re-ECN preserves IP as a generic network layer for | ||||
| all sorts of responses to congestion, for all sorts of transports. | ||||
| Re-ECN merely ensures truthful downstream congestion information is | ||||
| available in the network layer for all sorts of accountability | ||||
| applications. | ||||
| The end to end design principle does not say that all functions | ||||
| should be moved out of the lower layers---only those functions that | ||||
| are not generic to all higher layers. Re-ECN adds a function to the | ||||
| network layer that is generic, but was omitted: accountability for | ||||
| causing congestion. Accountability is not something that an end-user | ||||
| can provide to themselves. We believe re-ECN adds no more than is | ||||
| sufficient to hold each flow accountable, even if it consists of a | ||||
| single datagram. | ||||
| "Accountability" implies being able to identify who is responsible | ||||
| for causing congestion. However, at the network layer it would NOT | ||||
| be useful to identify the cause of congestion by adding individual or | ||||
| organisational identity information, NOR by using source IP | ||||
| addresses. Rather than bringing identity information to the point of | ||||
| congestion, we bring downstream congestion information to the point | ||||
| where the cause can be most easily identified and dealt with. That | ||||
| is, at any trust boundary congestion can be associated with the | ||||
| physically connected upstream neighbour that is directly responsible | ||||
| for causing it (whether intentionally or not). A trust boundary | ||||
| interface is exactly the place to police or throttle in order to | ||||
| directly mitigate congestion, rather than having to trace the | ||||
| (ir)responsible party in order to shut them down. | ||||
| Some considered that ECN itself was a layering violation. The | ||||
| reasoning went that the interface to a layer should provide a service | ||||
| to the higher layer and hide how the lower layer does it. However, | ||||
| ECN reveals the state of the network layer and below to the transport | ||||
| layer. A more positive way to describe ECN is that it is like the | ||||
| return value of a function call to the network layer. It explicitly | ||||
| returns the status of the request to deliver a packet, by returning a | ||||
| value representing the current risk that a packet will not be served. | ||||
| Re-ECN has similar semantics, except the transport layer must try to | ||||
| guess the return value, then it can use the actual return value from | ||||
| the network layer to modify the next guess. | ||||
| The guiding principle behind all the discussion in Section 6.1.6 on | ||||
| Policing is that any gain from subverting the protocol should be | ||||
| precisely neutralised, rather than punished. If a gain is punished | ||||
| to a greater extent than is sufficient to neutralise it, it will most | ||||
| likely open up a new vulnerability, where the amplifying effect of | ||||
| the punishment mechanism can be turned on others. | ||||
| For instance, if possible, flows should be removed as soon as they go | ||||
| negative, but we do NOT RECOMMEND any attempts to discard such flows | ||||
| further upstream while they are still positive. Such over-zealous | ||||
| push-back is unnecessary and potentially dangerous. These flows have | ||||
| paid their `fare' up to the point they go negative, so there is no | ||||
| harm in delivering them that far. If someone downstream asks for a | ||||
| flow to be dropped as near to the source as possible, because they | ||||
| say it is going to become negative later, an upstream node cannot | ||||
| test the truth of this assertion. Rather than have to authenticate | ||||
| such messages, re-ECN has been designed so that flows can be dropped | ||||
| solely based on locally measurable evidence. A message hinting that | ||||
| a flow should be watched closely to test for negativity is fine. But | ||||
| not a message that claims that a positive flow will go negative | ||||
| later, so it should be dropped. . | ||||
| 9. Related Work | ||||
| {Due to lack of time, this section is incomplete. The reader is | ||||
| referred to the Related Work section of [Re-fb] for a brief selection | ||||
| of related ideas.} | ||||
| 9.1. Policing Rate Response to Congestion | ||||
| ATM network elements send congestion back-pressure | ||||
| messages [ITU-T.I.371] along each connection, duplicating any end to | ||||
| end feedback because they don't trust it. On the other hand, re-ECN | ||||
| ensures information in forwarded packets can be used for congestion | ||||
| management without requiring a connection-oriented architecture and | ||||
| re-using the overhead of fields that are already set aside for end to | ||||
| end congestion control (and routing loop detection in the case of re- | ||||
| TTL in Appendix F). | ||||
| We borrowed ideas from policers in the literature [pBox],[XCHOKe], | ||||
| AFD etc. for our rate equation policer. However, without the benefit | ||||
| of re-ECN they don't police the correct rate for the condition of | ||||
| their path. They detect unusually high /absolute/ rates, but only | ||||
| while the policer itself is congested, because they work by detecting | ||||
| prevalent flows in the discards from the local RED queue. These | ||||
| policers must sit at every potential bottleneck, whereas our policer | ||||
| need only be located at each ingress to the internetwork. As Floyd & | ||||
| Fall explain [pBox], the limitation of their approach is that a high | ||||
| sending rate might be perfectly legitimate, if the rest of the path | ||||
| is uncongested or the round trip time is short. Commercially | ||||
| available rate policers cap the rate of any one flow. Or they | ||||
| enforce monthly volume caps in an attempt to control high volume | ||||
| file-sharing. They limit the value a customer derives. They might | ||||
| also limit the congestion customers can cause, but only as an | ||||
| accidental side-effect. They actually punish traffic that fills | ||||
| troughs as much as traffic that causes peaks in utilisation. In | ||||
| practice network operators need to be able to allocate service by | ||||
| cost during congestion, and by value at other times. | ||||
| 9.2. Congestion Notification Integrity | 8.1. Congestion Notification Integrity | |||
| The choice of two ECT code-points in the ECN field [RFC3168] | The choice of two ECT code-points in the ECN field [RFC3168] | |||
| permitted future flexibility, optionally allowing the sender to | permitted future flexibility, optionally allowing the sender to | |||
| encode the experimental ECN nonce [RFC3540] in the packet stream. | encode the experimental ECN nonce [RFC3540] in the packet stream. | |||
| This mechanism has since been included in the specifications of DCCP | This mechanism has since been included in the specifications of DCCP | |||
| [RFC4340]. | [RFC4340]. | |||
| The ECN nonce is an elegant scheme that allows the sender to detect | The ECN nonce is an elegant scheme that allows the sender to detect | |||
| if someone in the feedback loop - the receiver especially - tries to | if someone in the feedback loop - the receiver especially - tries to | |||
| claim no congestion was experienced when in fact congestion led to | claim no congestion was experienced when in fact congestion led to | |||
| skipping to change at page 67, line 5 | skipping to change at page 35, line 44 | |||
| can police their upstream neighbours, to encourage them to police | can police their upstream neighbours, to encourage them to police | |||
| their users in turn. But most importantly, it requires the sender to | their users in turn. But most importantly, it requires the sender to | |||
| declare path congestion to the network and it can remove traffic at | declare path congestion to the network and it can remove traffic at | |||
| the egress if this declaration is dishonest. So it can police | the egress if this declaration is dishonest. So it can police | |||
| correctly, irrespective of whether the receiver tries to suppress | correctly, irrespective of whether the receiver tries to suppress | |||
| congestion feedback or whether the sender ignores genuine congestion | congestion feedback or whether the sender ignores genuine congestion | |||
| feedback. Therefore the re-ECN protocol addresses a much wider range | feedback. Therefore the re-ECN protocol addresses a much wider range | |||
| of cheating problems, which includes the one addressed by the ECN | of cheating problems, which includes the one addressed by the ECN | |||
| nonce. | nonce. | |||
| 9.3. Identifying Upstream and Downstream Congestion | 9. Security Considerations | |||
| Purple [Purple] proposes that queues should use the CWR flag in the | ||||
| TCP header of ECN-capable flows to work out path congestion and | ||||
| therefore downstream congestion in a similar way to re-ECN. However, | ||||
| because CWR is in the transport layer, it is not always visible to | ||||
| network layer routers and policers. Purple's motivation was to | ||||
| improve AQM, not policing. But, of course, nodes trying to avoid a | ||||
| policer would not be expected to allow CWR to be visible. | ||||
| 10. Security Considerations | ||||
| This whole memo concerns the deployment of a secure congestion | This whole memo concerns the deployment of a secure congestion | |||
| control framework. However, below we list some specific security | control framework. However, below we list some specific security | |||
| issues that we are still working on: | issues that we are still working on: | |||
| o Malicious users have ability to launch dynamically changing | o Malicious users have ability to launch dynamically changing | |||
| attacks, exploiting the time it takes to detect an attack, given | attacks, exploiting the time it takes to detect an attack, given | |||
| ECN marking is binary. We are concentrating on subtle | ECN marking is binary. We are concentrating on subtle | |||
| interactions between the ingress policer and the egress dropper in | interactions between the ingress policer and the egress dropper in | |||
| an effort to make it impossible to game the system. | an effort to make it impossible to game the system. | |||
| skipping to change at page 67, line 37 | skipping to change at page 36, line 17 | |||
| o There is an inherent need for at least some flow state at the | o There is an inherent need for at least some flow state at the | |||
| egress dropper given the binary marking environment, which leads | egress dropper given the binary marking environment, which leads | |||
| to an apparent vulnerability to state exhaustion attacks. An | to an apparent vulnerability to state exhaustion attacks. An | |||
| egress dropper design with bounded flow state is in write-up. | egress dropper design with bounded flow state is in write-up. | |||
| o A malicious source can spoof another user's address and send | o A malicious source can spoof another user's address and send | |||
| negative traffic to the same destination in order to fool the | negative traffic to the same destination in order to fool the | |||
| dropper into sanctioning the other user's flow. To prevent or | dropper into sanctioning the other user's flow. To prevent or | |||
| mitigate these two different kinds of DoS attack, against the | mitigate these two different kinds of DoS attack, against the | |||
| dropper and against given flows, we are considering various | dropper and against given flows, we are considering various | |||
| protection mechanisms. Section 5.5.1 discusses one of these. | protection mechanisms. | |||
| o A malicious client can send requests using a spoofed source | o A malicious client can send requests using a spoofed source | |||
| address to a server (such as a DNS server) that tends to respond | address to a server (such as a DNS server) that tends to respond | |||
| with single packet responses. This server will then be tricked | with single packet responses. This server will then be tricked | |||
| into having to set FNE on the first (and only) packet of all these | into having to set FNE on the first (and only) packet of all these | |||
| wasted responses. Given packets marked FNE are worth +1, this | wasted responses. Given packets marked FNE are worth +1, this | |||
| will cause such servers to consume more of their allowance to | will cause such servers to consume more of their allowance to | |||
| cause congestion than they would wish to. In general, re-ECN is | cause congestion than they would wish to. In general, re-ECN is | |||
| deliberately designed so that single packet flows have to bear the | deliberately designed so that single packet flows have to bear the | |||
| cost of not discovering the congestion state of their path. One | cost of not discovering the congestion state of their path. One | |||
| skipping to change at page 68, line 46 | skipping to change at page 37, line 26 | |||
| was defined). But it would be sufficient for a pair of endpoints to | was defined). But it would be sufficient for a pair of endpoints to | |||
| make random checks on whether the RE flag was the same when it | make random checks on whether the RE flag was the same when it | |||
| reached the egress as when it left the ingress. Indeed, if IPSec AH | reached the egress as when it left the ingress. Indeed, if IPSec AH | |||
| had covered the RE flag, any network intending to alter sufficient RE | had covered the RE flag, any network intending to alter sufficient RE | |||
| flags to make a gain would have focused its alterations on packets | flags to make a gain would have focused its alterations on packets | |||
| without authenticating headers (AHs). | without authenticating headers (AHs). | |||
| The security of re-ECN has been deliberately designed to not rely on | The security of re-ECN has been deliberately designed to not rely on | |||
| cryptography. | cryptography. | |||
| 11. IANA Considerations | 10. IANA Considerations | |||
| This memo includes no request to IANA (yet). | This memo includes no request to IANA (yet). | |||
| If this memo was to progress to standards track, it would list: | If this memo was to progress to standards track, it would list: | |||
| o The new RE flag in IPv4 (Section 5.1) and its extension with the | o The new RE flag in IPv4 (Section 5.1) and its extension with the | |||
| ECN field to create a new set of extended ECN (EECN) codepoints; | ECN field to create a new set of extended ECN (EECN) codepoints; | |||
| o The definition of the EECN codepoints for default Diffserv PHBs | o The definition of the EECN codepoints for default Diffserv PHBs | |||
| (Section 3.3) | (Section 4.2) | |||
| o The new extension header for IPv6 (Section 5.2); | o The new extension header for IPv6 (Section 5.2); | |||
| o The new combinations of flags in the TCP header for capability | o The new combinations of flags in the TCP header for capability | |||
| negotiation (Section 4.1.3); | negotiation (Section 6.1.3); | |||
| o The new ICMP message type (Section 5.5.1). | ||||
| 12. Conclusions | 11. Conclusions | |||
| {ToDo:} | {ToDo:} | |||
| 13. Acknowledgements | 12. Acknowledgements | |||
| Sebastien Cazalet and Andrea Soppera contributed to the idea of re- | Sebastien Cazalet and Andrea Soppera contributed to the idea of re- | |||
| feedback. All the following have given helpful comments: Andrea | feedback. All the following have given helpful comments: Andrea | |||
| Soppera, David Songhurst, Peter Hovell, Louise Burness, Phil Eardley, | Soppera, David Songhurst, Peter Hovell, Louise Burness, Phil Eardley, | |||
| Steve Rudkin, Marc Wennink, Fabrice Saffre, Cefn Hoile, Steve Wright, | Steve Rudkin, Marc Wennink, Fabrice Saffre, Cefn Hoile, Steve Wright, | |||
| John Davey, Martin Koyabe, Carla Di Cairano-Gilfedder, Alexandru | John Davey, Martin Koyabe, Carla Di Cairano-Gilfedder, Alexandru | |||
| Murgu, Nigel Geffen, Pete Willis, John Adams (BT), Sally Floyd | Murgu, Nigel Geffen, Pete Willis, John Adams (BT), Sally Floyd | |||
| (ICIR), Joe Babiarz, Kwok Ho-Chan (Nortel), Stephen Hailes, Mark | (ICIR), Joe Babiarz, Kwok Ho-Chan (Nortel), Stephen Hailes, Mark | |||
| Handley (who developed the attack with canceled packets), Adam | Handley (who developed the attack with canceled packets), Adam | |||
| Greenhalgh (who developed the attack on DNS) (UCL), Jon Crowcroft | Greenhalgh (who developed the attack on DNS) (UCL), Jon Crowcroft | |||
| (Uni Cam), David Clark, Bill Lehr, Sharon Gillett, Steve Bauer (who | (Uni Cam), David Clark, Bill Lehr, Sharon Gillett, Steve Bauer (who | |||
| complemented our own dummy traffic attacks with others), Liz Maida | complemented our own dummy traffic attacks with others), Liz Maida | |||
| (MIT), and comments from participants in the CRN/CFP Broadband and | (MIT), and comments from participants in the CRN/CFP Broadband and | |||
| DoS-resistant Internet working groups.A special thank you to | DoS-resistant Internet working groups.A special thank you to | |||
| Alessandro Salvatori for coming up with fiendish attacks on re-ECN. | Alessandro Salvatori for coming up with fiendish attacks on re-ECN. | |||
| 14. Comments Solicited | 13. Comments Solicited | |||
| Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
| addressed to the IETF Transport Area working group's mailing list | addressed to the IETF Transport Area working group's mailing list | |||
| <tsvwg@ietf.org>, and/or to the authors. | <tsvwg@ietf.org>, and/or to the authors. | |||
| 15. References | 14. References | |||
| 15.1. Normative References | 14.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | RFCs to Indicate Requirement Levels", | |||
| BCP 14, RFC 2119, March 1997. | ||||
| [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion | [RFC2581] Allman, M., Paxson, V., and W. | |||
| Control", RFC 2581, April 1999. | Stevens, "TCP Congestion Control", | |||
| RFC 2581, April 1999. | ||||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. | |||
| of Explicit Congestion Notification (ECN) to IP", | Black, "The Addition of Explicit | |||
| Congestion Notification (ECN) to IP", | ||||
| RFC 3168, September 2001. | RFC 3168, September 2001. | |||
| [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's | [RFC3390] Allman, M., Floyd, S., and C. | |||
| Initial Window", RFC 3390, October 2002. | Partridge, "Increasing TCP's Initial | |||
| Window", RFC 3390, October 2002. | ||||
| [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram | [RFC4340] Kohler, E., Handley, M., and S. | |||
| Congestion Control Protocol (DCCP)", RFC 4340, March 2006. | Floyd, "Datagram Congestion Control | |||
| Protocol (DCCP)", RFC 4340, | ||||
| March 2006. | ||||
| [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion | [RFC4341] Floyd, S. and E. Kohler, "Profile for | |||
| Control Protocol (DCCP) Congestion Control ID 2: TCP-like | Datagram Congestion Control Protocol | |||
| Congestion Control", RFC 4341, March 2006. | (DCCP) Congestion Control ID 2: TCP- | |||
| like Congestion Control", RFC 4341, | ||||
| March 2006. | ||||
| [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for | [RFC4342] Floyd, S., Kohler, E., and J. Padhye, | |||
| Datagram Congestion Control Protocol (DCCP) Congestion | "Profile for Datagram Congestion | |||
| Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, | Control Protocol (DCCP) Congestion | |||
| Control ID 3: TCP-Friendly Rate | ||||
| Control (TFRC)", RFC 4342, | ||||
| March 2006. | March 2006. | |||
| [RFC4960] Stewart, R., "Stream Control Transmission Protocol", | [RFC4960] Stewart, R., "Stream Control | |||
| RFC 4960, September 2007. | Transmission Protocol", RFC 4960, | |||
| September 2007. | ||||
| 15.2. Informative References | 14.2. Informative References | |||
| [ARI05] Adams, J., Roberts, L., and A. IJsselmuiden, "Changing the | [ARI05] Adams, J., Roberts, L., and A. | |||
| Internet to Support Real-Time Content Supply from a Large | IJsselmuiden, "Changing the Internet | |||
| Fraction of Broadband Residential Users", BT Technology | to Support Real-Time Content Supply | |||
| from a Large Fraction of Broadband | ||||
| Residential Users", BT Technology | ||||
| Journal (BTTJ) 23(2), April 2005. | Journal (BTTJ) 23(2), April 2005. | |||
| [Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the | [ECN-tunnel] Briscoe, B., "Layered Encapsulation | |||
| assumptions underlying mechanism design for the Internet", | of Congestion Notification", | |||
| Proc. Workshop on the Economics of Networked Systems | draft-briscoe-tsvwg-ecn-tunnel-00 | |||
| (NetEcon06) , June 2006, <http://www.cs.duke.edu/nicl/ | (work in progress), June 2007. | |||
| netecon06/papers/ne06-assessing.pdf>. | ||||
| [CLoop_pol] | ||||
| Salvatori, A., "Closed Loop Traffic Policing", Politecnico | ||||
| Torino and Institut Eurecom Masters Thesis , | ||||
| September 2005. | ||||
| [ECN-Deploy] | ||||
| Floyd, S., "ECN (Explicit Congestion Notification) in | ||||
| TCP/IP; Implementation and Deployment of ECN", Web-page , | ||||
| May 2004, | ||||
| <http://www.icir.org/floyd/ecn.html#implementations>. | ||||
| [ECN-tunnel] | ||||
| Briscoe, B., "Layered Encapsulation of Congestion | ||||
| Notification", draft-briscoe-tsvwg-ecn-tunnel-00 (work in | ||||
| progress), June 2007. | ||||
| [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the | ||||
| evolution of congestion control", Automatica 35(12)1969-- | ||||
| 1985, December 1999, | ||||
| <http://www.statslab.cam.ac.uk/~frank/evol.html>. | ||||
| [I-D.ietf-tcpm-ecnsyn] | ||||
| Kuzmanovic, A., "Adding Explicit Congestion Notification | ||||
| (ECN) Capability to TCP's SYN/ACK Packets", | ||||
| draft-ietf-tcpm-ecnsyn-05 (work in progress), | ||||
| February 2008. | ||||
| [I-D.moncaster-tcpm-rcv-cheat] | ||||
| Moncaster, T., "A TCP Test to Allow Senders to Identify | ||||
| Receiver Non-Compliance", | ||||
| draft-moncaster-tcpm-rcv-cheat-02 (work in progress), | ||||
| November 2007. | ||||
| [ITU-T.I.371] | ||||
| ITU-T, "Traffic Control and Congestion Control in | ||||
| {B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004. | ||||
| [Jiang02] Jiang, H. and D. Dovrolis, "The Macroscopic Behavior of | ||||
| the TCP Congestion Avoidance Algorithm", ACM SIGCOMM | ||||
| CCR 32(3)75-88, July 2002, | ||||
| <http://doi.acm.org/10.1145/571697.571725>. | ||||
| [Mathis97] | ||||
| Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The | ||||
| Macroscopic Behavior of the TCP Congestion Avoidance | ||||
| Algorithm", ACM SIGCOMM CCR 27(3)67--82, July 1997, | ||||
| <http://doi.acm.org/10.1145/263932.264023>. | ||||
| [PCN-arch] | [I-D.ietf-tcpm-ecnsyn] Kuzmanovic, A., "Adding Explicit | |||
| Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R., | Congestion Notification (ECN) | |||
| Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion | Capability to TCP's SYN/ACK | |||
| Notification Architecture", draft-ietf-pcn-architecture-03 | Packets", draft-ietf-tcpm-ecnsyn-05 | |||
| (work in progress), February 2008. | (work in progress), February 2008. | |||
| [Purple] Pletka, R., Waldvogel, M., and S. Mannal, "PURPLE: | [I-D.moncaster-tcpm-rcv-cheat] Moncaster, T., "A TCP Test to Allow | |||
| Predictive Active Queue Management Utilizing Congestion | Senders to Identify Receiver Non- | |||
| Information", Proc. Local Computer Networks (LCN 2003) , | Compliance", | |||
| October 2003. | draft-moncaster-tcpm-rcv-cheat-02 | |||
| (work in progress), November 2007. | ||||
| [RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell, | [PCN-arch] Eardley, P., Babiarz, J., Chan, K., | |||
| M., Romanow, A., Weinrib, A., and L. Zhang, "Resource | Charny, A., Geib, R., Karagiannis, | |||
| ReSerVation Protocol (RSVP) Version 1 Applicability | G., Menth, M., and T. Tsou, "Pre- | |||
| Statement Some Guidelines on Deployment", RFC 2208, | Congestion Notification | |||
| September 1997. | Architecture", | |||
| draft-ietf-pcn-architecture-03 (work | ||||
| in progress), February 2008. | ||||
| [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | [RFC2309] Braden, B., Clark, D., Crowcroft, J., | |||
| S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | Davie, B., Deering, S., Estrin, D., | |||
| Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | Floyd, S., Jacobson, V., Minshall, | |||
| S., Wroclawski, J., and L. Zhang, "Recommendations on | G., Partridge, C., Peterson, L., | |||
| Queue Management and Congestion Avoidance in the | Ramakrishnan, K., Shenker, S., | |||
| Wroclawski, J., and L. Zhang, | ||||
| "Recommendations on Queue Management | ||||
| and Congestion Avoidance in the | ||||
| Internet", RFC 2309, April 1998. | Internet", RFC 2309, April 1998. | |||
| [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., | [RFC2475] Blake, S., Black, D., Carlson, M., | |||
| and W. Weiss, "An Architecture for Differentiated | Davies, E., Wang, Z., and W. Weiss, | |||
| "An Architecture for Differentiated | ||||
| Services", RFC 2475, December 1998. | Services", RFC 2475, December 1998. | |||
| [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission | [RFC2988] Paxson, V. and M. Allman, "Computing | |||
| Timer", RFC 2988, November 2000. | TCP's Retransmission Timer", | |||
| RFC 2988, November 2000. | ||||
| [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", | ||||
| RFC 3124, June 2001. | ||||
| [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", | [RFC3124] Balakrishnan, H. and S. Seshan, "The | |||
| RFC 3514, April 2003. | Congestion Manager", RFC 3124, | |||
| June 2001. | ||||
| [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC3514] Bellovin, S., "The Security Flag in | |||
| Congestion Notification (ECN) Signaling with Nonces", | the IPv4 Header", RFC 3514, | |||
| RFC 3540, June 2003. | April 2003. | |||
| [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | [RFC3540] Spring, N., Wetherall, D., and D. | |||
| Control for Voice Traffic in the Internet", RFC 3714, | Ely, "Robust Explicit Congestion | |||
| March 2004. | Notification (ECN) Signaling with | |||
| Nonces", RFC 3540, June 2003. | ||||
| [RFC4301] Kent, S. and K. Seo, "Security Architecture for the | [RFC4301] Kent, S. and K. Seo, "Security | |||
| Internet Protocol", RFC 4301, December 2005. | Architecture for the Internet | |||
| Protocol", RFC 4301, December 2005. | ||||
| [RFC4302] Kent, S., "IP Authentication Header", RFC 4302, | [RFC4302] Kent, S., "IP Authentication Header", | |||
| December 2005. | RFC 4302, December 2005. | |||
| [RFC4305] Eastlake, D., "Cryptographic Algorithm Implementation | [RFC4305] Eastlake, D., "Cryptographic | |||
| Requirements for Encapsulating Security Payload (ESP) and | Algorithm Implementation Requirements | |||
| Authentication Header (AH)", RFC 4305, December 2005. | for Encapsulating Security Payload | |||
| (ESP) and Authentication Header | ||||
| (AH)", RFC 4305, December 2005. | ||||
| [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion | [RFC5129] Davie, B., Briscoe, B., and J. Tay, | |||
| Marking in MPLS", RFC 5129, January 2008. | "Explicit Congestion Marking in | |||
| MPLS", RFC 5129, January 2008. | ||||
| [Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN | [Re-PCN] Briscoe, B., "Emulating Border Flow | |||
| on Bulk Data", draft-briscoe-re-pcn-border-cheat-01 (work | Policing using Re-ECN on Bulk Data", | |||
| in progress), February 2008. | draft-briscoe-re-pcn-border-cheat-01 | |||
| (work in progress), February 2008. | ||||
| [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., | [Re-fb] Briscoe, B., Jacquet, A., Di Cairano- | |||
| Salvatori, A., Soppera, A., and M. Koyabe, "Policing | Gilfedder, C., Salvatori, A., | |||
| Congestion Response in an Internetwork Using Re-Feedback", | Soppera, A., and M. Koyabe, "Policing | |||
| ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// | Congestion Response in an | |||
| www.acm.org/sigs/sigcomm/sigcomm2005/ | Internetwork Using Re-Feedback", ACM | |||
| SIGCOMM CCR 35(4)277--288, | ||||
| August 2005, <http://www.acm.org/ | ||||
| sigs/sigcomm/sigcomm2005/ | ||||
| techprog.html#session8>. | techprog.html#session8>. | |||
| [Savage99] | [Savage99] Savage, S., Cardwell, N., Wetherall, | |||
| Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, | D., and T. Anderson, "TCP congestion | |||
| "TCP congestion control with a misbehaving receiver", ACM | control with a misbehaving receiver", | |||
| SIGCOMM CCR 29(5), October 1999, | ACM SIGCOMM CCR 29(5), October 1999, | |||
| <http://citeseer.ist.psu.edu/savage99tcp.html>. | <http://citeseer.ist.psu.edu/ | |||
| savage99tcp.html>. | ||||
| [Smart_rtg] | ||||
| Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang, | ||||
| "Optimizing Cost and Performance for Multihoming", ACM | ||||
| SIGCOMM CCR 34(4)79--92, October 2004, | ||||
| <http://citeseer.ist.psu.edu/698472.html>. | ||||
| [Steps_DoS] | ||||
| Handley, M. and A. Greenhalgh, "Steps towards a DoS- | ||||
| resistant Internet Architecture", Proc. ACM SIGCOMM | ||||
| workshop on Future directions in network architecture | ||||
| (FDNA'04) pp 49--56, August 2004. | ||||
| [Tussle] Clark, D., Sollins, K., Wroclawski, J., and R. Braden, | ||||
| "Tussle in Cyberspace: Defining Tomorrow's Internet", ACM | ||||
| SIGCOMM CCR 32(4)347--356, October 2002, | ||||
| <http://www.acm.org/sigcomm/sigcomm2002/papers/ | ||||
| tussle.pdf>. | ||||
| [XCHOKe] Chhabra, P., Chuig, S., Goel, A., John, A., Kumar, A., | [Steps_DoS] Handley, M. and A. Greenhalgh, "Steps | |||
| Saran, H., and R. Shorey, "XCHOKe: Malicious Source | towards a DoS-resistant Internet | |||
| Control for Congestion Avoidance at Internet Gateways", | Architecture", Proc. ACM SIGCOMM | |||
| Proceedings of IEEE International Conference on Network | workshop on Future directions in | |||
| Protocols (ICNP-02) , November 2002, | network architecture (FDNA'04) pp | |||
| <http://www.cc.gatech.edu/~akumar/xchoke.pdf>. | 49--56, August 2004. | |||
| [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End | [re-ecn-motive] Briscoe, B., "Re-ECN: The Motivation | |||
| Congestion Control in the Internet", IEEE/ACM Transactions | for Adding Congestion Accountability | |||
| on Networking 7(4) 458--472, August 1999, | to TCP/IP", draft-briscoe-tsvwg-re- | |||
| <http://www.aciri.org/floyd/end2end-paper.html>. | ecn-tcp-motivation-00 (work in | |||
| progress), February 2009. | ||||
| Appendix A. Precise Re-ECN Protocol Operation | Appendix A. Precise Re-ECN Protocol Operation | |||
| {ToDo: fix this} | {ToDo: fix this} | |||
| The protocol operation in the middle described in Section 3.4 was an | The protocol operation in the middle described in Section 4.3 was an | |||
| approximation. In fact, standard ECN router marking combines 1% and | approximation. In fact, standard ECN router marking combines 1% and | |||
| 2% marking into slightly less than 3% whole-path marking, because | 2% marking into slightly less than 3% whole-path marking, because | |||
| routers deliberately mark CE whether or not it has already been | routers deliberately mark CE whether or not it has already been | |||
| marked by another router upstream. So the combined marking fraction | marked by another router upstream. So the combined marking fraction | |||
| would actually be 100% - (100% - 1%)(100% - 2%) = 2.98%. | would actually be 100% - (100% - 1%)(100% - 2%) = 2.98%. | |||
| To generalise this we will need some notation. | To generalise this we will need some notation. | |||
| o j represents the index of each resource (typically queues) along a | o j represents the index of each resource (typically queues) along a | |||
| path, ranging from 0 at the first router to n-1 at the last. | path, ranging from 0 at the first router to n-1 at the last. | |||
| skipping to change at page 75, line 12 | skipping to change at page 42, line 44 | |||
| p_0 = u_n | p_0 = u_n | |||
| = 1 - (1 - m_1)(1 - m_2)... | = 1 - (1 - m_1)(1 - m_2)... | |||
| Similarly, at some point j in the middle of the network, if p = 1 - | Similarly, at some point j in the middle of the network, if p = 1 - | |||
| (1 - u_j)(1 - v_j), then | (1 - u_j)(1 - v_j), then | |||
| v_j = 1 - (1 - p)/(1 - u_j) | v_j = 1 - (1 - p)/(1 - u_j) | |||
| ~= p - u_j; if u_j << 100% | ~= p - u_j; if u_j << 100% | |||
| So, between the two routers in the example in Section 3.4, congestion | So, between the two routers in the example in Section 4.3, congestion | |||
| downstream is | downstream is | |||
| v_1 = 100.00% - (100% - 2.98%) / (100% - 1.00%) | v_1 = 100.00% - (100% - 2.98%) / (100% - 1.00%) | |||
| = 2.00%, | = 2.00%, | |||
| or a useful approximation of downstream congestion is | or a useful approximation of downstream congestion is | |||
| v_1 ~= 2.98% - 1.00% | v_1 ~= 2.98% - 1.00% | |||
| ~= 1.98%. | ~= 1.98%. | |||
| Appendix B. Justification for Two Codepoints Signifying Zero Worth | Appendix B. Justification for Two Codepoints Signifying Zero Worth | |||
| Packets | Packets | |||
| It may seem a waste of a codepoint to set aside two codepoints of the | It may seem a waste of a codepoint to set aside two codepoints of the | |||
| Extended ECN field to signify zero worth (RECT and CE(0) are both | Extended ECN field to signify zero worth (RECT and CE(0) are both | |||
| worth zero). The justification is subtle, but worth recording. | worth zero). The justification is subtle, but worth recording. | |||
| skipping to change at page 76, line 49 | skipping to change at page 44, line 33 | |||
| the same as the proportion of RECT packets changed to CE(-1) and the | the same as the proportion of RECT packets changed to CE(-1) and the | |||
| proportion of Re-Echo packets changed to CE(0). Double checking | proportion of Re-Echo packets changed to CE(0). Double checking | |||
| using such redundant relationships can improve the security of a | using such redundant relationships can improve the security of a | |||
| scheme (cf. double-entry book-keeping or the ECN Nonce). | scheme (cf. double-entry book-keeping or the ECN Nonce). | |||
| Alternatively, it might be necessary to exploit the redundancy in the | Alternatively, it might be necessary to exploit the redundancy in the | |||
| future to encode an extra information channel. | future to encode an extra information channel. | |||
| Appendix C. ECN Compatibility | Appendix C. ECN Compatibility | |||
| The rationale for choosing the particular combinations of SYN and SYN | The rationale for choosing the particular combinations of SYN and SYN | |||
| ACK flags in Section 4.1.3 is as follows. | ACK flags in Section 6.1.3 is as follows. | |||
| Choice of SYN flags: A Re-ECN sender can work with RFC3168 compliant | Choice of SYN flags: A Re-ECN sender can work with RFC3168 compliant | |||
| ECN receivers so we wanted to use the same flags as would be used | ECN receivers so we wanted to use the same flags as would be used | |||
| in an ECN-setup SYN [RFC3168] (CWR=1, ECE=1). But at the same | in an ECN-setup SYN [RFC3168] (CWR=1, ECE=1). But at the same | |||
| time, we wanted a server (host B) that is Re-ECT to be able to | time, we wanted a server (host B) that is Re-ECT to be able to | |||
| recognise that the client (A) is also Re-ECT. We believe also | recognise that the client (A) is also Re-ECT. We believe also | |||
| setting NS=1 in the initial SYN achieves both these objectives, as | setting NS=1 in the initial SYN achieves both these objectives, as | |||
| it should be ignored by RFC3168 compliant ECT receivers and by | it should be ignored by RFC3168 compliant ECT receivers and by | |||
| ECT-Nonce receivers. But senders that are not Re-ECT should not | ECT-Nonce receivers. But senders that are not Re-ECT should not | |||
| set NS=1. At the time ECN was defined, the NS flag was not | set NS=1. At the time ECN was defined, the NS flag was not | |||
| skipping to change at page 80, line 5 | skipping to change at page 47, line 30 | |||
| This behaviour happens to match TCP's congestion window control in | This behaviour happens to match TCP's congestion window control in | |||
| slow start, which is why for TCP sources, only the first and third | slow start, which is why for TCP sources, only the first and third | |||
| packet need be FNE packets. | packet need be FNE packets. | |||
| A source that would open the congestion window any quicker would have | A source that would open the congestion window any quicker would have | |||
| to insert more FNE packets. As another example a UDP source sending | to insert more FNE packets. As another example a UDP source sending | |||
| VBR traffic might need to send several FNE packets ahead of the | VBR traffic might need to send several FNE packets ahead of the | |||
| traffic peaks it generates. | traffic peaks it generates. | |||
| Appendix E. Example Egress Dropper Algorithm | Appendix E. Argument for holding back the ECN nonce | |||
| {ToDo: Write up the basic algorithm with flow state, then the | ||||
| aggregated one.} | ||||
| Appendix F. Re-TTL | ||||
| This Appendix gives an overview of a proposal to be able to overload | ||||
| the TTL field in the IP header to monitor downstream propagation | ||||
| delay. This is included to show that it would be possible to take | ||||
| account of RTT if it was deemed desirable. | ||||
| Delay re-feedback can be achieved by overloading the TTL field, | ||||
| without changing IP or router TTL processing. A target value for TTL | ||||
| at the destination would need standardising, say 16. If the path hop | ||||
| count increased by more than 16 during a routing change, it would | ||||
| temporarily be mistaken for a routing loop, so this target would need | ||||
| to be chosen to exceed typical hop count increases. The TCP wire | ||||
| protocol and handlers would need modifying to feed back the | ||||
| destination TTL and initialise it. It would be necessary to | ||||
| standardise the unit of TTL in terms of real time (as was the | ||||
| original intent in the early days of the Internet). | ||||
| In the longer term, precision could be improved if routers | ||||
| decremented TTL to represent exact propagation delay to the next | ||||
| router. That is, for a router to decrement TTL by, say, 1.8 time | ||||
| units it would alternate the decrement of every packet between 1 & 2 | ||||
| at a ratio of 1:4. Although this might sometimes require a seemingly | ||||
| dangerous null decrement, a packet in a loop would still decrement to | ||||
| zero after 255 time units on average. As more routers were upgraded | ||||
| to this more accurate TTL decrement, path delay estimates would | ||||
| become increasingly accurate despite the presence of some RFC3168 | ||||
| compliant routers that continued to always decrement the TTL by 1. | ||||
| Appendix G. Policer Designs to ensure Congestion Responsiveness | ||||
| G.1. Per-user Policing | ||||
| User policing requires a policer on the ingress interface of the | ||||
| access router associated with the user. At that point, the traffic | ||||
| of the user hasn't diverged on different routes yet; nor has it mixed | ||||
| with traffic from other sources. | ||||
| In order to ensure that a user doesn't generate more congestion in | ||||
| the network than her due share, a modified bulk token-bucket is | ||||
| maintained with the following parameter: | ||||
| o b_0 the initial token level | ||||
| o r the filling rate | ||||
| o b_max the bucket depth | ||||
| The same token bucket algorithm is used as in many areas of | ||||
| networking, but how it is used is very different: | ||||
| o all traffic from a user over the lifetime of their subscription is | ||||
| policed in the same token bucket. | ||||
| o only positive and canceled packets (Re-Echo, FNE and CE(0)) | ||||
| consume tokens | ||||
| Such a policer will allow network operators to throttle the | ||||
| contribution of their users to network congestion. This will require | ||||
| the appropriate contractual terms to be in place between operators | ||||
| and users. For instance: a condition for a user to subscribe to a | ||||
| given network service may be that she should not cause more than a | ||||
| volume C_user of congestion over a reference period T_user, although | ||||
| she may carry forward up to N_user times her allowance at the end of | ||||
| each period. These terms directly set the parameter of the user | ||||
| policer: | ||||
| o b_0 = C_user | ||||
| o r = C_user/T_user | ||||
| o b_max = b_0 * (N_user +1) | ||||
| Besides the congestion budget policer above, another user policer may | ||||
| be necessary to further rate-limit FNE packets, if they are to be | ||||
| marked rather than dropped (see discussion in Section 5.3.). Rate- | ||||
| limiting FNE packets will prevent high bursts of new flow arrivals, | ||||
| which is a very useful feature in DoS prevention. A condition to | ||||
| subscribe to a given network service would have to be that a user | ||||
| should not generate more than C_FNE FNE packets, over a reference | ||||
| period T_FNE, with no option to carry forward any of the allowance at | ||||
| the end of each period. These terms directly set the parameters of | ||||
| the FNE policer: | ||||
| o b_0 = C_FNE | ||||
| o r = C_FNE/T_FNE | ||||
| o b_max = b_0 | ||||
| T_FNE should be a much shorter period than T_user: for instance T_FNE | ||||
| could be in the order of minutes while T_user could be in order of | ||||
| weeks. | ||||
| G.2. Per-flow Rate Policing | ||||
| Whilst we believe that simple per-user policing would be sufficient | ||||
| to ensure senders comply with congestion control, some operators may | ||||
| wish to police the rate response of each flow to congestion as well. | ||||
| Although we do not believe this will be neceesary, we include this | ||||
| section to show how one could perform per-flow policing using | ||||
| enforcement of TCP-fairness as an example. Per-flow policing aims to | ||||
| enforce congestion responsiveness on the shortest information | ||||
| timescale on a network path: packet roundtrips. | ||||
| This again requires that the appropriate terms be agreed between a | ||||
| network operator and its users, where a congestion responsiveness | ||||
| policy might be required for the use of a given network service | ||||
| (perhaps unless the user specifically requests otherwise). | ||||
| As an example, we describe below how a rate adaptation policer can be | ||||
| designed when the applicable rate adaptation policy is TCP- | ||||
| compliance. In that context, the average throughput of a flow will | ||||
| be expected to be bounded by the value of the TCP throughput during | ||||
| congestion avoidance, given in Mathis' formula [Mathis97] | ||||
| x_TCP = k * s / ( T * sqrt(m) ) | ||||
| where: | ||||
| o x_TCP is the throughput of the TCP flow in packets per second, | ||||
| o k is a constant upper-bounded by sqrt(3/2), | ||||
| o s is the average packet size of the flow, | ||||
| o T is the roundtrip time of the flow, | ||||
| o m is the congestion level experienced by the flow. | ||||
| We define the marking period N=1/m which represents the average | ||||
| number of packets between two positive or canceled packets. Mathis' | ||||
| formula can be re-written as: | ||||
| x_TCP = k*s*sqrt(N)/T | ||||
| We can then get the average inter-mark time in a compliant TCP flow, | ||||
| dt_TCP, by solving (x_TCP/s)*dt_TCP = N which gives | ||||
| dt_TCP = sqrt(N)*T/k | ||||
| We rely on this equation for the design of a rate-adaptation policer | ||||
| as a variation of a token bucket. In that case a policer has to be | ||||
| set up for each policed flow. This may be triggered by FNE packets, | ||||
| with the remainder of flows being all rate limited together if they | ||||
| do not start with an FNE packet. | ||||
| Where maintaining per flow state is not a problem, for instance on | ||||
| some access routers, systematic per-flow policing may be considered. | ||||
| Should per-flow state be more constrained, rate adaptation policing | ||||
| could be limited to a random sample of flows exhibiting positive or | ||||
| canceled packets. | ||||
| As in the case of user policing, only positive or canceled packets | ||||
| will consume tokens, however the amount of tokens consumed will | ||||
| depend on the congestion signal. | ||||
| When a new rate adaptation policer is set up for flow j, the | ||||
| following state is created: | ||||
| o a token bucket b_j of depth b_max starting at level b_0 | ||||
| o a timestamp t_j = timenow() | ||||
| o a counter N_j = 0 | ||||
| o a roundtrip estimate T_j | ||||
| o a filling rate r | ||||
| When the policing node forwards a packet of flow j with no Re-Echo: | ||||
| o . the counter is incremented: N_j += 1 | ||||
| When the policing node forwards a packet of flow j carrying a | ||||
| congestion mark (CE): | ||||
| o the counter is incremented: N_j += 1 | ||||
| o the token level is adjusted: b_j += r*(timenow()-t_j) - sqrt(N_j)* | ||||
| T_j/k | ||||
| o the counter is reset: N_j = 0 | ||||
| o the timer is reset: t_j = timenow() | ||||
| An implementation example will be given in a later draft that avoids | ||||
| having to extract the square root. | ||||
| Analysis: For a TCP flow, for r= 1 token/sec, on average, | ||||
| r*(timenow()-t_j)-sqrt(N_j)* T_j/k = dt_TCP - sqrt(N)*T/k = 0 | ||||
| This means that the token level will fluctuate around its initial | ||||
| level. The depth b_max of the bucket sets the timescale on which the | ||||
| rate adaptation policy is performed while the filling rate r sets the | ||||
| trade-off between responsiveness and robustness: | ||||
| o the higher b_max, the longer it will take to catch greedy flows | ||||
| o the higher r, the fewer false positives (greedy verdict on | ||||
| compliant flows) but the more false negatives (compliant verdict | ||||
| on greedy flows) | ||||
| This rate adaptation policer requires the availability of a roundtrip | ||||
| estimate which may be obtained for instance from the application of | ||||
| re-feedback to the downstream delay Appendix F or passive estimation | ||||
| [Jiang02]. | ||||
| When the bucket of a policer located at the access router (whether it | ||||
| is a per-user policer or a per-flow policer) becomes empty, the | ||||
| access router SHOULD drop at least all packets causing the token | ||||
| level to become negative. The network operator MAY take further | ||||
| sanctions if the token level of the per-flow policers associated with | ||||
| a user becomes negative. | ||||
| Appendix H. Downstream Congestion Metering Algorithms | ||||
| H.1. Bulk Downstream Congestion Metering Algorithm | ||||
| To meter the bulk amount of downstream congestion in traffic crossing | ||||
| an inter-domain border an algorithm is needed that accumulates the | ||||
| size of positive packets and subtracts the size of negative packets. | ||||
| We maintain two counters: | ||||
| V_b: accumulated congestion volume | ||||
| B: total data volume (in case it is needed) | ||||
| A suitable pseudo-code algorithm for a border router is as follows: | ||||
| ==================================================================== | ||||
| V_b = 0 | ||||
| B = 0 | ||||
| for each Re-ECN-capable packet { | ||||
| b = readLength(packet) /* set b to packet size */ | ||||
| B += b /* accumulate total volume */ | ||||
| if readEECN(packet) == (Re-Echo || FNE) { | ||||
| V_b += b /* increment... */ | ||||
| } elseif readEECN(packet) == CE(-1) { | ||||
| V_b -= b /* ...or decrement V_b... */ | ||||
| } /*...depending on EECN field */ | ||||
| } | ||||
| ==================================================================== | ||||
| At the end of an accounting period this counter V_b represents the | ||||
| congestion volume that penalties could be applied to, as described in | ||||
| Section 6.1.6. | ||||
| For instance, accumulated volume of congestion through a border | ||||
| interface over a month might be V_b = 5PB (petabyte = 10^15 byte). | ||||
| This might have resulted from an average downstream congestion level | ||||
| of 1% on an accumulated total data volume of B = 500PB. | ||||
| H.2. Inflation Factor for Persistently Negative Flows | ||||
| The following process is suggested to complement the simple algorithm | ||||
| above in order to protect against the various attacks from | ||||
| persistently negative flows described in Section 6.1.6. As explained | ||||
| in that section, the most important and first step is to estimate the | ||||
| contribution of persistently negative flows to the bulk volume of | ||||
| downstream pre-congestion and to inflate this bulk volume as if these | ||||
| flows weren't there. The process below has been designed to give an | ||||
| unbiased estimate, but it may be possible to define other processes | ||||
| that achieve similar ends. | ||||
| While the above simple metering algorithm is counting the bulk of | ||||
| traffic over an accounting period, the meter should also select a | ||||
| subset of the whole flow ID space that is small enough to be able to | ||||
| realistically measure but large enough to give a realistic sample. | ||||
| Many different samples of different subsets of the ID space should be | ||||
| taken at different times during the accounting period, preferably | ||||
| covering the whole ID space. During each sample, the meter should | ||||
| count the volume of positive packets and subtract the volume of | ||||
| negative, maintaining a separate account for each flow in the sample. | ||||
| It should run a lot longer than the large majority of flows, to avoid | ||||
| a bias from missing the starts and ends of flows, which tend to be | ||||
| positive and negative respectively. | ||||
| Once the accounting period finishes, the meter should calculate the | ||||
| total of the accounts V_{bI} for the subset of flows I in the sample, | ||||
| and the total of the accounts V_{fI} excluding flows with a negative | ||||
| account from the subset I. Then the weighted mean of all these | ||||
| samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I} | ||||
| V_{bI}. | ||||
| If V_b is the result of the bulk accounting algorithm over the | ||||
| accounting period (Appendix H.1) it can be inflated by this factor | ||||
| a_S to get a good unbiased estimate of the volume of downstream | ||||
| congestion over the accounting period a_S.V_b, without being polluted | ||||
| by the effect of persistently negative flows. | ||||
| Appendix I. Argument for holding back the ECN nonce | ||||
| The ECN nonce is a mechanism that allows a /sending/ transport to | The ECN nonce is a mechanism that allows a /sending/ transport to | |||
| detect if drop or ECN marking at a congested router has been | detect if drop or ECN marking at a congested router has been | |||
| suppressed by a node somewhere in the feedback loop---another router | suppressed by a node somewhere in the feedback loop---another router | |||
| or the receiver. | or the receiver. | |||
| Space for the ECN nonce was set aside in [RFC3168] (currently | Space for the ECN nonce was set aside in [RFC3168] (currently | |||
| proposed standard) while the full nonce mechanism is specified in | proposed standard) while the full nonce mechanism is specified in | |||
| [RFC3540] (currently experimental). The specifications for [RFC4340] | [RFC3540] (currently experimental). The specifications for [RFC4340] | |||
| (currently proposed standard) requires that "Each DCCP sender SHOULD | (currently proposed standard) requires that "Each DCCP sender SHOULD | |||
| skipping to change at page 88, line 16 | skipping to change at page 49, line 29 | |||
| because Re-ECN marking fractions at inter-domain borders would be | because Re-ECN marking fractions at inter-domain borders would be | |||
| polluted by unknown levels of nonce traffic. | polluted by unknown levels of nonce traffic. | |||
| The authors are aware that Re-ECN must prove it has the potential it | The authors are aware that Re-ECN must prove it has the potential it | |||
| claims if it is to displace the nonce. Therefore, every effort has | claims if it is to displace the nonce. Therefore, every effort has | |||
| been made to complete a comprehensive specification of Re-ECN so that | been made to complete a comprehensive specification of Re-ECN so that | |||
| its potential can be assessed. We therefore seek the opinion of the | its potential can be assessed. We therefore seek the opinion of the | |||
| Internet community on whether the Re-ECN protocol is sufficiently | Internet community on whether the Re-ECN protocol is sufficiently | |||
| useful to warrant standards action. | useful to warrant standards action. | |||
| Appendix F. Alternative Terminology Used in Other Documents | ||||
| A number of alternative terms have been used in various documents | ||||
| describign re-feedback and re-ECN. These are set out in the | ||||
| following table | ||||
| +-------------------+---------------+-------------------------------+ | ||||
| | Current | EECN | Colour | | ||||
| | Terminology | codepoint | | | ||||
| +-------------------+---------------+-------------------------------+ | ||||
| | Cautious | FNE | Green | | ||||
| | Positive | Re-Echo | Black | | ||||
| | Neutral | RECT | Grey | | ||||
| | Negative | CE(-1) | Red | | ||||
| | Cancelled | CE(0) | Red-Black | | ||||
| | Legacy ECN | ECT(0) | White | | ||||
| | Currently Unused | --CU-- | Currently unused | | ||||
| | | | | | ||||
| | Legacy | Not-ECT | White | | ||||
| +-------------------+---------------+-------------------------------+ | ||||
| Table 7: Alternative re-ECN Terminology | ||||
| Authors' Addresses | Authors' Addresses | |||
| Bob Briscoe | Bob Briscoe | |||
| BT & UCL | BT & UCL | |||
| B54/77, Adastral Park | B54/77, Adastral Park | |||
| Martlesham Heath | Martlesham Heath | |||
| Ipswich IP5 3RE | Ipswich IP5 3RE | |||
| UK | UK | |||
| Phone: +44 1473 645196 | Phone: +44 1473 645196 | |||
| Email: bob.briscoe@bt.com | EMail: bob.briscoe@bt.com | |||
| URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/ | URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/ | |||
| Arnaud Jacquet | Arnaud Jacquet | |||
| BT | BT | |||
| B54/70, Adastral Park | B54/70, Adastral Park | |||
| Martlesham Heath | Martlesham Heath | |||
| Ipswich IP5 3RE | Ipswich IP5 3RE | |||
| UK | UK | |||
| Phone: +44 1473 647284 | Phone: +44 1473 647284 | |||
| Email: arnaud.jacquet@bt.com | EMail: arnaud.jacquet@bt.com | |||
| URI: | URI: | |||
| Toby Moncaster | Toby Moncaster | |||
| BT | BT | |||
| B54/70, Adastral Park | B54/70, Adastral Park | |||
| Martlesham Heath | Martlesham Heath | |||
| Ipswich IP5 3RE | Ipswich IP5 3RE | |||
| UK | UK | |||
| Phone: +44 1473 648734 | Phone: +44 1473 648734 | |||
| Email: toby.moncaster@bt.com | EMail: toby.moncaster@bt.com | |||
| Alan Smith | Alan Smith | |||
| BT | BT | |||
| B54/76, Adastral Park | B54/76, Adastral Park | |||
| Martlesham Heath | Martlesham Heath | |||
| Ipswich IP5 3RE | Ipswich IP5 3RE | |||
| UK | UK | |||
| Phone: +44 1473 640404 | Phone: +44 1473 640404 | |||
| Email: alan.p.smith@bt.com | EMail: alan.p.smith@bt.com | |||
| Full Copyright Statement | Full Copyright Statement | |||
| Copyright (C) The IETF Trust (2008). | Copyright (C) The IETF Trust (2009). | |||
| This document is subject to the rights, licenses and restrictions | This document is subject to the rights, licenses and restrictions | |||
| contained in BCP 78, and except as set forth therein, the authors | contained in BCP 78, and except as set forth therein, the authors | |||
| retain all their rights. | retain all their rights. | |||
| This document and the information contained herein are provided on an | This document and the information contained herein are provided on an | |||
| "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | |||
| OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND | OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND | |||
| THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS | THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS | |||
| OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF | OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF | |||
| skipping to change at page 90, line 45 | skipping to change at page 52, line 45 | |||
| such proprietary rights by implementers or users of this | such proprietary rights by implementers or users of this | |||
| specification can be obtained from the IETF on-line IPR repository at | specification can be obtained from the IETF on-line IPR repository at | |||
| http://www.ietf.org/ipr. | http://www.ietf.org/ipr. | |||
| The IETF invites any interested party to bring to its attention any | The IETF invites any interested party to bring to its attention any | |||
| copyrights, patents or patent applications, or other proprietary | copyrights, patents or patent applications, or other proprietary | |||
| rights that may cover technology that may be required to implement | rights that may cover technology that may be required to implement | |||
| this standard. Please address the information to the IETF at | this standard. Please address the information to the IETF at | |||
| ietf-ipr@ietf.org. | ietf-ipr@ietf.org. | |||
| Acknowledgments | Acknowledgements | |||
| Funding for the RFC Editor function is provided by the IETF | Funding for the RFC Editor function is provided by the IETF | |||
| Administrative Support Activity (IASA). This document was produced | Administrative Support Activity (IASA). This document was produced | |||
| using xml2rfc v1.32 (of http://xml.resource.org/) from a source in | using xml2rfc v1.32 (of http://xml.resource.org/) from a source in | |||
| RFC-2629 XML format. | RFC-2629 XML format. | |||
| End of changes. 137 change blocks. | ||||
| 2640 lines changed or deleted | 870 lines changed or added | |||
This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||