| draft-briscoe-tsvwg-re-ecn-tcp-05.txt | | draft-briscoe-tsvwg-re-ecn-tcp-06.txt | |
| | | | |
| Transport Area Working Group B. Briscoe | | Transport Area Working Group B. Briscoe | |
| Internet-Draft BT & UCL | | Internet-Draft BT & UCL | |
| Intended status: Standards Track A. Jacquet | | Intended status: Standards Track A. Jacquet | |
|
| Expires: July 13, 2008 T. Moncaster | | Expires: January 15, 2009 T. Moncaster | |
| A. Smith | | A. Smith | |
| BT | | BT | |
|
| January 10, 2008 | | July 14, 2008 | |
| | | | |
| Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | | Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | |
|
| draft-briscoe-tsvwg-re-ecn-tcp-05 | | draft-briscoe-tsvwg-re-ecn-tcp-06 | |
| | | | |
| Status of this Memo | | Status of this Memo | |
| | | | |
| By submitting this Internet-Draft, each author represents that any | | By submitting this Internet-Draft, each author represents that any | |
| applicable patent or other IPR claims of which he or she is aware | | applicable patent or other IPR claims of which he or she is aware | |
| have been or will be disclosed, and any of which he or she becomes | | have been or will be disclosed, and any of which he or she becomes | |
| aware will be disclosed, in accordance with Section 6 of BCP 79. | | aware will be disclosed, in accordance with Section 6 of BCP 79. | |
| | | | |
| Internet-Drafts are working documents of the Internet Engineering | | Internet-Drafts are working documents of the Internet Engineering | |
| Task Force (IETF), its areas, and its working groups. Note that | | Task Force (IETF), its areas, and its working groups. Note that | |
| | | | |
| skipping to change at page 1, line 37 | | skipping to change at page 1, line 37 | |
| and may be updated, replaced, or obsoleted by other documents at any | | and may be updated, replaced, or obsoleted by other documents at any | |
| time. It is inappropriate to use Internet-Drafts as reference | | time. It is inappropriate to use Internet-Drafts as reference | |
| material or to cite them other than as "work in progress." | | material or to cite them other than as "work in progress." | |
| | | | |
| The list of current Internet-Drafts can be accessed at | | The list of current Internet-Drafts can be accessed at | |
| http://www.ietf.org/ietf/1id-abstracts.txt. | | http://www.ietf.org/ietf/1id-abstracts.txt. | |
| | | | |
| The list of Internet-Draft Shadow Directories can be accessed at | | The list of Internet-Draft Shadow Directories can be accessed at | |
| http://www.ietf.org/shadow.html. | | http://www.ietf.org/shadow.html. | |
| | | | |
|
| This Internet-Draft will expire on July 13, 2008. | | This Internet-Draft will expire on January 15, 2009. | |
| | | | |
| Copyright Notice | | Copyright Notice | |
| | | | |
| Copyright (C) The IETF Trust (2008). | | Copyright (C) The IETF Trust (2008). | |
| | | | |
| Abstract | | Abstract | |
| | | | |
| This document introduces a new protocol for explicit congestion | | This document introduces a new protocol for explicit congestion | |
| notification (ECN), termed re-ECN, which can be deployed | | notification (ECN), termed re-ECN, which can be deployed | |
|
| incrementally around unmodified routers. The protocol arranges an | | incrementally around unmodified routers. It enbales the the upstream | |
| extended ECN field in each packet so that, as it crosses any | | party at any trust boundary in the internetwork to be held | |
| interface in an internetwork, it will carry a truthful prediction of | | responsible for the congestion they cause, or allow to be caused. | |
| congestion on the remainder of its path. Then the upstream party at | | | |
| any trust boundary in the internetwork can be held responsible for | | So, networks can introduce straightforward accountability for | |
| the congestion they cause, or allow to be caused. So, networks can | | congestion and policing mechanisms for incoming traffic from end- | |
| introduce straightforward accountability and policing mechanisms for | | customers or from neighbouring network domains. The protocol works | |
| incoming traffic from end-customers or from neighbouring network | | by arranging an extended ECN field in each packet so that, as it | |
| domains. The purpose of this document is to specify the re-ECN | | crosses any interface in an internetwork, it will carry a truthful | |
| protocol at the IP layer and to give guidelines on any consequent | | prediction of congestion on the remainder of its path. The purpose | |
| changes required to transport protocols. It includes the changes | | of this document is to specify the re-ECN protocol at the IP layer | |
| required to TCP both as an example and as a specification. It also | | and to give guidelines on any consequent changes required to | |
| gives examples of mechanisms that can use the protocol to ensure data | | transport protocols. It includes the changes required to TCP both as | |
| sources respond correctly to congestion. And it describes example | | an example and as a specification. It also gives examples of | |
| mechanisms that ensure the dominant selfish strategy of both network | | mechanisms that can use the protocol to ensure data sources respond | |
| domains and end-points will be to set the extended ECN field | | correctly to congestion. And it describes example mechanisms that | |
| honestly. | | ensure the dominant selfish strategy of both network domains and end- | |
| | | points will be to set the extended ECN field honestly. | |
| | | | |
| Authors' Statement: Status (to be removed by the RFC Editor) | | Authors' Statement: Status (to be removed by the RFC Editor) | |
| | | | |
| Although the re-ECN protocol is intended to make a simple but far- | | Although the re-ECN protocol is intended to make a simple but far- | |
| reaching change to the Internet architecture, the most immediate | | reaching change to the Internet architecture, the most immediate | |
| priority for the authors is to delay any move of the ECN nonce to | | priority for the authors is to delay any move of the ECN nonce to | |
| Proposed Standard status. The argument for this position is | | Proposed Standard status. The argument for this position is | |
| developed in Appendix I. | | developed in Appendix I. | |
| | | | |
| Changes from previous drafts (to be removed by the RFC Editor) | | Changes from previous drafts (to be removed by the RFC Editor) | |
| | | | |
| Full diffs created using the rfcdiff tool are available at | | Full diffs created using the rfcdiff tool are available at | |
| <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp> | | <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp> | |
| | | | |
|
| From -04 to -05 (current version): | | From -05 to -06 (current version): | |
| | | | |
| | | Clarifications made to Section 1 and Section 3. | |
| | | | |
| | | Minor editorial changes throughout. | |
| | | | |
| | | From -04 to -05: | |
| | | | |
| Completed justification for packet marking with FNE during slow- | | Completed justification for packet marking with FNE during slow- | |
| start(Appendix D). | | start(Appendix D). | |
| | | | |
| Minor editorial changes throughout. | | Minor editorial changes throughout. | |
| | | | |
| From -03 to -04: | | From -03 to -04: | |
| | | | |
|
| Clarified reasons for holding back ECN nonce (Section 3.2 & | | Clarified reasons for holding back ECN nonce (Section 3.3 & | |
| Appendix I). | | Appendix I). | |
| | | | |
|
| Clarified Figure 1. | | Clarified Figure 2. | |
| | | | |
| Added Section 4.1.1.1 on equivalence of drops and ECN marks. | | Added Section 4.1.1.1 on equivalence of drops and ECN marks. | |
| | | | |
| Improved precision of Section 5.6 on IP in IP tunnels. | | Improved precision of Section 5.6 on IP in IP tunnels. | |
| | | | |
| Explained the RTT fairness is possible to enforce, but unlikely to | | Explained the RTT fairness is possible to enforce, but unlikely to | |
| be required (Section 6.1.3 & Appendix F). | | be required (Section 6.1.3 & Appendix F). | |
| | | | |
| Explained that bulk per-user policing should be adequate but per- | | Explained that bulk per-user policing should be adequate but per- | |
| flow policing is also possible if desired, though it is not likely | | flow policing is also possible if desired, though it is not likely | |
| | | | |
| skipping to change at page 3, line 27 | | skipping to change at page 3, line 33 | |
| From -02 to -03: | | From -02 to -03: | |
| | | | |
| Started guidelines for re-ECN support in DCCP and SCTP. | | Started guidelines for re-ECN support in DCCP and SCTP. | |
| | | | |
| Added annex on limitations of nonce mechanism. | | Added annex on limitations of nonce mechanism. | |
| | | | |
| Minor editorial changes throughout. | | Minor editorial changes throughout. | |
| | | | |
| From -01 to -02: | | From -01 to -02: | |
| | | | |
|
| Explanation on informal terminology in Section 3.4 clarified. | | Explanation on informal terminology in Section 3.5 clarified. | |
| | | | |
| IPv6 wire protocol encoding added (Section 5.2). | | IPv6 wire protocol encoding added (Section 5.2). | |
| | | | |
| Text on (non-)issues with tunnels, encryption and link layer | | Text on (non-)issues with tunnels, encryption and link layer | |
| congestion notification added (Section 5.6 & Section 5.7). | | congestion notification added (Section 5.6 & Section 5.7). | |
| | | | |
| Section added giving evolvability arguments against encouraging | | Section added giving evolvability arguments against encouraging | |
| bottleneck policing (Section 6.1.2). And text on re-ECN's | | bottleneck policing (Section 6.1.2). And text on re-ECN's | |
| evolvability by design added to Section 6.1.3 | | evolvability by design added to Section 6.1.3 | |
| | | | |
| | | | |
| skipping to change at page 4, line 8 | | skipping to change at page 4, line 11 | |
| | | | |
| Encoding of re-ECN wire protocol changed for reasons given in | | Encoding of re-ECN wire protocol changed for reasons given in | |
| Appendix B and consequently draft substantially re-written. | | Appendix B and consequently draft substantially re-written. | |
| | | | |
| Substantial text added in sections on applications, incremental | | Substantial text added in sections on applications, incremental | |
| deployment, architectural rationale and security considerations. | | deployment, architectural rationale and security considerations. | |
| | | | |
| Table of Contents | | Table of Contents | |
| | | | |
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 | | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |
|
| 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7 | | 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 8 | |
| 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 | | 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 | |
| 3.1. Background and Applicability . . . . . . . . . . . . . . . 8 | | 3.1. Background and Applicability . . . . . . . . . . . . . . . 8 | |
|
| 3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | | 3.2. Simplified Re-ECN Protocol . . . . . . . . . . . . . . . . 10 | |
| v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | | 3.3. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | |
| 3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 11 | | v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 | |
| 3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 13 | | 3.4. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 12 | |
| | | 3.5. Informal Terminology . . . . . . . . . . . . . . . . . . . 14 | |
| 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15 | | 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15 | |
|
| 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | | 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 | |
| 4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16 | | 4.1.1. RECN mode: Full Re-ECN capable transport . . . . . . . 17 | |
| 4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or | | 4.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168 | |
| Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 20 | | compliant ECN Receiver . . . . . . . . . . . . . . . . 20 | |
| 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 21 | | 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 21 | |
| 4.1.4. Extended ECN (EECN) Field Settings during Flow | | 4.1.4. Extended ECN (EECN) Field Settings during Flow | |
| Start or after Idle Periods . . . . . . . . . . . . . 23 | | Start or after Idle Periods . . . . . . . . . . . . . 23 | |
| 4.1.5. Pure ACKS, Retransmissions, Window Probes and | | 4.1.5. Pure ACKS, Retransmissions, Window Probes and | |
|
| Partial ACKs . . . . . . . . . . . . . . . . . . . . . 26 | | Partial ACKs . . . . . . . . . . . . . . . . . . . . . 27 | |
| 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 27 | | 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 27 | |
| 4.2.1. General Guidelines for Adding Re-ECN to Other | | 4.2.1. General Guidelines for Adding Re-ECN to Other | |
| Transports . . . . . . . . . . . . . . . . . . . . . . 27 | | Transports . . . . . . . . . . . . . . . . . . . . . . 27 | |
| 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 28 | | 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 28 | |
| 4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 28 | | 4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 28 | |
|
| 4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 28 | | 4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 29 | |
| 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 28 | | 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 29 | |
| 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 28 | | 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 29 | |
| 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 30 | | 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 30 | |
| 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 31 | | 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 31 | |
|
| 5.4. Justification for Setting the First SYN to FNE . . . . . . 32 | | 5.4. Justification for Setting the First SYN to FNE . . . . . . 33 | |
| 5.5. Control and Management . . . . . . . . . . . . . . . . . . 33 | | 5.5. Control and Management . . . . . . . . . . . . . . . . . . 34 | |
| 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 33 | | 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 34 | |
| 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 34 | | 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 35 | |
| 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 34 | | 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 35 | |
| 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 35 | | 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 36 | |
| 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 36 | | 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 37 | |
| 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 36 | | 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 37 | |
| 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 36 | | 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 37 | |
| 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 37 | | 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 38 | |
| 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 38 | | 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 39 | |
| 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 45 | | 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 46 | |
| 6.1.5. Policing . . . . . . . . . . . . . . . . . . . . . . . 47 | | 6.1.5. Policing . . . . . . . . . . . . . . . . . . . . . . . 47 | |
|
| 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 48 | | 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 49 | |
| 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 52 | | 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 52 | |
| 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 53 | | 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 53 | |
| 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 53 | | 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 53 | |
| 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 53 | | 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 53 | |
| 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 54 | | 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 54 | |
|
| 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 54 | | 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 55 | |
| 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 54 | | 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 55 | |
| 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 54 | | 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 55 | |
| 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 55 | | 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 56 | |
| 7.1. Incremental Deployment Features . . . . . . . . . . . . . 55 | | 7.1. Incremental Deployment Features . . . . . . . . . . . . . 56 | |
| 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 57 | | 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 57 | |
|
| 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 61 | | 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 62 | |
| 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 64 | | 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 65 | |
| 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 64 | | 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 65 | |
| 9.2. Congestion Notification Integrity . . . . . . . . . . . . 65 | | 9.2. Congestion Notification Integrity . . . . . . . . . . . . 66 | |
| 9.3. Identifying Upstream and Downstream Congestion . . . . . . 66 | | 9.3. Identifying Upstream and Downstream Congestion . . . . . . 67 | |
| 10. Security Considerations . . . . . . . . . . . . . . . . . . . 66 | | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 67 | |
| 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 68 | | 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 68 | |
|
| 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 68 | | 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 69 | |
| 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 68 | | 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 69 | |
| 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 69 | | 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 69 | |
|
| 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 69 | | 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 70 | |
| 15.1. Normative References . . . . . . . . . . . . . . . . . . . 69 | | 15.1. Normative References . . . . . . . . . . . . . . . . . . . 70 | |
| 15.2. Informative References . . . . . . . . . . . . . . . . . . 70 | | 15.2. Informative References . . . . . . . . . . . . . . . . . . 70 | |
|
| Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 73 | | Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 74 | |
| Appendix B. Justification for Two Codepoints Signifying Zero | | Appendix B. Justification for Two Codepoints Signifying Zero | |
|
| Worth Packets . . . . . . . . . . . . . . . . . . . . 74 | | Worth Packets . . . . . . . . . . . . . . . . . . . . 75 | |
| Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 76 | | Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 76 | |
|
| Appendix D. Packet Marking with FNE During Flow Start . . . . . . 77 | | Appendix D. Packet Marking with FNE During Flow Start . . . . . . 78 | |
| Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 79 | | Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 80 | |
| Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 79 | | Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 80 | |
| Appendix G. Policer Designs to ensure Congestion | | Appendix G. Policer Designs to ensure Congestion | |
| Responsiveness . . . . . . . . . . . . . . . . . . . 80 | | Responsiveness . . . . . . . . . . . . . . . . . . . 80 | |
| G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 80 | | G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 80 | |
|
| G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 81 | | G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 82 | |
| Appendix H. Downstream Congestion Metering Algorithms . . . . . . 84 | | Appendix H. Downstream Congestion Metering Algorithms . . . . . . 84 | |
| H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 84 | | H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 84 | |
| H.2. Inflation Factor for Persistently Negative Flows . . . . . 85 | | H.2. Inflation Factor for Persistently Negative Flows . . . . . 85 | |
|
| Appendix I. Argument for holding back the ECN nonce . . . . . . . 85 | | Appendix I. Argument for holding back the ECN nonce . . . . . . . 86 | |
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 87 | | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 88 | |
| Intellectual Property and Copyright Statements . . . . . . . . . . 89 | | Intellectual Property and Copyright Statements . . . . . . . . . . 90 | |
| | | | |
| 1. Introduction | | 1. Introduction | |
| | | | |
| This document aims: | | This document aims: | |
| | | | |
| o To provide a complete specification of the addition of the re-ECN | | o To provide a complete specification of the addition of the re-ECN | |
| protocol to IP and guidelines on how to add it to transport layer | | protocol to IP and guidelines on how to add it to transport layer | |
| protocols, including a complete specification of re-ECN in TCP as | | protocols, including a complete specification of re-ECN in TCP as | |
| an example; | | an example; | |
| | | | |
| o To show how a number of hard problems become much easier to solve | | o To show how a number of hard problems become much easier to solve | |
| once re-ECN is available in IP. | | once re-ECN is available in IP. | |
| | | | |
|
| | | In ECN [RFC3168] congested queues probabilistically mark packets as | |
| | | they approach a congested state. The receiver informs the sender | |
| | | that they have seen one or more marks. In re-ECN the sender must | |
| | | predict the level of congestion on the path by re-inserting feedback | |
| | | according to the marking scheme described later in this draft. This | |
| | | results in packets that carry a prediction of downstream congestion. | |
| | | | |
| | | If a sender understates expected congestion compared to actual | |
| | | congestion then the network could discard packets or enact some other | |
| | | sanction. A policer can also be introduced at the ingress of | |
| | | networks that can limit the congestion caused (or base penalties on | |
| | | it). | |
| | | | |
| | | It is important to add a few key points. | |
| | | | |
| | | o It can be seen that it takes one round trip before any feedback is | |
| | | received. For this reason a sender must make a conservative | |
| | | prediction by transmitting IP packets with a special Feedback Not | |
| | | Established (FNE) marking. | |
| | | | |
| | | o It should be noted that the prediction is carried in-band in | |
| | | normal data packets and for many transports feedback can be | |
| | | carried in the normal acknowledgements or control packets. | |
| | | | |
| | | o The re-ECN protocol is independent of the transport. In TCP, | |
| | | acknowledgments are used to convey the feedback from receiver to | |
| | | sender. This memo concentrates on TCP as an example transport | |
| | | protocol, however the re-ECN protocol is compatible with any | |
| | | transport where feedback can be sent from receiver to sender. | |
| | | | |
| A general statement of the problem solved by re-ECN is to provide | | A general statement of the problem solved by re-ECN is to provide | |
| sufficient information in each IP datagram to be able to hold senders | | sufficient information in each IP datagram to be able to hold senders | |
| and whole networks accountable for the congestion they cause | | and whole networks accountable for the congestion they cause | |
| downstream, before they cause it. But the every-day problems that | | downstream, before they cause it. But the every-day problems that | |
| re-ECN can solve are much more recognisable than this rather generic | | re-ECN can solve are much more recognisable than this rather generic | |
| statement: mitigating distributed denial of service (DDoS); | | statement: mitigating distributed denial of service (DDoS); | |
| simplifying differentiation of quality of service (QoS); policing | | simplifying differentiation of quality of service (QoS); policing | |
| compliance to congestion control; and so on. | | compliance to congestion control; and so on. | |
| | | | |
| Uniquely, re-ECN manages to enable solutions to these problems | | Uniquely, re-ECN manages to enable solutions to these problems | |
| | | | |
| skipping to change at page 6, line 45 | | skipping to change at page 7, line 26 | |
| | | | |
| For instance, some network owners want to block applications like | | For instance, some network owners want to block applications like | |
| voice and video unless their network is compensated for the extra | | voice and video unless their network is compensated for the extra | |
| share of bottleneck bandwidth taken. These real-time applications | | share of bottleneck bandwidth taken. These real-time applications | |
| tend to be unresponsive when congestion arises. Whereas elastic TCP- | | tend to be unresponsive when congestion arises. Whereas elastic TCP- | |
| based applications back away quickly, ending up taking a much smaller | | based applications back away quickly, ending up taking a much smaller | |
| share of congested capacity for themselves. Other network owners | | share of congested capacity for themselves. Other network owners | |
| want to invest in large amounts of capacity and make their gains from | | want to invest in large amounts of capacity and make their gains from | |
| simplicity of operation and economies of scale. | | simplicity of operation and economies of scale. | |
| | | | |
|
| Re-ECN allows the more conservative networks to police out flows that | | re-ECN allows the more conservative networks to police out flows that | |
| have not asked to be unresponsive to congestion---not because they | | have not asked to be unresponsive to congestion---not because they | |
| are voice or video---just because they don't respond to congestion. | | are voice or video---just because they don't respond to congestion. | |
| But it also allows other networks to choose not to police. | | But it also allows other networks to choose not to police. | |
| Crucially, when flows from liberal networks cross into a conservative | | Crucially, when flows from liberal networks cross into a conservative | |
| network, re-ECN enables the conservative network to apply penalties | | network, re-ECN enables the conservative network to apply penalties | |
| to its neighbouring networks for the congestion they allow to be | | to its neighbouring networks for the congestion they allow to be | |
| caused. And these penalties can be applied to bulk data, without | | caused. And these penalties can be applied to bulk data, without | |
| regard to flows. | | regard to flows. | |
| | | | |
| Then, if unresponsive applications become so dominant that some of | | Then, if unresponsive applications become so dominant that some of | |
| the more liberal networks experience congestion collapse [RFC3714], | | the more liberal networks experience congestion collapse [RFC3714], | |
| they can change their minds and use re-ECN to apply tighter controls | | they can change their minds and use re-ECN to apply tighter controls | |
| in order to bring congestion back under control. | | in order to bring congestion back under control. | |
| | | | |
|
| Re-ECN works by arranging that each packet arrives at each network | | re-ECN works by arranging that each packet arrives at each network | |
| element carrying a view of expected congestion on its own downstream | | element carrying a view of expected congestion on its own downstream | |
| path, albeit averaged over multiple packets. Most usefully, | | path, albeit averaged over multiple packets. Most usefully, | |
| congestion on the remainder of the path becomes visible in the IP | | congestion on the remainder of the path becomes visible in the IP | |
| header at the first ingress. Many of the applications of re-ECN | | header at the first ingress. Many of the applications of re-ECN | |
| involve a policer at this ingress using the view of downstream | | involve a policer at this ingress using the view of downstream | |
| congestion arriving in packets to police or control the packet rate. | | congestion arriving in packets to police or control the packet rate. | |
| | | | |
| Importantly, the scheme is recursive: a whole network harbouring | | Importantly, the scheme is recursive: a whole network harbouring | |
| users causing congestion in downstream networks can be held | | users causing congestion in downstream networks can be held | |
| responsible or policed by its downstream neighbour. | | responsible or policed by its downstream neighbour. | |
| | | | |
| This document is structured as follows. First an overview of the re- | | This document is structured as follows. First an overview of the re- | |
| ECN protocol is given (Section 3), outlining its attributes and | | ECN protocol is given (Section 3), outlining its attributes and | |
| explaining conceptually how it works as a whole. The two main parts | | explaining conceptually how it works as a whole. The two main parts | |
|
| of the document follow, as described above. That is, the protocol | | of the document follow. That is, the protocol specification divided | |
| specification divided into transport (Section 4) and network | | into transport (Section 4) and network (Section 5) layers which | |
| (Section 5) layers, then the applications it can be put to, such as | | contain most of the standards compliance terminology, then the | |
| policing DDoS, QoS and congestion control (Section 6). Although | | applications re-ECN can be put to, such as policing DDoS, QoS and | |
| these applications do not require standardisation themselves, they | | congestion control (Section 6). Although these applications do not | |
| are described in a fair degree of detail in order to explain how re- | | require standardisation themselves, they are described in a fair | |
| ECN can be used. Given re-ECN proposes to use the last undefined bit | | degree of detail in order to explain how re-ECN can be used. Given | |
| in the IPv4 header, we felt it necessary to outline the potential | | re-ECN proposes to use the last undefined bit in the IPv4 header, we | |
| that re-ECN could release in return for being given that bit. | | felt it necessary to outline the potential that re-ECN could release | |
| | | in return for being given that bit. | |
| | | | |
| Deployment issues discussed throughout the document are brought | | Deployment issues discussed throughout the document are brought | |
| together in Section 7, which is followed by a brief section | | together in Section 7, which is followed by a brief section | |
| explaining the somewhat subtle rationale for the design from an | | explaining the somewhat subtle rationale for the design from an | |
| architectural perspective (Section 8). We end by describing related | | architectural perspective (Section 8). We end by describing related | |
| work (Section 9), listing security considerations (Section 10) and | | work (Section 9), listing security considerations (Section 10) and | |
| finally drawing conclusions (Section 12). | | finally drawing conclusions (Section 12). | |
| | | | |
| 2. Requirements notation | | 2. Requirements notation | |
| | | | |
| | | | |
| skipping to change at page 8, line 15 | | skipping to change at page 8, line 45 | |
| document considers many cases where malicious nodes may not comply | | document considers many cases where malicious nodes may not comply | |
| with the protocol. When such contingencies are described, if any of | | with the protocol. When such contingencies are described, if any of | |
| the above keywords are not capitalised, that is deliberate. So, for | | the above keywords are not capitalised, that is deliberate. So, for | |
| instance, the following two apparently contradictory sentences would | | instance, the following two apparently contradictory sentences would | |
| be perfectly consistent: i) x MUST do this; ii) x may not do this. | | be perfectly consistent: i) x MUST do this; ii) x may not do this. | |
| | | | |
| 3. Protocol Overview | | 3. Protocol Overview | |
| | | | |
| 3.1. Background and Applicability | | 3.1. Background and Applicability | |
| | | | |
|
| First we briefly recap the essentials of the ECN protocol [RFC3168]. | | The re-ECN protocol makes no changes and has no effect on the TCP | |
| Two bits in the IP protocol (v4 or v6) are assigned to the ECN field. | | congestion control algorithm or on other rate responses to | |
| The sender clears the field to "00" (Not-ECT) if either end-point | | congestion. re-ECN is not a new congestion control protocol, rather | |
| transport is not ECN-capable. Otherwise it indicates an ECN-capable | | it is orthogonal to congestion control itself. Re-ECN is concerned | |
| transport (ECT) using either of the two code-points "10" or "01" | | with revealing information about congestion so that users and | |
| (ECT(0) and ECT(1) resp.). | | networks can be held accountable for the congestion they cause, or | |
| | | allow to be caused. | |
| | | | |
|
| ECN-capable routers probabilistically set "11" if congestion is | | Re-ECN builds on ECN so we briefly recap the essentials of the ECN | |
| | | protocol [RFC3168]. Two bits in the IP protocol (v4 or v6) are | |
| | | assigned to the ECN field. The sender clears the field to "00" (Not- | |
| | | ECT) if either end-point transport is not ECN-capable. Otherwise it | |
| | | indicates an ECN-capable transport (ECT) using either of the two | |
| | | code-points "10" or "01" (ECT(0) and ECT(1) resp.). | |
| | | | |
| | | ECN-capable queues probabilistically set "11" if congestion is | |
| experienced (CE), the marking probability increasing with the length | | experienced (CE), the marking probability increasing with the length | |
| of the queue at its egress link (typically using the RED | | of the queue at its egress link (typically using the RED | |
| algorithm [RFC2309]). However, they still drop rather than mark Not- | | algorithm [RFC2309]). However, they still drop rather than mark Not- | |
|
| ECT packets. With multiple ECN-capable routers on a path, a flow of | | ECT packets. With multiple ECN-capable queues on a path, a flow of | |
| packets accumulates the fraction of CE marking that each router adds. | | packets accumulates the fraction of CE marking that each queue adds. | |
| The combined effect of the packet marking of all the routers along | | The combined effect of the packet marking of all the queues along the | |
| the path signals congestion of the whole path to the receiver. So, | | path signals congestion of the whole path to the receiver. So, for | |
| for example, if one router early in a path is marking 1% of packets | | example, if one queue early in a path is marking 1% of packets and | |
| and another later in a path is marking 2%, flows that pass through | | another later in a path is marking 2%, flows that pass through both | |
| both routers will experience approximately 3% marking (see Appendix A | | queues will experience approximately 3% marking (see Appendix A for a | |
| for a precise treatment). | | precise treatment). | |
| | | | |
| The choice of two ECT code-points in the ECN field [RFC3168] | | The choice of two ECT code-points in the ECN field [RFC3168] | |
| permitted future flexibility, optionally allowing the sender to | | permitted future flexibility, optionally allowing the sender to | |
| encode the experimental ECN nonce [RFC3540] in the packet stream. | | encode the experimental ECN nonce [RFC3540] in the packet stream. | |
| The nonce is designed to allow a sender to check the integrity of | | The nonce is designed to allow a sender to check the integrity of | |
| congestion feedback. But Section 9.2 explains that it still gives no | | congestion feedback. But Section 9.2 explains that it still gives no | |
| control over how fast the sender transmits as a result of the | | control over how fast the sender transmits as a result of the | |
| feedback. On the other hand, re-ECN is designed both to ensure that | | feedback. On the other hand, re-ECN is designed both to ensure that | |
| congestion is declared honestly and that the sender's rate responds | | congestion is declared honestly and that the sender's rate responds | |
| appropriately. | | appropriately. | |
| | | | |
| skipping to change at page 9, line 10 | | skipping to change at page 9, line 48 | |
| re-inserted or re-echoed feedback. But it actually works even when | | re-inserted or re-echoed feedback. But it actually works even when | |
| no feedback is available. In fact it has been carefully designed to | | no feedback is available. In fact it has been carefully designed to | |
| work for single datagram flows. It also encourages aggregation of | | work for single datagram flows. It also encourages aggregation of | |
| single packet flows by congestion control proxies. Then, even if the | | single packet flows by congestion control proxies. Then, even if the | |
| traffic mix of the Internet were to become dominated by short | | traffic mix of the Internet were to become dominated by short | |
| messages, it would still be possible to control congestion | | messages, it would still be possible to control congestion | |
| effectively and efficiently. | | effectively and efficiently. | |
| | | | |
| Changing the Internet's feedback architecture seems to imply | | Changing the Internet's feedback architecture seems to imply | |
| considerable upheaval. But re-ECN can be deployed incrementally at | | considerable upheaval. But re-ECN can be deployed incrementally at | |
|
| the transport layer around unmodified routers using existing fields | | the transport layer around unmodified queues using existing fields in | |
| in IP (v4 or v6). However it does also require the last undefined | | IP (v4 or v6). However it does also require the last undefined bit | |
| bit in the IPv4 header, which it uses in combination with the 2-bit | | in the IPv4 header, which it uses in combination with the 2-bit ECN | |
| ECN field to create four new codepoints. Nonetheless, changes to IP | | field to create four new codepoints. Nonetheless, we RECOMMENDED | |
| routers are RECOMMENDED in order to improve resilience against DoS | | adding optional preferentail drop to IP queues based on the re-ECN | |
| attacks. Similarly, re-ECN works best if both the sender and | | fields in order to improve resilience against DoS attacks. | |
| receiver transports are re-ECN-capable, but it can work with just | | Similarly, re-ECN works best if both the sender and receiver | |
| sender support. Section 7.1 summarises the incremental deployment | | transports are re-ECN-capable, but it can work with just sender | |
| strategy. | | support. Section 7.1 summarises the incremental deployment strategy. | |
| | | | |
| The re-ECN protocol makes no changes and has no effect on the TCP | | | |
| congestion control algorithm or on other rate responses to | | | |
| congestion. Re-ECN is only concerned with enabling the ingress | | | |
| network to police that a source is complying with a congestion | | | |
| control algorithm, which is orthogonal to congestion control itself. | | | |
| | | | |
| Before re-ECN can be considered worthy of using up the last bit in | | Before re-ECN can be considered worthy of using up the last bit in | |
| the IP header, we must be sure that all our claims are robust. We | | the IP header, we must be sure that all our claims are robust. We | |
| have gradually been reducing the list of outstanding issues, but the | | have gradually been reducing the list of outstanding issues, but the | |
| few that still remain are listed in Section 6.3. We expect new | | few that still remain are listed in Section 6.3. We expect new | |
| attacks may still be found, but we offer the re-ECN protocol on the | | attacks may still be found, but we offer the re-ECN protocol on the | |
| basis that it is built on fairly solid theoretical foundations and, | | basis that it is built on fairly solid theoretical foundations and, | |
| so far, it has proved possible to keep it relatively robust. | | so far, it has proved possible to keep it relatively robust. | |
| | | | |
|
| 3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6) | | 3.2. Simplified Re-ECN Protocol | |
| | | | |
| | | We describe here the simplified re-ECN protocol. In this first | |
| | | description we assume packets and segments are synonymous. | |
| | | | |
| | | Packets are sent from a sender to a receiver. In Figure 1 the queues | |
| | | (Q1 and Q2) are ECN enabled as per RFC 3168 [ref]. If congestion | |
| | | occurs then packets are marked with the congestion experienced (CE) | |
| | | flag exactly as in the ECN protocol [RFC3168]; the routers do not | |
| | | need to be modified and do not need to know the re-ECN protocol. On | |
| | | reception of marked packets the receiver notifies the sender of the | |
| | | current count of marked packets. Note that this is the number of | |
| | | packets marked rather than the setting of the ECE flag in ECN. The | |
| | | sender uses this information to re-echo mark packets in exact | |
| | | correspondence to the number of CE marked bytes observed at the | |
| | | receiver. | |
| | | | |
| | | +--------- Feedback----------+ | |
| | | | | | |
| | | v | | |
| | | +---+ +----+ +----+ +---+ | |
| | | | | RE | | | | | | | |
| | | | S |--->| Q1 |--->| Q2 |--->| R | | |
| | | | | | | | | | | | |
| | | +---+ +----+ +----+ +---+ | |
| | | | |
| | | Figure 1: Simple Re-ECN | |
| | | | |
| | | 3.3. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6) | |
| | | | |
| The re-ECN wire protocol uses the two bit ECN field broadly as in | | The re-ECN wire protocol uses the two bit ECN field broadly as in | |
| RFC3168 [RFC3168] as described above, but with five differences of | | RFC3168 [RFC3168] as described above, but with five differences of | |
| detail (brought together in a list in Section 7.1). This | | detail (brought together in a list in Section 7.1). This | |
| specification defines a new re-ECN extension (RE) flag. We will | | specification defines a new re-ECN extension (RE) flag. We will | |
| defer the definition of the actual position of the RE flag in the | | defer the definition of the actual position of the RE flag in the | |
|
| IPv4 & v6 headers until Section 5. Until then it will suffice to use | | IPv4 & v6 headers until Section 5. When we don't need to choose | |
| an abstraction of the IPv4 and v6 wire protocols by just calling it | | between IPv4 and v6 wire protocols it will suffice call it the RE | |
| the RE flag. | | flag. | |
| | | | |
| Unlike the ECN field, the RE flag is intended to be set by the sender | | Unlike the ECN field, the RE flag is intended to be set by the sender | |
| and remain unchanged along the path, although it can be read by | | and remain unchanged along the path, although it can be read by | |
| network elements that understand the re-ECN protocol. It is feasible | | network elements that understand the re-ECN protocol. It is feasible | |
| that a network element MAY change the setting of the RE flag, perhaps | | that a network element MAY change the setting of the RE flag, perhaps | |
| acting as a proxy for an end-point, but such a protocol would have to | | acting as a proxy for an end-point, but such a protocol would have to | |
| be defined in another specification (e.g. [Re-PCN]). | | be defined in another specification (e.g. [Re-PCN]). | |
| | | | |
| Although the RE flag is a separate, single bit field, it can be read | | Although the RE flag is a separate, single bit field, it can be read | |
| as an extension to the two-bit ECN field; the three concatenated bits | | as an extension to the two-bit ECN field; the three concatenated bits | |
|
| in what we will call the extended ECN field (EECN) making eight | | in what we will call the extended ECN field (EECN) giving eight | |
| codepoints. We will use the RFC3168 names of the ECN codepoints to | | codepoints. We will use the RFC3168 names of the ECN codepoints to | |
| describe settings of the ECN field when the RE flag setting is "don't | | describe settings of the ECN field when the RE flag setting is "don't | |
| care", but we also define the following six extended ECN codepoint | | care", but we also define the following six extended ECN codepoint | |
| names for when we need to be more specific. | | names for when we need to be more specific. | |
| | | | |
|
| RFC3168 ECN defines uses for all four codepoints of the two-bit ECN | | One of re-ECN's codepoints is an alternative use of the codepoint set | |
| field. This memo widens the codepoint space to eight, and uses six | | aside in RFC3168 for the ECN nonce (ECT(1)). Transports using re-ECN | |
| codepoints. One of re-ECN's codepoints is an alternative use of the | | do not need to use the ECN nonce as long as the sender is also | |
| codepoint set aside in RFC3168 for the ECN nonce (ECT(1)). | | checking for transport protocol compliance | |
| Transports not using re-ECN can still use the ECN nonce, while those | | [I-D.moncaster-tcpm-rcv-cheat]. The case for doing this is given in | |
| using re-ECN do not need to as long as the sender is also checking | | Appendix I. Two re-ECN codepoints are given compatible uses to those | |
| for transport protocol compliance [I-D.moncaster-tcpm-rcv-cheat]. | | defined in RFC3168 (Not-ECT and CE). The other codepoint used by | |
| The case for doing this is given in Appendix I. Two re-ECN | | RFC3168 (ECT(0)) isn't used for re-ECN. Altogether this leave one | |
| codepoints are given compatible uses to those defined in RFC3168 | | codepoint of the eight unused by ECN or re-ECN and available for | |
| (Not-ECT and CE). The other codepoint used by RFC3168 (ECT(0)) isn't | | future use. | |
| used for re-ECN. Altogether this leave one codepoint of the eight | | | |
| unused and available for future use. | | | |
| | | | |
| +-------+------------+------+--------------+------------------------+ | | +-------+------------+------+--------------+------------------------+ | |
| | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | | | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | |
| | field | codepoint | flag | codepoint | | | | | field | codepoint | flag | codepoint | | | |
| +-------+------------+------+--------------+------------------------+ | | +-------+------------+------+--------------+------------------------+ | |
|
| | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | | | 00 | Not-ECT | 0 | Not-ECT | Not re-ECN-capable | | |
| | | | | | transport | | | | | | | | transport | | |
|
| | 00 | Not-ECT | 1 | FNE | Feedback not | | | | 00 | --- | 1 | FNE | Feedback not | | |
| | | | | | established | | | | | | | | established | | |
| | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | | | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | |
| | | | | | and RECT | | | | | | | | and RECT | | |
|
| | 01 | ECT(1) | 1 | RECT | Re-ECN capable | | | | 01 | --- | 1 | RECT | Re-ECN capable | | |
| | | | | | transport | | | | | | | | transport | | |
|
| | 10 | ECT(0) | 0 | --- | Legacy ECN use only | | | | 10 | ECT(0) | 0 | ECT(0) | RFC3168 ECN use only | | |
| | | | | | | | | | | | | | | | |
|
| | 10 | ECT(0) | 1 | --CU-- | Currently unused | | | | 10 | --- | 1 | --CU-- | Currently unused | | |
| | | | | | | | | | | | | | | | |
| | 11 | CE | 0 | CE(0) | Re-Echo canceled by | | | | 11 | CE | 0 | CE(0) | Re-Echo canceled by | | |
| | | | | | congestion experienced | | | | | | | | congestion experienced | | |
|
| | 11 | CE | 1 | CE(-1) | Congestion experienced | | | | 11 | --- | 1 | CE(-1) | Congestion experienced | | |
| +-------+------------+------+--------------+------------------------+ | | +-------+------------+------+--------------+------------------------+ | |
| | | | |
| Table 1: Extended ECN Codepoints | | Table 1: Extended ECN Codepoints | |
| | | | |
|
| 3.3. Re-ECN Protocol Operation | | 3.4. Re-ECN Protocol Operation | |
| | | | |
| In this section we will give an overview of the operation of the re- | | In this section we will give an overview of the operation of the re- | |
| ECN protocol for TCP/IP, leaving a detailed specification to the | | ECN protocol for TCP/IP, leaving a detailed specification to the | |
| following sections. Other transports will be discussed later. | | following sections. Other transports will be discussed later. | |
| | | | |
| In summary, the protocol adds a third `re-echo' stage to the existing | | In summary, the protocol adds a third `re-echo' stage to the existing | |
| TCP/IP ECN protocol. Whenever the network adds CE congestion | | TCP/IP ECN protocol. Whenever the network adds CE congestion | |
| signalling to the IP header on the forward data path, the receiver | | signalling to the IP header on the forward data path, the receiver | |
| feeds it back to the ingress using TCP, then the sender re-echoes it | | feeds it back to the ingress using TCP, then the sender re-echoes it | |
| into the forward data path using the RE flag in the next packet. | | into the forward data path using the RE flag in the next packet. | |
| | | | |
| Prior to receiving any feedback a sender will not know which setting | | Prior to receiving any feedback a sender will not know which setting | |
| of the RE flag to use, so it sets the feedback not established (FNE) | | of the RE flag to use, so it sets the feedback not established (FNE) | |
| codepoint. The network reads the FNE codepoint conservatively as | | codepoint. The network reads the FNE codepoint conservatively as | |
| equivalent to re-echoed congestion. | | equivalent to re-echoed congestion. | |
| | | | |
|
| Specifically, once a flow is established, a re-ECN sender always | | Specifically, once feedback from a flow is established, a re-ECN | |
| initialises the ECN field to ECT(1). And it usually sets the RE flag | | sender always initialises the ECN field to ECT(1). And it usually | |
| to "1". Whenever a router re-marks a packet to CE, the receiver | | sets the RE flag to "1". Whenever a queue marks a packet to CE, the | |
| feeds back this event to the sender. On receiving this feedback, the | | receiver feeds back this event to the sender. On receiving this | |
| re-ECN sender will clear the RE flag to "0" in the next packet it | | feedback, the re-ECN sender will clear the RE flag to "0" in the next | |
| sends. | | packet it sends. | |
| | | | |
| We chose to set and clear the RE flag this way round to ease | | We chose to set and clear the RE flag this way round to ease | |
| incremental deployment (see Section 7.1). To avoid confusion we will | | incremental deployment (see Section 7.1). To avoid confusion we will | |
| use the term `blanking' (rather than marking) when the RE flag is | | use the term `blanking' (rather than marking) when the RE flag is | |
| cleared to "0". So, over a stream of packets, we will talk of the | | cleared to "0". So, over a stream of packets, we will talk of the | |
| `RE blanking fraction' as the fraction of octets in packets with the | | `RE blanking fraction' as the fraction of octets in packets with the | |
| RE flag cleared to "0". | | RE flag cleared to "0". | |
| | | | |
|
| _ _ _ _ | | +---+ +----+ +----+ +---+ | |
| / \ / \ / \ / \ | | | S |--| Q1 |----------------| Q2 |--| R | | |
| | S |--| 0 | - - - - - - - - | i |--| D | | | +---+ +----+ +----+ +---+ | |
| \ _ / \ _ / \ _ / \ _ / | | | |
| . . . . | | . . . . | |
| ^ . . . . | | ^ . . . . | |
| | . . . . | | | . . . . | |
| | . RE blanking fraction . . | | | . RE blanking fraction . . | |
| 3% |-------------------------------+======= | | 3% |-------------------------------+======= | |
| | . . | . | | | . . | . | |
| 2% | . . | . | | 2% | . . | . | |
| | . . CE marking fraction | . | | | . . CE marking fraction | . | |
| 1% | . +----------------------+ . | | 1% | . +----------------------+ . | |
| | . | . . | | | . | . . | |
| 0% +---------------------------------------> | | 0% +---------------------------------------> | |
|
| ^ 0 ^ i ^ resource index | | ^ ^ ^ | |
| 0 ^ 1 ^ 2 observation points | | L M N Observation points | |
| | | | | | |
| 1.00% 2.00% marking fraction | | | |
| | | | |
|
| Figure 1: A 2-Router Example (Imprecise) | | Figure 2: A 2-Queue Example (Imprecise) | |
| | | | |
|
| Figure 1 uses a simple network to illustrate how re-ECN allows | | Figure 2 uses a simple network to illustrate how re-ECN allows queues | |
| routers to measure downstream congestion. The horizontal axis | | to measure downstream congestion. The receiver views a CE marking | |
| represents the index of each congestible resource (typically queues) | | fraction of 3% which is fed back to the sender. The sender sets an | |
| along a path through the Internet. There may be many routers on the | | RE blanking fraction of 3% to match this. This RE blanking fraction | |
| path, but we assume only two are currently congested (those with | | can be observed along the path as the RE flag is not changed by | |
| resource index 0 and i). The two superimposed plots show the | | network nodes once set by the sender. This is shown by the | |
| fraction of each extended ECN codepoint in a flow observed along this | | horizontal line at 3% in the figure. The CE marked fraction is shown | |
| path. Given about 3% of packets reaching the destination are marked | | by the stepped line which rises to meet the RE blanking fraction line | |
| CE, in response to feedback the sender will blank the RE flag in | | with steps at at each queue where packets are marked. Two queues are | |
| about 3% of packets it sends. Then approximate downstream congestion | | shown (Q1 and Q2) that are currently congested. Each time packets | |
| can be measured at the observation points shown along the path by | | pass through a fraction are marked; 1% at Q1 and 2% at Q2). The | |
| subtracting the CE marking fraction from the RE blanking fraction, as | | approximate downstream congestion can be measured at the observation | |
| shown in the table below (Appendix A derives these approximations | | points shown along the path by subtracting the CE marking fraction | |
| from a precise analysis). | | from the RE blanking fraction, as shown in the table below | |
| | | (Appendix A derives these approximations from a precise analysis). | |
| | | | |
| +-------------------+------------------------------+ | | +-------------------+------------------------------+ | |
| | Observation point | Approx downstream congestion | | | | Observation point | Approx downstream congestion | | |
| +-------------------+------------------------------+ | | +-------------------+------------------------------+ | |
|
| | 0 | 3% - 0% = 3% | | | | L | 3% - 0% = 3% | | |
| | 1 | 3% - 1% = 2% | | | | M | 3% - 1% = 2% | | |
| | 2 | 3% - 3% = 0% | | | | N | 3% - 3% = 0% | | |
| +-------------------+------------------------------+ | | +-------------------+------------------------------+ | |
| | | | |
| Table 2: Downstream Congestion Measured at Example Observation Points | | Table 2: Downstream Congestion Measured at Example Observation Points | |
| | | | |
| All along the path, whole-path congestion remains unchanged so it can | | All along the path, whole-path congestion remains unchanged so it can | |
| be used as a reference against which to compare upstream congestion. | | be used as a reference against which to compare upstream congestion. | |
| The difference predicts downstream congestion for the rest of the | | The difference predicts downstream congestion for the rest of the | |
| path. Therefore, measuring the fractions of each codepoint at any | | path. Therefore, measuring the fractions of each codepoint at any | |
| point in the Internet will reveal upstream, downstream and whole path | | point in the Internet will reveal upstream, downstream and whole path | |
| congestion. | | congestion. | |
| | | | |
| Note that we have introduced discussion of marking and blanking | | Note that we have introduced discussion of marking and blanking | |
|
| fractions solely for illustration. To be absolutely clear, these | | fractions solely for illustration. To be absolutely clear, for TCP | |
| fractions are averages that would result from the behaviour of a TCP | | these fractions are averages that would result from the behaviour of | |
| protocol handler mechanically blanking outgoing packets in direct | | the protocol handler mechanically blanking outgoing packets in direct | |
| response to incoming feedback---we are not saying any protocol | | response to incoming feedback---we are not saying any protocol | |
|
| handler works with these average fractions directly. | | handler has to work with these average fractions directly. | |
| | | | |
|
| 3.4. Informal Terminology | | 3.5. Informal Terminology | |
| | | | |
| In the rest of this memo we will loosely talk of positive or negative | | In the rest of this memo we will loosely talk of positive or negative | |
| flows, meaning flows where the moving average of the downstream | | flows, meaning flows where the moving average of the downstream | |
|
| congestion metric is persistently positive or negative. The notion | | congestion metric is persistently positive or negative. A negative | |
| of a negative metric arises because it is derived by subtracting one | | flow is one where more CE marked packets than re-ECN blanked packets | |
| metric from another. Of course actual downstream congestion cannot | | arrive. Likewise in positive flows more re-ECN blanked packets | |
| be negative, only the metric can (whether due to time lags or | | arrive than CE marked packets. The notion of a negative metric | |
| deliberate malice). | | arises because it is derived by subtracting one metric from another. | |
| | | Of course actual downstream congestion cannot be negative, only the | |
| | | metric can (whether due to time lags or deliberate malice). | |
| | | | |
| Just as we will loosely talk of positive and negative flows, we will | | Just as we will loosely talk of positive and negative flows, we will | |
| also talk of positive or negative packets, meaning packets that | | also talk of positive or negative packets, meaning packets that | |
| contribute positively or negatively to the downstream congestion | | contribute positively or negatively to the downstream congestion | |
| metric. | | metric. | |
| | | | |
| Therefore we will talk of packets having `worth' of +1, 0 or -1, | | Therefore we will talk of packets having `worth' of +1, 0 or -1, | |
| which, when multiplied by their size, indicates their contribution to | | which, when multiplied by their size, indicates their contribution to | |
| the downstream congestion metric. | | the downstream congestion metric. | |
| | | | |
|
| Figure 2 shows the main state transitions of the system once a flow | | The idea is that most packets start with zero worth. Every time the | |
| is established, showing the worth of packets in each state. When the | | network decrements the worth of a packet, the sender increments the | |
| network congestion marks a packet it decrements its worth (moving | | worth of a later packet. Then, over time, as many positive octets | |
| from the left of the main square to the right). When the sender | | should arrive at the receiver as negative. Note we have said octets | |
| blanks the RE flag in order to re-echo congestion it increments the | | not packets, so if packets are of different sizes, the worth should | |
| worth of a packet (moving from the bottom of the main square to the | | be incremented on enough octets to balance the octets in negative | |
| top). | | packets arriving at the receiver. It is this balance that will allow | |
| | | the network to hold the sender accountable for the congestion it | |
| Sender state Sent Worth Received Worth | | causes. | |
| packet packet | | | |
| +----------------------------------------------------+ | | | |
| | ^ | | | |
| V | | | | |
| Congestion echoed -->Re-Echo +1 --+---> CE(0) 0 --+ | | | |
| (positive) | (canceled) | | | | |
| V network | | | | |
| | congestion | | | | |
| | | | | | |
| Flow established --> RECT 0 ----+-> CE(-1) -1 --+ | | | |
| ^ (neutral) | | (negative) | | | |
| | | | | | | |
| | no V V | | | |
| | congestion | | | | | |
| +-----------<--------------+-+ | | | |
| | | | |
| Figure 2: Re-ECN System State Diagram (bootstrap not shown) | | | |
| | | | |
| The idea is that every time the network decrements the worth of a | | | |
| packet, the sender increments the worth of a later packet. Then, | | | |
| over time, as many positive octets should arrive at the receiver as | | | |
| negative. Note we have said octets not packets, so if packets are of | | | |
| different sizes, the worth should be incremented on enough octets to | | | |
| balance the octets in negative packets arriving at the receiver. It | | | |
| is this balance that will allow the network to hold the sender | | | |
| accountable for the congestion it causes, as we shall see. The | | | |
| informal outline below uses TCP as an example transport, but the idea | | | |
| would be broadly similar for any transport that adapts its rate to | | | |
| congestion. | | | |
| | | | |
| We will start with the sender in `flow established' state. Normally, | | | |
| as acknowledgements of earlier packets arrive that don't feedback any | | | |
| congestion, the congestion window can be opened, so the sender goes | | | |
| round the smaller sub-loop, sending RECT packets (worth 0) and | | | |
| returning to the flow established state to send another one. If a | | | |
| router congestion marks one of the packets, it decrements the | | | |
| packet's worth. The sender will have been continuing to traverse | | | |
| round the smaller feedback loop every time acknowledgements arrive. | | | |
| But when congestion feedback returns from this packet that was marked | | | |
| with -1 worth (the largest loop in the figure) the sender jumps to | | | |
| the congestion echoed state in order to re-echo the congestion, | | | |
| incrementing the worth of the next packet to +1 by blanking its RE | | | |
| flag. The sender then returns to the flow established state and | | | |
| continues round the smaller loop, sending packets worth 0. Note that | | | |
| the size of the loops is just an artefact of the figure; it is not | | | |
| meant to imply that one loop is slower than the other - they are both | | | |
| the same end to end feedback loop. | | | |
| | | | |
| If a packet carrying re-echoed congestion happens to also be | | If a packet carrying re-echoed congestion happens to also be | |
| congestion marked, the +1 worth added by the sender will be cancelled | | congestion marked, the +1 worth added by the sender will be cancelled | |
| out by the -1 network congestion marking. Although the two worth | | out by the -1 network congestion marking. Although the two worth | |
| values correctly cancel out, neither the congestion marking nor the | | values correctly cancel out, neither the congestion marking nor the | |
| re-echoed congestion are lost, because the RE bit and the ECN field | | re-echoed congestion are lost, because the RE bit and the ECN field | |
| are orthogonal. So, whenever this happens, the receiver will | | are orthogonal. So, whenever this happens, the receiver will | |
|
| correctly detect and re-echo the new congestion event as well (the | | correctly detect and re-echo the new congestion event as well. | |
| top sub-loop). When we need to distinguish, we will sometimes call a | | | |
| packet marked RECT 'neutral' (0 worth), while we will call the CE(0) | | | |
| marking 'canceled' (also 0 worth). If a re-echoed packet isn't | | | |
| unlucky enough to be further congestion marked, the sender will | | | |
| return to the flow established state and continue to send RECT | | | |
| packets (worth 0). | | | |
| | | | |
| The table below specifies unambiguously the worth of each extended | | The table below specifies unambiguously the worth of each extended | |
| ECN codepoint. Note the order is different from the previous table | | ECN codepoint. Note the order is different from the previous table | |
| to better show how the worth increments and decrements. The FNE | | to better show how the worth increments and decrements. The FNE | |
|
| codepoint is an exception. It is used in the flow bootstrap process | | codepoint is used in the flow bootstrap process (explained later) and | |
| (explained later) and has the same positive (+1) worth as a packet | | has the same positive (+1) worth as a packet with the Re-Echo | |
| with the Re-Echo codepoint. | | codepoint. | |
| | | | |
| +--------+------+----------------+-------+--------------------------+ | | +--------+------+----------------+-------+--------------------------+ | |
| | ECN | RE | Extended ECN | Worth | Re-ECN meaning | | | | ECN | RE | Extended ECN | Worth | Re-ECN meaning | | |
| | field | bit | codepoint | | | | | | field | bit | codepoint | | | | |
| +--------+------+----------------+-------+--------------------------+ | | +--------+------+----------------+-------+--------------------------+ | |
| | 00 | 0 | Not-RECT | ... | Not re-ECN-capable | | | | 00 | 0 | Not-RECT | ... | Not re-ECN-capable | | |
| | | | | | transport | | | | | | | | transport | | |
|
| | | | 00 | 1 | FNE | +1 | Feedback not established | | |
| | 01 | 0 | Re-Echo | +1 | Re-echoed congestion and | | | | 01 | 0 | Re-Echo | +1 | Re-echoed congestion and | | |
| | | | | | RECT | | | | | | | | RECT | | |
|
| | 10 | 0 | --- | ... | Legacy ECN use only | | | | 10 | 0 | --- | ... | RFC3168 ECN use only | | |
| | 11 | 0 | CE(0) | 0 | Re-Echo canceled by | | | | 11 | 0 | CE(0) | 0 | Re-Echo canceled by | | |
| | | | | | congestion experienced | | | | | | | | congestion experienced | | |
|
| | 00 | 1 | FNE | +1 | Feedback not established | | | | |
| | 01 | 1 | RECT | 0 | Re-ECN capable transport | | | | 01 | 1 | RECT | 0 | Re-ECN capable transport | | |
| | 10 | 1 | --CU-- | ... | Currently unused | | | | 10 | 1 | --CU-- | ... | Currently unused | | |
| | | | | | | | | | | | | | | | |
| | 11 | 1 | CE(-1) | -1 | Congestion experienced | | | | 11 | 1 | CE(-1) | -1 | Congestion experienced | | |
| +--------+------+----------------+-------+--------------------------+ | | +--------+------+----------------+-------+--------------------------+ | |
| | | | |
| Table 3: 'Worth' of Extended ECN Codepoints | | Table 3: 'Worth' of Extended ECN Codepoints | |
| | | | |
| 4. Transport Layers | | 4. Transport Layers | |
| | | | |
| 4.1. TCP | | 4.1. TCP | |
| | | | |
| Re-ECN capability at the sender is essential. At the receiver it is | | Re-ECN capability at the sender is essential. At the receiver it is | |
|
| optional, as long as the receiver has a basic (`vanilla flavour') | | optional, as long as the receiver has a basic RFC3168-compliant ECN- | |
| RFC3168-compliant ECN-capable transport (ECT) [RFC3168]. Given re- | | capable transport (ECT) [RFC3168]. Given re-ECN is not the first | |
| ECN is not the first attempt to define the semantics of the ECN | | attempt to define the semantics of the ECN field, we give a table | |
| field, we give a table below summarising what happens for various | | below summarising what happens for various combinations of | |
| combinations of capabilities of the sender S and receiver R, as | | capabilities of the sender S and receiver R, as indicated in the | |
| indicated in the first four columns below. The last column gives the | | first four columns below. The last column gives the mode a half- | |
| mode a half-connection should be in after the first two of the three | | connection should be in after the first two of the three TCP | |
| TCP handshakes. | | handshakes. | |
| | | | |
| +--------+--------------+------------+---------+--------------------+ | | +--------+--------------+------------+---------+--------------------+ | |
| | Re-ECT | ECT-Nonce | ECT | Not-ECT | S-R | | | | Re-ECT | ECT-Nonce | ECT | Not-ECT | S-R | | |
| | | (RFC3540) | (RFC3168) | | Half-connection | | | | | (RFC3540) | (RFC3168) | | Half-connection | | |
| | | | | | Mode | | | | | | | | Mode | | |
| +--------+--------------+------------+---------+--------------------+ | | +--------+--------------+------------+---------+--------------------+ | |
| | SR | | | | RECN | | | | SR | | | | RECN | | |
| | S | R | | | RECN-Co | | | | S | R | | | RECN-Co | | |
| | S | | R | | RECN-Co | | | | S | | R | | RECN-Co | | |
| | S | | | R | Not-ECT | | | | S | | | R | Not-ECT | | |
| | | | |
| skipping to change at page 16, line 32 | | skipping to change at page 16, line 37 | |
| | | | |
| Table 4: Modes of TCP Half-connection for Combinations of ECN | | Table 4: Modes of TCP Half-connection for Combinations of ECN | |
| Capabilities of Sender S and Receiver R | | Capabilities of Sender S and Receiver R | |
| | | | |
| We will describe what happens in each mode, then describe how they | | We will describe what happens in each mode, then describe how they | |
| are negotiated. The abbreviations for the modes in the above table | | are negotiated. The abbreviations for the modes in the above table | |
| mean: | | mean: | |
| | | | |
| RECN: Full re-ECN capable transport | | RECN: Full re-ECN capable transport | |
| | | | |
|
| RECN-Co: Re-ECN sender in compatibility mode with a | | RECN-Co: Re-ECN sender in compatibility mode with a RFC3168 | |
| vanilla [RFC3168] ECN receiver or an [RFC3540] ECN nonce-capable | | compliant [RFC3168] ECN receiver or an [RFC3540] ECN nonce-capable | |
| receiver. Implementation of this mode is OPTIONAL. | | receiver. Implementation of this mode is OPTIONAL. | |
| | | | |
| Not-ECT: Not ECN-capable transport, as defined in [RFC3168] for when | | Not-ECT: Not ECN-capable transport, as defined in [RFC3168] for when | |
| at least one of the transports does not understand even basic ECN | | at least one of the transports does not understand even basic ECN | |
| marking. | | marking. | |
| | | | |
| Note that we use the term Re-ECT for a host transport that is re-ECN- | | Note that we use the term Re-ECT for a host transport that is re-ECN- | |
| capable but RECN for the modes of the half connections between hosts | | capable but RECN for the modes of the half connections between hosts | |
| when they are both Re-ECT. If a host transport is Re-ECT, this fact | | when they are both Re-ECT. If a host transport is Re-ECT, this fact | |
| alone does NOT imply either of its half connections will necessarily | | alone does NOT imply either of its half connections will necessarily | |
| be in RECN mode, at least not until it has confirmed that the other | | be in RECN mode, at least not until it has confirmed that the other | |
| host is Re-ECT. | | host is Re-ECT. | |
| | | | |
|
| 4.1.1. RECN mode: Full re-ECN capable transport | | 4.1.1. RECN mode: Full Re-ECN capable transport | |
| | | | |
| In full RECN mode, for each half connection, both the sender and the | | In full RECN mode, for each half connection, both the sender and the | |
| receiver each maintain an unsigned integer counter we will call ECC | | receiver each maintain an unsigned integer counter we will call ECC | |
|
| (echo congestion counter). The receiver maintains a count, modulo 8, | | (echo congestion counter). The receiver maintains a count of how | |
| of how many times a CE marked packet has arrived during the half- | | many times a CE marked packet has arrived during the half-connection. | |
| connection. Once a RECN connection is established, the three TCP | | Once a RECN connection is established, the three TCP option flags | |
| option flags (ECE, CWR & NS) used for ECN-related functions in other | | (ECE, CWR & NS) used for ECN-related functions in other versions of | |
| versions of ECN are used as a 3-bit field for the receiver to | | ECN are used as a 3-bit field for the receiver to repeatedly tell the | |
| repeatedly tell the sender the current value of ECC whenever it sends | | sender the current value of ECC, modulo 8, whenever it sends a TCP | |
| a TCP ACK. We will call this the echo congestion increment (ECI) | | ACK. We will call this the echo congestion increment (ECI) field. | |
| field. This overloaded use of these 3 option flags as one 3-bit ECI | | This overloaded use of these 3 option flags as one 3-bit ECI field is | |
| field is shown in Figure 4. The actual definition of the TCP header, | | shown in Figure 4. The actual definition of the TCP header, | |
| including the addition of support for the ECN nonce, is shown for | | including the addition of support for the ECN nonce, is shown for | |
| comparison in Figure 3. This specification does not redefine the | | comparison in Figure 3. This specification does not redefine the | |
| names of these three TCP option flags, it merely overloads them with | | names of these three TCP option flags, it merely overloads them with | |
| another definition once a flow is established. | | another definition once a flow is established. | |
| | | | |
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |
| | | | N | C | E | U | A | P | R | S | F | | | | | | N | C | E | U | A | P | R | S | F | | |
| | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | |
| | | | | R | E | G | K | H | T | N | N | | | | | | | R | E | G | K | H | T | N | N | | |
| | | | |
| skipping to change at page 17, line 41 | | skipping to change at page 17, line 47 | |
| | | | | G | K | H | T | N | N | | | | | | | G | K | H | T | N | N | | |
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |
| | | | |
| Figure 4: Definition of the ECI field within bytes 13 and 14 of the | | Figure 4: Definition of the ECI field within bytes 13 and 14 of the | |
| TCP Header, overloading the current definitions above for established | | TCP Header, overloading the current definitions above for established | |
| RECN flows. | | RECN flows. | |
| | | | |
| Receiver Action in RECN Mode | | Receiver Action in RECN Mode | |
| | | | |
| Every time a CE marked packet arrives at a receiver in RECN mode, | | Every time a CE marked packet arrives at a receiver in RECN mode, | |
|
| the receiver transport increments its local value of ECC modulo 8 | | the receiver transport increments its local value of ECC and MUST | |
| and MUST echo its value to the sender in the ECI field of the next | | echo its value, modulo 8, to the sender in the ECI field of the | |
| ACK. It MUST repeat the same value of ECI in every subsequent ACK | | next ACK. It MUST repeat the same value of ECI in every | |
| until the next CE event, when it increments ECI again. | | subsequent ACK until the next CE event, when it increments ECI | |
| | | again. | |
| | | | |
| The increment of the local ECC values is modulo 8 so the field | | The increment of the local ECC values is modulo 8 so the field | |
| value simply wraps round back to zero when it overflows. The | | value simply wraps round back to zero when it overflows. The | |
| least significant bit is to the right (labelled bit 9). | | least significant bit is to the right (labelled bit 9). | |
| | | | |
| A receiver in RECN mode MAY delay the echo of a CE to the next | | A receiver in RECN mode MAY delay the echo of a CE to the next | |
| delayed-ACK, which would be necessary if ACK-withholding were | | delayed-ACK, which would be necessary if ACK-withholding were | |
| implemented. | | implemented. | |
| | | | |
| Sender Action in RECN Mode | | Sender Action in RECN Mode | |
| | | | |
| On the arrival of every ACK, the sender compares the ECI field | | On the arrival of every ACK, the sender compares the ECI field | |
| with its own ECC value, then replaces its local value with that | | with its own ECC value, then replaces its local value with that | |
|
| from the ACK. The difference D is assumed to be the number of CE | | from the ACK. The difference D (D = (ECI + 8 - ECC mod 8) mod 8) | |
| marked packets that arrived at the receiver since it sent the | | is assumed to be the number of CE marked packets that arrived at | |
| previously received ACK (but see below for the sender's safety | | the receiver since it sent the previously received ACK (but see | |
| strategy). Whenever the ECI field increments by D (and/or d drops | | below for the sender's safety strategy). Whenever the ECI field | |
| are detected), the sender MUST clear the RE flag to "0" in the IP | | increments by D (and/or d drops are detected), the sender MUST | |
| header of the next D' data packets it sends (where D' = D + d), | | clear the RE flag to "0" in the IP header of the next D' data | |
| effectively re-echoing each single increment of ECI. Otherwise | | packets it sends (where D' = D + d), effectively re-echoing each | |
| the data sender MUST send all data packets with RE set to "1". | | single increment of ECI. Otherwise the data sender MUST send all | |
| | | data packets with RE set to "1". | |
| | | | |
| As a general rule, once a flow is established, as well as setting | | As a general rule, once a flow is established, as well as setting | |
| or clearing the RE flag as above, a data sender in RECN mode MUST | | or clearing the RE flag as above, a data sender in RECN mode MUST | |
| always set the ECN field to ECT(1). However, the settings of the | | always set the ECN field to ECT(1). However, the settings of the | |
| extended ECN field during flow start are defined in Section 4.1.4. | | extended ECN field during flow start are defined in Section 4.1.4. | |
| | | | |
| As we have already emphasised, the re-ECN protocol makes no | | As we have already emphasised, the re-ECN protocol makes no | |
| changes and has no effect on the TCP congestion control algorithm. | | changes and has no effect on the TCP congestion control algorithm. | |
|
| So, each increment of ECI (or detection of a drop) also triggers | | So, the first increment of ECI (or detection of a drop) in a RTT | |
| the standard TCP congestion response, but with no more than one | | triggers the standard TCP congestion response, no more than one | |
| congestion response per round trip, as usual. | | congestion response per round trip, as usual. However, the sender | |
| | | re-echoes every increment of ECI irrespective of RTTs. | |
| | | | |
| A TCP sender also acts as the receiver for the other half- | | A TCP sender also acts as the receiver for the other half- | |
| connection. The host will maintain two ECC values S.ECC and R.ECC | | connection. The host will maintain two ECC values S.ECC and R.ECC | |
| as sender and receiver respectively. Every TCP header sent by a | | as sender and receiver respectively. Every TCP header sent by a | |
| host in RECN mode will also repeat the prevailing value of R.ECC | | host in RECN mode will also repeat the prevailing value of R.ECC | |
| in its ECI field. If a sender in RECN mode has to retransmit a | | in its ECI field. If a sender in RECN mode has to retransmit a | |
| packet due to a suspected loss, the re-transmitted packet MUST | | packet due to a suspected loss, the re-transmitted packet MUST | |
| carry the latest prevailing value of R.ECC when it is re- | | carry the latest prevailing value of R.ECC when it is re- | |
| transmitted, which will not necessarily be the one it carried | | transmitted, which will not necessarily be the one it carried | |
| originally. | | originally. | |
| | | | |
| 4.1.1.1. Drops and Marks | | 4.1.1.1. Drops and Marks | |
| | | | |
|
| Re-ECN is based on the ECN protocol [RFC3168] which in turn is | | Re-ECN is based on the ECN protocol [RFC3168] . In turn the | |
| typically based on the RED algorithm [RFC2309]. This algorithm marks | | congestion markings ECN uses are typically based on the RED | |
| packets as CE with a probability that increases as the size of the | | algorithm [RFC2309]. This algorithm marks packets as CE with a | |
| router queue increases. Howeverif the queue becomes too full then it | | probability that increases as the size of the router queue increases. | |
| will revert to dropping packets. Because of this it is important | | However, if the queue becomes too full then it will revert to | |
| that re-ECN treats each packet drop it detects as if it were actually | | dropping packets. Because of this it is important that a re-ECN | |
| a CE mark. This ensures that it can continue to correctly echo | | sender treats each packet drop it detects as if it were actually a CE | |
| congestion even through a highly congested path. | | mark. This ensures that it can continue to correctly echo congestion | |
| | | even through a highly congested path. | |
| | | | |
| In order to ensure that drops are correctly echoed the sender needs | | In order to ensure that drops are correctly echoed the sender needs | |
| to add the number of drops detected per RTT to the difference in ECI | | to add the number of drops detected per RTT to the difference in ECI | |
|
| value waiting to be echoed. A drop is defined as set out in | | value waiting to be echoed. Drop detection is defined as set out in | |
| [RFC2581] -- if the connection is in slow start then a single | | [RFC2581] -- if the connection is in slow start then a single | |
| duplicate aknowledgement will be treated as an indication of a drop. | | duplicate aknowledgement will be treated as an indication of a drop. | |
| When the system is in the congestion avoidance stage then 3 duplicate | | When the system is in the congestion avoidance stage then 3 duplicate | |
| acknowledgements will be treated as a sign of a drop. In all cases, | | acknowledgements will be treated as a sign of a drop. In all cases, | |
| if a re-transmission time-out occurs then that will be treatd as a | | if a re-transmission time-out occurs then that will be treatd as a | |
| drop. | | drop. | |
| | | | |
| 4.1.1.2. Safety against Long Pure ACK Loss Sequences | | 4.1.1.2. Safety against Long Pure ACK Loss Sequences | |
| | | | |
| The ECI method was chosen for echoing congestion marking because a | | The ECI method was chosen for echoing congestion marking because a | |
| | | | |
| skipping to change at page 20, line 4 | | skipping to change at page 20, line 12 | |
| previous ACK but with a sequence number unchanged from the previously | | previous ACK but with a sequence number unchanged from the previously | |
| received ACK, it SHOULD conservatively assume that the ECI field | | received ACK, it SHOULD conservatively assume that the ECI field | |
| incremented by D' = L - ((L-D) mod 8), where D is the apparent | | incremented by D' = L - ((L-D) mod 8), where D is the apparent | |
| increase in the ECI field. For example if the ACK arriving after 9 | | increase in the ECI field. For example if the ACK arriving after 9 | |
| pure ACK losses apparently increased ECI by 2, the assumed increment | | pure ACK losses apparently increased ECI by 2, the assumed increment | |
| of ECI would still be 2. But if ECI apparently increased by 2 after | | of ECI would still be 2. But if ECI apparently increased by 2 after | |
| 11 pure ACK losses, ECI should be assumed to have increased by 10. | | 11 pure ACK losses, ECI should be assumed to have increased by 10. | |
| | | | |
| A re-ECN sender MAY implement a heuristic algorithm to predict beyond | | A re-ECN sender MAY implement a heuristic algorithm to predict beyond | |
| reasonable doubt that the ECI field probably did not wrap within a | | reasonable doubt that the ECI field probably did not wrap within a | |
|
| sequence of lost pure ACKs. But such an algorithm is NOT REQUIRED. | | sequence of lost pure ACKs. But such an algorithm is OPTIONAL. Such | |
| Such an algorithm MUST NOT be used unless it is proven to work even | | an algorithm MUST NOT be used unless it is proven to work even in the | |
| in the presence of correlation between high ACK loss rate on the back | | presence of correlation between high ACK loss rate on the back | |
| channel and high CE marking rate on the forward channel. | | channel and high CE marking rate on the forward channel. | |
| | | | |
| Whatever assumption a re-ECN sender makes about potentially lost CE | | Whatever assumption a re-ECN sender makes about potentially lost CE | |
| marks, both its congestion control and its re-echoing behaviour | | marks, both its congestion control and its re-echoing behaviour | |
| SHOULD be consistent with the assumption it makes. | | SHOULD be consistent with the assumption it makes. | |
| | | | |
|
| 4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or Nonce ECT Receiver | | 4.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168 compliant ECN | |
| | | Receiver | |
| | | | |
| If the half-connection is in RECN-Co mode, ECN feedback proceeds no | | If the half-connection is in RECN-Co mode, ECN feedback proceeds no | |
|
| differently to that of vanilla ECN. In other words, the receiver | | differently to that of RFC3168 compliant ECN. In other words, the | |
| sets the ECE flag repeatedly in the TCP header and the sender | | receiver sets the ECE flag repeatedly in the TCP header and the | |
| responds by setting the CWR flag. Although RECN-Co mode is used when | | sender responds by setting the CWR flag. Although RECN-Co mode is | |
| the receiver has not implemented the re-ECN protocol, the sender can | | used when the receiver has not implemented the re-ECN protocol, the | |
| infer enough from its vanilla ECN feedback to set or clear the RE | | sender can infer enough from its RFC3168 compliant ECN feedback to | |
| flag reasonably well. Specifically, every time the receiver toggles | | set or clear the RE flag reasonably well. Specifically, every time | |
| the ECE field from "0" to "1" (or a loss is detected), as well as | | the receiver toggles the ECE field from "0" to "1" (or a loss is | |
| setting CWR in the TCP flags, the re-ECN sender MUST blank the RE | | detected), as well as setting CWR in the TCP flags, the re-ECN sender | |
| flag of the next packet to "0" as it would do in full RECN mode. | | MUST blank the RE flag of the next packet to "0" as it would do in | |
| Otherwise, the data sender SHOULD send all other packets with RE set | | full RECN mode. Otherwise, the data sender SHOULD send all other | |
| to "1". Once a flow is established, a re-ECN data sender in RECN-Co | | packets with RE set to "1". Once a flow is established, a re-ECN | |
| mode MUST always set the ECN field to ECT(1). | | data sender in RECN-Co mode MUST always set the ECN field to ECT(1). | |
| | | | |
| If a CE marked packet arrives at the receiver within a round trip | | If a CE marked packet arrives at the receiver within a round trip | |
| time of a previous mark, the receiver will still be echoing ECE for | | time of a previous mark, the receiver will still be echoing ECE for | |
| the last CE mark. Therefore, such a mark will be missed by the | | the last CE mark. Therefore, such a mark will be missed by the | |
| sender. Of course, this isn't of concern for congestion control, but | | sender. Of course, this isn't of concern for congestion control, but | |
| it does mean that very occasionally the RE blanking fraction will be | | it does mean that very occasionally the RE blanking fraction will be | |
| understated. Therefore flows in RECN-Co mode may occasionally be | | understated. Therefore flows in RECN-Co mode may occasionally be | |
| mistaken for very lightly cheating flows and consequently might | | mistaken for very lightly cheating flows and consequently might | |
| suffer a small number of packet drops through an egress dropper | | suffer a small number of packet drops through an egress dropper | |
| (Section 6.1.4). We expect re-ECN would be deployed for some time | | (Section 6.1.4). We expect re-ECN would be deployed for some time | |
| before policers and droppers start to enforce it. So, given there is | | before policers and droppers start to enforce it. So, given there is | |
| not much ECN deployment yet anyway, this minor problem may affect | | not much ECN deployment yet anyway, this minor problem may affect | |
| only a very small proportion of flows, reducing to nothing over the | | only a very small proportion of flows, reducing to nothing over the | |
|
| years as vanilla ECN hosts upgrade. The use of RECN-Co mode would | | years as RFC3168 compliant ECN hosts upgrade. The use of RECN-Co | |
| need to be reviewed in the light of experience at the time of re-ECN | | mode would need to be reviewed in the light of experience at the time | |
| deployment. | | of re-ECN deployment. | |
| | | | |
| RECN-Co mode is OPTIONAL. Re-ECN implementers who want to keep their | | RECN-Co mode is OPTIONAL. Re-ECN implementers who want to keep their | |
| code simple, MAY choose not to implement this mode. If they do not, | | code simple, MAY choose not to implement this mode. If they do not, | |
|
| a re-ECN sender SHOULD fall back to vanilla ECT mode in the presence | | a re-ECN sender SHOULD fall back to RFC3168 compliant ECT mode in the | |
| of an ECN-capable receiver. It MAY choose to fall back to the ECT- | | presence of an ECN-capable receiver. It MAY choose to fall back to | |
| Nonce mode, but if re-ECN implementers don't want to be bothered with | | the ECT-Nonce mode, but if re-ECN implementers don't want to be | |
| RECN-Co mode, they probably won't want to add an ECT-Nonce mode | | bothered with RECN-Co mode, they probably won't want to add an ECT- | |
| either. | | Nonce mode either. | |
| | | | |
| 4.1.2.1. Re-ECN support for the ECN Nonce | | 4.1.2.1. Re-ECN support for the ECN Nonce | |
| | | | |
| A TCP half-connection in RECN-Co mode MUST NOT support the ECN | | A TCP half-connection in RECN-Co mode MUST NOT support the ECN | |
| Nonce [RFC3540]. This means that the sending code of a re-ECN | | Nonce [RFC3540]. This means that the sending code of a re-ECN | |
| implementation will never need to include ECN Nonce support. Re-ECN | | implementation will never need to include ECN Nonce support. Re-ECN | |
| is intended to provide wider protection than the ECN nonce against | | is intended to provide wider protection than the ECN nonce against | |
| congestion control misbehaviour, and re-ECN only requires support | | congestion control misbehaviour, and re-ECN only requires support | |
| from the sender, therefore it is preferable to specifically rule out | | from the sender, therefore it is preferable to specifically rule out | |
| the need for dual sender implementations. As a consequence, a re-ECN | | the need for dual sender implementations. As a consequence, a re-ECN | |
| capable sender will never set ECT(0), so it will be easier for | | capable sender will never set ECT(0), so it will be easier for | |
| network elements to discriminate re-ECN traffic flows from other ECN | | network elements to discriminate re-ECN traffic flows from other ECN | |
| traffic, which will always contain some ECT(0) packets. | | traffic, which will always contain some ECT(0) packets. | |
| | | | |
| However, a re-ECN implementation MAY OPTIONALLY include receiving | | However, a re-ECN implementation MAY OPTIONALLY include receiving | |
| code that complies with the ECN Nonce protocol when interacting with | | code that complies with the ECN Nonce protocol when interacting with | |
| a sender that supports the ECN nonce (rather than re-ECN), but this | | a sender that supports the ECN nonce (rather than re-ECN), but this | |
|
| support is NOT REQUIRED. | | support is not required. | |
| | | | |
| RFC3540 allows an ECN nonce sender to choose whether to sanction a | | RFC3540 allows an ECN nonce sender to choose whether to sanction a | |
| receiver that does not ever set the nonce sum. Given re-ECN is | | receiver that does not ever set the nonce sum. Given re-ECN is | |
| intended to provide wider protection than the ECN nonce against | | intended to provide wider protection than the ECN nonce against | |
| congestion control misbehaviour, implementers of re-ECN receivers MAY | | congestion control misbehaviour, implementers of re-ECN receivers MAY | |
| choose not to implement backwards compatibility with the ECN nonce | | choose not to implement backwards compatibility with the ECN nonce | |
| capability. This may be because they deem that the risk of sanctions | | capability. This may be because they deem that the risk of sanctions | |
| is low, perhaps because significant deployment of the ECN nonce seems | | is low, perhaps because significant deployment of the ECN nonce seems | |
| unlikely at implementation time. | | unlikely at implementation time. | |
| | | | |
| 4.1.3. Capability Negotiation | | 4.1.3. Capability Negotiation | |
| | | | |
| During the TCP hand-shake at the start of a connection, an originator | | During the TCP hand-shake at the start of a connection, an originator | |
| of the connection (host A) with a re-ECN-capable transport MUST | | of the connection (host A) with a re-ECN-capable transport MUST | |
|
| indicate it is Re-ECT by setting the TCP options NS=1, CWR=1 and | | indicate it is Re-ECT by setting the TCP flags NS=1, CWR=1 and ECE=1 | |
| ECE=1 in the initial SYN. | | in the initial SYN. | |
| | | | |
| A responding Re-ECT host (host B) MUST return a SYN ACK with flags | | A responding Re-ECT host (host B) MUST return a SYN ACK with flags | |
| CWR=1 and ECE=0. The responding host MUST NOT set this combination | | CWR=1 and ECE=0. The responding host MUST NOT set this combination | |
| of flags unless the preceding SYN has already indicated Re-ECT | | of flags unless the preceding SYN has already indicated Re-ECT | |
|
| support as above. A Re-ECT server (B) can use either setting of the | | support as above. Normally a Re-ECT server (B) will reply to a Re- | |
| NS flag combined with this type of SYN ACK in response to a SYN from | | ECT client with NS=0, but if the initial SYN from Re-ECT client A is | |
| a Re-ECT client (A). Normally a Re-ECT server will reply to a Re-ECT | | marked CE(-1), a Re-ECT server B MUST increment its local value of | |
| client with NS=0, but in the special circumstance below it can return | | ECC. But B cannot reflect the value of ECC in the SYN ACK, because | |
| a SYN ACK with NS=1. | | it is still using the 3 bits to negotiate connection capabilities. | |
| | | So, server B MUST set the alternative TCP header flags in its SYN | |
| If the initial SYN from Re-ECT client A is marked CE(-1), a Re-ECT | | ACK: NS=1, CWR=1 and ECE=0. | |
| server B MUST increment its local value of ECC. But B cannot reflect | | | |
| the value of ECC in the SYN ACK, because it is still using the 3 bits | | | |
| to negotiate connection capabilities. So, server B MUST set the | | | |
| alternative TCP header flags in its SYN ACK: NS=1, CWR=1 and ECE=0. | | | |
| | | | |
|
| These handshakes are summarised in Table 5 below, with X meaning | | These handshakes are summarised in Table 5 below, with X indicating | |
| `don't care'. The handshakes used for the other flavours of ECN are | | NS can be either 0 or 1 depending on whether congestion had been | |
| | | experienced. The handshakes used for the other flavours of ECN are | |
| also shown for comparison. To compress the width of the table, the | | also shown for comparison. To compress the width of the table, the | |
| headings of the first four columns have been severely abbreviated, as | | headings of the first four columns have been severely abbreviated, as | |
| follows: | | follows: | |
| | | | |
| R: *R*e-ECT | | R: *R*e-ECT | |
| | | | |
| N: ECT-*N*once (RFC3540) | | N: ECT-*N*once (RFC3540) | |
| | | | |
| E: *E*CT (RFC3168) | | E: *E*CT (RFC3168) | |
| | | | |
| | | | |
| skipping to change at page 22, line 47 | | skipping to change at page 23, line 6 | |
| Responder (B) | | Responder (B) | |
| | | | |
| As soon as a re-ECN capable TCP server receives a SYN, it MUST set | | As soon as a re-ECN capable TCP server receives a SYN, it MUST set | |
| its two half-connections into the modes given in Table 5. As soon as | | its two half-connections into the modes given in Table 5. As soon as | |
| a re-ECN capable TCP client receives a SYN ACK, it MUST set its two | | a re-ECN capable TCP client receives a SYN ACK, it MUST set its two | |
| half-connections into the modes given in Table 5. The half- | | half-connections into the modes given in Table 5. The half- | |
| connections will remain in these modes for the rest of the | | connections will remain in these modes for the rest of the | |
| connection, including for the third segment of TCP's three-way hand- | | connection, including for the third segment of TCP's three-way hand- | |
| shake (the ACK). | | shake (the ACK). | |
| | | | |
|
| {ToDo: Consider SYNs within a connection.} | | {ToDo: Consider RSTs within a connection.} | |
| | | | |
| Recall that, if the SYN ACK reflects the same flag settings as the | | Recall that, if the SYN ACK reflects the same flag settings as the | |
|
| preceding SYN (because there is a broken legacy implementation that | | preceding SYN (because there is a broken RFC3168 compliant | |
| behaves this way), RFC3168 specifies that the whole connection MUST | | implementation that behaves this way), RFC3168 specifies that the | |
| revert to Not-ECT. | | whole connection MUST revert to Not-ECT. | |
| | | | |
| Also note that, whenever the SYN flag of a TCP segment is set | | Also note that, whenever the SYN flag of a TCP segment is set | |
| (including when the ACK flag is also set), the NS, CWR and ECE flags | | (including when the ACK flag is also set), the NS, CWR and ECE flags | |
|
| MUST NOT be interpreted as the 3-bit ECI value, which is only set as | | ( i.e the ECI field of the SYNACK) MUST NOT be interpreted as the | |
| a copy of the local ECC value in non-SYN packets. | | 3-bit ECI value, which is only set as a copy of the local ECC value | |
| | | in non-SYN packets. | |
| | | | |
| 4.1.4. Extended ECN (EECN) Field Settings during Flow Start or after | | 4.1.4. Extended ECN (EECN) Field Settings during Flow Start or after | |
| Idle Periods | | Idle Periods | |
| | | | |
| If the originator (A) of a TCP connection supports re-ECN it MUST set | | If the originator (A) of a TCP connection supports re-ECN it MUST set | |
| the extended ECN (EECN) field in the IP header of the initial SYN | | the extended ECN (EECN) field in the IP header of the initial SYN | |
| packet to the feedback not established (FNE) codepoint. | | packet to the feedback not established (FNE) codepoint. | |
| | | | |
| FNE is a new extended ECN codepoint defined by this specification | | FNE is a new extended ECN codepoint defined by this specification | |
|
| (Section 3.2). The feedback not established (FNE) codepoint is used | | (Section 3.3). The feedback not established (FNE) codepoint is used | |
| when the transport does not have the benefit of ECN feedback so it | | when the transport does not have the benefit of ECN feedback so it | |
| cannot decide whether to set or clear the RE flag. | | cannot decide whether to set or clear the RE flag. | |
| | | | |
| If after receiving a SYN the server B has set its sending half- | | If after receiving a SYN the server B has set its sending half- | |
| connection into RECN mode or RECN-Co mode, it MUST set the extended | | connection into RECN mode or RECN-Co mode, it MUST set the extended | |
| ECN field in the IP header of its SYN ACK to the feedback not | | ECN field in the IP header of its SYN ACK to the feedback not | |
| established (FNE) codepoint. Note the careful wording here, which | | established (FNE) codepoint. Note the careful wording here, which | |
| means that Re-ECT server B MUST set FNE on a SYN ACK whether it is | | means that Re-ECT server B MUST set FNE on a SYN ACK whether it is | |
| responding to a SYN from a Re-ECT client or from a client that is | | responding to a SYN from a Re-ECT client or from a client that is | |
|
| merely ECN-capable. | | merely ECN-capable. This is because FNE indicates the transport is | |
| | | ECN capable. | |
| | | | |
| The original ECN specification [RFC3168] required SYNs and SYN ACKs | | The original ECN specification [RFC3168] required SYNs and SYN ACKs | |
| to use the Not-ECT codepoint of the ECN field. The aim was to | | to use the Not-ECT codepoint of the ECN field. The aim was to | |
| prevent well-known DoS attacks such as SYN flooding being able to | | prevent well-known DoS attacks such as SYN flooding being able to | |
| gain from the advantage that ECN capability afforded over drop at | | gain from the advantage that ECN capability afforded over drop at | |
| ECN-capable routers. | | ECN-capable routers. | |
| | | | |
| For a SYN ACK, Kuzmanovic [I-D.ietf-tcpm-ecnsyn] has shown that this | | For a SYN ACK, Kuzmanovic [I-D.ietf-tcpm-ecnsyn] has shown that this | |
| caution was unnecessary, and proposes to allow a SYN ACK to be ECN- | | caution was unnecessary, and proposes to allow a SYN ACK to be ECN- | |
|
| capable to improve performance. We have gone further by proposing to | | capable to improve performance. By stipulating the FNE codepoint for | |
| make the initial SYN ECN-capable too. By stipulating the FNE | | the initial SYN, we comply with RFC3168 in word but not in spirit, | |
| codepoint for the initial SYN, we comply with RFC3168 in word but not | | because we have indeed set the ECN field to Not-ECT, but we have | |
| in spirit, because we have indeed set the ECN field to Not-ECT, but | | extended the ECN field with another bit. And it will be seen | |
| we have extended the ECN field with another bit. And it will be seen | | | |
| (Section 5.3) that we have defined one setting of that bit to mean an | | (Section 5.3) that we have defined one setting of that bit to mean an | |
| ECN-capable transport. Therefore, by proposing that the FNE | | ECN-capable transport. Therefore, by proposing that the FNE | |
| codepoint MUST be used on the initial SYN of a connection, we have | | codepoint MUST be used on the initial SYN of a connection, we have | |
|
| (deliberately) made the initial SYN ECN-capable. Section 5.4 | | gone further by proposing to make the initial SYN ECN-capable too. | |
| justifies deciding to make the initial SYN ECN-capable. | | Section 5.4 justifies deciding to make the initial SYN ECN-capable. | |
| | | | |
| Once a TCP half connection is in RECN mode or RECN-Co mode, FNE will | | Once a TCP half connection is in RECN mode or RECN-Co mode, FNE will | |
| have already been set on the initial SYN and possibly the SYN ACK as | | have already been set on the initial SYN and possibly the SYN ACK as | |
| above. But each re-ECN sender will have to set FNE cautiously on a | | above. But each re-ECN sender will have to set FNE cautiously on a | |
| few data packets as well, given a number of packets will usually have | | few data packets as well, given a number of packets will usually have | |
| to be sent before sufficient congestion feedback is received. The | | to be sent before sufficient congestion feedback is received. The | |
| behaviour will be different depending on the mode of the half- | | behaviour will be different depending on the mode of the half- | |
| connection: | | connection: | |
| | | | |
| RECN mode: Given the constraints on TCP's initial window [RFC3390] | | RECN mode: Given the constraints on TCP's initial window [RFC3390] | |
| and its exponential window increase during slow start | | and its exponential window increase during slow start | |
| phase [RFC2581], it turns out that the sender SHOULD set FNE on | | phase [RFC2581], it turns out that the sender SHOULD set FNE on | |
|
| the first and third data packets in its flow, assuming equal sized | | the first and third data packets in its flow after the initial | |
| data packets once a flow is established. Appendix D presents the | | 3-way handshake, assuming equal sized data packets once a flow is | |
| calculation that led to this conclusion. Below, after running | | established. Appendix D presents the calculation that led to this | |
| through the start of an example TCP session, we give the intuition | | conclusion. Below, after running through the start of an example | |
| learned from that calculation. | | TCP session, we give the intuition learned from that calculation. | |
| | | | |
| RECN-Co mode: A re-ECT sender that switches into re-ECN | | RECN-Co mode: A re-ECT sender that switches into re-ECN | |
| compatibility mode or into Not-ECT mode (because it has detected | | compatibility mode or into Not-ECT mode (because it has detected | |
| the corresponding host is not re-ECN capable) MUST limit its | | the corresponding host is not re-ECN capable) MUST limit its | |
| initial window to 1 segment. The reasoning behind this constraint | | initial window to 1 segment. The reasoning behind this constraint | |
| is given in Section 5.4. Having set this initial window, a re-ECN | | is given in Section 5.4. Having set this initial window, a re-ECN | |
| sender in RECN-Co mode SHOULD set FNE on the first and third data | | sender in RECN-Co mode SHOULD set FNE on the first and third data | |
| packets in a flow, as for RECN mode. | | packets in a flow, as for RECN mode. | |
| | | | |
| +----+------+----------------+-------+-------+---------------+------+ | | +----+------+----------------+-------+-------+---------------+------+ | |
| | | | |
| skipping to change at page 25, line 21 | | skipping to change at page 25, line 49 | |
| (EECN) field. | | (EECN) field. | |
| | | | |
| Also shown on the receiving side of the table is the value of the | | Also shown on the receiving side of the table is the value of the | |
| receiver's echo congestion counter (R.ECC) after processing the | | receiver's echo congestion counter (R.ECC) after processing the | |
| incoming EECN header. Note that, once a host sets a half-connection | | incoming EECN header. Note that, once a host sets a half-connection | |
| into RECN mode, it MUST initialise its local value of ECC to zero. | | into RECN mode, it MUST initialise its local value of ECC to zero. | |
| | | | |
| The intuition that Appendix D gives for why a sender should set FNE | | The intuition that Appendix D gives for why a sender should set FNE | |
| on the first and third data packets is as follows. At line 13, a | | on the first and third data packets is as follows. At line 13, a | |
| packet sent by B is shown with an '*', which means it has been | | packet sent by B is shown with an '*', which means it has been | |
|
| congestion marked by an intermediate router from RECT to CE(-1). On | | congestion marked by an intermediate queue from RECT to CE(-1). On | |
| receiving this CE marked packet, client A increments its ECC counter | | receiving this CE marked packet, client A increments its ECC counter | |
| to 1 as shown. This was the 7th data packet B sent, but before | | to 1 as shown. This was the 7th data packet B sent, but before | |
| feedback about this event returns to B, it might well have sent many | | feedback about this event returns to B, it might well have sent many | |
| more packets. Indeed, during exponential slow start, about as many | | more packets. Indeed, during exponential slow start, about as many | |
| packets will be in flight (unacknowledged) as have been acknowledged. | | packets will be in flight (unacknowledged) as have been acknowledged. | |
| So, when the feedback from the congestion event on B's 7th segment | | So, when the feedback from the congestion event on B's 7th segment | |
| returns, B will have sent about 7 further packets that will still be | | returns, B will have sent about 7 further packets that will still be | |
| in flight. At that stage, B's best estimate of the network's packet | | in flight. At that stage, B's best estimate of the network's packet | |
| marking fraction will be 1/7. So, as B will have sent about 14 | | marking fraction will be 1/7. So, as B will have sent about 14 | |
| packets, it should have already marked 2 of them as FNE in order to | | packets, it should have already marked 2 of them as FNE in order to | |
| | | | |
| skipping to change at page 26, line 19 | | skipping to change at page 26, line 46 | |
| that the design of network policers can be deterministic, this | | that the design of network policers can be deterministic, this | |
| specification deliberately puts an absolute lower limit on how long a | | specification deliberately puts an absolute lower limit on how long a | |
| connection can be idle before the packet that resumes the connection | | connection can be idle before the packet that resumes the connection | |
| must be set to FNE, rather than relating it to the connection round | | must be set to FNE, rather than relating it to the connection round | |
| trip time. We use the lower bound of the retransmission timeout | | trip time. We use the lower bound of the retransmission timeout | |
| (RTO) [RFC2988], which is commonly used as the idle period before TCP | | (RTO) [RFC2988], which is commonly used as the idle period before TCP | |
| must reduce to the restart window [RFC2581]. Note our specification | | must reduce to the restart window [RFC2581]. Note our specification | |
| of re-ECN's idle period is NOT intended to change the idle period for | | of re-ECN's idle period is NOT intended to change the idle period for | |
| TCP's restart, nor indeed for any other purposes. | | TCP's restart, nor indeed for any other purposes. | |
| | | | |
|
| {ToDo: Describe how the sender falls back to legacy modes if packets | | {ToDo: Describe how the sender falls back to RFC3168 modes if packets | |
| don't appear to be getting through (to work round firewalls | | don't appear to be getting through (to work round firewalls | |
| discarding packets they consider unusual).} | | discarding packets they consider unusual).} | |
| | | | |
| 4.1.5. Pure ACKS, Retransmissions, Window Probes and Partial ACKs | | 4.1.5. Pure ACKS, Retransmissions, Window Probes and Partial ACKs | |
| | | | |
| A re-ECN sender MUST clear the RE flag to "0" and set the ECN field | | A re-ECN sender MUST clear the RE flag to "0" and set the ECN field | |
| to Not-ECT in pure ACKs, retransmissions and window probes, as | | to Not-ECT in pure ACKs, retransmissions and window probes, as | |
| specified in [RFC3168]. Our eventual goal is for all packets to be | | specified in [RFC3168]. Our eventual goal is for all packets to be | |
| sent with re-ECN enabled, and we believe the semantics of the ECI | | sent with re-ECN enabled, and we believe the semantics of the ECI | |
| field go a long way towards being able to achieve this. However, we | | field go a long way towards being able to achieve this. However, we | |
| | | | |
| skipping to change at page 26, line 46 | | skipping to change at page 27, line 28 | |
| general principle we work to is to remain compatible with TCP's | | general principle we work to is to remain compatible with TCP's | |
| congestion control which is driven by congestion events at packet | | congestion control which is driven by congestion events at packet | |
| granularity while at the same time aiming to blank the RE flag on at | | granularity while at the same time aiming to blank the RE flag on at | |
| least as many octets in a flow as have been marked CE. | | least as many octets in a flow as have been marked CE. | |
| | | | |
| Therefore, a re-ECN TCP receiver MUST increment its ECC value as many | | Therefore, a re-ECN TCP receiver MUST increment its ECC value as many | |
| times as CE marked packets have been received. And that value MUST | | times as CE marked packets have been received. And that value MUST | |
| be echoed to the sender in the first available ACK using the ECI | | be echoed to the sender in the first available ACK using the ECI | |
| field. This ensures the TCP sender's congestion control receives | | field. This ensures the TCP sender's congestion control receives | |
| timely feedback on congestion events at the same packet granularity | | timely feedback on congestion events at the same packet granularity | |
|
| that they were generated on congested routers. | | that they were generated on congested queues. | |
| | | | |
| Then, a re-ECN sender stores the difference D between its own ECC | | Then, a re-ECN sender stores the difference D between its own ECC | |
| value and the incoming ECI field by incrementing a counter R. Then, R | | value and the incoming ECI field by incrementing a counter R. Then, R | |
| is decremented by 1 each subsequent packet that is sent with the RE | | is decremented by 1 each subsequent packet that is sent with the RE | |
| flag blanked, until R is no longer positive. Using this technique, | | flag blanked, until R is no longer positive. Using this technique, | |
|
| whenever a re-ECN transport sends a not re-ECN capable (NRECN) packet | | whenever a re-ECN transport sends a not re-ECN capable packet (e.g. a | |
| (e.g. a retransmission), the remaining packets required to have the | | retransmission), the remaining packets required to have the RE flag | |
| RE flag blanked will be automatically carried over to subsequent | | blanked will be automatically carried over to subsequent packets, | |
| packets, through the variable R. | | through the variable R. | |
| | | | |
| This does not ensure precisely the same number of octets have RE | | This does not ensure precisely the same number of octets have RE | |
| blanked as were CE marked. But we believe positive errors will | | blanked as were CE marked. But we believe positive errors will | |
| cancel negative over a long enough period. {ToDo: However, more | | cancel negative over a long enough period. {ToDo: However, more | |
| research is needed to prove whether this is so. If it is not, it may | | research is needed to prove whether this is so. If it is not, it may | |
| be necessary to increment and decrement R in octets rather than | | be necessary to increment and decrement R in octets rather than | |
| packets, by incrementing R as the product of D and the size in octets | | packets, by incrementing R as the product of D and the size in octets | |
| of packets being sent (typically the MSS).} | | of packets being sent (typically the MSS).} | |
| | | | |
| 4.2. Other Transports | | 4.2. Other Transports | |
| | | | |
| 4.2.1. General Guidelines for Adding Re-ECN to Other Transports | | 4.2.1. General Guidelines for Adding Re-ECN to Other Transports | |
| | | | |
|
| Re-ECT sender transports that have established the receiver transport | | As a general rule, Re-ECT sender transports that have established the | |
| is at least ECN-capable (not necessarily re-ECN capable) MUST blank | | receiver transport is at least ECN-capable (not necessarily re-ECN | |
| the RE codepoint in packets carrying at least as many octets as | | capable) MUST blank the RE codepoint for at least as many octets as | |
| arrive at receiver with the CE codepoint set. Re-ECN-capable sender | | arrive at receiver with the CE codepoint set. Re-ECN-capable sender | |
| transports should always initialise the ECN field to the ECT(1) | | transports should always initialise the ECN field to the ECT(1) | |
| codepoint once a flow is established. | | codepoint once a flow is established. | |
| | | | |
| If the sender transport does not have sufficient feedback to even | | If the sender transport does not have sufficient feedback to even | |
| estimate the path's CE rate, it SHOULD set FNE continuously. If the | | estimate the path's CE rate, it SHOULD set FNE continuously. If the | |
| sender transport has some, perhaps stale, feedback to estimate that | | sender transport has some, perhaps stale, feedback to estimate that | |
| the path's CE rate is nearly definitely less than E%, the transport | | the path's CE rate is nearly definitely less than E%, the transport | |
| MAY blank RE in packets for E% of sent octets, and set the RECT | | MAY blank RE in packets for E% of sent octets, and set the RECT | |
| codepoint for the remainder. | | codepoint for the remainder. | |
| | | | |
| skipping to change at page 28, line 25 | | skipping to change at page 29, line 7 | |
| | | | |
| 4.2.3. Guidelines for adding Re-ECN to DCCP | | 4.2.3. Guidelines for adding Re-ECN to DCCP | |
| | | | |
| Beside adjusting the initial features negotiation sequence, operating | | Beside adjusting the initial features negotiation sequence, operating | |
| re-ECN in DCCP [RFC4340] could be achieved by defining a new option | | re-ECN in DCCP [RFC4340] could be achieved by defining a new option | |
| to be added to acknowledgments, that would include a multibit field | | to be added to acknowledgments, that would include a multibit field | |
| where the destination could copy its ECC. | | where the destination could copy its ECC. | |
| | | | |
| 4.2.4. Guidelines for adding Re-ECN to SCTP | | 4.2.4. Guidelines for adding Re-ECN to SCTP | |
| | | | |
|
| Annex 1 in [RFC2960] gives the specifications for SCTP to support | | Appendix A in [RFC4960] gives the specifications for SCTP to support | |
| ECN. Similar steps should be taken to support re-ECN. Beside | | ECN. Similar steps should be taken to support re-ECN. Beside | |
| adjusting the initial features negotiation sequence, operating re-ECN | | adjusting the initial features negotiation sequence, operating re-ECN | |
| in SCTP could be achieved by defining a new control chunk, that would | | in SCTP could be achieved by defining a new control chunk, that would | |
| include a multibit field where the destination could copy its ECC | | include a multibit field where the destination could copy its ECC | |
| | | | |
| 5. Network Layer | | 5. Network Layer | |
| | | | |
| 5.1. Re-ECN IPv4 Wire Protocol | | 5.1. Re-ECN IPv4 Wire Protocol | |
| | | | |
| The wire protocol of the ECN field in the IP header remains largely | | The wire protocol of the ECN field in the IP header remains largely | |
| unchanged from [RFC3168]. However, an extension to the ECN field we | | unchanged from [RFC3168]. However, an extension to the ECN field we | |
|
| call the RE (re-ECN extension) flag (Section 3.2) is defined in this | | call the RE (Re-ECN extension) flag (Section 3.3) is defined in this | |
| document. It doubles the extended ECN codepoint space, giving 8 | | document. It doubles the extended ECN codepoint space, giving 8 | |
| potential codepoints. The semantics of the extra codepoints are | | potential codepoints. The semantics of the extra codepoints are | |
| backward compatible with the semantics of the 4 original codepoints | | backward compatible with the semantics of the 4 original codepoints | |
| [RFC3168] (Section 7.1 collects together and summarises all the | | [RFC3168] (Section 7.1 collects together and summarises all the | |
| changes defined in this document). | | changes defined in this document). | |
| | | | |
| For IPv4, this document proposes that the new RE control flag will be | | For IPv4, this document proposes that the new RE control flag will be | |
| positioned where the `reserved' control flag was at bit 48 of the | | positioned where the `reserved' control flag was at bit 48 of the | |
| IPv4 header (counting from 0). Alternatively, some would call this | | IPv4 header (counting from 0). Alternatively, some would call this | |
| bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4 | | bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4 | |
| | | | |
| skipping to change at page 30, line 21 | | skipping to change at page 30, line 50 | |
| 0 1 2 3 | | 0 1 2 3 | |
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| | Next Header | Hdr ext Len | Option Type | Opt Length =4 | | | | Next Header | Hdr ext Len | Option Type | Opt Length =4 | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| |R| Reserved for future use | | | |R| Reserved for future use | | |
| |E| | | | |E| | | |
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |
| | | | |
| Figure 6: Definition of a New IPv6 Congestion Hop by Hop Option | | Figure 6: Definition of a New IPv6 Congestion Hop by Hop Option | |
|
| Header containing the Re-ECN Extension (RE) Control Flag | | Header containing the re-ECN Extension (RE) Control Flag | |
| | | | |
| 0 1 2 3 4 5 6 7 8 | | 0 1 2 3 4 5 6 7 8 | |
| +-+-+-+-+-+-+-+-+- | | +-+-+-+-+-+-+-+-+- | |
| |AIU|C|Option ID| | | |AIU|C|Option ID| | |
| +-+-+-+-+-+-+-+-+- | | +-+-+-+-+-+-+-+-+- | |
| | | | |
| Figure 7: Congestion Hop by Hop Option Type Encoding | | Figure 7: Congestion Hop by Hop Option Type Encoding | |
| | | | |
| The Hop-by-Hop Options header enables packets to carry information to | | The Hop-by-Hop Options header enables packets to carry information to | |
| be examined and processed by routers or nodes along the packet's | | be examined and processed by routers or nodes along the packet's | |
| delivery path, including the source and destination nodes. For re- | | delivery path, including the source and destination nodes. For re- | |
| | | | |
| skipping to change at page 30, line 44 | | skipping to change at page 31, line 25 | |
| Congestion extension header MUST be set to "00" meaning if | | Congestion extension header MUST be set to "00" meaning if | |
| unrecognized `skip over option and continue processing the header'. | | unrecognized `skip over option and continue processing the header'. | |
| Then, any routers or a receiver not upgraded with the optional re-ECN | | Then, any routers or a receiver not upgraded with the optional re-ECN | |
| features described in this memo will simply ignore this header. But | | features described in this memo will simply ignore this header. But | |
| routers with these optional re-ECN features or a re-ECN policing | | routers with these optional re-ECN features or a re-ECN policing | |
| function, will process this Congestion extension header. | | function, will process this Congestion extension header. | |
| | | | |
| The `C' flag MUST be set to "1" to specify that the Option Data | | The `C' flag MUST be set to "1" to specify that the Option Data | |
| (currently only the RE control flag) can change en-route to the | | (currently only the RE control flag) can change en-route to the | |
| packet's final destination. This ensures that, when an | | packet's final destination. This ensures that, when an | |
|
| Authentication header (AH [RFC2402]) is present in the packet, for | | Authentication header (AH [RFC4302]) is present in the packet, for | |
| any option whose data may change en-route, its entire Option Data | | any option whose data may change en-route, its entire Option Data | |
| field will be treated as zero-valued octets when computing or | | field will be treated as zero-valued octets when computing or | |
| verifying the packet's authenticating value. | | verifying the packet's authenticating value. | |
| | | | |
| Although the RE control flag should not be changed along the path, we | | Although the RE control flag should not be changed along the path, we | |
| expect that the rest of this option field that is currently `Reserved | | expect that the rest of this option field that is currently `Reserved | |
| for future use' could be used for a multi-bit congestion notification | | for future use' could be used for a multi-bit congestion notification | |
| field which we would expect to change en route. As the RE flag does | | field which we would expect to change en route. As the RE flag does | |
| not need end-to-end authentication, we set the C flag to '1'. | | not need end-to-end authentication, we set the C flag to '1'. | |
| | | | |
| | | | |
| skipping to change at page 31, line 19 | | skipping to change at page 31, line 48 | |
| | | | |
| 5.3. Router Forwarding Behaviour | | 5.3. Router Forwarding Behaviour | |
| | | | |
| Re-ECN works well without modifying the forwarding behaviour of any | | Re-ECN works well without modifying the forwarding behaviour of any | |
| routers. However, below, two OPTIONAL changes to forwarding | | routers. However, below, two OPTIONAL changes to forwarding | |
| behaviour are defined which respectively enhance performance and | | behaviour are defined which respectively enhance performance and | |
| improve a router's discrimination against flooding attacks. They are | | improve a router's discrimination against flooding attacks. They are | |
| both OPTIONAL additions that we propose MAY apply by default to all | | both OPTIONAL additions that we propose MAY apply by default to all | |
| Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN | | Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN | |
| marking behaviours [RFC3168]. Specifications for PHBs MAY define | | marking behaviours [RFC3168]. Specifications for PHBs MAY define | |
|
| different forwarding behaviours from this default, but this is NOT | | different forwarding behaviours from this default, but this is not | |
| REQUIRED. [Re-PCN] is one example. | | required. [Re-PCN] is one example. | |
| | | | |
| FNE indicates ECT: | | FNE indicates ECT: | |
| | | | |
| The FNE codepoint tells a router to assume that the packet was | | The FNE codepoint tells a router to assume that the packet was | |
| sent by an ECN-capable transport (see Section 5.4). Therefore an | | sent by an ECN-capable transport (see Section 5.4). Therefore an | |
| FNE packet MAY be marked rather than dropped. Note that the FNE | | FNE packet MAY be marked rather than dropped. Note that the FNE | |
|
| codepoint has been intentionally chosen so that, to legacy routers | | codepoint has been intentionally chosen so that, to RFC3168 | |
| (which do not inspect the RE flag) an FNE packet appears to be | | compliant routers (which do not inspect the RE flag) an FNE packet | |
| Not-ECT so it will be dropped by legacy AQM algorithms. | | appears to be Not-ECT so it will be dropped by legacy AQM | |
| | | algorithms. | |
| | | | |
|
| A network operator MUST NOT configure a router to ECN mark rather | | A network operator MUST NOT configure a queue to ECN mark rather | |
| than drop FNE packets unless it can guarantee that FNE packets | | than drop FNE packets unless it can guarantee that FNE packets | |
| will be rate limited, either locally or upstream. The ingress | | will be rate limited, either locally or upstream. The ingress | |
| policers discussed in Section 6.1.5 would count as rate limiters | | policers discussed in Section 6.1.5 would count as rate limiters | |
| for this purpose. | | for this purpose. | |
| | | | |
|
| Preferential Drop: If a re-ECN capable router experiences very high | | Preferential Drop: If a re-ECN capable router queue experiences very | |
| load so that it has to drop arriving packets (e.g. a DoS attack), | | high load so that it has to drop arriving packets (e.g. a DoS | |
| it MAY preferentially drop packets within the same Diffserv PHB | | attack), it MAY preferentially drop packets within the same | |
| using the preference order for extended ECN codepoints given in | | Diffserv PHB using the preference order for extended ECN | |
| Table 7. Preferential dropping can be difficult to implement on | | codepoints given in Table 7. Preferential dropping can be | |
| some hardware, but if feasible it would discriminate against | | difficult to implement on some hardware, but if feasible it would | |
| attack traffic if done as part of the overall policing framework | | discriminate against attack traffic if done as part of the overall | |
| of Section 6.1.3. If nowhere else, routers at the egress of a | | policing framework of Section 6.1.3. If nowhere else, routers at | |
| network SHOULD implement preferential drop (stronger than the MAY | | the egress of a network SHOULD implement preferential drop | |
| above). For simplicity, preferences 4 & 5 MAY be merged into one | | (stronger than the MAY above). For simplicity, preferences 4 & 5 | |
| preference level. | | MAY be merged into one preference level. | |
| | | | |
| +-------+-----+------------+-------+------------+-------------------+ | | +-------+-----+------------+-------+------------+-------------------+ | |
| | ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | | | ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | |
| | field | bit | ECN | | (1 = drop | | | | | field | bit | ECN | | (1 = drop | | | |
| | | | codepoint | | 1st) | | | | | | | codepoint | | 1st) | | | |
| +-------+-----+------------+-------+------------+-------------------+ | | +-------+-----+------------+-------+------------+-------------------+ | |
| | 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | | | 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | |
| | | | | | | congestion and | | | | | | | | | congestion and | | |
| | | | | | | RECT | | | | | | | | | RECT | | |
| | 00 | 1 | FNE | +1 | 4 | Feedback not | | | | 00 | 1 | FNE | +1 | 4 | Feedback not | | |
| | | | | | | established | | | | | | | | | established | | |
| | 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | | | 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | |
| | | | | | | by congestion | | | | | | | | | by congestion | | |
| | | | | | | experienced | | | | | | | | | experienced | | |
| | 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | | | 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | |
| | | | | | | transport | | | | | | | | | transport | | |
| | 11 | 1 | CE(-1) | -1 | 3 | Congestion | | | | 11 | 1 | CE(-1) | -1 | 3 | Congestion | | |
| | | | | | | experienced | | | | | | | | | experienced | | |
| | 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | | | 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | |
|
| | 10 | 0 | --- | n/a | 2 | Legacy ECN use | | | | 10 | 0 | --- | n/a | 2 | RFC3168 ECN use | | |
| | | | | | | only | | | | | | | | | only | | |
| | 00 | 0 | Not-RECT | n/a | 1 | Not | | | | 00 | 0 | Not-RECT | n/a | 1 | Not | | |
|
| | | | | | | re-ECN-capable | | | | | | | | | Re-ECN-capable | | |
| | | | | | | transport | | | | | | | | | transport | | |
| +-------+-----+------------+-------+------------+-------------------+ | | +-------+-----+------------+-------+------------+-------------------+ | |
| | | | |
| Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth') | | Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth') | |
| | | | |
| The above drop preferences are arranged to preserve packets with | | The above drop preferences are arranged to preserve packets with | |
|
| more positive worth (Section 3.4), given senders of positive | | more positive worth (Section 3.5), given senders of positive | |
| packets must have honestly declared downstream congestion. This | | packets must have honestly declared downstream congestion. This | |
| is explained fully in Section 6 on applications, particularly when | | is explained fully in Section 6 on applications, particularly when | |
| the application of re-ECN to protect against DDoS attacks is | | the application of re-ECN to protect against DDoS attacks is | |
| described. | | described. | |
| | | | |
| 5.4. Justification for Setting the First SYN to FNE | | 5.4. Justification for Setting the First SYN to FNE | |
| | | | |
|
| Congested routers may mark an FNE packet to CE(-1) (Section 5.3), and | | the initial SYN MUST be set to FNE by Re-ECT client A (Section 4.1.4) | |
| the initial SYN MUST be set to FNE by Re-ECT client A | | and (Section 5.3) says a queue MAY optionally treat an FNE packet as | |
| (Section 4.1.4). So an initial SYN may be marked CE(-1) rather than | | ECN capable, so an initial SYN may be marked CE(-1) rather than | |
| dropped. This seems dangerous, because the sender has not yet | | dropped. This seems dangerous, because the sender has not yet | |
|
| established whether the receiver is a legacy one that does not | | established whether the receiver is a RFC3168 one that does not | |
| understand congestion marking. It also seems to allow malicious | | understand congestion marking. It also seems to allow malicious | |
| senders to take advantage of ECN marking to avoid so much drop when | | senders to take advantage of ECN marking to avoid so much drop when | |
| launching SYN flooding attacks. Below we explain the features of the | | launching SYN flooding attacks. Below we explain the features of the | |
| protocol design that remove both these dangers. | | protocol design that remove both these dangers. | |
| | | | |
| ECN-capable initial SYN with a Not-ECT server: If the TCP server B | | ECN-capable initial SYN with a Not-ECT server: If the TCP server B | |
| is re-ECN capable, provision is made for it to feedback a possible | | is re-ECN capable, provision is made for it to feedback a possible | |
| congestion marked SYN in the SYN ACK (Section 4.1.4). But if the | | congestion marked SYN in the SYN ACK (Section 4.1.4). But if the | |
| TCP client A finds out from the SYN ACK that the server was not | | TCP client A finds out from the SYN ACK that the server was not | |
|
| ECN-capable, the TCP client MUST consider the first SYN as | | ECN-capable, the TCP client MUST conservatively consider the first | |
| congestion marked before setting itself into Not-ECT mode. | | SYN as congestion marked before setting itself into Not-ECT mode. | |
| Section 4.1.4 mandates that such a TCP client MUST also set its | | Section 4.1.4 mandates that such a TCP client MUST also set its | |
| initial window to 1 segment. In this way we remove the need to | | initial window to 1 segment. In this way we remove the need to | |
| cautiously avoid setting the first SYN to Not-RECT. This will | | cautiously avoid setting the first SYN to Not-RECT. This will | |
| give worse performance while deployment is patchy, but better | | give worse performance while deployment is patchy, but better | |
| performance once deployment is widespread. | | performance once deployment is widespread. | |
| | | | |
| SYN flooding attacks can't exploit ECN-capability: Malicious hosts | | SYN flooding attacks can't exploit ECN-capability: Malicious hosts | |
| may think they can use the advantage that ECN-marking gives over | | may think they can use the advantage that ECN-marking gives over | |
| drop in launching classic SYN-flood attacks. But Section 5.3 | | drop in launching classic SYN-flood attacks. But Section 5.3 | |
| mandates that a router MUST only be configured to treat packets | | mandates that a router MUST only be configured to treat packets | |
| with the FNE codepoint as ECN-capable if FNE packets are rate | | with the FNE codepoint as ECN-capable if FNE packets are rate | |
|
| limited. Introduction of the FNE codepoint was a deliberate move | | limited somewhere. Introduction of the FNE codepoint was a | |
| to enable transport-neutral handling of flow-start and flow state | | deliberate move to enable transport-neutral handling of flow-start | |
| set-up in the IP layer where it belongs. It then becomes possible | | and flow state set-up in the IP layer where it belongs. It then | |
| to protect against flooding attacks of all forms (not just SYN | | becomes possible to protect against flooding attacks of all forms | |
| flooding) without transport-specific inspection for things like | | (not just SYN flooding) without transport-specific inspection for | |
| the SYN flag in TCP headers. Then, for instance, SYN flooding | | things like the SYN flag in TCP headers. Then, for instance, SYN | |
| attacks using IPSec ESP encryption can also be rate limited at the | | flooding attacks using IPSec ESP encryption can also be rate | |
| IP layer. | | limited at the IP layer. | |
| | | | |
| It might seem pedantic going to all this trouble to enable ECN on the | | It might seem pedantic going to all this trouble to enable ECN on the | |
| initial packet of a flow, but it is motivated by a much wider concern | | initial packet of a flow, but it is motivated by a much wider concern | |
| to ensure safe congestion control will still be possible even if the | | to ensure safe congestion control will still be possible even if the | |
| application mix evolves to the point where the majority of flows | | application mix evolves to the point where the majority of flows | |
| consist of a single window or even a single packet. It also allows | | consist of a single window or even a single packet. It also allows | |
| denial of service attacks to be more easily isolated and prevented. | | denial of service attacks to be more easily isolated and prevented. | |
| | | | |
| 5.5. Control and Management | | 5.5. Control and Management | |
| | | | |
| | | | |
| skipping to change at page 35, line 15 | | skipping to change at page 36, line 15 | |
| flag should be the same as the inner. If it isn't a management alarm | | flag should be the same as the inner. If it isn't a management alarm | |
| should be raised. This behaviour is the same as the full- | | should be raised. This behaviour is the same as the full- | |
| functionality variant of [RFC3168] at tunnel exit, but different at | | functionality variant of [RFC3168] at tunnel exit, but different at | |
| tunnel entry. | | tunnel entry. | |
| | | | |
| If tunnels are left as they are specified in [RFC3168], whether the | | If tunnels are left as they are specified in [RFC3168], whether the | |
| limited or full-functionality variants are used, a problem arises | | limited or full-functionality variants are used, a problem arises | |
| with re-ECN if a tunnel crosses an inter-domain boundary, because the | | with re-ECN if a tunnel crosses an inter-domain boundary, because the | |
| difference between positive and negative markings will not be | | difference between positive and negative markings will not be | |
| correctly accounted for. In a limited functionality ECN tunnel, the | | correctly accounted for. In a limited functionality ECN tunnel, the | |
|
| flow will appear to be legacy traffic, and therefore may be wrongly | | flow will appear to be RFC3168 compliant traffic, and therefore may | |
| rate limited. In a full-functionality ECN tunnel, the result will | | be wrongly rate limited. In a full-functionality ECN tunnel, the | |
| depend whether the tunnel entry copies the inner RE flag to the outer | | result will depend whether the tunnel entry copies the inner RE flag | |
| header or the RE flag in the outer header is always cleared. If the | | to the outer header or the RE flag in the outer header is always | |
| former, the flow will tend to be too positive when accounted for at | | cleared. If the former, the flow will tend to be too positive when | |
| borders. If the latter, it will be too negative. If the rules set | | accounted for at borders. If the latter, it will be too negative. | |
| out in [ECN-tunnel] are followed then this will not be an issue. | | If the rules set out in [ECN-tunnel] are followed then this will not | |
| | | be an issue. | |
| | | | |
| 5.7. Non-Issues | | 5.7. Non-Issues | |
| | | | |
| The following issues might seem to cause unfavourable interactions | | The following issues might seem to cause unfavourable interactions | |
| with re-ECN, but we will explain why they don't: | | with re-ECN, but we will explain why they don't: | |
| | | | |
| o Various link layers support explicit congestion notification, such | | o Various link layers support explicit congestion notification, such | |
| as Frame Relay and ATM. Explicit congestion notification is | | as Frame Relay and ATM. Explicit congestion notification is | |
| proposed to be added to other link layers, such as Ethernet | | proposed to be added to other link layers, such as Ethernet | |
|
| (802.3ar Ethernet congestion management) and MPLS [ECN-MPLS]; | | (802.3ar Ethernet congestion management) and MPLS [RFC5129]; | |
| | | | |
| o Encryption and IPSec. | | o Encryption and IPSec. | |
| | | | |
| In the case of congestion notification at the link layer, each | | In the case of congestion notification at the link layer, each | |
| particular link layer scheme either manages congestion on the link | | particular link layer scheme either manages congestion on the link | |
| with its own link-level feedback (the usual arrangement in the cases | | with its own link-level feedback (the usual arrangement in the cases | |
| of ATM and Frame Relay), or congestion notification from the link | | of ATM and Frame Relay), or congestion notification from the link | |
| layer is merged into congestion notification at the IP level when the | | layer is merged into congestion notification at the IP level when the | |
| frame headers are decapsulated at the end of the link (the | | frame headers are decapsulated at the end of the link (the | |
| recommended arrangement in the Ethernet and MPLS cases). Given the | | recommended arrangement in the Ethernet and MPLS cases). Given the | |
| | | | |
| skipping to change at page 36, line 6 | | skipping to change at page 37, line 7 | |
| is processed on the path by subtracting positive from negative | | is processed on the path by subtracting positive from negative | |
| markings. | | markings. | |
| | | | |
| In the case of encryption, as long as the tunnel issues described in | | In the case of encryption, as long as the tunnel issues described in | |
| Section 5.6 are dealt with, payload encryption itself will not be a | | Section 5.6 are dealt with, payload encryption itself will not be a | |
| problem. The design goal of re-ECN is to include downstream | | problem. The design goal of re-ECN is to include downstream | |
| congestion in the IP header so that it is not necessary to bury into | | congestion in the IP header so that it is not necessary to bury into | |
| inner headers. Obfuscation of flow identifiers is not a problem for | | inner headers. Obfuscation of flow identifiers is not a problem for | |
| re-ECN policing elements. Re-ECN doesn't ever require flow | | re-ECN policing elements. Re-ECN doesn't ever require flow | |
| identifiers to be valid, it only requires them to be unique. So if | | identifiers to be valid, it only requires them to be unique. So if | |
|
| an IPSec encapsulating security payload (ESP [RFC2406]) or an | | an IPSec encapsulating security payload (ESP [RFC4305]) or an | |
| authentication header (AH [RFC2402]) is used, the security parameters | | authentication header (AH [RFC4302]) is used, the security parameters | |
| index (SPI) will be a sufficient flow identifier, as it is intended | | index (SPI) will be a sufficient flow identifier, as it is intended | |
| to be unique to a flow without revealing actual port numbers. | | to be unique to a flow without revealing actual port numbers. | |
| | | | |
| In general, even if endpoints use some locally agreed scheme to hide | | In general, even if endpoints use some locally agreed scheme to hide | |
| port numbers, re-ECN policing elements can just consider the pair of | | port numbers, re-ECN policing elements can just consider the pair of | |
| source and destination IP addresses as the flow identifier. Re-ECN | | source and destination IP addresses as the flow identifier. Re-ECN | |
| encourages endpoints to at least tell the network layer that a | | encourages endpoints to at least tell the network layer that a | |
| sequence of packets are all part of the same flow, if indeed they | | sequence of packets are all part of the same flow, if indeed they | |
| are. The alternative would be for the sender to make each packet | | are. The alternative would be for the sender to make each packet | |
| appear to be a new flow, which would require them all to be marked | | appear to be a new flow, which would require them all to be marked | |
| | | | |
| skipping to change at page 39, line 9 | | skipping to change at page 40, line 9 | |
| delay using re-feedback. We give a simple outline of how this could | | delay using re-feedback. We give a simple outline of how this could | |
| work in Appendix F. However, we do not expect this to be necessary, | | work in Appendix F. However, we do not expect this to be necessary, | |
| as researchers tend to agree that only congestion control dynamics | | as researchers tend to agree that only congestion control dynamics | |
| need to depend on RTT, not the rate that the algorithm would converge | | need to depend on RTT, not the rate that the algorithm would converge | |
| on after a period of stability. | | on after a period of stability. | |
| | | | |
| Figure 8 sketches the incentive framework that we will describe piece | | Figure 8 sketches the incentive framework that we will describe piece | |
| by piece throughout this section. We will do a first pass in | | by piece throughout this section. We will do a first pass in | |
| overview, then return to each piece in detail. We re-use the earlier | | overview, then return to each piece in detail. We re-use the earlier | |
| example of how downstream congestion is derived by subtracting | | example of how downstream congestion is derived by subtracting | |
|
| upstream congestion from path congestion (Figure 1) but depict | | upstream congestion from path congestion (Figure 2) but depict | |
| multiple trust boundaries to turn it into an internetwork. For | | multiple trust boundaries to turn it into an internetwork. For | |
| clarity, only downstream congestion is shown (the difference between | | clarity, only downstream congestion is shown (the difference between | |
| the two earlier plots). The graph displays downstream path | | the two earlier plots). The graph displays downstream path | |
| congestion seen in a typical flow as it traverses an example path | | congestion seen in a typical flow as it traverses an example path | |
|
| from sender S to receiver R, across networks N1, N2 & N4. Everyone | | from sender S to receiver R, across networks N1, N2 & N3. Everyone | |
| is shown using re-ECN correctly, but we intend to show why everyone | | is shown using re-ECN correctly, but we intend to show why everyone | |
| would /choose/ to use it correctly, and honestly. | | would /choose/ to use it correctly, and honestly. | |
| | | | |
| Three main types of self-interest can be identified: | | Three main types of self-interest can be identified: | |
| | | | |
| o Users want to transmit data across the network as fast as | | o Users want to transmit data across the network as fast as | |
| possible, paying as little as possible for the privilege. In this | | possible, paying as little as possible for the privilege. In this | |
| respect, there is no distinction between senders and receivers, | | respect, there is no distinction between senders and receivers, | |
| but we must be wary of potential malice by one on the other; | | but we must be wary of potential malice by one on the other; | |
| | | | |
| o Network operators want to maximise revenues from the resources | | o Network operators want to maximise revenues from the resources | |
| they invest in. They compete amongst themselves for the custom of | | they invest in. They compete amongst themselves for the custom of | |
| users. | | users. | |
| | | | |
| o Attackers (whether users or networks) want to use any opportunity | | o Attackers (whether users or networks) want to use any opportunity | |
| to subvert the new re-ECN system for their own gain or to damage | | to subvert the new re-ECN system for their own gain or to damage | |
| the service of their victims, whether targeted or random. | | the service of their victims, whether targeted or random. | |
| | | | |
|
| policer | | policer dropper | |
| | | | | | | |
| | | | | | | |
| S <-----N1----> <---N2---> <---N4--> R domain | | S <-----N1----> <---N2---> <---N3--> R domain | |
| | : : | | | |
| A\|/: : | | | |
| | V : : | | | |
| 3% |---------+ : | | | |
| | : | : | | | |
| 2% | : +-----------------------+ : | | | |
| | : downstream congestion | : | | | |
| 1% | : | : | | | |
| | : | : | | | |
| 0% +---------------------------------+=====--> | | | |
| 0 i ^ resource index | | | |
| | | /|\ | | | |
| 1.00% 2.00% | marking fraction | | | |
| | | | | | |
|
| dropper | | 3% |---------+ | |
| | | | | | |
| | | 2% | +-----------------------+ | |
| | | | downstream congestion | | |
| | | 1% | | | |
| | | | | | |
| | | 0% +---------------------------------+====== | |
| | | 0 i | |
| | | | |
| Figure 8: Incentive Framework, showing creation of opposing pressures | | Figure 8: Incentive Framework, showing creation of opposing pressures | |
| to under-declare and over-declare downstream congestion, using a | | to under-declare and over-declare downstream congestion, using a | |
| policer and a dropper | | policer and a dropper | |
| | | | |
| Source congestion control: We want to ensure that the sender will | | Source congestion control: We want to ensure that the sender will | |
| throttle its rate as downstream congestion increases. Whatever | | throttle its rate as downstream congestion increases. Whatever | |
| the agreed congestion response (whether TCP-compatible or some | | the agreed congestion response (whether TCP-compatible or some | |
| enhanced QoS), to some extent it will always be against the | | enhanced QoS), to some extent it will always be against the | |
| sender's interest to comply. | | sender's interest to comply. | |
| | | | |
| skipping to change at page 41, line 9 | | skipping to change at page 42, line 6 | |
| | | | |
| Edge egress dropper: If the policer ensures the source has less | | Edge egress dropper: If the policer ensures the source has less | |
| right to a high rate the higher it declares downstream congestion, | | right to a high rate the higher it declares downstream congestion, | |
| the source has a clear incentive to understate downstream | | the source has a clear incentive to understate downstream | |
| congestion. But, if flows of packets are understated when they | | congestion. But, if flows of packets are understated when they | |
| enter the internetwork, they will have become negative by the time | | enter the internetwork, they will have become negative by the time | |
| they leave. So, we introduce a dropper at the last network | | they leave. So, we introduce a dropper at the last network | |
| egress, which drops packets in flows that persistently declare | | egress, which drops packets in flows that persistently declare | |
| negative downstream congestion (see Section 6.1.4 for details). | | negative downstream congestion (see Section 6.1.4 for details). | |
| | | | |
|
| ..competitive routing | | | |
| .' : '. | | | |
| .' p e n a l:t i e s '. | | | |
| : | : \ : | | | |
| A : | : | : | | | |
| |S <-----N1----> <---N2---> <---N4--> R domain | | | |
| | : | : | : | | | |
| | V | : | : | | | |
| 3% |--------+ | : | : | | | |
| | | V V V V | | | |
| 2% | +-----------------------+ | | | |
| | downstream congestion | | | | |
| 1% | : | | | | |
| | : | | | | |
| 0% +--------------------------------+=====--> | | | |
| 0 ^ i resource index | | | |
| | /|\ | | | | |
| 1.00% | 2.00% marking fraction | | | |
| | | | | |
| sanctions | | | |
| | | | |
| Figure 9: Incentives at Inter-domain Borders | | | |
| | | | |
| Inter-domain traffic policing: But next we must ask, if congestion | | Inter-domain traffic policing: But next we must ask, if congestion | |
|
| arises downstream (say in N4), what is the ingress network's | | arises downstream (say in N3), what is the ingress network's | |
| (N1's) incentive to police its customers' response? If N1 turns a | | (N1's) incentive to police its customers' response? If N1 turns a | |
| blind eye, its own customers benefit while other networks suffer. | | blind eye, its own customers benefit while other networks suffer. | |
| This is why all inter-domain QoS architectures (e.g. Intserv, | | This is why all inter-domain QoS architectures (e.g. Intserv, | |
| Diffserv) police traffic each time it crosses a trust boundary. | | Diffserv) police traffic each time it crosses a trust boundary. | |
| We have already shown that re-ECN gives a trustworthy measure of | | We have already shown that re-ECN gives a trustworthy measure of | |
| the expected downstream congestion that a flow will cause by | | the expected downstream congestion that a flow will cause by | |
| subtracting negative volume from positive at any intermediate | | subtracting negative volume from positive at any intermediate | |
|
| point on a path. N4 (say) can use this measure to police all the | | point on a path. N3 (say) can use this measure to police all the | |
| responses to congestion of all the sources beyond its upstream | | responses to congestion of all the sources beyond its upstream | |
| neighbour (N2), but in bulk with one very simple passive | | neighbour (N2), but in bulk with one very simple passive | |
|
| mechanism, rather than per flow, as we will now explain using | | mechanism, rather than per flow, as we will now explain. | |
| Figure 9. | | | |
| | | | |
| Emulating policing with inter-domain congestion penalties: Between | | Emulating policing with inter-domain congestion penalties: Between | |
| high-speed networks, we would rather avoid per-flow policing, and | | high-speed networks, we would rather avoid per-flow policing, and | |
| we would rather avoid holding back traffic while it is policed. | | we would rather avoid holding back traffic while it is policed. | |
| Instead, once re-ECN has arranged headers to carry downstream | | Instead, once re-ECN has arranged headers to carry downstream | |
|
| congestion honestly, N2 can contract to pay N4 penalties in | | congestion honestly, N2 can contract to pay N3 penalties in | |
| proportion to a single bulk count of the congestion metrics | | proportion to a single bulk count of the congestion metrics | |
| crossing their mutual trust boundary (Section 6.1.6). In this | | crossing their mutual trust boundary (Section 6.1.6). In this | |
|
| way, N4 puts pressure on N2 to suppress downstream congestion, for | | way, N3 puts pressure on N2 to suppress downstream congestion, for | |
| every flow passing through the border interface, even though they | | every flow passing through the border interface, even though they | |
| will all start and end in different places, and even though they | | will all start and end in different places, and even though they | |
| may all be allowed different responses to congestion. The figure | | may all be allowed different responses to congestion. The figure | |
| depicts this downward pressure on N2 by the solid downward arrow | | depicts this downward pressure on N2 by the solid downward arrow | |
| at the egress of N2. Then N2 has an incentive either to police | | at the egress of N2. Then N2 has an incentive either to police | |
| the congestion response of its own ingress traffic (from N1) or to | | the congestion response of its own ingress traffic (from N1) or to | |
| emulate policing by applying penalties to N1 in turn on the basis | | emulate policing by applying penalties to N1 in turn on the basis | |
| of congestion counted at their mutual boundary. In this recursive | | of congestion counted at their mutual boundary. In this recursive | |
| way, the incentives for each flow to respond correctly to | | way, the incentives for each flow to respond correctly to | |
| congestion trace back with each flow precisely to each source, | | congestion trace back with each flow precisely to each source, | |
| despite the mechanism not recognising flows (see Section 6.2.2). | | despite the mechanism not recognising flows (see Section 6.2.2). | |
| | | | |
| Inter-domain congestion charging diversity: Any two networks are | | Inter-domain congestion charging diversity: Any two networks are | |
| free to agree any of a range of penalty regimes between themselves | | free to agree any of a range of penalty regimes between themselves | |
| but they would only provide the right incentives if they were | | but they would only provide the right incentives if they were | |
| within the following reasonable constraints. N2 should expect to | | within the following reasonable constraints. N2 should expect to | |
|
| have to pay penalties to N4 where penalties monotonically increase | | have to pay penalties to N3 where penalties monotonically increase | |
| with the volume of congestion and negative penalties are not | | with the volume of congestion and negative penalties are not | |
| allowed. For instance, they may agree an SLA with tiered | | allowed. For instance, they may agree an SLA with tiered | |
| congestion thresholds, where higher penalties apply the higher the | | congestion thresholds, where higher penalties apply the higher the | |
| threshold that is broken. But the most obvious (and useful) form | | threshold that is broken. But the most obvious (and useful) form | |
|
| of penalty is where N4 levies a charge on N2 proportional to the | | of penalty is where N3 levies a charge on N2 proportional to the | |
| volume of downstream congestion N2 dumps into N4. In the | | volume of downstream congestion N2 dumps into N3. In the | |
| explanation that follows, we assume this specific variant of | | explanation that follows, we assume this specific variant of | |
| volume charging between networks - charging proportionate to the | | volume charging between networks - charging proportionate to the | |
| volume of congestion. | | volume of congestion. | |
| | | | |
| We must make clear that we are not advocating that everyone should | | We must make clear that we are not advocating that everyone should | |
| use this form of contract. We are well aware that the IETF tries | | use this form of contract. We are well aware that the IETF tries | |
| to avoid standardising technology that depends on a particular | | to avoid standardising technology that depends on a particular | |
| business model. And we strongly share this desire to encourage | | business model. And we strongly share this desire to encourage | |
| diversity. But our aim is merely to show that border policing can | | diversity. But our aim is merely to show that border policing can | |
| at least work with this one model, then we can assume that | | at least work with this one model, then we can assume that | |
| | | | |
| skipping to change at page 43, line 28 | | skipping to change at page 44, line 4 | |
| inter-domain congestion charging, a domain seems to have a | | inter-domain congestion charging, a domain seems to have a | |
| perverse incentive to fake congestion; N2's profit depends on the | | perverse incentive to fake congestion; N2's profit depends on the | |
| difference between congestion at its ingress (its revenue) and at | | difference between congestion at its ingress (its revenue) and at | |
| its egress (its cost). So, overstating internal congestion seems | | its egress (its cost). So, overstating internal congestion seems | |
| to increase profit. However, smart border routing [Smart_rtg] by | | to increase profit. However, smart border routing [Smart_rtg] by | |
| N1 will bias its routing towards the least cost routes. So, N2 | | N1 will bias its routing towards the least cost routes. So, N2 | |
| risks losing all its revenue to competitive routes if it | | risks losing all its revenue to competitive routes if it | |
| overstates congestion (see Section 6.2.3). In other words, if N2 | | overstates congestion (see Section 6.2.3). In other words, if N2 | |
| is the least congested route, its ability to raise excess profits | | is the least congested route, its ability to raise excess profits | |
| is limited by the congestion on the next least congested route. | | is limited by the congestion on the next least congested route. | |
|
| This pressure on N2 to remain competitive is represented by the | | | |
| dotted downward arrow at the ingress to N2 in Figure 9. | | | |
| | | | |
| Closing the loop: All the above elements conspire to trap everyone | | Closing the loop: All the above elements conspire to trap everyone | |
|
| between two opposing pressures (the downward and upward arrows in | | between two opposing pressures, ensuring the downstream congestion | |
| Figure 8 & Figure 9), ensuring the downstream congestion metric | | metric arrives at the destination neither above nor below zero. | |
| arrives at the destination neither above nor below zero. So, we | | So, we have arrived back where we started in our argument. The | |
| have arrived back where we started in our argument. The ingress | | ingress edge network can rely on downstream congestion declared in | |
| edge network can rely on downstream congestion declared in the | | the packet headers presented by the sender. So it can police the | |
| packet headers presented by the sender. So it can police the | | | |
| sender's congestion response accordingly. | | sender's congestion response accordingly. | |
| | | | |
| Evolvability of congestion control: We have seen that re-ECN enables | | Evolvability of congestion control: We have seen that re-ECN enables | |
| policing at the very first ingress. We have also seen that, as | | policing at the very first ingress. We have also seen that, as | |
| flows continue on their path through further networks downstream, | | flows continue on their path through further networks downstream, | |
| re-ECN removes the need for further per-domain ingress policing of | | re-ECN removes the need for further per-domain ingress policing of | |
| all the different congestion responses allowed to each different | | all the different congestion responses allowed to each different | |
| flow. This is why the evolvability of re-ECN policing is so | | flow. This is why the evolvability of re-ECN policing is so | |
| superior to bottleneck policing or to any policing of different | | superior to bottleneck policing or to any policing of different | |
| QoS for different flows. Even if all access networks choose to | | QoS for different flows. Even if all access networks choose to | |
| | | | |
| skipping to change at page 44, line 35 | | skipping to change at page 45, line 8 | |
| except only the volume of packets marked with congestion experienced | | except only the volume of packets marked with congestion experienced | |
| (CE) was counted. | | (CE) was counted. | |
| | | | |
| However, below we explain why relying on classic feedback /required/ | | However, below we explain why relying on classic feedback /required/ | |
| congestion charging to be used, while re-ECN achieves the same | | congestion charging to be used, while re-ECN achieves the same | |
| powerful outcome (given it is built on Kelly's foundations), but does | | powerful outcome (given it is built on Kelly's foundations), but does | |
| not /require/ congestion charging. In brief, the problem with | | not /require/ congestion charging. In brief, the problem with | |
| classic feedback is that the incentives have to trace the indirect | | classic feedback is that the incentives have to trace the indirect | |
| path back to the sender---the long way round the feedback loop. For | | path back to the sender---the long way round the feedback loop. For | |
| example, if classic feedback were used in Figure 8, N2 would have had | | example, if classic feedback were used in Figure 8, N2 would have had | |
|
| to influence N1 via all of N4, R & S rather than directly. | | to influence N1 via all of N3, R & S rather than directly. | |
| | | | |
| Inability to agree what is happening downstream: In order to police | | Inability to agree what is happening downstream: In order to police | |
| its upstream neighbour's congestion response, the neighbours | | its upstream neighbour's congestion response, the neighbours | |
| should be able to agree on the congestion to be responded to. | | should be able to agree on the congestion to be responded to. | |
| Whatever the feedback regime, as packets change hands at each | | Whatever the feedback regime, as packets change hands at each | |
| trust boundary, any path metrics they carry are verifiable by both | | trust boundary, any path metrics they carry are verifiable by both | |
| neighbours. But, with a classic path metric, they can only agree | | neighbours. But, with a classic path metric, they can only agree | |
| on the /upstream/ path congestion. | | on the /upstream/ path congestion. | |
| | | | |
| Inaccessible back-channel: The network needs a whole-path congestion | | Inaccessible back-channel: The network needs a whole-path congestion | |
| | | | |
| skipping to change at page 45, line 37 | | skipping to change at page 46, line 10 | |
| using the safer `sender pays' model. However, congestion charging is | | using the safer `sender pays' model. However, congestion charging is | |
| only likely to be appropriate between domains. So, without losing | | only likely to be appropriate between domains. So, without losing | |
| evolvability, re-ECN enables technical policing mechanisms that are | | evolvability, re-ECN enables technical policing mechanisms that are | |
| more appropriate for end users than congestion pricing. | | more appropriate for end users than congestion pricing. | |
| | | | |
| We now take a second pass over the incentive framework, filling in | | We now take a second pass over the incentive framework, filling in | |
| the detail. | | the detail. | |
| | | | |
| 6.1.4. Egress Dropper | | 6.1.4. Egress Dropper | |
| | | | |
|
| As traffic leaves the last network before the receiver (domain N4 in | | As traffic leaves the last network before the receiver (domain N3 in | |
| Figure 8), the fraction of positive octets in a flow should match the | | Figure 8), the fraction of positive octets in a flow should match the | |
| fraction of negative octets introduced by congestion marking, leaving | | fraction of negative octets introduced by congestion marking, leaving | |
| a balance of zero. If it is less (a negative flow), it implies that | | a balance of zero. If it is less (a negative flow), it implies that | |
| the source is understating path congestion (which will reduce the | | the source is understating path congestion (which will reduce the | |
|
| penalties that N2 owes N4). | | penalties that N2 owes N3). | |
| | | | |
|
| If flows are positive, N4 need take no action---this simply means its | | If flows are positive, N3 need take no action---this simply means its | |
| upstream neighbour is paying more penalties than it needs to, and the | | upstream neighbour is paying more penalties than it needs to, and the | |
| source is going slower than it needs to. But, to protect itself | | source is going slower than it needs to. But, to protect itself | |
|
| against persistently negative flows, N4 will need to install a | | against persistently negative flows, N3 will need to install a | |
| dropper at its egress. Appendix E gives a suggested algorithm for | | dropper at its egress. Appendix E gives a suggested algorithm for | |
| this dropper. There is no intention that the dropper algorithm needs | | this dropper. There is no intention that the dropper algorithm needs | |
| to be standardised, it is merely provided to show that an efficient, | | to be standardised, it is merely provided to show that an efficient, | |
| robust algorithm is possible. But whatever algorithm is used must | | robust algorithm is possible. But whatever algorithm is used must | |
| meet the criteria below: | | meet the criteria below: | |
| | | | |
| o It SHOULD introduce minimal false positives for honest flows; | | o It SHOULD introduce minimal false positives for honest flows; | |
| | | | |
| o It SHOULD quickly detect and sanction dishonest flows (minimal | | o It SHOULD quickly detect and sanction dishonest flows (minimal | |
| false negatives); | | false negatives); | |
| | | | |
| skipping to change at page 48, line 35 | | skipping to change at page 49, line 7 | |
| | | | |
| Of course, even if the sender does operate its own network, it may | | Of course, even if the sender does operate its own network, it may | |
| arrange not to congestion mark traffic. Whether the sender does this | | arrange not to congestion mark traffic. Whether the sender does this | |
| or not is of no concern to anyone else except the sender. Such a | | or not is of no concern to anyone else except the sender. Such a | |
| sender will not be policed against its own network's contribution to | | sender will not be policed against its own network's contribution to | |
| congestion, but the only resulting problem would be overload in the | | congestion, but the only resulting problem would be overload in the | |
| sender's own network. | | sender's own network. | |
| | | | |
| Finally, we must not forget that an easy way to circumvent re-ECN's | | Finally, we must not forget that an easy way to circumvent re-ECN's | |
| defences is for the source to turn off re-ECN support, by setting the | | defences is for the source to turn off re-ECN support, by setting the | |
|
| Not-RECT codepoint, implying legacy traffic. Therefore an ingress | | Not-RECT codepoint, implying RFC3168 compliant traffic. Therefore an | |
| policer should put a general rate-limit on Not-RECT traffic, which | | ingress policer should put a general rate-limit on Not-RECT traffic, | |
| SHOULD be lax during early, patchy deployment, but will have to | | which SHOULD be lax during early, patchy deployment, but will have to | |
| become stricter as deployment widens. Similarly, flows starting | | become stricter as deployment widens. Similarly, flows starting | |
| without an FNE packet can be confined by a strict rate-limit used for | | without an FNE packet can be confined by a strict rate-limit used for | |
| the remainder of flows that haven't proved they are well-behaved by | | the remainder of flows that haven't proved they are well-behaved by | |
| starting correctly (therefore they need not consume any flow state--- | | starting correctly (therefore they need not consume any flow state--- | |
| they are just confined to the `misbehaving' bin if they carry an | | they are just confined to the `misbehaving' bin if they carry an | |
| unrecognised flow ID). | | unrecognised flow ID). | |
| | | | |
| 6.1.6. Inter-domain Policing | | 6.1.6. Inter-domain Policing | |
| | | | |
| One of the main design goals of re-ECN is for border security | | One of the main design goals of re-ECN is for border security | |
| | | | |
| skipping to change at page 51, line 39 | | skipping to change at page 52, line 9 | |
| Once an unbiased estimate of the effect of negative flows can be | | Once an unbiased estimate of the effect of negative flows can be | |
| made, the problem reduces to detecting and preferably removing flows | | made, the problem reduces to detecting and preferably removing flows | |
| that have gone negative as soon as possible. But importantly, | | that have gone negative as soon as possible. But importantly, | |
| complete eradication of negative flows is no longer critical---best | | complete eradication of negative flows is no longer critical---best | |
| endeavours will be sufficient. | | endeavours will be sufficient. | |
| | | | |
| For instance, let us consider the case where a source sends traffic | | For instance, let us consider the case where a source sends traffic | |
| with no positive markings at all, hoping to at least get as much | | with no positive markings at all, hoping to at least get as much | |
| traffic delivered as network-based droppers will allow. The flow is | | traffic delivered as network-based droppers will allow. The flow is | |
| likely to go at least slightly negative in the first network on the | | likely to go at least slightly negative in the first network on the | |
|
| path (N1 if we use the example network layout in Figure 9). If all | | path (N1 if we use the example network layout in Figure 8). If all | |
| networks use the algorithm in Appendix H.2 to inflate penalties at | | networks use the algorithm in Appendix H.2 to inflate penalties at | |
| their border with an upstream network, they will remove the effect of | | their border with an upstream network, they will remove the effect of | |
| negative flows. So, for instance, N2 will not be paying a penalty to | | negative flows. So, for instance, N2 will not be paying a penalty to | |
| N1 for this flow. Further, because the flow contributes no positive | | N1 for this flow. Further, because the flow contributes no positive | |
| markings at all, a dropper at the egress will completely remove it. | | markings at all, a dropper at the egress will completely remove it. | |
| | | | |
| The remaining problem is that every network is carrying a flow that | | The remaining problem is that every network is carrying a flow that | |
| is causing congestion to others but not being held to account for the | | is causing congestion to others but not being held to account for the | |
| congestion it is causing. Whenever the fail-safe border algorithm | | congestion it is causing. Whenever the fail-safe border algorithm | |
| (Section 6.1.7) or the border algorithm to compensate for negative | | (Section 6.1.7) or the border algorithm to compensate for negative | |
| flows (Appendix H.2) detects a negative flow, it can instantiate a | | flows (Appendix H.2) detects a negative flow, it can instantiate a | |
| focused dropper for that flow locally. It may be some time before | | focused dropper for that flow locally. It may be some time before | |
| the flow is detected, but the more strongly negative the flow is, the | | the flow is detected, but the more strongly negative the flow is, the | |
| more quickly it will be detected by the fail-safe algorithm. But, in | | more quickly it will be detected by the fail-safe algorithm. But, in | |
| the meantime, it will not be distorting border incentives. Until it | | the meantime, it will not be distorting border incentives. Until it | |
| is detected, if it contributes to drop anywhere, its packets will | | is detected, if it contributes to drop anywhere, its packets will | |
|
| tend to be dropped before others if routers use the preferential drop | | tend to be dropped before others if queues use the preferential drop | |
| rules in Section 5.3, which discriminate against non-positive | | rules in Section 5.3, which discriminate against non-positive | |
| packets. All networks below the point where a flow goes negative | | packets. All networks below the point where a flow goes negative | |
|
| (N1, N2 and N4 in this case) have an incentive to remove this flow, | | (N1, N2 and N3 in this case) have an incentive to remove this flow, | |
| but the router where it first goes negative (in N1) can of course | | but the queue where it first goes negative (in N1) can of course | |
| remove the problem for everyone downstream. | | remove the problem for everyone downstream. | |
| | | | |
| In the case of DDoS attacks, Section 6.2.1 describes how re-ECN | | In the case of DDoS attacks, Section 6.2.1 describes how re-ECN | |
| mitigates their force. | | mitigates their force. | |
| | | | |
| 6.1.7. Inter-domain Fail-safes | | 6.1.7. Inter-domain Fail-safes | |
| | | | |
| The mechanisms described so far create incentives for rational | | The mechanisms described so far create incentives for rational | |
| network operators to behave. That is, one operator aims to make | | network operators to behave. That is, one operator aims to make | |
| another behave responsibly by applying penalties and expects a | | another behave responsibly by applying penalties and expects a | |
| | | | |
| skipping to change at page 53, line 21 | | skipping to change at page 53, line 41 | |
| | | | |
| 6.2. Other Applications | | 6.2. Other Applications | |
| | | | |
| 6.2.1. DDoS Mitigation | | 6.2.1. DDoS Mitigation | |
| | | | |
| A flooding attack is inherently about congestion of a resource. | | A flooding attack is inherently about congestion of a resource. | |
| Because re-ECN ensures the sources causing network congestion | | Because re-ECN ensures the sources causing network congestion | |
| experience the cost of their own actions, it acts as a first line of | | experience the cost of their own actions, it acts as a first line of | |
| defence against DDoS. As load focuses on a victim, upstream queues | | defence against DDoS. As load focuses on a victim, upstream queues | |
| grow, requiring honest sources to pre-load packets with a higher | | grow, requiring honest sources to pre-load packets with a higher | |
|
| fraction of positive packets. Once downstream routers are so | | fraction of positive packets. Once downstream queues are so | |
| congested that they are dropping traffic, they will be CE marking the | | congested that they are dropping traffic, they will be CE marking the | |
| traffic they do forward 100%. Honest sources will therefore be | | traffic they do forward 100%. Honest sources will therefore be | |
| sending Re-Echo 100% (and therefore being severely rate-limited at | | sending Re-Echo 100% (and therefore being severely rate-limited at | |
| the ingress). | | the ingress). | |
| | | | |
| Senders under malicious control can either do the same as honest | | Senders under malicious control can either do the same as honest | |
| sources, and be rate-limited at ingress, or they can understate | | sources, and be rate-limited at ingress, or they can understate | |
| congestion by sending more neutral RECT packets than they should. If | | congestion by sending more neutral RECT packets than they should. If | |
| sources understate congestion (i.e. do not re-echo sufficient | | sources understate congestion (i.e. do not re-echo sufficient | |
| positive packets) and the preferential drop ranking is implemented on | | positive packets) and the preferential drop ranking is implemented on | |
|
| routers (Section 5.3), these routers will preserve positive traffic | | queues (Section 5.3), these queues will preserve positive traffic | |
| until last. So, the neutral traffic from malicious sources will all | | until last. So, the neutral traffic from malicious sources will all | |
| be automatically dropped first. Either way, the malicious sources | | be automatically dropped first. Either way, the malicious sources | |
| cannot send more than honest sources. | | cannot send more than honest sources. | |
| | | | |
| Further, hosts under malicious control will tend to be re-used for | | Further, hosts under malicious control will tend to be re-used for | |
| many different attacks. They will therefore build up a long term | | many different attacks. They will therefore build up a long term | |
| history of causing congestion. Therefore, as long as the population | | history of causing congestion. Therefore, as long as the population | |
| of potentially compromisable hosts around the Internet is limited, | | of potentially compromisable hosts around the Internet is limited, | |
| the per-user policing algorithms in Appendix G.1 will gradually | | the per-user policing algorithms in Appendix G.1 will gradually | |
| throttle down zombies and other launchpads for attacks. Therefore, | | throttle down zombies and other launchpads for attacks. Therefore, | |
| | | | |
| skipping to change at page 55, line 32 | | skipping to change at page 56, line 10 | |
| o We are considering the issue of whether it would be useful to | | o We are considering the issue of whether it would be useful to | |
| truncate rather than drop packets that appear to be malicious, so | | truncate rather than drop packets that appear to be malicious, so | |
| that the feedback loop is not broken but useful data can be | | that the feedback loop is not broken but useful data can be | |
| removed. | | removed. | |
| | | | |
| 7. Incremental Deployment | | 7. Incremental Deployment | |
| | | | |
| 7.1. Incremental Deployment Features | | 7.1. Incremental Deployment Features | |
| | | | |
| The design of the re-ECN protocol started from the fact that the | | The design of the re-ECN protocol started from the fact that the | |
|
| current ECN marking behaviour of routers was sufficient and that re- | | current ECN marking behaviour of queues was sufficient and that re- | |
| feedback could be introduced around these routers by changing the | | feedback could be introduced around these queues by changing the | |
| sender behaviour but not the routers. Otherwise, if we had required | | sender behaviour but not the routers. Otherwise, if we had required | |
| routers to be changed, the chance of encountering a path that had | | routers to be changed, the chance of encountering a path that had | |
| every router upgraded would be vanishly small during early | | every router upgraded would be vanishly small during early | |
| deployment, giving no incentive to start deployment. Also, as there | | deployment, giving no incentive to start deployment. Also, as there | |
| is no new forwarding behaviour, routers and hosts do not have to | | is no new forwarding behaviour, routers and hosts do not have to | |
| signal or negotiate anything. | | signal or negotiate anything. | |
| | | | |
| However, networks that choose to protect themselves using re-ECN do | | However, networks that choose to protect themselves using re-ECN do | |
| have to add new security functions at their trust boundaries with | | have to add new security functions at their trust boundaries with | |
| others. They distinguish legacy traffic by its ECN field. Traffic | | others. They distinguish legacy traffic by its ECN field. Traffic | |
|
| from Not-ECT transports is distinguishable by its Not-RECT marking. | | from Not-ECT transports is distinguishable by its Not-ECT marking. | |
| Traffic from legacy ECN transports is distinguished from re-ECN by | | Traffic from RFC3168 compliant ECN transports is distinguished from | |
| which of ECT(0) or ECT(1) is used. We chose to use ECT(1) for re-ECN | | re-ECN by which of ECT(0) or ECT(1) is used. We chose to use ECT(1) | |
| traffic deliberately. Existing ECN sources set ECT(0) on either 50% | | for re-ECN traffic deliberately. Existing ECN sources set ECT(0) on | |
| (the nonce) or 100% (the default) of packets, whereas re-ECN does not | | either 50% (the nonce) or 100% (the default) of packets, whereas re- | |
| use ECT(0) at all. We can use this distinguishing feature of legacy | | ECN does not use ECT(0) at all. We can use this distinguishing | |
| ECN traffic to separate it out for different treatment at the various | | feature of RFC3168 compliant ECN traffic to separate it out for | |
| border security functions: egress dropping, ingress policing and | | different treatment at the various border security functions: egress | |
| border policing. | | dropping, ingress policing and border policing. | |
| | | | |
| The general principle we adopt is that an egress dropper will not | | The general principle we adopt is that an egress dropper will not | |
| drop any legacy traffic, but ingress and border policers will limit | | drop any legacy traffic, but ingress and border policers will limit | |
|
| the bulk rate of legacy traffic that can enter each network. Then, | | the bulk rate of legacy traffic (Not-ECT, ECT(0) and those amrked | |
| during early re-ECN deployment, operators can set very permissive (or | | with the unused codepoint) that can enter each network. Then, during | |
| non-existent) rate-limits on legacy traffic, but once re-ECN | | early re-ECN deployment, operators can set very permissive (or non- | |
| | | existent) rate-limits on legacy traffic, but once re-ECN | |
| implementations are generally available, legacy traffic can be rate- | | implementations are generally available, legacy traffic can be rate- | |
| limited increasingly harshly. Ultimately, an operator might choose | | limited increasingly harshly. Ultimately, an operator might choose | |
| to block all legacy traffic entering its network, or at least only | | to block all legacy traffic entering its network, or at least only | |
| allow through a trickle. | | allow through a trickle. | |
| | | | |
|
| Then, as the limits are set more strictly, the more legacy ECN | | Then, as the limits are set more strictly, the more RFC3168 ECN | |
| sources will gain by upgrading to re-ECN. Thus, towards the end of | | sources will gain by upgrading to re-ECN. Thus, towards the end of | |
|
| the voluntary incremental deployment period, legacy transports can be | | the voluntary incremental deployment period, RFC3168 compliant | |
| given progressively stronger encouragement to upgrade. | | transports can be given progressively stronger encouragement to | |
| | | upgrade. | |
| | | | |
| The following list of minor changes, brings together all the points | | The following list of minor changes, brings together all the points | |
|
| where Re-ECN semantics for use of the two-bit ECN field are different | | where re-ECN semantics for use of the two-bit ECN field are different | |
| compared to RFC3168: | | compared to RFC3168: | |
| | | | |
| o A re-ECN sender sets ECT(1) by default, whereas an RFC3168 sender | | o A re-ECN sender sets ECT(1) by default, whereas an RFC3168 sender | |
|
| sets ECT(0) by default (Section 3.3); | | sets ECT(0) by default (Section 3.4); | |
| | | | |
| o No provision is necessary for a re-ECN capable source transport to | | o No provision is necessary for a re-ECN capable source transport to | |
| use the ECN nonce (Section 4.1.2.1); | | use the ECN nonce (Section 4.1.2.1); | |
| | | | |
| o Routers MAY preferentially drop different extended ECN codepoints | | o Routers MAY preferentially drop different extended ECN codepoints | |
| (Section 5.3); | | (Section 5.3); | |
| | | | |
| o Packets carrying the feedback not established (FNE) codepoint MAY | | o Packets carrying the feedback not established (FNE) codepoint MAY | |
| optionally be marked rather than dropped by routers, even though | | optionally be marked rather than dropped by routers, even though | |
| their ECN field is Not-ECT (with the important caveat in | | their ECN field is Not-ECT (with the important caveat in | |
| | | | |
| skipping to change at page 57, line 44 | | skipping to change at page 58, line 20 | |
| Deployment that requires co-ordination adds cost and delay and | | Deployment that requires co-ordination adds cost and delay and | |
| tends to dilute any competitive advantage that might be gained. | | tends to dilute any competitive advantage that might be gained. | |
| | | | |
| * ECN `only' gives a performance improvement. Making a product a | | * ECN `only' gives a performance improvement. Making a product a | |
| bit faster (whether the product is a device or a network), | | bit faster (whether the product is a device or a network), | |
| isn't usually a sufficient selling point to be worth the cost | | isn't usually a sufficient selling point to be worth the cost | |
| of co-ordinating across the industry to deploy it. Network | | of co-ordinating across the industry to deploy it. Network | |
| operators tend to avoid re-configuring a working network unless | | operators tend to avoid re-configuring a working network unless | |
| launching a new product. | | launching a new product. | |
| | | | |
|
| ECN and re-ECN for Edge-to-edge Assured QoS: | | ECN and Re-ECN for Edge-to-edge Assured QoS: | |
| | | | |
| We believe the proposal to provide assured QoS sessions using a | | We believe the proposal to provide assured QoS sessions using a | |
| form of ECN called pre-congestion notification (PCN) [PCN-arch] is | | form of ECN called pre-congestion notification (PCN) [PCN-arch] is | |
| most likely to break the deadlock in ECN deployment first. It | | most likely to break the deadlock in ECN deployment first. It | |
| only requires edge-to-edge deployment so it does not require | | only requires edge-to-edge deployment so it does not require | |
| endpoint support. It can be deployed in a single network, then | | endpoint support. It can be deployed in a single network, then | |
| grow incrementally to interconnected networks. And it provides a | | grow incrementally to interconnected networks. And it provides a | |
| different `product' (internetworked assured QoS), rather than | | different `product' (internetworked assured QoS), rather than | |
| merely making an existing product a bit faster. | | merely making an existing product a bit faster. | |
| | | | |
| Not only could this assured QoS application kick-start ECN | | Not only could this assured QoS application kick-start ECN | |
| deployment, it could also carry re-ECN deployment with it; because | | deployment, it could also carry re-ECN deployment with it; because | |
| re-ECN can enable the assured QoS region to expand to a large | | re-ECN can enable the assured QoS region to expand to a large | |
| internetwork where neighbouring networks do not trust each other. | | internetwork where neighbouring networks do not trust each other. | |
| [Re-PCN] argues that re-ECN security should be built in to the QoS | | [Re-PCN] argues that re-ECN security should be built in to the QoS | |
| system from the start, explaining why and how. | | system from the start, explaining why and how. | |
| | | | |
| If ECN and re-ECN were deployed edge-to-edge for assured QoS, | | If ECN and re-ECN were deployed edge-to-edge for assured QoS, | |
| operators would gain valuable experience. They would also clear | | operators would gain valuable experience. They would also clear | |
| away many technical obstacles such as firewall configurations that | | away many technical obstacles such as firewall configurations that | |
|
| block all but the legacy settings of the ECN field and the RE | | block all but the RFC3168 settings of the ECN field and the RE | |
| flag. | | flag. | |
| | | | |
| ECN in Access Networks: | | ECN in Access Networks: | |
| | | | |
| The next obstacle to ECN deployment would be extension to access | | The next obstacle to ECN deployment would be extension to access | |
| and backhaul networks, where considerable link layer differences | | and backhaul networks, where considerable link layer differences | |
| makes implementation non-trivial, particularly on congested | | makes implementation non-trivial, particularly on congested | |
| wireless links. ECN and re-ECN work fine during partial | | wireless links. ECN and re-ECN work fine during partial | |
| deployment, but they will not be very useful if the most congested | | deployment, but they will not be very useful if the most congested | |
| elements in networks are the last to support them. Access network | | elements in networks are the last to support them. Access network | |
| | | | |
| skipping to change at page 60, line 44 | | skipping to change at page 61, line 21 | |
| So, if re-ECN were stipulated for cellular devices, it would | | So, if re-ECN were stipulated for cellular devices, it would | |
| automatically appear in those devices connected to the wireless | | automatically appear in those devices connected to the wireless | |
| fringes of fixed networks if they coupled cellular with WiFi or | | fringes of fixed networks if they coupled cellular with WiFi or | |
| Bluetooth technology, for instance. Also, once implemented in the | | Bluetooth technology, for instance. Also, once implemented in the | |
| operating system of one mobile device, it would tend to be found | | operating system of one mobile device, it would tend to be found | |
| in other devices using the same family of operating system. | | in other devices using the same family of operating system. | |
| | | | |
| Therefore, whether or not a fixed network deployed ECN, or | | Therefore, whether or not a fixed network deployed ECN, or | |
| deployed re-ECN policers and droppers, many of its hosts might | | deployed re-ECN policers and droppers, many of its hosts might | |
| well be using re-ECN over it. Indeed, they would be at an | | well be using re-ECN over it. Indeed, they would be at an | |
|
| advantage when communicating with hosts across Re-ECN policed | | advantage when communicating with hosts across re-ECN policed | |
| networks that rate limited Not-RECT traffic. | | networks that rate limited Not-RECT traffic. | |
| | | | |
| Other possible scenarios: | | Other possible scenarios: | |
| | | | |
| The above is thankfully not the only plausible scenario we can | | The above is thankfully not the only plausible scenario we can | |
| think of. One of the many clubs of operators that meet regularly | | think of. One of the many clubs of operators that meet regularly | |
| around the world might decide to act together to persuade a major | | around the world might decide to act together to persuade a major | |
| operating system manufacturer to implement re-ECN. And they may | | operating system manufacturer to implement re-ECN. And they may | |
| agree between them on an interconnection model that includes | | agree between them on an interconnection model that includes | |
| congestion penalties. | | congestion penalties. | |
| | | | |
| Re-ECN provides an interesting opportunity for device | | Re-ECN provides an interesting opportunity for device | |
| manufacturers as well as network operators. Policers can be | | manufacturers as well as network operators. Policers can be | |
| configured loosely when first deployed. Then as re-ECN take-up | | configured loosely when first deployed. Then as re-ECN take-up | |
| increases, they can be tightened up, so that a network with re-ECN | | increases, they can be tightened up, so that a network with re-ECN | |
|
| deployed can gradually squeeze down the service provided to legacy | | deployed can gradually squeeze down the service provided to | |
| devices that have not upgraded to re-ECN. Many device vendors | | RFC3168 compliant devices that have not upgraded to re-ECN. Many | |
| rely on replacement sales. And operating system companies rely | | device vendors rely on replacement sales. And operating system | |
| heavily on new release sales. Also support services would like to | | companies rely heavily on new release sales. Also support | |
| be able to force stragglers to upgrade. So, the ability to | | services would like to be able to force stragglers to upgrade. | |
| throttle service to legacy operating systems is quite valuable. | | So, the ability to throttle service to RFC3168 compliant operating | |
| | | systems is quite valuable. | |
| | | | |
| Also, policing unresponsive sources may not be the only or even | | Also, policing unresponsive sources may not be the only or even | |
| the first application that drives deployment. It may be policing | | the first application that drives deployment. It may be policing | |
| causes of heavy congestion (e.g. peer-to-peer file-sharing). Or | | causes of heavy congestion (e.g. peer-to-peer file-sharing). Or | |
| it may be mitigation of denial of service. Or we may be wrong in | | it may be mitigation of denial of service. Or we may be wrong in | |
| thinking simpler QoS will not be the initial motivation for re-ECN | | thinking simpler QoS will not be the initial motivation for re-ECN | |
| deployment. Indeed, the combined pressure for all these may be | | deployment. Indeed, the combined pressure for all these may be | |
| the motivator, but it seems optimistic to expect such a level of | | the motivator, but it seems optimistic to expect such a level of | |
| joined-up thinking from today's communications industry. We | | joined-up thinking from today's communications industry. We | |
| believe a single application alone must be a sufficient motivator. | | believe a single application alone must be a sufficient motivator. | |
| | | | |
| skipping to change at page 63, line 10 | | skipping to change at page 63, line 32 | |
| (policing) congestion control. But policing is only truly effective | | (policing) congestion control. But policing is only truly effective | |
| at the first ingress into an internetwork, whereas path congestion | | at the first ingress into an internetwork, whereas path congestion | |
| was previously only visible at the last egress. So, re-ECN | | was previously only visible at the last egress. So, re-ECN | |
| democratises congestion information. Then the choice over who | | democratises congestion information. Then the choice over who | |
| actually controls congestion can be made at run-time, not design | | actually controls congestion can be made at run-time, not design | |
| time---a bit like an aircraft with dual controls. And different | | time---a bit like an aircraft with dual controls. And different | |
| operators can make different choices. We believe non-architectural | | operators can make different choices. We believe non-architectural | |
| approaches to this problem are unlikely to offer more than partial | | approaches to this problem are unlikely to offer more than partial | |
| solutions (see Section 9). | | solutions (see Section 9). | |
| | | | |
|
| Importantly, re-ECN does NOT REQUIRE assumptions about specific | | Importantly, re-ECN does not require assumptions about specific | |
| congestion responses to be embedded in any network elements, except | | congestion responses to be embedded in any network elements, except | |
| at the first ingress to the internetwork if that level of control is | | at the first ingress to the internetwork if that level of control is | |
| desired by the ingress operator. But such tight policing will be a | | desired by the ingress operator. But such tight policing will be a | |
| matter of agreement between the source and its access network | | matter of agreement between the source and its access network | |
| operator. The ingress operator need not police congestion response | | operator. The ingress operator need not police congestion response | |
| at flow granularity; it can simply hold a source responsible for the | | at flow granularity; it can simply hold a source responsible for the | |
| aggregate congestion it causes, perhaps keeping it within a monthly | | aggregate congestion it causes, perhaps keeping it within a monthly | |
| congestion quota. Or if the ingress network trusts the source, it | | congestion quota. Or if the ingress network trusts the source, it | |
| can do nothing. | | can do nothing. | |
| | | | |
| | | | |
| skipping to change at page 66, line 28 | | skipping to change at page 67, line 7 | |
| declare path congestion to the network and it can remove traffic at | | declare path congestion to the network and it can remove traffic at | |
| the egress if this declaration is dishonest. So it can police | | the egress if this declaration is dishonest. So it can police | |
| correctly, irrespective of whether the receiver tries to suppress | | correctly, irrespective of whether the receiver tries to suppress | |
| congestion feedback or whether the sender ignores genuine congestion | | congestion feedback or whether the sender ignores genuine congestion | |
| feedback. Therefore the re-ECN protocol addresses a much wider range | | feedback. Therefore the re-ECN protocol addresses a much wider range | |
| of cheating problems, which includes the one addressed by the ECN | | of cheating problems, which includes the one addressed by the ECN | |
| nonce. | | nonce. | |
| | | | |
| 9.3. Identifying Upstream and Downstream Congestion | | 9.3. Identifying Upstream and Downstream Congestion | |
| | | | |
|
|