| draft-briscoe-tsvwg-re-ecn-tcp-03.txt | draft-briscoe-tsvwg-re-ecn-tcp-04.txt | |||
|---|---|---|---|---|
| Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
| Internet-Draft BT & UCL | Internet-Draft BT & UCL | |||
| Intended status: Informational A. Jacquet | Intended status: Standards Track A. Jacquet | |||
| Expires: April 26, 2007 A. Salvatori | Expires: January 10, 2008 A. Salvatori | |||
| M. Koyabe | M. Koyabe | |||
| T. Moncaster | ||||
| BT | BT | |||
| October 23, 2006 | July 09, 2007 | |||
| Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | |||
| draft-briscoe-tsvwg-re-ecn-tcp-03 | draft-briscoe-tsvwg-re-ecn-tcp-04 | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 37 | skipping to change at page 1, line 38 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on April 26, 2007. | This Internet-Draft will expire on January 10, 2008. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2006). | Copyright (C) The IETF Trust (2007). | |||
| Abstract | Abstract | |||
| This document introduces a new protocol for explicit congestion | This document introduces a new protocol for explicit congestion | |||
| notification (ECN), termed re-ECN, which can be deployed | notification (ECN), termed re-ECN, which can be deployed | |||
| incrementally around unmodified routers. The protocol arranges an | incrementally around unmodified routers. The protocol arranges an | |||
| extended ECN field in each packet so that, as it crosses any | extended ECN field in each packet so that, as it crosses any | |||
| interface in an internetwork, it will carry a truthful prediction of | interface in an internetwork, it will carry a truthful prediction of | |||
| congestion on the remainder of its path. Then the upstream party at | congestion on the remainder of its path. Then the upstream party at | |||
| any trust boundary in the internetwork can be held responsible for | any trust boundary in the internetwork can be held responsible for | |||
| skipping to change at page 2, line 21 | skipping to change at page 2, line 22 | |||
| changes required to transport protocols. It includes the changes | changes required to transport protocols. It includes the changes | |||
| required to TCP both as an example and as a specification. It also | required to TCP both as an example and as a specification. It also | |||
| gives examples of mechanisms that can use the protocol to ensure data | gives examples of mechanisms that can use the protocol to ensure data | |||
| sources respond correctly to congestion. And it describes example | sources respond correctly to congestion. And it describes example | |||
| mechanisms that ensure the dominant selfish strategy of both network | mechanisms that ensure the dominant selfish strategy of both network | |||
| domains and end-points will be to set the extended ECN field | domains and end-points will be to set the extended ECN field | |||
| honestly. | honestly. | |||
| Authors' Statement: Status (to be removed by the RFC Editor) | Authors' Statement: Status (to be removed by the RFC Editor) | |||
| This document is posted as an Internet-Draft with the intent (at | ||||
| least that of the authors) to eventually progress to standards track. | ||||
| Although the re-ECN protocol is intended to make a simple but far- | Although the re-ECN protocol is intended to make a simple but far- | |||
| reaching change to the Internet architecture, the most immediate | reaching change to the Internet architecture, the most immediate | |||
| priority for the authors is to delay any move of the ECN nonce to | priority for the authors is to delay any move of the ECN nonce to | |||
| Proposed Standard status. The argument for this position is | Proposed Standard status. The argument for this position is | |||
| developed in Appendix I. | developed in Appendix I. | |||
| Changes from previous drafts (to be removed by the RFC Editor) | Changes from previous drafts (to be removed by the RFC Editor) | |||
| From -00 to -01: | Full diffs created using the rfcdiff tool are available at | |||
| <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp> | ||||
| Encoding of re-ECN wire protocol changed for reasons given in | From -03 to -04 (current version): | |||
| Appendix B and consequently draft substantially re-written. | ||||
| Substantial text added in sections on applications, incremental | Clarified reasons for holding back ECN nonce (Section 3.2 & | |||
| deployment, architectural rationale and security considerations. | Appendix I). | |||
| Clarified Figure 1. | ||||
| Added Section 4.1.1.1 on equivalence of drops and ECN marks. | ||||
| Improved precision of Section 5.6 on IP in IP tunnels. | ||||
| Explained the RTT fairness is possible to enforce, but unlikely to | ||||
| be required (Section 6.1.3 & Appendix F). | ||||
| Explained that bulk per-user policing should be adequate but per- | ||||
| flow policing is also possible if desired, though it is not likely | ||||
| to be necessary (Section 6.1.5 & Appendix G). | ||||
| Reinforced need for passive policing at inter-domain borders to | ||||
| enable all-optical networking (Section 6.1.6). | ||||
| Minor editorial changes throughout. | ||||
| From -02 to -03: | ||||
| Started guidelines for re-ECN support in DCCP and SCTP. | ||||
| Added annex on limitations of nonce mechanism. | ||||
| Minor editorial changes throughout. | ||||
| From -01 to -02: | From -01 to -02: | |||
| Explanation on informal terminology in Section 3.4 clarified. | Explanation on informal terminology in Section 3.4 clarified. | |||
| IPv6 wire protocol encoding added (Section 5.2). | IPv6 wire protocol encoding added (Section 5.2). | |||
| Text on (non-)issues with tunnels, encryption and link layer | Text on (non-)issues with tunnels, encryption and link layer | |||
| congestion notification added (Section 5.6 & Section 5.7). | congestion notification added (Section 5.6 & Section 5.7). | |||
| Section added giving evolvability arguments against encouraging | Section added giving evolvability arguments against encouraging | |||
| bottleneck policing (Section 6.1.2). And text on re-ECN's | bottleneck policing (Section 6.1.2). And text on re-ECN's | |||
| evolvability by design added to Section 6.1.3 | evolvability by design added to Section 6.1.3 | |||
| Text on inter-domain policing (Section 6.1.6) and inter-domain | Text on inter-domain policing (Section 6.1.6) and inter-domain | |||
| fail-safes (Section 6.1.7) added. | fail-safes (Section 6.1.7) added. | |||
| From -02 to -03: | From -00 to -01: | |||
| Started guidelines for re-ECN support in DCCP and SCTP. | ||||
| Added annex on limitations of nonce mechanism. | Encoding of re-ECN wire protocol changed for reasons given in | |||
| Appendix B and consequently draft substantially re-written. | ||||
| Minor editorial changes throughout. | Substantial text added in sections on applications, incremental | |||
| deployment, architectural rationale and security considerations. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7 | 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7 | |||
| 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 | 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 3.1. Background and Applicability . . . . . . . . . . . . . . . 8 | 3.1. Background and Applicability . . . . . . . . . . . . . . . 8 | |||
| 3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | 3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | |||
| v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 10 | 3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 11 | |||
| 3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 12 | 3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 13 | |||
| 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15 | 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
| 4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16 | 4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16 | |||
| 4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or | 4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or | |||
| Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 18 | Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 20 | |||
| 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 20 | 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 21 | |||
| 4.1.4. Extended ECN (EECN) Field Settings during Flow | 4.1.4. Extended ECN (EECN) Field Settings during Flow | |||
| Start or after Idle Periods . . . . . . . . . . . . . 21 | Start or after Idle Periods . . . . . . . . . . . . . 23 | |||
| 4.1.5. Pure ACKS, Retransmissions, Window Probes and | 4.1.5. Pure ACKS, Retransmissions, Window Probes and | |||
| Partial ACKs . . . . . . . . . . . . . . . . . . . . . 25 | Partial ACKs . . . . . . . . . . . . . . . . . . . . . 26 | |||
| 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 26 | 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 27 | |||
| 4.2.1. General Guidelines for Adding Re-ECN to Other | 4.2.1. General Guidelines for Adding Re-ECN to Other | |||
| Transports . . . . . . . . . . . . . . . . . . . . . . 26 | Transports . . . . . . . . . . . . . . . . . . . . . . 27 | |||
| 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 26 | 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 28 | |||
| 4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 27 | 4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 28 | |||
| 4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 27 | 4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 28 | |||
| 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 27 | 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 28 | |||
| 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 27 | 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 28 | |||
| 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 28 | 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 30 | |||
| 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 30 | 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 31 | |||
| 5.4. Justification for Setting the First SYN to FNE . . . . . . 31 | 5.4. Justification for Setting the First SYN to FNE . . . . . . 32 | |||
| 5.5. Control and Management . . . . . . . . . . . . . . . . . . 32 | 5.5. Control and Management . . . . . . . . . . . . . . . . . . 33 | |||
| 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 32 | 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 33 | |||
| 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 33 | 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 34 | |||
| 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 33 | 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 34 | |||
| 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 34 | 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 35 | |||
| 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 35 | 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 36 | |||
| 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 35 | 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 36 | |||
| 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 35 | 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 36 | |||
| 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 36 | 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 37 | |||
| 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 37 | 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 38 | |||
| 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 44 | 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 45 | |||
| 6.1.5. Rate Policing . . . . . . . . . . . . . . . . . . . . 45 | 6.1.5. Policing . . . . . . . . . . . . . . . . . . . . . . . 47 | |||
| 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 47 | 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 48 | |||
| 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 51 | 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 52 | |||
| 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 51 | 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 53 | |||
| 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 51 | 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 53 | |||
| 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 52 | 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 53 | |||
| 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 53 | 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 54 | |||
| 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 53 | 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 54 | |||
| 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 53 | 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 54 | |||
| 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 53 | 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 54 | |||
| 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 54 | 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 55 | |||
| 7.1. Incremental Deployment Features . . . . . . . . . . . . . 54 | 7.1. Incremental Deployment Features . . . . . . . . . . . . . 55 | |||
| 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 55 | 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 57 | |||
| 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 60 | 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 61 | |||
| 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 63 | 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 64 | |||
| 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 63 | 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 64 | |||
| 9.2. Congestion Notification Integrity . . . . . . . . . . . . 63 | 9.2. Congestion Notification Integrity . . . . . . . . . . . . 65 | |||
| 9.3. Identifying Upstream and Downstream Congestion . . . . . . 64 | 9.3. Identifying Upstream and Downstream Congestion . . . . . . 66 | |||
| 10. Security Considerations . . . . . . . . . . . . . . . . . . . 65 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 66 | |||
| 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 66 | 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 68 | |||
| 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 67 | 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 68 | |||
| 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 67 | 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 68 | |||
| 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 67 | 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 69 | |||
| 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 67 | 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 69 | |||
| 15.1. Normative References . . . . . . . . . . . . . . . . . . . 67 | 15.1. Normative References . . . . . . . . . . . . . . . . . . . 69 | |||
| 15.2. Informative References . . . . . . . . . . . . . . . . . . 68 | 15.2. Informative References . . . . . . . . . . . . . . . . . . 70 | |||
| Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 71 | Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 73 | |||
| Appendix B. Justification for Two Codepoints Signifying Zero | Appendix B. Justification for Two Codepoints Signifying Zero | |||
| Worth Packets . . . . . . . . . . . . . . . . . . . . 72 | Worth Packets . . . . . . . . . . . . . . . . . . . . 74 | |||
| Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 74 | Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 76 | |||
| Appendix D. Packet Marking During Flow Start . . . . . . . . . . 75 | Appendix D. Packet Marking During Flow Start . . . . . . . . . . 77 | |||
| Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 75 | Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 77 | |||
| Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 75 | Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 77 | |||
| Appendix G. Policer Designs to ensure Congestion | Appendix G. Policer Designs to ensure Congestion | |||
| Responsiveness . . . . . . . . . . . . . . . . . . . 76 | Responsiveness . . . . . . . . . . . . . . . . . . . 78 | |||
| G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 76 | G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 78 | |||
| G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 77 | G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 79 | |||
| Appendix H. Downstream Congestion Metering Algorithms . . . . . . 80 | Appendix H. Downstream Congestion Metering Algorithms . . . . . . 82 | |||
| H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 80 | H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 82 | |||
| H.2. Inflation Factor for Persistently Negative Flows . . . . . 80 | H.2. Inflation Factor for Persistently Negative Flows . . . . . 83 | |||
| Appendix I. Argument for holding back the ECN nonce . . . . . . . 81 | Appendix I. Argument for holding back the ECN nonce . . . . . . . 84 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 83 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 85 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . . 85 | Intellectual Property and Copyright Statements . . . . . . . . . . 88 | |||
| 1. Introduction | 1. Introduction | |||
| This document aims: | This document aims: | |||
| o To provide a complete specification of the addition of the re-ECN | o To provide a complete specification of the addition of the re-ECN | |||
| protocol to IP and guidelines on how to add it to transport layer | protocol to IP and guidelines on how to add it to transport layer | |||
| protocols, including a complete specification of re-ECN in TCP as | protocols, including a complete specification of re-ECN in TCP as | |||
| an example; | an example; | |||
| skipping to change at page 7, line 32 | skipping to change at page 7, line 32 | |||
| This document is structured as follows. First an overview of the re- | This document is structured as follows. First an overview of the re- | |||
| ECN protocol is given (Section 3), outlining its attributes and | ECN protocol is given (Section 3), outlining its attributes and | |||
| explaining conceptually how it works as a whole. The two main parts | explaining conceptually how it works as a whole. The two main parts | |||
| of the document follow, as described above. That is, the protocol | of the document follow, as described above. That is, the protocol | |||
| specification divided into transport (Section 4) and network | specification divided into transport (Section 4) and network | |||
| (Section 5) layers, then the applications it can be put to, such as | (Section 5) layers, then the applications it can be put to, such as | |||
| policing DDoS, QoS and congestion control (Section 6). Although | policing DDoS, QoS and congestion control (Section 6). Although | |||
| these applications do not require standardisation themselves, they | these applications do not require standardisation themselves, they | |||
| are described in a fair degree of detail in order to explain how re- | are described in a fair degree of detail in order to explain how re- | |||
| ECN can be used. Given, re-ECN proposes to use the last undefined | ECN can be used. Given re-ECN proposes to use the last undefined bit | |||
| bit in the IPv4 header, we felt it necessary to outline the potential | in the IPv4 header, we felt it necessary to outline the potential | |||
| that re-ECN could release in return for being given that bit. | that re-ECN could release in return for being given that bit. | |||
| Deployment issues discussed throughout the document are brought | Deployment issues discussed throughout the document are brought | |||
| together in Section 7, which is followed by a brief section | together in Section 7, which is followed by a brief section | |||
| explaining the somewhat subtle rationale for the design from an | explaining the somewhat subtle rationale for the design from an | |||
| architectural perspective (Section 8). We end by describing related | architectural perspective (Section 8). We end by describing related | |||
| work (Section 9), listing security considerations (Section 10) and | work (Section 9), listing security considerations (Section 10) and | |||
| finally drawing conclusions (Section 12). | finally drawing conclusions (Section 12). | |||
| 2. Requirements notation | 2. Requirements notation | |||
| skipping to change at page 8, line 49 | skipping to change at page 8, line 49 | |||
| congestion feedback. But Section 9.2 explains that it still gives no | congestion feedback. But Section 9.2 explains that it still gives no | |||
| control over how fast the sender transmits as a result of the | control over how fast the sender transmits as a result of the | |||
| feedback. On the other hand, re-ECN is designed both to ensure that | feedback. On the other hand, re-ECN is designed both to ensure that | |||
| congestion is declared honestly and that the sender's rate responds | congestion is declared honestly and that the sender's rate responds | |||
| appropriately. | appropriately. | |||
| Re-ECN is based on a feedback arrangement called `re- | Re-ECN is based on a feedback arrangement called `re- | |||
| feedback' [Re-fb]. The word is short for either receiver-aligned, | feedback' [Re-fb]. The word is short for either receiver-aligned, | |||
| re-inserted or re-echoed feedback. But it actually works even when | re-inserted or re-echoed feedback. But it actually works even when | |||
| no feedback is available. In fact it has been carefully designed to | no feedback is available. In fact it has been carefully designed to | |||
| work for single datagram flows. Indeed, it even encourages | work for single datagram flows. It also encourages aggregation of | |||
| aggregation of single packet flows by congestion control proxies. | single packet flows by congestion control proxies. Then, even if the | |||
| traffic mix of the Internet were to become dominated by short | ||||
| Then, even if the traffic mix of the Internet were to become | messages, it would still be possible to control congestion | |||
| dominated by short messages, it would still be possible to control | effectively and efficiently. | |||
| congestion effectively and efficiently. | ||||
| Changing the Internet's feedback architecture seems to imply | Changing the Internet's feedback architecture seems to imply | |||
| considerable upheaval. But re-ECN can be deployed incrementally at | considerable upheaval. But re-ECN can be deployed incrementally at | |||
| the transport layer around unmodified routers using existing fields | the transport layer around unmodified routers using existing fields | |||
| in IP (v4 or v6). However it does also require the last undefined | in IP (v4 or v6). However it does also require the last undefined | |||
| bit in the IPv4 header, which it uses in combination with the 2-bit | bit in the IPv4 header, which it uses in combination with the 2-bit | |||
| ECN field to create four new codepoints. Nonetheless, changes to IP | ECN field to create four new codepoints. Nonetheless, changes to IP | |||
| routers are RECOMMENDED in order to improve resilience against DoS | routers are RECOMMENDED in order to improve resilience against DoS | |||
| attacks. Similarly, re-ECN works best if both the sender and | attacks. Similarly, re-ECN works best if both the sender and | |||
| receiver transports are re-ECN-capable, but it can work with just | receiver transports are re-ECN-capable, but it can work with just | |||
| skipping to change at page 10, line 13 | skipping to change at page 10, line 13 | |||
| be defined in another specification (e.g. [Re-PCN]). | be defined in another specification (e.g. [Re-PCN]). | |||
| Although the RE flag is a separate, single bit field, it can be read | Although the RE flag is a separate, single bit field, it can be read | |||
| as an extension to the two-bit ECN field; the three concatenated bits | as an extension to the two-bit ECN field; the three concatenated bits | |||
| in what we will call the extended ECN field (EECN) making eight | in what we will call the extended ECN field (EECN) making eight | |||
| codepoints. We will use the RFC3168 names of the ECN codepoints to | codepoints. We will use the RFC3168 names of the ECN codepoints to | |||
| describe settings of the ECN field when the RE flag setting is "don't | describe settings of the ECN field when the RE flag setting is "don't | |||
| care", but we also define the following six extended ECN codepoint | care", but we also define the following six extended ECN codepoint | |||
| names for when we need to be more specific. | names for when we need to be more specific. | |||
| RFC3168 ECN defines uses for all four codepoints of the two-bit ECN | ||||
| field. This memo widens the codepoint space to eight, and uses six | ||||
| codepoints. One of re-ECN's codepoints is an alternative use of the | ||||
| codepoint set aside in RFC3168 for the ECN nonce (ECT(1)). | ||||
| Transports not using re-ECN can still use the ECN nonce, while those | ||||
| using re-ECN do not need to as long as the sender is also checking | ||||
| for transport protocol compliance [I-D.moncaster-tcpm-rcv-cheat]. | ||||
| The case for doing this is given in Appendix I. Two re-ECN | ||||
| codepoints are given compatible uses to those defined in RFC3168 | ||||
| (Not-ECT and CE). The other codepoint used by RFC3168 (ECT(0)) isn't | ||||
| used for re-ECN. Altogether this leave one codepoint of the eight | ||||
| unused and available for future use. | ||||
| +-------+------------+------+--------------+------------------------+ | +-------+------------+------+--------------+------------------------+ | |||
| | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | |||
| | field | codepoint | flag | codepoint | | | | field | codepoint | flag | codepoint | | | |||
| +-------+------------+------+--------------+------------------------+ | +-------+------------+------+--------------+------------------------+ | |||
| | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | |||
| | | | | | transport | | | | | | | transport | | |||
| | 00 | Not-ECT | 1 | FNE | Feedback not | | | 00 | Not-ECT | 1 | FNE | Feedback not | | |||
| | | | | | established | | | | | | | established | | |||
| | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | |||
| | | | | | and RECT | | | | | | | and RECT | | |||
| skipping to change at page 11, line 19 | skipping to change at page 12, line 5 | |||
| re-ECN sender will clear the RE flag to "0" in the next packet it | re-ECN sender will clear the RE flag to "0" in the next packet it | |||
| sends. | sends. | |||
| We chose to set and clear the RE flag this way round to ease | We chose to set and clear the RE flag this way round to ease | |||
| incremental deployment (see Section 7.1). To avoid confusion we will | incremental deployment (see Section 7.1). To avoid confusion we will | |||
| use the term `blanking' (rather than marking) when the RE flag is | use the term `blanking' (rather than marking) when the RE flag is | |||
| cleared to "0". So, over a stream of packets, we will talk of the | cleared to "0". So, over a stream of packets, we will talk of the | |||
| `RE blanking fraction' as the fraction of octets in packets with the | `RE blanking fraction' as the fraction of octets in packets with the | |||
| RE flag cleared to "0". | RE flag cleared to "0". | |||
| ^ | _ _ _ _ | |||
| | | / \ / \ / \ / \ | |||
| | RE blanking fraction | | S |--| 0 | - - - - - - - - | i |--| D | | |||
| 3% |--------------------------------+===== | \ _ / \ _ / \ _ / \ _ / | |||
| | | | . . . . | |||
| 2% | | | ^ . . . . | |||
| | CE marking fraction | | | . . . . | |||
| 1% | +-----------------------+ | | . RE blanking fraction . . | |||
| | | | 3% |-------------------------------+======= | |||
| 0% +----------------------------------------> | | . . | . | |||
| 2% | . . | . | ||||
| | . . CE marking fraction | . | ||||
| 1% | . +----------------------+ . | ||||
| | . | . . | ||||
| 0% +---------------------------------------> | ||||
| ^ 0 ^ i ^ resource index | ^ 0 ^ i ^ resource index | |||
| | ^ | ^ | | 0 ^ 1 ^ 2 observation points | |||
| 0 | 1 | 2 observation points | | | | |||
| 1.00% 2.00% marking fraction | 1.00% 2.00% marking fraction | |||
| Figure 1: A 2-Router Example (Imprecise) | Figure 1: A 2-Router Example (Imprecise) | |||
| Figure 1 uses the two router example introduced earlier to illustrate | Figure 1 uses a simple network to illustrate how re-ECN allows | |||
| why re-ECN allows routers to measure downstream congestion. The | routers to measure downstream congestion. The horizontal axis | |||
| horizontal axis represents the index of each congestible resource | represents the index of each congestible resource (typically queues) | |||
| (typically queues) along a path through the Internet. There may be | along a path through the Internet. There may be many routers on the | |||
| many routers on the path, but we assume only two are currently | path, but we assume only two are currently congested (those with | |||
| congested (those with resource index 0 and i). The two superimposed | resource index 0 and i). The two superimposed plots show the | |||
| plots show the fraction of each extended ECN codepoint in a flow | fraction of each extended ECN codepoint in a flow observed along this | |||
| observed along this path. Given about 3% of packets reaching the | path. Given about 3% of packets reaching the destination are marked | |||
| destination are marked CE, in response to feedback the sender will | CE, in response to feedback the sender will blank the RE flag in | |||
| blank the RE flag in about 3% of packets it sends. Then approximate | about 3% of packets it sends. Then approximate downstream congestion | |||
| downstream congestion can be measured at the observation points shown | can be measured at the observation points shown along the path by | |||
| along the path by subtracting the CE marking fraction from the RE | subtracting the CE marking fraction from the RE blanking fraction, as | |||
| blanking fraction, as shown in the table below (Appendix A derives | shown in the table below (Appendix A derives these approximations | |||
| these approximations from a precise analysis). | from a precise analysis). | |||
| +-------------------+------------------------------+ | +-------------------+------------------------------+ | |||
| | Observation point | Approx downstream congestion | | | Observation point | Approx downstream congestion | | |||
| +-------------------+------------------------------+ | +-------------------+------------------------------+ | |||
| | 0 | 3% - 0% = 3% | | | 0 | 3% - 0% = 3% | | |||
| | 1 | 3% - 1% = 2% | | | 1 | 3% - 1% = 2% | | |||
| | 2 | 3% - 3% = 0% | | | 2 | 3% - 3% = 0% | | |||
| +-------------------+------------------------------+ | +-------------------+------------------------------+ | |||
| Table 2: Downstream Congestion Measured at Example Observation Points | Table 2: Downstream Congestion Measured at Example Observation Points | |||
| skipping to change at page 16, line 12 | skipping to change at page 17, line 6 | |||
| be in RECN mode, at least not until it has confirmed that the other | be in RECN mode, at least not until it has confirmed that the other | |||
| host is Re-ECT. | host is Re-ECT. | |||
| 4.1.1. RECN mode: Full re-ECN capable transport | 4.1.1. RECN mode: Full re-ECN capable transport | |||
| In full RECN mode, for each half connection, both the sender and the | In full RECN mode, for each half connection, both the sender and the | |||
| receiver each maintain an unsigned integer counter we will call ECC | receiver each maintain an unsigned integer counter we will call ECC | |||
| (echo congestion counter). The receiver maintains a count, modulo 8, | (echo congestion counter). The receiver maintains a count, modulo 8, | |||
| of how many times a CE marked packet has arrived during the half- | of how many times a CE marked packet has arrived during the half- | |||
| connection. Once a RECN connection is established, the three TCP | connection. Once a RECN connection is established, the three TCP | |||
| option flags (ECE, CWR & NS) used for ECN-related functions in | option flags (ECE, CWR & NS) used for ECN-related functions in other | |||
| previous versions of ECN are used as a 3-bit field for the receiver | versions of ECN are used as a 3-bit field for the receiver to | |||
| to repeatedly tell the sender the current value of ECC whenever it | repeatedly tell the sender the current value of ECC whenever it sends | |||
| sends a TCP ACK. We will call this the echo congestion increment | a TCP ACK. We will call this the echo congestion increment (ECI) | |||
| (ECI) field. This overloaded use of these 3 option flags as one | field. This overloaded use of these 3 option flags as one 3-bit ECI | |||
| 3-bit ECI field is shown in Figure 4. The actual definition of the | field is shown in Figure 4. The actual definition of the TCP header, | |||
| TCP header, including the addition of support for the ECN nonce, is | including the addition of support for the ECN nonce, is shown for | |||
| shown for comparison in Figure 3. This specification does not | comparison in Figure 3. This specification does not redefine the | |||
| redefine the names of these three TCP option flags, it merely | names of these three TCP option flags, it merely overloads them with | |||
| overloads them with another definition once a flow is established. | another definition once a flow is established. | |||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | | N | C | E | U | A | P | R | S | F | | | | | N | C | E | U | A | P | R | S | F | | |||
| | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | |||
| | | | | R | E | G | K | H | T | N | N | | | | | | R | E | G | K | H | T | N | N | | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| Figure 3: The (post-ECN Nonce) definition of bytes 13 and 14 of the | Figure 3: The (post-ECN Nonce) definition of bytes 13 and 14 of the | |||
| TCP Header | TCP Header | |||
| skipping to change at page 17, line 20 | skipping to change at page 18, line 16 | |||
| delayed-ACK, which would be necessary if ACK-withholding were | delayed-ACK, which would be necessary if ACK-withholding were | |||
| implemented. | implemented. | |||
| Sender Action in RECN Mode | Sender Action in RECN Mode | |||
| On the arrival of every ACK, the sender compares the ECI field | On the arrival of every ACK, the sender compares the ECI field | |||
| with its own ECC value, then replaces its local value with that | with its own ECC value, then replaces its local value with that | |||
| from the ACK. The difference D is assumed to be the number of CE | from the ACK. The difference D is assumed to be the number of CE | |||
| marked packets that arrived at the receiver since it sent the | marked packets that arrived at the receiver since it sent the | |||
| previously received ACK (but see below for the sender's safety | previously received ACK (but see below for the sender's safety | |||
| strategy). Whenever the ECI field increments by D (or D drops are | strategy). Whenever the ECI field increments by D (and/or d drops | |||
| detected), the sender MUST clear the RE flag to "0" in the IP | are detected), the sender MUST clear the RE flag to "0" in the IP | |||
| header of the next D data packets it sends, effectively re-echoing | header of the next D' data packets it sends (where D' = D + d), | |||
| each single increment of ECI. Otherwise the data sender MUST send | effectively re-echoing each single increment of ECI. Otherwise | |||
| all data packets with RE set to "1". | the data sender MUST send all data packets with RE set to "1". | |||
| As a general rule, once a flow is established, as well as setting | As a general rule, once a flow is established, as well as setting | |||
| or clearing the RE flag as above, a data sender in RECN mode MUST | or clearing the RE flag as above, a data sender in RECN mode MUST | |||
| always set the ECN field to ECT(1). However, the settings of the | always set the ECN field to ECT(1). However, the settings of the | |||
| extended ECN field during flow start are defined in Section 4.1.4. | extended ECN field during flow start are defined in Section 4.1.4. | |||
| As we have already emphasised, the re-ECN protocol makes no | As we have already emphasised, the re-ECN protocol makes no | |||
| changes and has no effect on the TCP congestion control algorithm. | changes and has no effect on the TCP congestion control algorithm. | |||
| So, each increment of ECI (or detection of a drop) also triggers | So, each increment of ECI (or detection of a drop) also triggers | |||
| the standard TCP congestion response, but with no more than one | the standard TCP congestion response, but with no more than one | |||
| skipping to change at page 18, line 5 | skipping to change at page 18, line 43 | |||
| A TCP sender also acts as the receiver for the other half- | A TCP sender also acts as the receiver for the other half- | |||
| connection. The host will maintain two ECC values S.ECC and R.ECC | connection. The host will maintain two ECC values S.ECC and R.ECC | |||
| as sender and receiver respectively. Every TCP header sent by a | as sender and receiver respectively. Every TCP header sent by a | |||
| host in RECN mode will also repeat the prevailing value of R.ECC | host in RECN mode will also repeat the prevailing value of R.ECC | |||
| in its ECI field. If a sender in RECN mode has to retransmit a | in its ECI field. If a sender in RECN mode has to retransmit a | |||
| packet due to a suspected loss, the re-transmitted packet MUST | packet due to a suspected loss, the re-transmitted packet MUST | |||
| carry the latest prevailing value of R.ECC when it is re- | carry the latest prevailing value of R.ECC when it is re- | |||
| transmitted, which will not necessarily be the one it carried | transmitted, which will not necessarily be the one it carried | |||
| originally. | originally. | |||
| 4.1.1.1. Safety against Long Pure ACK Loss Sequences | 4.1.1.1. Drops and Marks | |||
| Re-ECN is based on the ECN protocol [RFC3168] which in turn is | ||||
| typically based on the RED algorithm [RFC2309]. This algorithm marks | ||||
| packets as CE with a probability that increases as the size of the | ||||
| router queue increases. Howeverif the queue becomes too full then it | ||||
| will revert to dropping packets. Because of this it is important | ||||
| that re-ECN treats each packet drop it detects as if it were actually | ||||
| a CE mark. This ensures that it can continue to correctly echo | ||||
| congestion even through a highly congested path. | ||||
| In order to ensure that drops are correctly echoed the sender needs | ||||
| to add the number of drops detected per RTT to the difference in ECI | ||||
| value waiting to be echoed. A drop is defined as set out in | ||||
| [RFC2581] -- if the connection is in slow start then a single | ||||
| duplicate aknowledgement will be treated as an indication of a drop. | ||||
| When the system is in the congestion avoidance stage then 3 duplicate | ||||
| acknowledgements will be treated as a sign of a drop. In all cases, | ||||
| if a re-transmission time-out occurs then that will be treatd as a | ||||
| drop. | ||||
| 4.1.1.2. Safety against Long Pure ACK Loss Sequences | ||||
| The ECI method was chosen for echoing congestion marking because a | The ECI method was chosen for echoing congestion marking because a | |||
| re-ECN sender needs to know about every CE mark arriving at the | re-ECN sender needs to know about every CE mark arriving at the | |||
| receiver, not just whether at least one arrives within a round trip | receiver, not just whether at least one arrives within a round trip | |||
| time (which is all the ECE/CWR mechanism supported). And, as pure | time (which is all the ECE/CWR mechanism supported). And, as pure | |||
| ACKs are not protected by TCP reliable delivery, we repeat the same | ACKs are not protected by TCP reliable delivery, we repeat the same | |||
| ECI value in every ACK until it changes. Even if many ACKs in a row | ECI value in every ACK until it changes. Even if many ACKs in a row | |||
| are lost, as soon as one gets through, the ECI field it repeats from | are lost, as soon as one gets through, the ECI field it repeats from | |||
| previous ACKs that didn't get through will update the sender on how | previous ACKs that didn't get through will update the sender on how | |||
| many CE marks arrived since the last ACK got through. | many CE marks arrived since the last ACK got through. | |||
| skipping to change at page 22, line 24 | skipping to change at page 23, line 36 | |||
| means that Re-ECT server B MUST set FNE on a SYN ACK whether it is | means that Re-ECT server B MUST set FNE on a SYN ACK whether it is | |||
| responding to a SYN from a Re-ECT client or from a client that is | responding to a SYN from a Re-ECT client or from a client that is | |||
| merely ECN-capable. | merely ECN-capable. | |||
| The original ECN specification [RFC3168] required SYNs and SYN ACKs | The original ECN specification [RFC3168] required SYNs and SYN ACKs | |||
| to use the Not-ECT codepoint of the ECN field. The aim was to | to use the Not-ECT codepoint of the ECN field. The aim was to | |||
| prevent well-known DoS attacks such as SYN flooding being able to | prevent well-known DoS attacks such as SYN flooding being able to | |||
| gain from the advantage that ECN capability afforded over drop at | gain from the advantage that ECN capability afforded over drop at | |||
| ECN-capable routers. | ECN-capable routers. | |||
| For a SYN ACK, Kuzmanovic [I-D.ietf-tsvwg-ecnsyn] has shown that this | For a SYN ACK, Kuzmanovic [I-D.ietf-tcpm-ecnsyn] has shown that this | |||
| caution was unnecessary, and proposes to allow a SYN ACK to be ECN- | caution was unnecessary, and proposes to allow a SYN ACK to be ECN- | |||
| capable to improve performance. We have gone further by proposing to | capable to improve performance. We have gone further by proposing to | |||
| make the initial SYN ECN-capable too. By stipulating the FNE | make the initial SYN ECN-capable too. By stipulating the FNE | |||
| codepoint for the initial SYN, we comply with RFC3168 in word but not | codepoint for the initial SYN, we comply with RFC3168 in word but not | |||
| in spirit, because we have indeed set the ECN field to Not-ECT, but | in spirit, because we have indeed set the ECN field to Not-ECT, but | |||
| we have extended the ECN field with another bit. And it will be seen | we have extended the ECN field with another bit. And it will be seen | |||
| (Section 5.3) that we have defined one setting of that bit to mean an | (Section 5.3) that we have defined one setting of that bit to mean an | |||
| ECN-capable transport. Therefore, by proposing that the FNE | ECN-capable transport. Therefore, by proposing that the FNE | |||
| codepoint MUST be used on the initial SYN of a connection, we have | codepoint MUST be used on the initial SYN of a connection, we have | |||
| (deliberately) made the initial SYN ECN-capable. Section 5.4 | (deliberately) made the initial SYN ECN-capable. Section 5.4 | |||
| skipping to change at page 26, line 26 | skipping to change at page 27, line 37 | |||
| If the sender transport does not have sufficient feedback to even | If the sender transport does not have sufficient feedback to even | |||
| estimate the path's CE rate, it SHOULD set FNE continuously. If the | estimate the path's CE rate, it SHOULD set FNE continuously. If the | |||
| sender transport has some, perhaps stale, feedback to estimate that | sender transport has some, perhaps stale, feedback to estimate that | |||
| the path's CE rate is nearly definitely less than E%, the transport | the path's CE rate is nearly definitely less than E%, the transport | |||
| MAY blank RE in packets for E% of sent octets, and set the RECT | MAY blank RE in packets for E% of sent octets, and set the RECT | |||
| codepoint for the remainder. | codepoint for the remainder. | |||
| The following sections give guidelines on how re-ECN support could be | The following sections give guidelines on how re-ECN support could be | |||
| added to RSVP or NSIS, to DCCP, and to SCTP - although separate | added to RSVP or NSIS, to DCCP, and to SCTP - although separate | |||
| Internet drafts will be necessary to document the exact mechanics of | Internet drafts will be necessary to document the exact mechanics of | |||
| re-ECN if each of these protocols. | re-ECN in each of these protocols. | |||
| {ToDo: Give a brief outline of what would be expected for each of the | {ToDo: Give a brief outline of what would be expected for each of the | |||
| following: | following: | |||
| o UDP fire and forget (e.g. DNS) | o UDP fire and forget (e.g. DNS) | |||
| o UDP streaming with no feedback | o UDP streaming with no feedback | |||
| o UDP streaming with feedback | o UDP streaming with feedback | |||
| } | } | |||
| 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS | 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS | |||
| A separate I-D has been submitted [Re-PCN] describing how re-ECN can | A separate I-D has been submitted [Re-PCN] describing how re-ECN can | |||
| be used in an edge-to-edge rather than end-to-end scenario. It can | be used in an edge-to-edge rather than end-to-end scenario. It can | |||
| then be used by downstream networks to police whether upstream | then be used by downstream networks to police whether upstream | |||
| networks are blocking new flow reservations when downstream | networks are blocking new flow reservations when downstream | |||
| congestion is too high, even though the congestion is in other | congestion is too high, even though the congestion is in other | |||
| operators' downstream networks. This relates to current work in | operators' downstream networks. This relates to current IETF work on | |||
| progress on Admission Control over Diffserv using Pre-Congestion | Admission Control over Diffserv using Pre-Congestion Notification | |||
| Notification, being reported to the IETF TSVWG [CL-deploy]. | (PCN) [PCN-arch]. | |||
| 4.2.3. Guidelines for adding Re-ECN to DCCP | 4.2.3. Guidelines for adding Re-ECN to DCCP | |||
| Beside adjusting the initial features negotiation sequence, operating | Beside adjusting the initial features negotiation sequence, operating | |||
| re-ECN in DCCP could be achieved by defining a new option to be added | re-ECN in DCCP [RFC4340] could be achieved by defining a new option | |||
| to acknowledgments, that would include a multibit field where the | to be added to acknowledgments, that would include a multibit field | |||
| destination could copy its ECC. | where the destination could copy its ECC. | |||
| 4.2.4. Guidelines for adding Re-ECN to SCTP | 4.2.4. Guidelines for adding Re-ECN to SCTP | |||
| Annex 1 in RFC4340 gives the specifications for SCTP to support ECN. | Annex 1 in [RFC2960] gives the specifications for SCTP to support | |||
| Similar steps should be taken to support re-ECN. Beside adjusting | ECN. Similar steps should be taken to support re-ECN. Beside | |||
| the initial features negotiation sequence, operating re-ECN in SCTP | adjusting the initial features negotiation sequence, operating re-ECN | |||
| could be achieved by defining a new control chunk, that would include | in SCTP could be achieved by defining a new control chunk, that would | |||
| a multibit field where the destination could copy its ECC | include a multibit field where the destination could copy its ECC | |||
| 5. Network Layer | 5. Network Layer | |||
| 5.1. Re-ECN IPv4 Wire Protocol | 5.1. Re-ECN IPv4 Wire Protocol | |||
| The wire protocol of the ECN field in the IP header remains largely | The wire protocol of the ECN field in the IP header remains largely | |||
| unchanged from [RFC3168]. However, an extension to the ECN field we | unchanged from [RFC3168]. However, an extension to the ECN field we | |||
| call the RE (re-ECN extension) flag (Section 3.2) is defined in this | call the RE (re-ECN extension) flag (Section 3.2) is defined in this | |||
| document. It doubles the extended ECN codepoint space, giving 8 | document. It doubles the extended ECN codepoint space, giving 8 | |||
| potential codepoints. The semantics of the extra codepoints are | potential codepoints. The semantics of the extra codepoints are | |||
| skipping to change at page 29, line 8 | skipping to change at page 30, line 14 | |||
| 5.2. Re-ECN IPv6 Wire Protocol | 5.2. Re-ECN IPv6 Wire Protocol | |||
| For IPv6, this document proposes that the new RE control flag will be | For IPv6, this document proposes that the new RE control flag will be | |||
| positioned as the first bit of the option field of a new Congestion | positioned as the first bit of the option field of a new Congestion | |||
| hop by hop option header (Figure 6). | hop by hop option header (Figure 6). | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| | Next Header | Hdr ext Len | Option Type | Option Len | | | Next Header | Hdr ext Len | Option Type | Opt Length =4 | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |R| Reserved for future use | | |R| Reserved for future use | | |||
| |E| | | |E| | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| Figure 6: Definition of a New IPv6 Congestion Hop by Hop Option | Figure 6: Definition of a New IPv6 Congestion Hop by Hop Option | |||
| Header containing the Re-ECN Extension (RE) Control Flag | Header containing the Re-ECN Extension (RE) Control Flag | |||
| 0 1 2 3 4 5 6 7 8 | 0 1 2 3 4 5 6 7 8 | |||
| +-+-+-+-+-+-+-+-+- | +-+-+-+-+-+-+-+-+- | |||
| skipping to change at page 33, line 19 | skipping to change at page 34, line 19 | |||
| not aware of. Otherwise, spoof messages could be sent by malicious | not aware of. Otherwise, spoof messages could be sent by malicious | |||
| sources to slow down a sender (c.f. ICMP source quench). | sources to slow down a sender (c.f. ICMP source quench). | |||
| However, the need for this message type is not yet confirmed, as we | However, the need for this message type is not yet confirmed, as we | |||
| are considering how to prevent it being used by malicious senders to | are considering how to prevent it being used by malicious senders to | |||
| scan for droppers and to test their threshold settings. {ToDo: | scan for droppers and to test their threshold settings. {ToDo: | |||
| Complete this section.} | Complete this section.} | |||
| 5.5.2. Rate Response Control | 5.5.2. Rate Response Control | |||
| The incentive framework of Section 6.1.3 implies there may be a need | As discussed in Section 6.1.5 the sender's access operator will be | |||
| for a sender to send a request to an ingress policer asking that it | expected to use bulk per-user policing, but they might choose to | |||
| be allowed to apply a non-default response to congestion (where TCP- | introduce a per-flow policer. In cases where operators do introduce | |||
| friendly is assumed to be the default). This would require the | per-flow policing, there may be a need for a sender to send a request | |||
| sender to know what message format(s) to use and to be able to | to the ingress policer asking for permission to apply a non-default | |||
| discover how to address the policer. The required control | response to congestion (where TCP-friendly is assumed to be the | |||
| protocol(s) are outside the scope of this document, but will require | default). This would require the sender to know what message | |||
| definition elsewhere. | format(s) to use and to be able to discover how to address the | |||
| policer. The required control protocol(s) are outside the scope of | ||||
| this document, but will require definition elsewhere. | ||||
| The policer is likely to be local to the sender and inline, probably | The policer is likely to be local to the sender and inline, probably | |||
| at the ingress interface to the internetwork. So, discovery should | at the ingress interface to the internetwork. So, discovery should | |||
| not be hard. A variety of control protocols already exist for some | not be hard. A variety of control protocols already exist for some | |||
| widely used rate-responses to congestion. For instance DCCP | widely used rate-responses to congestion. For instance DCCP | |||
| congestion control identifiers (CCIDs [RFC4340]) fulfil this role and | congestion control identifiers (CCIDs [RFC4340]) fulfil this role and | |||
| so does QoS signalling (e.g. and RSVP request for controlled load | so does QoS signalling (e.g. and RSVP request for controlled load | |||
| service is equivalent to a request for no rate response to | service is equivalent to a request for no rate response to | |||
| congestion, but with admission control). | congestion, but with admission control). | |||
| 5.6. IP in IP Tunnels | 5.6. IP in IP Tunnels | |||
| For re-ECN to work correctly through IP in IP tunnels, it needs | For re-ECN to work correctly through IP in IP tunnels, it needs | |||
| slightly different tunnel handling to regular ECN [RFC3168]. | slightly different tunnel handling to regular ECN [RFC3168]. | |||
| Ideally, for re-ECN to work through a tunnel, the tunnel entry should | Currently there is some incosistency between how the handling of IP | |||
| copy both the RE flag and the ECN field from the inner to the outer | in IP tunnels is defined in [RFC3168] and how it is defined in | |||
| IP header. Then at the tunnel exit, any congestion marking of the | [RFC4301], but re-ECN would work fine with the IPsec behaviour. This | |||
| outer ECN field should overwrite the inner ECN field (unless the | inconsistency is addressed in a new Internet Draft [ECN-tunnel] that | |||
| inner field is Not-ECT in which case an alarm should be raised). The | proposes to update RFC3168 tunnel behaviour to bring it into line | |||
| RE flag shouldn't change along a path, so the outer RE flag should be | with IPsec. Ideally, for re-ECN to work through a tunnel, the tunnel | |||
| the same as the inner. If it isn't a management alarm should be | entry should copy both the RE flag and the ECN field from the inner | |||
| raised. This behaviour is the same as the full-functionality variant | to the outer IP header. Then at the tunnel exit, any congestion | |||
| of [RFC3168] at tunnel exit, but different at tunnel entry. | marking of the outer ECN field should overwrite the inner ECN field | |||
| (unless the inner field is Not-ECT in which case an alarm should be | ||||
| raised). The RE flag shouldn't change along a path, so the outer RE | ||||
| flag should be the same as the inner. If it isn't a management alarm | ||||
| should be raised. This behaviour is the same as the full- | ||||
| functionality variant of [RFC3168] at tunnel exit, but different at | ||||
| tunnel entry. | ||||
| If tunnels are left as they are specified in [RFC3168], whether the | If tunnels are left as they are specified in [RFC3168], whether the | |||
| limited or full-functionality variants are used, a problem arises | limited or full-functionality variants are used, a problem arises | |||
| with re-ECN if a tunnel crosses an inter-domain boundary, because the | with re-ECN if a tunnel crosses an inter-domain boundary, because the | |||
| difference between positive and negative markings will not be | difference between positive and negative markings will not be | |||
| correctly accounted for. In a limited functionality ECN tunnel, the | correctly accounted for. In a limited functionality ECN tunnel, the | |||
| flow will appear to be legacy traffic, and therefore may be wrongly | flow will appear to be legacy traffic, and therefore may be wrongly | |||
| rate limited. In a full-functionality ECN tunnel, the result will | rate limited. In a full-functionality ECN tunnel, the result will | |||
| depend whether the tunnel entry copies the inner RE flag to the outer | depend whether the tunnel entry copies the inner RE flag to the outer | |||
| header or the RE flag in the outer header is always cleared. If the | header or the RE flag in the outer header is always cleared. If the | |||
| former, the flow will tend to be too positive when accounted for at | former, the flow will tend to be too positive when accounted for at | |||
| borders. If the latter, it will be too negative. | borders. If the latter, it will be too negative. If the rules set | |||
| out in [ECN-tunnel] are followed then this will not be an issue. | ||||
| {ToDo: A future version of this draft will discuss the necessary | ||||
| changes to IP in IP tunnels in more depth.} | ||||
| 5.7. Non-Issues | 5.7. Non-Issues | |||
| The following issues might seem to cause unfavourable interactions | The following issues might seem to cause unfavourable interactions | |||
| with re-ECN, but we will explain why they don't: | with re-ECN, but we will explain why they don't: | |||
| o Various link layers support explicit congestion notification, such | o Various link layers support explicit congestion notification, such | |||
| as Frame Relay and ATM. Explicit congestion notification is | as Frame Relay and ATM. Explicit congestion notification is | |||
| proposed to be added to other link layers, such as Ethernet | proposed to be added to other link layers, such as Ethernet | |||
| (802.3ar Ethernet congestion management) and MPLS [ECN-MPLS]; | (802.3ar Ethernet congestion management) and MPLS [ECN-MPLS]; | |||
| skipping to change at page 35, line 31 | skipping to change at page 36, line 37 | |||
| 6. Applications | 6. Applications | |||
| 6.1. Policing Congestion Response | 6.1. Policing Congestion Response | |||
| 6.1.1. The Policing Problem | 6.1.1. The Policing Problem | |||
| The current Internet architecture trusts hosts to respond voluntarily | The current Internet architecture trusts hosts to respond voluntarily | |||
| to congestion. Limited evidence shows that the large majority of | to congestion. Limited evidence shows that the large majority of | |||
| end-points on the Internet comply with a TCP-friendly response to | end-points on the Internet comply with a TCP-friendly response to | |||
| congestion. But telephony (and increasingly video) services over the | congestion. But telephony (and increasingly video) services over the | |||
| best efforts Internet are attracting the interest of major commercial | best effort Internet are attracting the interest of major commercial | |||
| operations. Most of these applications do not respond to congestion | operations. Most of these applications do not respond to congestion | |||
| at all. Those that can switch to lower rate codecs, still have a | at all. Those that can switch to lower rate codecs, still have a | |||
| lower bound below which they must become unresponsive to congestion. | lower bound below which they must become unresponsive to congestion. | |||
| Of course, the Internet is intended to support many different | Of course, the Internet is intended to support many different | |||
| application behaviours. But the problem is that this freedom can be | application behaviours. But the problem is that this freedom can be | |||
| exercised irresponsibly. The greater problem is that we will never | exercised irresponsibly. The greater problem is that we will never | |||
| be able to agree on where the boundary is between responsible and | be able to agree on where the boundary is between responsible and | |||
| irresponsible. Therefore re-ECN is designed to allow different | irresponsible. Therefore re-ECN is designed to allow different | |||
| networks to set their own view of the limit to irresponsibility, and | networks to set their own view of the limit to irresponsibility, and | |||
| skipping to change at page 37, line 37 | skipping to change at page 38, line 44 | |||
| return address at a higher layer. | return address at a higher layer. | |||
| 6.1.3. Re-ECN Incentive Framework | 6.1.3. Re-ECN Incentive Framework | |||
| The aim is to create an incentive environment that ensures optimal | The aim is to create an incentive environment that ensures optimal | |||
| sharing of capacity despite everyone acting selfishly (including | sharing of capacity despite everyone acting selfishly (including | |||
| lying and cheating). Of course, the mechanisms put in place for this | lying and cheating). Of course, the mechanisms put in place for this | |||
| can lie dormant wherever co-operation is the norm. | can lie dormant wherever co-operation is the norm. | |||
| Throughout this document we focus on path congestion. But some forms | Throughout this document we focus on path congestion. But some forms | |||
| of fairness, particularly TCP's, also depend on round trip time. So, | of fairness, particularly TCP's, also depend on round trip time. If | |||
| we also propose to measure downstream path delay using re-feedback. | TCP-fairness is required, we also propose to measure downstream path | |||
| This proposal will be published in a very simple future draft, but | delay using re-feedback. We give a simple outline of how this could | |||
| for now we give an outline in Appendix F. | work in Appendix F. However, we do not expect this to be necessary, | |||
| as researchers tend to agree that only congestion control dynamics | ||||
| need to depend on RTT, not the rate that the algorithm would converge | ||||
| on after a period of stability. | ||||
| Figure 8 sketches the incentive framework that we will describe piece | Figure 8 sketches the incentive framework that we will describe piece | |||
| by piece throughout this section. We will do a first pass in | by piece throughout this section. We will do a first pass in | |||
| overview, then return to each piece in detail. We re-use the earlier | overview, then return to each piece in detail. We re-use the earlier | |||
| example of how downstream congestion is derived by subtracting | example of how downstream congestion is derived by subtracting | |||
| upstream congestion from path congestion (Figure 1) but depict | upstream congestion from path congestion (Figure 1) but depict | |||
| multiple trust boundaries to turn it into an internetwork. For | multiple trust boundaries to turn it into an internetwork. For | |||
| clarity, only downstream congestion is shown (the difference between | clarity, only downstream congestion is shown (the difference between | |||
| the two earlier plots). The graph displays downstream path | the two earlier plots). The graph displays downstream path | |||
| congestion seen in a typical flow as it traverses an example path | congestion seen in a typical flow as it traverses an example path | |||
| skipping to change at page 39, line 12 | skipping to change at page 40, line 42 | |||
| enhanced QoS), to some extent it will always be against the | enhanced QoS), to some extent it will always be against the | |||
| sender's interest to comply. | sender's interest to comply. | |||
| Ingress policing: But it is in all the network operators' interests | Ingress policing: But it is in all the network operators' interests | |||
| to encourage fair congestion response, so that their investments | to encourage fair congestion response, so that their investments | |||
| are employed to satisfy the most valuable demand. The re-ECN | are employed to satisfy the most valuable demand. The re-ECN | |||
| protocol ensures packets carry the necessary information about | protocol ensures packets carry the necessary information about | |||
| their own expected downstream congestion so that N1 can deploy a | their own expected downstream congestion so that N1 can deploy a | |||
| policer at its ingress to check that S1 is complying with whatever | policer at its ingress to check that S1 is complying with whatever | |||
| congestion control it should be using (Section 6.1.5). If N1 is | congestion control it should be using (Section 6.1.5). If N1 is | |||
| extremely conservative it may police each flow, but it can choose | extremely conservative it could police each flow, but it is likely | |||
| to just police the bulk amount of congestion each customer causes | to just police the bulk amount of congestion each customer causes | |||
| without regard to flows, or if it is extremely liberal it need not | without regard to flows, or if it is extremely liberal it need not | |||
| police congestion control at all. Whatever, it is always | police congestion control at all. Whatever, it is always | |||
| preferable to police traffic at the very first ingress into an | preferable to police traffic at the very first ingress into an | |||
| internetwork, before non-compliant traffic can cause any damage. | internetwork, before non-compliant traffic can cause any damage. | |||
| Edge egress dropper: If the policer ensures the source has less | Edge egress dropper: If the policer ensures the source has less | |||
| right to a high rate the higher it declares downstream congestion, | right to a high rate the higher it declares downstream congestion, | |||
| the source has a clear incentive to understate downstream | the source has a clear incentive to understate downstream | |||
| congestion. But, if flows of packets are understated when they | congestion. But, if flows of packets are understated when they | |||
| skipping to change at page 40, line 41 | skipping to change at page 42, line 21 | |||
| at the egress of N2. Then N2 has an incentive either to police | at the egress of N2. Then N2 has an incentive either to police | |||
| the congestion response of its own ingress traffic (from N1) or to | the congestion response of its own ingress traffic (from N1) or to | |||
| emulate policing by applying penalties to N1 in turn on the basis | emulate policing by applying penalties to N1 in turn on the basis | |||
| of congestion counted at their mutual boundary. In this recursive | of congestion counted at their mutual boundary. In this recursive | |||
| way, the incentives for each flow to respond correctly to | way, the incentives for each flow to respond correctly to | |||
| congestion trace back with each flow precisely to each source, | congestion trace back with each flow precisely to each source, | |||
| despite the mechanism not recognising flows (see Section 6.2.2). | despite the mechanism not recognising flows (see Section 6.2.2). | |||
| Inter-domain congestion charging diversity: Any two networks are | Inter-domain congestion charging diversity: Any two networks are | |||
| free to agree any of a range of penalty regimes between themselves | free to agree any of a range of penalty regimes between themselves | |||
| but they would only provide the right incentives if they were | ||||
| within the following reasonable constraints. N2 should expect to | within the following reasonable constraints. N2 should expect to | |||
| have to pay penalties to N4 where penalties monotonically increase | have to pay penalties to N4 where penalties monotonically increase | |||
| with the volume of congestion and negative penalties are not | with the volume of congestion and negative penalties are not | |||
| allowed. For instance, they may agree an SLA with tiered | allowed. For instance, they may agree an SLA with tiered | |||
| congestion thresholds, where higher penalties apply the higher the | congestion thresholds, where higher penalties apply the higher the | |||
| threshold that is broken. But the most obvious (and useful) form | threshold that is broken. But the most obvious (and useful) form | |||
| of penalty is where N4 levies a charge on N2 proportional to the | of penalty is where N4 levies a charge on N2 proportional to the | |||
| volume of downstream congestion N2 dumps into N4. In the | volume of downstream congestion N2 dumps into N4. In the | |||
| explanation that follows, we assume this specific variant of | explanation that follows, we assume this specific variant of | |||
| volume charging between networks - charging proportionate to the | volume charging between networks - charging proportionate to the | |||
| skipping to change at page 41, line 14 | skipping to change at page 42, line 43 | |||
| We must make clear that we are not advocating that everyone should | We must make clear that we are not advocating that everyone should | |||
| use this form of contract. We are well aware that the IETF tries | use this form of contract. We are well aware that the IETF tries | |||
| to avoid standardising technology that depends on a particular | to avoid standardising technology that depends on a particular | |||
| business model. And we strongly share this desire to encourage | business model. And we strongly share this desire to encourage | |||
| diversity. But our aim is merely to show that border policing can | diversity. But our aim is merely to show that border policing can | |||
| at least work with this one model, then we can assume that | at least work with this one model, then we can assume that | |||
| operators might experiment with the metric in other models (see | operators might experiment with the metric in other models (see | |||
| Section 6.1.6 for examples). Of course, operators are free to | Section 6.1.6 for examples). Of course, operators are free to | |||
| complement this usage element of their charges with traditional | complement this usage element of their charges with traditional | |||
| capacity charging, and we expect they will. | capacity charging, and we expect they will as predicted by | |||
| economics. | ||||
| No congestion charging to users: Bulk congestion penalties at trust | No congestion charging to users: Bulk congestion penalties at trust | |||
| boundaries are passive and extremely simple, and lose none of | boundaries are passive and extremely simple, and lose none of | |||
| their per-packet precision from one boundary to the next (unlike | their per-packet precision from one boundary to the next (unlike | |||
| Diffserv all-address traffic conditioning agreements, which | Diffserv all-address traffic conditioning agreements, which | |||
| dissipate their effectiveness across long topologies). But at any | dissipate their effectiveness across long topologies). But at any | |||
| trust boundary, there is no imperative to use congestion charging. | trust boundary, there is no imperative to use congestion charging. | |||
| Traditional traffic policing can be used, if the complexity and | Traditional traffic policing can be used, if the complexity and | |||
| cost is preferred. In particular, at the boundary with end | cost is preferred. In particular, at the boundary with end | |||
| customers (e.g. between S and N1), traffic policing will most | customers (e.g. between S and N1), traffic policing will most | |||
| likely be more appropriate. Policer complexity is less of a | likely be more appropriate. Policer complexity is less of a | |||
| concern at the edge of the network. And end-customers are known | concern at the edge of the network. And end-customers are known | |||
| to be highly averse to the unpredictability of congestion | to be highly averse to the unpredictability of congestion | |||
| charging. | charging. | |||
| NOTE WELL: This document neither advocates nor requires congestion | NOTE WELL: This document neither advocates nor requires congestion | |||
| charging for end customers and advocates but does not require | charging for end customers and advocates but does not require | |||
| skipping to change at page 41, line 40 | skipping to change at page 43, line 23 | |||
| NOTE WELL: This document neither advocates nor requires congestion | NOTE WELL: This document neither advocates nor requires congestion | |||
| charging for end customers and advocates but does not require | charging for end customers and advocates but does not require | |||
| inter-domain congestion charging. | inter-domain congestion charging. | |||
| Competitive discipline of inter-domain traffic engineering: With | Competitive discipline of inter-domain traffic engineering: With | |||
| inter-domain congestion charging, a domain seems to have a | inter-domain congestion charging, a domain seems to have a | |||
| perverse incentive to fake congestion; N2's profit depends on the | perverse incentive to fake congestion; N2's profit depends on the | |||
| difference between congestion at its ingress (its revenue) and at | difference between congestion at its ingress (its revenue) and at | |||
| its egress (its cost). So, overstating internal congestion seems | its egress (its cost). So, overstating internal congestion seems | |||
| to increase profit. However, smart border routing [Smart_rtg] by | to increase profit. However, smart border routing [Smart_rtg] by | |||
| N1 will bias its multipath routing towards the least cost routes. | N1 will bias its routing towards the least cost routes. So, N2 | |||
| So, N2 risks losing all its revenue to competitive routes if it | risks losing all its revenue to competitive routes if it | |||
| overstates congestion (see Section 6.2.3). In other words, if N2 | overstates congestion (see Section 6.2.3). In other words, if N2 | |||
| is the least congested route, its ability to raise excess profits | is the least congested route, its ability to raise excess profits | |||
| is limited by the congestion on the next least congested route. | is limited by the congestion on the next least congested route. | |||
| This pressure on N2 to remain competitive is represented by the | This pressure on N2 to remain competitive is represented by the | |||
| dotted downward arrow at the ingress to N2 in Figure 9. | dotted downward arrow at the ingress to N2 in Figure 9. | |||
| Closing the loop: All the above elements conspire to trap everyone | Closing the loop: All the above elements conspire to trap everyone | |||
| between two opposing pressures (the downward and upward arrows in | between two opposing pressures (the downward and upward arrows in | |||
| Figure 8 & Figure 9), ensuring the downstream congestion metric | Figure 8 & Figure 9), ensuring the downstream congestion metric | |||
| arrives at the destination neither above nor below zero. So, we | arrives at the destination neither above nor below zero. So, we | |||
| skipping to change at page 42, line 24 | skipping to change at page 44, line 7 | |||
| superior to bottleneck policing or to any policing of different | superior to bottleneck policing or to any policing of different | |||
| QoS for different flows. Even if all access networks choose to | QoS for different flows. Even if all access networks choose to | |||
| conservatively police congestion per flow, each will want to | conservatively police congestion per flow, each will want to | |||
| compete with the others to allow new responses to congestion for | compete with the others to allow new responses to congestion for | |||
| new types of application. With re-ECN, each can introduce new | new types of application. With re-ECN, each can introduce new | |||
| controls independently, without coordinating with other networks | controls independently, without coordinating with other networks | |||
| and without having to standardise anything. But, as we have just | and without having to standardise anything. But, as we have just | |||
| seen, by making inter-domain penalties proportionate to bulk | seen, by making inter-domain penalties proportionate to bulk | |||
| downtream congestion, downstream networks can be agnostic to the | downtream congestion, downstream networks can be agnostic to the | |||
| specific congestion response for each flow, but they can still | specific congestion response for each flow, but they can still | |||
| apply more back-pressure the more liberal the ingress access | apply more penalty the more liberal the ingress access network has | |||
| network has been in the response to congestion it allowed for each | been in the response to congestion it allowed for each flow. | |||
| flow. | ||||
| 6.1.3.1. The Case against Classic Feedback | 6.1.3.1. The Case against Classic Feedback | |||
| A system that produces an optimal outcome as a result of everyone's | A system that produces an optimal outcome as a result of everyone's | |||
| selfish actions is extremely powerful. Especially one that enables | selfish actions is extremely powerful. Especially one that enables | |||
| evolvability of congestion control. But why do we have to change to | evolvability of congestion control. But why do we have to change to | |||
| re-ECN to achieve it? Can't classic congestion feedback (as used | re-ECN to achieve it? Can't classic congestion feedback (as used | |||
| already by standard ECN) be arranged to provide similar incentives | already by standard ECN) be arranged to provide similar incentives | |||
| and similar evolvability? Superficially it can. Kelly's seminal | and similar evolvability? Superficially it can. Kelly's seminal | |||
| work showed how we can allow everyone the freedom to evolve whatever | work showed how we can allow everyone the freedom to evolve whatever | |||
| congestion control behaviour is in their application's best interest | congestion control behaviour is in their application's best interest | |||
| but still optimise the whole system of networks and users by placing | but still optimise the whole system of networks and users by placing | |||
| a price on congestion to ensure responsible use of this | a price on congestion to ensure responsible use of this | |||
| freedom [Evol_cc]). Kelly used ECN with its classic congestion | freedom [Evol_cc]). Kelly used ECN with its classic congestion | |||
| feedback model as the mechanism to convey congestion price | feedback model as the mechanism to convey congestion price | |||
| information. The mechanism was nearly identical to volume charging; | information. The mechanism could be thought of as volume charging; | |||
| except only the volume of packets marked with congestion experienced | except only the volume of packets marked with congestion experienced | |||
| (CE) was counted. | (CE) was counted. | |||
| However, below we explain why relying on classic feedback /required/ | However, below we explain why relying on classic feedback /required/ | |||
| congestion charging to be used, while re-ECN achieves the same | congestion charging to be used, while re-ECN achieves the same | |||
| powerful outcome (given it is built on Kelly's foundations), but does | powerful outcome (given it is built on Kelly's foundations), but does | |||
| not /require/ congestion charging. In brief, the problem with | not /require/ congestion charging. In brief, the problem with | |||
| classic feedback is that the incentives have to trace the indirect | classic feedback is that the incentives have to trace the indirect | |||
| path back to the sender---the long way round the feedback loop. For | path back to the sender---the long way round the feedback loop. For | |||
| example, if classic feedback were used in Figure 8, N2 would have had | example, if classic feedback were used in Figure 8, N2 would have had | |||
| skipping to change at page 45, line 22 | skipping to change at page 47, line 5 | |||
| from the receiver. So, counting packets with FNE cleared would be | from the receiver. So, counting packets with FNE cleared would be | |||
| likely to make the average unnecessarily positive, providing headroom | likely to make the average unnecessarily positive, providing headroom | |||
| (or should we say footroom?) for dishonest (negative) traffic. | (or should we say footroom?) for dishonest (negative) traffic. | |||
| If the dropper detects a persistently negative flow, it SHOULD drop | If the dropper detects a persistently negative flow, it SHOULD drop | |||
| sufficient negative and neutral packets to force the flow to not be | sufficient negative and neutral packets to force the flow to not be | |||
| negative. Drops SHOULD be focused on just sufficient packets in | negative. Drops SHOULD be focused on just sufficient packets in | |||
| misbehaving flows to remove the negative bias while doing minimal | misbehaving flows to remove the negative bias while doing minimal | |||
| extra harm. | extra harm. | |||
| 6.1.5. Rate Policing | 6.1.5. Policing | |||
| Access operators who wish to check that a sender is complying with a | Access operators who wish to limit the congeston that a sender is | |||
| particular rate response to congestion can deploy rate policers at | able to cause can deploy policers at the very first ingress to the | |||
| the very first ingress to the internetwork. Re-ECN has been designed | internetwork. Re-ECN has been designed to avoid the need for | |||
| to avoid the need for bottleneck policing so that we can avoid a | bottleneck policing so that we can avoid a future where a single rate | |||
| future where a single rate adaptation policy is embedded throughout | adaptation policy is embedded throughout the network. Instead, re- | |||
| the network. Instead, re-ECN allows the particular rate adaptation | ECN allows the particular rate adaptation policy to be solely agreed | |||
| policy to be solely agreed bilaterally between the sender and its | bilaterally between the sender and its ingress access provider | |||
| ingress access provider (Section 5.5.2 discusses possible ways to | (Section 5.5.2 discusses possible ways to signal between them), which | |||
| signal between them), which allows congestion control to be policed, | allows congestion control to be policed, but maintains its | |||
| but maintains its evolvability, requiring only a single, local box to | evolvability, requiring only a single, local box to be updated. | |||
| be updated. | ||||
| If desired, the re-ECN protocol allows these ingress policers to | Appendix G gives examples of per-user policing algorithms. But there | |||
| perform per-flow policing according to the widely adopted TCP rate | is no implication that these algorithms are to be standardised, or | |||
| adaptation, perhaps as a default. But it also allows new rate | that they are ideal. The ingress rate policer is the part of the re- | |||
| adaptation policies beyond TCP to be enforced. Perhaps more | ECN incentive framework that is intended to be the most flexible. | |||
| usefully, it also allows the flexibility for networks to choose to | Once endpoint protocol handlers for re-ECN and egress droppers are in | |||
| police users as a whole, rather than flows. | place, operators can choose exactly which congestion response they | |||
| want to police, and whether they want to do it per user, per flow or | ||||
| not at all. | ||||
| Appendix G gives examples of per-user and per-flow policing | The re-ECN protocol allows these ingress policers to easily perform | |||
| algorithms. But there is no implication that these algorithms are to | bulk per-user policing (Appendix G.1). This is likely to provide | |||
| be standardised, or that they are ideal. The ingress rate policer is | sufficient incentive to the user to correctly respond to congestion | |||
| the part of the re-ECN incentive framework that is intended to be the | without needing the policing function to be overly complex. If an | |||
| most flexible. Once endpoint protocol handlers for re-ECN and egress | access operator chose they could use per-flow policing according to | |||
| droppers are in place, operators can choose exactly which congestion | the widely adopted TCP rate adaptation ( Appendix G.2) or other | |||
| response they want to police, and whether they want to do it per | alternatives, however this would introduce extra complexity to the | |||
| user, per flow or not at all. | system. | |||
| However, if a rate policer is used, it should use path (not | If a per-flow rate policer is used, it should use path (not | |||
| downstream) congestion as the relevant metric, which is represented | downstream) congestion as the relevant metric, which is represented | |||
| by the fraction of octets in packets with positive (Re-Echo and FNE) | by the fraction of octets in packets with positive (Re-Echo and FNE) | |||
| and canceled (CE(0)) markings. Of course, re-ECN provides all the | and canceled (CE(0)) markings. Of course, re-ECN provides all the | |||
| information a policer needs directly in the packets being policed. | information a policer needs directly in the packets being policed. | |||
| So, even policing TCP's AIMD algorithm is relatively straightforward. | So, even policing TCP's AIMD algorithm is relatively straightforward | |||
| Appendix G presents an example design, but the choice of preferred | (Appendix G.2). | |||
| mechanism is up to the implementer. | ||||
| Note that we have included canceled packets in the measure of path | Note that we have included canceled packets in the measure of path | |||
| congestion. Canceled packets arise when the sender re-echoes earlier | congestion. Canceled packets arise when the sender re-echoes earlier | |||
| congestion, but then this Re-Echo packet just happens to be | congestion, but then this Re-Echo packet just happens to be | |||
| congestion marked itself. One would not normally expect many | congestion marked itself. One would not normally expect many | |||
| canceled packets at the first ingress because one would not normally | canceled packets at the first ingress because one would not normally | |||
| expect much congestion marking to have been necessary that soon in | expect much congestion marking to have been necessary that soon in | |||
| the path. However, a home network or campus network may well sit | the path. However, a home network or campus network may well sit | |||
| between the sending endpoint and the ingress policer, so some | between the sending endpoint and the ingress policer, so some | |||
| congestion may occur upstream of the policer. And if congestion does | congestion may occur upstream of the policer. And if congestion does | |||
| skipping to change at page 47, line 5 | skipping to change at page 48, line 36 | |||
| Of course, even if the sender does operate its own network, it may | Of course, even if the sender does operate its own network, it may | |||
| arrange not to congestion mark traffic. Whether the sender does this | arrange not to congestion mark traffic. Whether the sender does this | |||
| or not is of no concern to anyone else except the sender. Such a | or not is of no concern to anyone else except the sender. Such a | |||
| sender will not be policed against its own network's contribution to | sender will not be policed against its own network's contribution to | |||
| congestion, but the only resulting problem would be overload in the | congestion, but the only resulting problem would be overload in the | |||
| sender's own network. | sender's own network. | |||
| Finally, we must not forget that an easy way to circumvent re-ECN's | Finally, we must not forget that an easy way to circumvent re-ECN's | |||
| defences is for the source to turn off re-ECN support, by setting the | defences is for the source to turn off re-ECN support, by setting the | |||
| Not-RECT codepoint, implying legacy traffic. Therefore an ingress | Not-RECT codepoint, implying legacy traffic. Therefore an ingress | |||
| policer must put a general rate-limit on Not-RECT traffic, which | policer should put a general rate-limit on Not-RECT traffic, which | |||
| SHOULD be lax during early, patchy deployment, but will have to | SHOULD be lax during early, patchy deployment, but will have to | |||
| become stricter as deployment widens. Similarly, flows starting | become stricter as deployment widens. Similarly, flows starting | |||
| without an FNE packet can be confined by a strict rate-limit used for | without an FNE packet can be confined by a strict rate-limit used for | |||
| the remainder of flows that haven't proved they are well-behaved by | the remainder of flows that haven't proved they are well-behaved by | |||
| starting correctly (therefore they need not consume any flow state--- | starting correctly (therefore they need not consume any flow state--- | |||
| they are just confined to the `misbehaving' bin if they carry an | they are just confined to the `misbehaving' bin if they carry an | |||
| unrecognised flow ID). | unrecognised flow ID). | |||
| 6.1.6. Inter-domain Policing | 6.1.6. Inter-domain Policing | |||
| One of the main design goals of re-ECN is for border security | One of the main design goals of re-ECN is for border security | |||
| mechanisms to be as simple as possible, otherwise they will become | mechanisms to be as simple as possible, otherwise they will become | |||
| the pinch-points that limit scalability of the whole internetwork. | the pinch-points that limit scalability of the whole internetwork. | |||
| We want to avoid per-flow processing at borders and to keep to | We want to avoid per-flow processing at borders and to keep to | |||
| passive mechanisms that can monitor traffic in parallel to | passive mechanisms that can monitor traffic in parallel to | |||
| forwarding, rather than having to filter traffic inline---in series | forwarding, rather than having to filter traffic inline---in series | |||
| with forwarding. | with forwarding. Such passive, off-line mechanisms are essential for | |||
| future high-speed all-optical border interconnection where packets | ||||
| cannot be buffered while they are checked for policy compliance. | ||||
| So far, we have been able to keep the border mechanisms simple, | So far, we have been able to keep the border mechanisms simple, | |||
| despite having had to harden them against some subtle attacks on the | despite having had to harden them against some subtle attacks on the | |||
| re-ECN design. The mechanisms are still passive and avoid per-flow | re-ECN design. The mechanisms are still passive and avoid per-flow | |||
| processing. | processing. | |||
| The basic accounting mechanism at each border interface simply | The basic accounting mechanism at each border interface simply | |||
| involves accumulating the volume of packets with positive worth (Re- | involves accumulating the volume of packets with positive worth (Re- | |||
| Echo and FNE), and subtracting the volume of those with negative | Echo and FNE), and subtracting the volume of those with negative | |||
| worth: CE(-1). Even though this mechanism takes no regard of flows, | worth: CE(-1). Even though this mechanism takes no regard of flows, | |||
| skipping to change at page 50, line 33 | skipping to change at page 52, line 18 | |||
| tend to be dropped before others if routers use the preferential drop | tend to be dropped before others if routers use the preferential drop | |||
| rules in Section 5.3, which discriminate against non-positive | rules in Section 5.3, which discriminate against non-positive | |||
| packets. All networks below the point where a flow goes negative | packets. All networks below the point where a flow goes negative | |||
| (N1, N2 and N4 in this case) have an incentive to remove this flow, | (N1, N2 and N4 in this case) have an incentive to remove this flow, | |||
| but the router where it first goes negative (in N1) can of course | but the router where it first goes negative (in N1) can of course | |||
| remove the problem for everyone downstream. | remove the problem for everyone downstream. | |||
| In the case of DDoS attacks, Section 6.2.1 describes how re-ECN | In the case of DDoS attacks, Section 6.2.1 describes how re-ECN | |||
| mitigates their force. | mitigates their force. | |||
| Note that the guiding principle behind all the above discussion is | ||||
| that any gain from subverting the protocol should be precisely | ||||
| neutralised, rather than punished. If a gain is punished to a | ||||
| greater extent than is sufficient to neutralise it, it will most | ||||
| likely open up a new vulnerability, where the amplifying effect of | ||||
| the punishment mechanism can be turned on others. | ||||
| For instance, if possible, flows should be removed as soon as they go | ||||
| negative, but we do NOT RECOMMEND any attempts to discard such flows | ||||
| further upstream while they are still positive. Such over-zealous | ||||
| push-back is unnecessary and potentially dangerous. These flows have | ||||
| paid their `fare' up to the point they go negative, so there is no | ||||
| harm in delivering them that far. If someone downstream asks for a | ||||
| flow to be dropped as near to the source as possible, because they | ||||
| say it is going to become negative later, an upstream node cannot | ||||
| test the truth of this assertion. Rather than have to authenticate | ||||
| such messages, re-ECN has been designed so that flows can be dropped | ||||
| solely based on locally measurable evidence. A message hinting that | ||||
| a flow should be watched closely to test for negativity is fine. But | ||||
| not a message that claims that a positive flow will go negative | ||||
| later, so it should be dropped. . | ||||
| 6.1.7. Inter-domain Fail-safes | 6.1.7. Inter-domain Fail-safes | |||
| The mechanisms described so far create incentives for rational | The mechanisms described so far create incentives for rational | |||
| network operators to behave. That is, one operator aims to make | network operators to behave. That is, one operator aims to make | |||
| another behave responsibly by applying penalties and expects a | another behave responsibly by applying penalties and expects a | |||
| rational response (i.e. one that trades off costs against benefits). | rational response (i.e. one that trades off costs against benefits). | |||
| It is usually reasonable to assume that other network operators will | It is usually reasonable to assume that other network operators will | |||
| behave rationally (policy routing can avoid those that might not). | behave rationally (policy routing can avoid those that might not). | |||
| But this approach does not protect against the misconfigurations and | But this approach does not protect against the misconfigurations and | |||
| accidents of other operators. | accidents of other operators. | |||
| skipping to change at page 56, line 36 | skipping to change at page 57, line 47 | |||
| * ECN `only' gives a performance improvement. Making a product a | * ECN `only' gives a performance improvement. Making a product a | |||
| bit faster (whether the product is a device or a network), | bit faster (whether the product is a device or a network), | |||
| isn't usually a sufficient selling point to be worth the cost | isn't usually a sufficient selling point to be worth the cost | |||
| of co-ordinating across the industry to deploy it. Network | of co-ordinating across the industry to deploy it. Network | |||
| operators tend to avoid re-configuring a working network unless | operators tend to avoid re-configuring a working network unless | |||
| launching a new product. | launching a new product. | |||
| ECN and re-ECN for Edge-to-edge Assured QoS: | ECN and re-ECN for Edge-to-edge Assured QoS: | |||
| We believe the proposal to provide assured QoS sessions using a | We believe the proposal to provide assured QoS sessions using a | |||
| form of ECN called pre-congestion notification (PCN) [CL-deploy] | form of ECN called pre-congestion notification (PCN) [PCN-arch] is | |||
| is most likely to break the deadlock in ECN deployment first. It | most likely to break the deadlock in ECN deployment first. It | |||
| only requires edge-to-edge deployment so it does not require | only requires edge-to-edge deployment so it does not require | |||
| endpoint support. It can be deployed in a single network, then | endpoint support. It can be deployed in a single network, then | |||
| grow incrementally to interconnected networks. And it provides a | grow incrementally to interconnected networks. And it provides a | |||
| different `product' (internetworked assured QoS), rather than | different `product' (internetworked assured QoS), rather than | |||
| merely making an existing product a bit faster. | merely making an existing product a bit faster. | |||
| Not only could this assured QoS application kick-start ECN | Not only could this assured QoS application kick-start ECN | |||
| deployment, it could also carry re-ECN deployment with it; because | deployment, it could also carry re-ECN deployment with it; because | |||
| re-ECN can enable the assured QoS region to expand to a large | re-ECN can enable the assured QoS region to expand to a large | |||
| internetwork where neighbouring networks do not trust each other. | internetwork where neighbouring networks do not trust each other. | |||
| skipping to change at page 63, line 5 | skipping to change at page 64, line 15 | |||
| to the higher layer and hide how the lower layer does it. However, | to the higher layer and hide how the lower layer does it. However, | |||
| ECN reveals the state of the network layer and below to the transport | ECN reveals the state of the network layer and below to the transport | |||
| layer. A more positive way to describe ECN is that it is like the | layer. A more positive way to describe ECN is that it is like the | |||
| return value of a function call to the network layer. It explicitly | return value of a function call to the network layer. It explicitly | |||
| returns the status of the request to deliver a packet, by returning a | returns the status of the request to deliver a packet, by returning a | |||
| value representing the current risk that a packet will not be served. | value representing the current risk that a packet will not be served. | |||
| Re-ECN has similar semantics, except the transport layer must try to | Re-ECN has similar semantics, except the transport layer must try to | |||
| guess the return value, then it can use the actual return value from | guess the return value, then it can use the actual return value from | |||
| the network layer to modify the next guess. | the network layer to modify the next guess. | |||
| The guiding principle behind all the discussion in Section 6.1.6 on | ||||
| Policing is that any gain from subverting the protocol should be | ||||
| precisely neutralised, rather than punished. If a gain is punished | ||||
| to a greater extent than is sufficient to neutralise it, it will most | ||||
| likely open up a new vulnerability, where the amplifying effect of | ||||
| the punishment mechanism can be turned on others. | ||||
| For instance, if possible, flows should be removed as soon as they go | ||||
| negative, but we do NOT RECOMMEND any attempts to discard such flows | ||||
| further upstream while they are still positive. Such over-zealous | ||||
| push-back is unnecessary and potentially dangerous. These flows have | ||||
| paid their `fare' up to the point they go negative, so there is no | ||||
| harm in delivering them that far. If someone downstream asks for a | ||||
| flow to be dropped as near to the source as possible, because they | ||||
| say it is going to become negative later, an upstream node cannot | ||||
| test the truth of this assertion. Rather than have to authenticate | ||||
| such messages, re-ECN has been designed so that flows can be dropped | ||||
| solely based on locally measurable evidence. A message hinting that | ||||
| a flow should be watched closely to test for negativity is fine. But | ||||
| not a message that claims that a positive flow will go negative | ||||
| later, so it should be dropped. . | ||||
| 9. Related Work | 9. Related Work | |||
| {Due to lack of time, this section is incomplete. The reader is | {Due to lack of time, this section is incomplete. The reader is | |||
| referred to the Related Work section of [Re-fb] for a brief selection | referred to the Related Work section of [Re-fb] for a brief selection | |||
| of related ideas.} | of related ideas.} | |||
| 9.1. Policing Rate Response to Congestion | 9.1. Policing Rate Response to Congestion | |||
| ATM network elements send congestion back-pressure | ATM network elements send congestion back-pressure | |||
| messages [ITU-T.I.371] along each connection, duplicating any end to | messages [ITU-T.I.371] along each connection, duplicating any end to | |||
| skipping to change at page 63, line 52 | skipping to change at page 65, line 37 | |||
| 9.2. Congestion Notification Integrity | 9.2. Congestion Notification Integrity | |||
| The choice of two ECT code-points in the ECN field [RFC3168] | The choice of two ECT code-points in the ECN field [RFC3168] | |||
| permitted future flexibility, optionally allowing the sender to | permitted future flexibility, optionally allowing the sender to | |||
| encode the experimental ECN nonce [RFC3540] in the packet stream. | encode the experimental ECN nonce [RFC3540] in the packet stream. | |||
| This mechanism has since been included in the specifications of DCCP | This mechanism has since been included in the specifications of DCCP | |||
| [RFC4340]. | [RFC4340]. | |||
| The ECN nonce is an elegant scheme that allows the sender to detect | The ECN nonce is an elegant scheme that allows the sender to detect | |||
| if someone in the feedback loop - the receiver especially - tries to | if someone in the feedback loop - the receiver especially - tries to | |||
| claim no congestion was experienced when in fact congestion lead to | claim no congestion was experienced when in fact congestion led to | |||
| packet drops or ECN marks. For each packet it sends, the sender | packet drops or ECN marks. For each packet it sends, the sender | |||
| chooses between the two ECT codepoints in a pseudo-random sequence. | chooses between the two ECT codepoints in a pseudo-random sequence. | |||
| Then, whenever the network marks a packet with CE, if the receiver | Then, whenever the network marks a packet with CE, if the receiver | |||
| wants to deny congestion happened, she has to guess which ECT | wants to deny congestion happened, she has to guess which ECT | |||
| codepoint was overwritten. She has only a 50:50 chance of being | codepoint was overwritten. She has only a 50:50 chance of being | |||
| correct each time she denies a congestion mark or a drop, which | correct each time she denies a congestion mark or a drop, which | |||
| ultimately will give her away. | ultimately will give her away. | |||
| The purpose of a network-layer nonce has to be the protection of the | The purpose of a network-layer nonce should primarily be protection | |||
| network in the first place, while a transport-layer nonce had better | of the network, while a transport-layer nonce would be better used to | |||
| be used to protect the sender from cheating receivers. Now, the | protect the sender from cheating receivers. Now, the assumption | |||
| assumption behind the ECN nonce is that a sender will want to detect | behind the ECN nonce is that a sender will want to detect whether a | |||
| whether a receiver is suppressing congestion feedback. This is only | receiver is suppressing congestion feedback. This is only true if | |||
| true if the sender's interests are aligned with the network's, or | the sender's interests are aligned with the network's, or with the | |||
| with the community of users as a whole. This may be true for certain | community of users as a whole. This may be true for certain large | |||
| large senders, who are under close scrutiny and have a reputation to | senders, who are under close scrutiny and have a reputation to | |||
| maintain. But we have to deal with a more hostile world, where | maintain. But we have to deal with a more hostile world, where | |||
| traffic may be dominated by peer-to-peer transfers, rather than | traffic may be dominated by peer-to-peer transfers, rather than | |||
| downloads from a few popular sites. Often the `natural' self- | downloads from a few popular sites. Often the `natural' self- | |||
| interest of a sender is not aligned with the interests of other | interest of a sender is not aligned with the interests of other | |||
| users. It often wishes to transfer data quickly to the receiver as | users. It often wishes to transfer data quickly to the receiver as | |||
| much as the receiver wants the data quickly. | much as the receiver wants the data quickly. | |||
| In contrast, the re-ECN protocol enables policing of an agreed rate- | In contrast, the re-ECN protocol enables policing of an agreed rate- | |||
| response to congestion (e.g. TCP-friendliness) at the sender's | response to congestion (e.g. TCP-friendliness) at the sender's | |||
| interface with the internetwork. It also ensures downstream networks | interface with the internetwork. It also ensures downstream networks | |||
| skipping to change at page 66, line 16 | skipping to change at page 67, line 49 | |||
| rather wastefully to encode just five states. In effect the RE flag | rather wastefully to encode just five states. In effect the RE flag | |||
| has been used as an orthogonal single bit, using up four codepoints | has been used as an orthogonal single bit, using up four codepoints | |||
| to encode the three states of positive, neutral and negative worth. | to encode the three states of positive, neutral and negative worth. | |||
| The mapping of the codepoints in an earlier version of this proposal | The mapping of the codepoints in an earlier version of this proposal | |||
| used the codepoint space more efficiently, but the scheme became | used the codepoint space more efficiently, but the scheme became | |||
| vulnerable to network operators bypassing congestion penalties by | vulnerable to network operators bypassing congestion penalties by | |||
| focusing congestion marking on positive packets. Appendix B explains | focusing congestion marking on positive packets. Appendix B explains | |||
| why fixing that problem while allowing for incremental deployment, | why fixing that problem while allowing for incremental deployment, | |||
| would have used another codepoint anyway. So it was better to use | would have used another codepoint anyway. So it was better to use | |||
| this orthogonal encoding scheme, which greatly simplified the whole | this orthogonal encoding scheme, which greatly simplified the whole | |||
| protocol and brought with it some subtle security benefits. | protocol and brought with it some subtle security benefits (see the | |||
| last paragraph of Appendix B). | ||||
| With the scheme as now proposed, once the RE flag is set or cleared | With the scheme as now proposed, once the RE flag is set or cleared | |||
| by the sender or its proxy, it should not be written by the network, | by the sender or its proxy, it should not be written by the network, | |||
| only read. So the gateways can detect if any network maliciously | only read. So the endpoints can detect if any network maliciously | |||
| alters the RE flag. IPSec AH integrity checking does not cover the | alters the RE flag. IPSec AH integrity checking does not cover the | |||
| IPv4 option flags (they were considered mutable---even the one we | IPv4 option flags (they were considered mutable---even the one we | |||
| propose using for the RE flag that was `currently unused' when IPSec | propose using for the RE flag that was `currently unused' when IPSec | |||
| was defined). But it would be sufficient for a pair of gateways to | was defined). But it would be sufficient for a pair of endpoints to | |||
| make random checks on whether the RE flag was the same when it | make random checks on whether the RE flag was the same when it | |||
| reached the egress gateway as when it left the ingress. Indeed, if | reached the egress as when it left the ingress. Indeed, if IPSec AH | |||
| IPSec AH had covered the RE flag, any network intending to alter | had covered the RE flag, any network intending to alter sufficient RE | |||
| sufficient RE flags to make a gain would have focused its alterations | flags to make a gain would have focused its alterations on packets | |||
| on packets without authenticating headers (AHs). | without authenticating headers (AHs). | |||
| The security of re-ECN has been deliberately designed to not rely on | The security of re-ECN has been deliberately designed to not rely on | |||
| cryptography. | cryptography. | |||
| 11. IANA Considerations | 11. IANA Considerations | |||
| This memo includes no request to IANA (yet). | This memo includes no request to IANA (yet). | |||
| If this memo was to progress to standards track, it would list: | If this memo was to progress to standards track, it would list: | |||
| skipping to change at page 68, line 42 | skipping to change at page 70, line 28 | |||
| Internet to Support Real-Time Content Supply from a Large | Internet to Support Real-Time Content Supply from a Large | |||
| Fraction of Broadband Residential Users", BT Technology | Fraction of Broadband Residential Users", BT Technology | |||
| Journal (BTTJ) 23(2), April 2005. | Journal (BTTJ) 23(2), April 2005. | |||
| [Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the | [Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the | |||
| assumptions underlying mechanism design for the Internet", | assumptions underlying mechanism design for the Internet", | |||
| Proc. Workshop on the Economics of Networked Systems | Proc. Workshop on the Economics of Networked Systems | |||
| (NetEcon06) , June 2006, <http://www.cs.duke.edu/nicl/ | (NetEcon06) , June 2006, <http://www.cs.duke.edu/nicl/ | |||
| netecon06/papers/ne06-assessing.pdf>. | netecon06/papers/ne06-assessing.pdf>. | |||
| [CL-deploy] | ||||
| Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F., | ||||
| Charny, A., Babiarz, J., Chan, K., Westberg, L., Bader, | ||||
| A., and G. Karagiannis, "A Deployment Model for Admission | ||||
| Control over DiffServ using Pre-Congestion Notification", | ||||
| draft-briscoe-tsvwg-cl-architecture-03 (work in progress), | ||||
| June 2006. | ||||
| [CLoop_pol] | [CLoop_pol] | |||
| Salvatori, A., "Closed Loop Traffic Policing", Politecnico | Salvatori, A., "Closed Loop Traffic Policing", Politecnico | |||
| Torino and Institut Eurecom Masters Thesis , | Torino and Institut Eurecom Masters Thesis , | |||
| September 2005. | September 2005. | |||
| [ECN-Deploy] | [ECN-Deploy] | |||
| Floyd, S., "ECN (Explicit Congestion Notification) in | Floyd, S., "ECN (Explicit Congestion Notification) in | |||
| TCP/IP; Implementation and Deployment of ECN", Web-page , | TCP/IP; Implementation and Deployment of ECN", Web-page , | |||
| May 2004, | May 2004, | |||
| <http://www.icir.org/floyd/ecn.html#implementations>. | <http://www.icir.org/floyd/ecn.html#implementations>. | |||
| [ECN-MPLS] | [ECN-MPLS] | |||
| Bruce, B., Briscoe, B., and J. Tay, "Explicit Congestion | Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion | |||
| Marking in MPLS", draft-davie-ecn-mpls-00 (work in | Marking in MPLS", draft-ietf-tsvwg-ecn-mpls-01 (work in | |||
| progress), June 2006. | progress), June 2007. | |||
| [ECN-tunnel] | ||||
| Briscoe, B., "Layered Encapsulation of Congestion | ||||
| Notification", draft-briscoe-tsvwg-ecn-tunnel-00 (work in | ||||
| progress), July 2007. | ||||
| [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the | [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the | |||
| evolution of congestion control", Automatica 35(12)1969-- | evolution of congestion control", Automatica 35(12)1969-- | |||
| 1985, December 1999, | 1985, December 1999, | |||
| <http://www.statslab.cam.ac.uk/~frank/evol.html>. | <http://www.statslab.cam.ac.uk/~frank/evol.html>. | |||
| [I-D.ietf-tsvwg-ecnsyn] | [I-D.ietf-tcpm-ecnsyn] | |||
| Kuzmanovic, A., "Adding Explicit Congestion Notification | Kuzmanovic, A., "Adding Explicit Congestion Notification | |||
| (ECN) Capability to TCP's SYN/ACK Packets", | (ECN) Capability to TCP's SYN/ACK Packets", | |||
| draft-ietf-tsvwg-ecnsyn-00 (work in progress), | draft-ietf-tcpm-ecnsyn-01 (work in progress), | |||
| November 2005. | October 2006. | |||
| [I-D.moncaster-tcpm-rcv-cheat] | ||||
| Moncaster, T., "A TCP Test to Allow Senders to Identify | ||||
| Receiver Non-Compliance", | ||||
| draft-moncaster-tcpm-rcv-cheat-01 (work in progress), | ||||
| June 2007. | ||||
| [ITU-T.I.371] | [ITU-T.I.371] | |||
| ITU-T, "Traffic Control and Congestion Control in | ITU-T, "Traffic Control and Congestion Control in | |||
| {B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004. | {B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004. | |||
| [Jiang02] Jiang, H. and D. Dovrolis, "The Macroscopic Behavior of | [Jiang02] Jiang, H. and D. Dovrolis, "The Macroscopic Behavior of | |||
| the TCP Congestion Avoidance Algorithm", ACM SIGCOMM | the TCP Congestion Avoidance Algorithm", ACM SIGCOMM | |||
| CCR 32(3)75-88, July 2002, | CCR 32(3)75-88, July 2002, | |||
| <http://doi.acm.org/10.1145/571697.571725>. | <http://doi.acm.org/10.1145/571697.571725>. | |||
| [Mathis97] | [Mathis97] | |||
| Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The | Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The | |||
| Macroscopic Behavior of the TCP Congestion Avoidance | Macroscopic Behavior of the TCP Congestion Avoidance | |||
| Algorithm", ACM SIGCOMM CCR 27(3)67--82, July 1997, | Algorithm", ACM SIGCOMM CCR 27(3)67--82, July 1997, | |||
| <http://doi.acm.org/10.1145/263932.264023>. | <http://doi.acm.org/10.1145/263932.264023>. | |||
| [PCN-arch] | ||||
| Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R., | ||||
| Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion | ||||
| Notification Architecture", | ||||
| draft-eardley-pcn-architecture-00 (work in progress), | ||||
| June 2007. | ||||
| [Purple] Pletka, R., Waldvogel, M., and S. Mannal, "PURPLE: | [Purple] Pletka, R., Waldvogel, M., and S. Mannal, "PURPLE: | |||
| Predictive Active Queue Management Utilizing Congestion | Predictive Active Queue Management Utilizing Congestion | |||
| Information", Proc. Local Computer Networks (LCN 2003) , | Information", Proc. Local Computer Networks (LCN 2003) , | |||
| October 2003. | October 2003. | |||
| [RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell, | [RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell, | |||
| M., Romanow, A., Weinrib, A., and L. Zhang, "Resource | M., Romanow, A., Weinrib, A., and L. Zhang, "Resource | |||
| ReSerVation Protocol (RSVP) Version 1 Applicability | ReSerVation Protocol (RSVP) Version 1 Applicability | |||
| Statement Some Guidelines on Deployment", RFC 2208, | Statement Some Guidelines on Deployment", RFC 2208, | |||
| September 1997. | September 1997. | |||
| skipping to change at page 70, line 33 | skipping to change at page 72, line 30 | |||
| RFC 3514, April 2003. | RFC 3514, April 2003. | |||
| [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | |||
| Congestion Notification (ECN) Signaling with Nonces", | Congestion Notification (ECN) Signaling with Nonces", | |||
| RFC 3540, June 2003. | RFC 3540, June 2003. | |||
| [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | |||
| Control for Voice Traffic in the Internet", RFC 3714, | Control for Voice Traffic in the Internet", RFC 3714, | |||
| March 2004. | March 2004. | |||
| [RFC4301] Kent, S. and K. Seo, "Security Architecture for the | ||||
| Internet Protocol", RFC 4301, December 2005. | ||||
| [Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN | [Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN | |||
| on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01 | on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01 | |||
| (work in progress), March 2006. | (work in progress), March 2006. | |||
| [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., | [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., | |||
| Salvatori, A., Soppera, A., and M. Koyabe, "Policing | Salvatori, A., Soppera, A., and M. Koyabe, "Policing | |||
| Congestion Response in an Internetwork Using Re-Feedback", | Congestion Response in an Internetwork Using Re-Feedback", | |||
| ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// | ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// | |||
| www.acm.org/sigs/sigcomm/sigcomm2005/ | www.acm.org/sigs/sigcomm/sigcomm2005/ | |||
| techprog.html#session8>. | techprog.html#session8>. | |||
| [Savage99] | ||||
| Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, | ||||
| "TCP congestion control with a misbehaving receiver", ACM | ||||
| SIGCOMM CCR 29(5), October 1999, | ||||
| <http://citeseer.ist.psu.edu/savage99tcp.html>. | ||||
| [Smart_rtg] | [Smart_rtg] | |||
| Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang, | Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang, | |||
| "Optimizing Cost and Performance for Multihoming", ACM | "Optimizing Cost and Performance for Multihoming", ACM | |||
| SIGCOMM CCR 34(4)79--92, October 2004, | SIGCOMM CCR 34(4)79--92, October 2004, | |||
| <http://citeseer.ist.psu.edu/698472.html>. | <http://citeseer.ist.psu.edu/698472.html>. | |||
| [Steps_DoS] | [Steps_DoS] | |||
| Handley, M. and A. Greenhalgh, "Steps towards a DoS- | Handley, M. and A. Greenhalgh, "Steps towards a DoS- | |||
| resistant Internet Architecture", Proc. ACM SIGCOMM | resistant Internet Architecture", Proc. ACM SIGCOMM | |||
| workshop on Future directions in network architecture | workshop on Future directions in network architecture | |||
| skipping to change at page 75, line 38 | skipping to change at page 77, line 43 | |||
| Appendix E. Example Egress Dropper Algorithm | Appendix E. Example Egress Dropper Algorithm | |||
| {ToDo: Write up the basic algorithm with flow state, then the | {ToDo: Write up the basic algorithm with flow state, then the | |||
| aggregated one.} | aggregated one.} | |||
| Appendix F. Re-TTL | Appendix F. Re-TTL | |||
| This Appendix gives an overview of a proposal to be able to overload | This Appendix gives an overview of a proposal to be able to overload | |||
| the TTL field in the IP header to monitor downstream propagation | the TTL field in the IP header to monitor downstream propagation | |||
| delay. It is planned to fully write up this proposal in a future | delay. This is included to show that it would be possible to take | |||
| Internet Draft. | account of RTT if it was deemed desirable. | |||
| Delay re-feedback can be achieved by overloading the TTL field, | Delay re-feedback can be achieved by overloading the TTL field, | |||
| without changing IP or router TTL processing. A target value for TTL | without changing IP or router TTL processing. A target value for TTL | |||
| at the destination would need standardising, say 16. If the path hop | at the destination would need standardising, say 16. If the path hop | |||
| count increased by more than 16 during a routing change, it would | count increased by more than 16 during a routing change, it would | |||
| temporarily be mistaken for a routing loop, so this target would need | temporarily be mistaken for a routing loop, so this target would need | |||
| to be chosen to exceed typical hop count increases. The TCP wire | to be chosen to exceed typical hop count increases. The TCP wire | |||
| protocol and handlers would need modifying to feed back the | protocol and handlers would need modifying to feed back the | |||
| destination TTL and initialise it. It would be necessary to | destination TTL and initialise it. It would be necessary to | |||
| standardise the unit of TTL in terms of real time (as was the | standardise the unit of TTL in terms of real time (as was the | |||
| skipping to change at page 77, line 38 | skipping to change at page 79, line 43 | |||
| o r = C_FNE/T_FNE | o r = C_FNE/T_FNE | |||
| o b_max = b_0 | o b_max = b_0 | |||
| T_FNE should be a much shorter period than T_user: for instance T_FNE | T_FNE should be a much shorter period than T_user: for instance T_FNE | |||
| could be in the order of minutes while T_user could be in order of | could be in the order of minutes while T_user could be in order of | |||
| weeks. | weeks. | |||
| G.2. Per-flow Rate Policing | G.2. Per-flow Rate Policing | |||
| Per-flow policing aims to enforce congestion responsiveness on the | Whilst we believe that simple per-user policing would be sufficient | |||
| shortest information timescale on a network path: packet roundtrips. | to ensure senders comply with congestion control, some operators may | |||
| wish to police the rate response of each flow to congestion as well. | ||||
| Although we do not believe this will be neceesary, we include this | ||||
| section to show how one could perform per-flow policing using | ||||
| enforcement of TCP-fairness as an example. Per-flow policing aims to | ||||
| enforce congestion responsiveness on the shortest information | ||||
| timescale on a network path: packet roundtrips. | ||||
| This again requires that the appropriate terms be agreed between a | This again requires that the appropriate terms be agreed between a | |||
| network operator and its users, where a congestion responsiveness | network operator and its users, where a congestion responsiveness | |||
| policy might be required for the use of a given network service | policy might be required for the use of a given network service | |||
| (perhaps unless the user specifically requests otherwise). | (perhaps unless the user specifically requests otherwise). | |||
| As an example, we describe below how a rate adaptation policer can be | As an example, we describe below how a rate adaptation policer can be | |||
| designed when the applicable rate adaptation policy is TCP- | designed when the applicable rate adaptation policy is TCP- | |||
| compliance. In that context, the average throughput of a flow will | compliance. In that context, the average throughput of a flow will | |||
| be expected to be bounded by the value of the TCP throughput during | be expected to be bounded by the value of the TCP throughput during | |||
| congestion avoidance, given n Mathis' formula [Mathis97] | congestion avoidance, given in Mathis' formula [Mathis97] | |||
| x_TCP = k * s / ( T * sqrt(m) ) | x_TCP = k * s / ( T * sqrt(m) ) | |||
| where: | where: | |||
| o x_TCP is the throughput of the TCP flow in packets per second, | o x_TCP is the throughput of the TCP flow in packets per second, | |||
| o k is a constant upper-bounded by sqrt(3/2), | o k is a constant upper-bounded by sqrt(3/2), | |||
| o s is the average packet size of the flow, | o s is the average packet size of the flow, | |||
| skipping to change at page 81, line 8 | skipping to change at page 83, line 20 | |||
| H.2. Inflation Factor for Persistently Negative Flows | H.2. Inflation Factor for Persistently Negative Flows | |||
| The following process is suggested to complement the simple algorithm | The following process is suggested to complement the simple algorithm | |||
| above in order to protect against the various attacks from | above in order to protect against the various attacks from | |||
| persistently negative flows described in Section 6.1.6. As explained | persistently negative flows described in Section 6.1.6. As explained | |||
| in that section, the most important and first step is to estimate the | in that section, the most important and first step is to estimate the | |||
| contribution of persistently negative flows to the bulk volume of | contribution of persistently negative flows to the bulk volume of | |||
| downstream pre-congestion and to inflate this bulk volume as if these | downstream pre-congestion and to inflate this bulk volume as if these | |||
| flows weren't there. The process below has been designed to give an | flows weren't there. The process below has been designed to give an | |||
| unboased estimate, but it may be possible to define other processes | unbiased estimate, but it may be possible to define other processes | |||
| that achieve similar ends. | that achieve similar ends. | |||
| While the above simple metering algorithm is counting the bulk of | While the above simple metering algorithm is counting the bulk of | |||
| traffic over an accounting period, the meter should also select a | traffic over an accounting period, the meter should also select a | |||
| subset of the whole flow ID space that is small enough to be able to | subset of the whole flow ID space that is small enough to be able to | |||
| realistically measure but large enough to give a realistic sample. | realistically measure but large enough to give a realistic sample. | |||
| Many different samples of different subsets of the ID space should be | Many different samples of different subsets of the ID space should be | |||
| taken at different times during the accounting period, preferably | taken at different times during the accounting period, preferably | |||
| covering the whole ID space. During each sample, the meter should | covering the whole ID space. During each sample, the meter should | |||
| count the volume of positive packets and subtract the volume of | count the volume of positive packets and subtract the volume of | |||
| skipping to change at page 81, line 45 | skipping to change at page 84, line 13 | |||
| by the effect of persistently negative flows. | by the effect of persistently negative flows. | |||
| Appendix I. Argument for holding back the ECN nonce | Appendix I. Argument for holding back the ECN nonce | |||
| The ECN nonce is a mechanism that allows a /sending/ transport to | The ECN nonce is a mechanism that allows a /sending/ transport to | |||
| detect if drop or ECN marking at a congested router has been | detect if drop or ECN marking at a congested router has been | |||
| suppressed by a node somewhere in the feedback loop---another router | suppressed by a node somewhere in the feedback loop---another router | |||
| or the receiver. | or the receiver. | |||
| Space for the ECN nonce was set aside in [RFC3168] (currently | Space for the ECN nonce was set aside in [RFC3168] (currently | |||
| proposed standard) while the full nonce mechanism is specified in RFC | proposed standard) while the full nonce mechanism is specified in | |||
| 3540 (currently experimental). The specifications for [RFC4340] | [RFC3540] (currently experimental). The specifications for [RFC4340] | |||
| (currently proposed standard) requires that "Each DCCP sender SHOULD | (currently proposed standard) requires that "Each DCCP sender SHOULD | |||
| set ECN Nonces on its packets...". It also mandates as a requirement | set ECN Nonces on its packets...". It also mandates as a requirement | |||
| for all CCID profiles that "Any newly defined acknowledgement | for all CCID profiles that "Any newly defined acknowledgement | |||
| mechanism MUST include a way to transmit ECN Nonce Echoes back to the | mechanism MUST include a way to transmit ECN Nonce Echoes back to the | |||
| sender.", therefore: | sender.", therefore: | |||
| o The CCID profile for TCP-like Congestion Control [RFC4341] | o The CCID profile for TCP-like Congestion Control [RFC4341] | |||
| (currently proposed standard) says "The sender will use the ECN | (currently proposed standard) says "The sender will use the ECN | |||
| Nonce for data packets, and the receiver will echo those nonces in | Nonce for data packets, and the receiver will echo those nonces in | |||
| its Ack Vectors." | its Ack Vectors." | |||
| o The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342] | o The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342] | |||
| recommends that "The sender [use] Loss Intervals options' ECN | recommends that "The sender [use] Loss Intervals options' ECN | |||
| Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to | Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to | |||
| probabilistically verify that the receiver is correctly reporting | probabilistically verify that the receiver is correctly reporting | |||
| all dropped or marked packets." | all dropped or marked packets." | |||
| The ECN nonce is used for three types of functions: | The primary function of the ECN nonce is to protect the integrity of | |||
| the information about congestion: ECN marks and packet drops. | ||||
| o if the sender wants to ensure the integrity of the information | ||||
| about packet drops, | ||||
| o if the sending transport chooses to act in the interests of a | ||||
| congested router, | ||||
| o if the sending transport wants to allocate its own resources in | ||||
| proportion to the rates that each network path can sustain, based | ||||
| on congestion control. | ||||
| However, when the nonce is used to protect the integrity of | However, when the nonce is used to protect the integrity of | |||
| information about packet drops, rather than ECN marks, a transport | information about packet drops, rather than ECN marks, a transport | |||
| layer nonce will always be sufficient (because a drop loses the | layer nonce will always be sufficient (because a drop loses the | |||
| transport header as well as the ECN field in the network header), | transport header as well as the ECN field in the network header), | |||
| which would avoid using scarce IP header codepoint space. Similarly, | which would avoid using scarce IP header codepoint space. Similarly, | |||
| a transport layer nonce would protect against a receiver sending | a transport layer nonce would protect against a receiver sending | |||
| early acknowledgements. | early acknowledgements [Savage99]. | |||
| The other two functions need the ECN nonce to be in the network | If the ECN nonce reveals integrity problems with the information | |||
| layer, but both require rather optimistic trust assumptions in order | about congestion, the sending transport can use that knowledge for | |||
| to be useful. If the sending transport chooses to act in the | two functions: | |||
| interests of a congested router, it can reduce its rate if it detects | ||||
| some malicious party in the feedback loop may be suppressing ECN | o to protect its own resources, by allocating them in proportion to | |||
| feedback. But it would only be useful to a router when /all/ senders | the rates that each network path can sustain, based on congestion | |||
| using the router are trusted to act in the router's interest. | control, | |||
| o and to protect congested routers in the network, by slowing down | ||||
| drastically its connection to the destination with corrupt | ||||
| congestion information. | ||||
| If the sending transport chooses to act in the interests of congested | ||||
| routers, it can reduce its rate if it detects some malicious party in | ||||
| the feedback loop may be suppressing ECN feedback. But it would only | ||||
| be useful to congested routers when /all/ senders using them are | ||||
| trusted to act in interest of the congested routers. | ||||
| In the end, the only essential use of a network layer nonce is when | In the end, the only essential use of a network layer nonce is when | |||
| sending transports (e.g. large servers) want to allocate their /own/ | sending transports (e.g. large servers) want to allocate their /own/ | |||
| resources in proportion to the rates that each network path can | resources in proportion to the rates that each network path can | |||
| sustain, based on congestion control. In that case, the nonce allows | sustain, based on congestion control. In that case, the nonce allows | |||
| senders to be assured that they aren't being duped into giving more | senders to be assured that they aren't being duped into giving more | |||
| of their own resources to a particular flow. And if congestion | of their own resources to a particular flow. And if congestion | |||
| suppression is detected, the sending transport can rate limit the | suppression is detected, the sending transport can rate limit the | |||
| offending connection to protect its own resources. Certainly, this | offending connection to protect its own resources. Certainly, this | |||
| is a useful function, but the IETF should carefully decide whether | is a useful function, but the IETF should carefully decide whether | |||
| skipping to change at page 83, line 17 | skipping to change at page 85, line 31 | |||
| In contrast, re-ECN allows all routers to fully protect themselves | In contrast, re-ECN allows all routers to fully protect themselves | |||
| from such attacks, without having to trust anyone - senders, | from such attacks, without having to trust anyone - senders, | |||
| receivers, neighbouring networks. Re-ECN is therefore proposed in | receivers, neighbouring networks. Re-ECN is therefore proposed in | |||
| preference to the ECN nonce on the basis that it addresses the | preference to the ECN nonce on the basis that it addresses the | |||
| generic problem of accountability for congestion of a network's | generic problem of accountability for congestion of a network's | |||
| resources at the IP layer. | resources at the IP layer. | |||
| Delaying the ECN nonce is justified because the applicability of the | Delaying the ECN nonce is justified because the applicability of the | |||
| ECN nonce seems too limited for it to consume a two-bit codepoint in | ECN nonce seems too limited for it to consume a two-bit codepoint in | |||
| the IP header. | the IP header. It therefore seems prudent to give time for an | |||
| alternative way to be found to do the one function the nonce is | ||||
| essential for. | ||||
| Moreover, while we have re-designed the re-ECN codepoints so that | Moreover, while we have re-designed the re-ECN codepoints so that | |||
| they do not prevent the ECN nonce progressing, the same is not true | they do not prevent the ECN nonce progressing, the same is not true | |||
| the other way round. If the ECN nonce started to see some deployment | the other way round. If the ECN nonce started to see some deployment | |||
| (perhaps because it was blessed with proposed standard status), | (perhaps because it was blessed with proposed standard status), | |||
| incremental deployment of re-ECN would effectively be impossible, | incremental deployment of re-ECN would effectively be impossible, | |||
| because re-ECN marking fractions at inter-domain borders would be | because re-ECN marking fractions at inter-domain borders would be | |||
| polluted by unknown levels of nonce traffic. | polluted by unknown levels of nonce traffic. | |||
| The authors are aware that re-ECN must prove it has the potential it | The authors are aware that re-ECN must prove it has the potential it | |||
| skipping to change at page 84, line 22 | skipping to change at page 86, line 36 | |||
| Email: arnaud.jacquet@bt.com | Email: arnaud.jacquet@bt.com | |||
| URI: | URI: | |||
| Alessandro Salvatori | Alessandro Salvatori | |||
| BT | BT | |||
| B54/77, Adastral Park | B54/77, Adastral Park | |||
| Martlesham Heath | Martlesham Heath | |||
| Ipswich IP5 3RE | Ipswich IP5 3RE | |||
| UK | UK | |||
| Email: sandr8@gmail.com | Email: alessandro.salvatori@gmail.com | |||
| Martin Koyabe | Martin Koyabe | |||
| BT | BT | |||
| B54/69, Adastral Park | PP2a Rigel House, Adastral Park | |||
| Martlesham Heath | Martlesham Heath | |||
| Ipswich IP5 3RE | Ipswich IP5 3RE | |||
| UK | UK | |||
| Phone: +44 1473 646923 | Phone: +44 1473 646923 | |||
| Email: martin.koyabe@bt.com | Email: martin.koyabe@bt.com | |||
| URI: | URI: | |||
| Toby Moncaster | ||||
| BT | ||||
| B54/70, Adastral Park | ||||
| Martlesham Heath | ||||
| Ipswich IP5 3RE | ||||
| UK | ||||
| Phone: +44 1473 648734 | ||||
| Email: toby.moncaster@bt.com | ||||
| Full Copyright Statement | Full Copyright Statement | |||
| Copyright (C) The Internet Society (2006). | Copyright (C) The IETF Trust (2007). | |||
| This document is subject to the rights, licenses and restrictions | This document is subject to the rights, licenses and restrictions | |||
| contained in BCP 78, and except as set forth therein, the authors | contained in BCP 78, and except as set forth therein, the authors | |||
| retain all their rights. | retain all their rights. | |||
| This document and the information contained herein are provided on an | This document and the information contained herein are provided on an | |||
| "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | |||
| OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND | |||
| ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS | |||
| INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF | |||
| INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | |||
| WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | |||
| Intellectual Property | Intellectual Property | |||
| The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
| Intellectual Property Rights or other rights that might be claimed to | Intellectual Property Rights or other rights that might be claimed to | |||
| pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
| this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
| might or might not be available; nor does it represent that it has | might or might not be available; nor does it represent that it has | |||
| made any independent effort to identify any such rights. Information | made any independent effort to identify any such rights. Information | |||
| skipping to change at page 85, line 45 | skipping to change at page 88, line 45 | |||
| such proprietary rights by implementers or users of this | such proprietary rights by implementers or users of this | |||
| specification can be obtained from the IETF on-line IPR repository at | specification can be obtained from the IETF on-line IPR repository at | |||
| http://www.ietf.org/ipr. | http://www.ietf.org/ipr. | |||
| The IETF invites any interested party to bring to its attention any | The IETF invites any interested party to bring to its attention any | |||
| copyrights, patents or patent applications, or other proprietary | copyrights, patents or patent applications, or other proprietary | |||
| rights that may cover technology that may be required to implement | rights that may cover technology that may be required to implement | |||
| this standard. Please address the information to the IETF at | this standard. Please address the information to the IETF at | |||
| ietf-ipr@ietf.org. | ietf-ipr@ietf.org. | |||
| Acknowledgment | Acknowledgments | |||
| Funding for the RFC Editor function is provided by the IETF | Funding for the RFC Editor function is provided by the IETF | |||
| Administrative Support Activity (IASA). | Administrative Support Activity (IASA). This document was produced | |||
| using xml2rfc v1.32 (of http://xml.resource.org/) from a source in | ||||
| RFC-2629 XML format. | ||||
| End of changes. 87 change blocks. | ||||
| 308 lines changed or deleted | 422 lines changed or added | |||
This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||