draft-briscoe-tsvwg-re-ecn-tcp-02.txt | draft-briscoe-tsvwg-re-ecn-tcp-03.txt | |||
---|---|---|---|---|
Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
Internet-Draft BT & UCL | Internet-Draft BT & UCL | |||
Expires: December 28, 2006 A. Jacquet | Intended status: Informational A. Jacquet | |||
A. Salvatori | Expires: April 26, 2007 A. Salvatori | |||
M. Koyabe | M. Koyabe | |||
BT | BT | |||
June 26, 2006 | October 23, 2006 | |||
Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | Re-ECN: Adding Accountability for Causing Congestion to TCP/IP | |||
draft-briscoe-tsvwg-re-ecn-tcp-02 | draft-briscoe-tsvwg-re-ecn-tcp-03 | |||
Status of this Memo | Status of this Memo | |||
By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
skipping to change at page 1, line 37 | skipping to change at page 1, line 37 | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
This Internet-Draft will expire on December 28, 2006. | This Internet-Draft will expire on April 26, 2007. | |||
Copyright Notice | Copyright Notice | |||
Copyright (C) The Internet Society (2006). | Copyright (C) The Internet Society (2006). | |||
Abstract | Abstract | |||
This document introduces a new protocol for explicit congestion | This document introduces a new protocol for explicit congestion | |||
notification (ECN), termed re-ECN, which can be deployed | notification (ECN), termed re-ECN, which can be deployed | |||
incrementally around unmodified routers. The protocol arranges an | incrementally around unmodified routers. The protocol arranges an | |||
skipping to change at page 2, line 27 | skipping to change at page 2, line 27 | |||
honestly. | honestly. | |||
Authors' Statement: Status (to be removed by the RFC Editor) | Authors' Statement: Status (to be removed by the RFC Editor) | |||
This document is posted as an Internet-Draft with the intent (at | This document is posted as an Internet-Draft with the intent (at | |||
least that of the authors) to eventually progress to standards track. | least that of the authors) to eventually progress to standards track. | |||
Although the re-ECN protocol is intended to make a simple but far- | Although the re-ECN protocol is intended to make a simple but far- | |||
reaching change to the Internet architecture, the most immediate | reaching change to the Internet architecture, the most immediate | |||
priority for the authors is to delay any move of the ECN nonce to | priority for the authors is to delay any move of the ECN nonce to | |||
Proposed Standard status. | Proposed Standard status. The argument for this position is | |||
developed in Appendix I. | ||||
The ECN nonce is an experimental RFC that allows /senders/ to check | ||||
the integrity of congestion feedback from /networks/. Therefore the | ||||
nonce only helps in scenarios where the sender is trusted to control | ||||
network congestion. On the other hand, the re-ECN protocol aims to | ||||
allow networks themselves to be able to police cheating senders and | ||||
receivers and to police neighbouring networks. Re-ECN is therefore | ||||
proposed in preference to the ECN nonce on the basis that it | ||||
addresses the generic problem of accountability for congestion of a | ||||
network's resources at the IP layer. | ||||
Delaying the ECN nonce is justified by two factors: | ||||
o The ECN nonce would permanently consumes a two-bit codepoint in | ||||
the IP header for a purpose specific to a limited trust model. | ||||
Although the nonce is a neat idea, its applicability seems too | ||||
limited to warrant space in the IP header; | ||||
o Although we have re-designed the re-ECN codepoints so that they do | ||||
not prevent the ECN nonce progressing, the same is not true the | ||||
other way round. If the ECN nonce started to see some deployment | ||||
(perhaps because it was blessed with proposed standard status), | ||||
incremental deployment of re-ECN would effectively be impossible, | ||||
because re-ECN marking fractions at inter-domain borders would be | ||||
polluted by unknown levels of nonce traffic. | ||||
The authors are aware that re-ECN must prove it has the potential it | ||||
claims if it is to displace the nonce. Therefore, every effort has | ||||
been made to complete a comprehensive specification of re-ECN so that | ||||
its potential can be assessed. We therefore seek the opinion of the | ||||
Internet community on whether the re-ECN protocol is sufficiently | ||||
useful to warrant standards action. | ||||
Changes from previous drafts (to be removed by the RFC Editor) | Changes from previous drafts (to be removed by the RFC Editor) | |||
From -00 to -01: | From -00 to -01: | |||
Encoding of re-ECN wire protocol changed for reasons given in | Encoding of re-ECN wire protocol changed for reasons given in | |||
Appendix B and consequently draft substantially re-written. | Appendix B and consequently draft substantially re-written. | |||
Substantial text added in sections on applications, incremental | Substantial text added in sections on applications, incremental | |||
deployment, architectural rationale and security considerations. | deployment, architectural rationale and security considerations. | |||
skipping to change at page 3, line 39 | skipping to change at page 3, line 12 | |||
Text on (non-)issues with tunnels, encryption and link layer | Text on (non-)issues with tunnels, encryption and link layer | |||
congestion notification added (Section 5.6 & Section 5.7). | congestion notification added (Section 5.6 & Section 5.7). | |||
Section added giving evolvability arguments against encouraging | Section added giving evolvability arguments against encouraging | |||
bottleneck policing (Section 6.1.2). And text on re-ECN's | bottleneck policing (Section 6.1.2). And text on re-ECN's | |||
evolvability by design added to Section 6.1.3 | evolvability by design added to Section 6.1.3 | |||
Text on inter-domain policing (Section 6.1.6) and inter-domain | Text on inter-domain policing (Section 6.1.6) and inter-domain | |||
fail-safes (Section 6.1.7) added. | fail-safes (Section 6.1.7) added. | |||
From -02 to -03: | ||||
Started guidelines for re-ECN support in DCCP and SCTP. | ||||
Added annex on limitations of nonce mechanism. | ||||
Minor editorial changes throughout. | Minor editorial changes throughout. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7 | 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7 | |||
3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 | 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 | |||
3.1. Background and Applicability . . . . . . . . . . . . . . . 8 | 3.1. Background and Applicability . . . . . . . . . . . . . . . 8 | |||
3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | 3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or | |||
v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 | |||
3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 10 | 3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 10 | |||
3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 12 | 3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 12 | |||
4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 14 | 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | |||
4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16 | 4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16 | |||
4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or | 4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or | |||
Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 18 | Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 18 | |||
4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 20 | 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 20 | |||
4.1.4. Extended ECN (EECN) Field Settings during Flow | 4.1.4. Extended ECN (EECN) Field Settings during Flow | |||
Start or after Idle Periods . . . . . . . . . . . . . 21 | Start or after Idle Periods . . . . . . . . . . . . . 21 | |||
4.1.5. Pure ACKS, Retransmissions, Window Probes and | 4.1.5. Pure ACKS, Retransmissions, Window Probes and | |||
Partial ACKs . . . . . . . . . . . . . . . . . . . . . 25 | Partial ACKs . . . . . . . . . . . . . . . . . . . . . 25 | |||
4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 26 | 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 26 | |||
4.2.1. Guidelines for Adding Re-ECN to Other Transports . . . 26 | 4.2.1. General Guidelines for Adding Re-ECN to Other | |||
5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 26 | Transports . . . . . . . . . . . . . . . . . . . . . . 26 | |||
5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 26 | 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 26 | |||
4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 27 | ||||
4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 27 | ||||
5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 27 | ||||
5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 27 | ||||
5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 28 | 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 28 | |||
5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 29 | 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 30 | |||
5.4. Justification for Setting the First SYN to FNE . . . . . . 30 | 5.4. Justification for Setting the First SYN to FNE . . . . . . 31 | |||
5.5. Control and Management . . . . . . . . . . . . . . . . . . 31 | 5.5. Control and Management . . . . . . . . . . . . . . . . . . 32 | |||
5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 31 | 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 32 | |||
5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 32 | 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 33 | |||
5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 32 | 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 33 | |||
5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 33 | 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 34 | |||
6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 34 | 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 35 | |||
6.1. Policing Congestion Response . . . . . . . . . . . . . . . 34 | 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 35 | |||
6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 34 | 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 35 | |||
6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 35 | 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 36 | |||
6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 36 | 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 37 | |||
6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 43 | 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 44 | |||
6.1.5. Rate Policing . . . . . . . . . . . . . . . . . . . . 44 | 6.1.5. Rate Policing . . . . . . . . . . . . . . . . . . . . 45 | |||
6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 46 | 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 47 | |||
6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 50 | 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 51 | |||
6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 51 | 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 51 | |||
6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 51 | 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 51 | |||
6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 51 | 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 52 | |||
6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 52 | 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 53 | |||
6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 52 | 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 53 | |||
6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 53 | 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 53 | |||
6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 53 | 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 53 | |||
7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 53 | 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 54 | |||
7.1. Incremental Deployment Features . . . . . . . . . . . . . 53 | 7.1. Incremental Deployment Features . . . . . . . . . . . . . 54 | |||
7.2. Incremental Deployment Incentives . . . . . . . . . . . . 55 | 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 55 | |||
8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 60 | 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 60 | |||
9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 62 | 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 63 | |||
9.1. Policing Rate Response to Congestion . . . . . . . . . . . 62 | 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 63 | |||
9.2. Congestion Notification Integrity . . . . . . . . . . . . 63 | 9.2. Congestion Notification Integrity . . . . . . . . . . . . 63 | |||
9.3. Identifying Upstream and Downstream Congestion . . . . . . 64 | 9.3. Identifying Upstream and Downstream Congestion . . . . . . 64 | |||
10. Security Considerations . . . . . . . . . . . . . . . . . . . 64 | 10. Security Considerations . . . . . . . . . . . . . . . . . . . 65 | |||
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 66 | 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 66 | |||
12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 66 | 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 67 | |||
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 66 | 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 67 | |||
14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 66 | 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 67 | |||
15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 67 | 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 67 | |||
15.1. Normative References . . . . . . . . . . . . . . . . . . . 67 | 15.1. Normative References . . . . . . . . . . . . . . . . . . . 67 | |||
15.2. Informative References . . . . . . . . . . . . . . . . . . 67 | 15.2. Informative References . . . . . . . . . . . . . . . . . . 68 | |||
Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 70 | Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 71 | |||
Appendix B. Justification for Two Codepoints Signifying Zero | Appendix B. Justification for Two Codepoints Signifying Zero | |||
Worth Packets . . . . . . . . . . . . . . . . . . . . 71 | Worth Packets . . . . . . . . . . . . . . . . . . . . 72 | |||
Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 73 | Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 74 | |||
Appendix D. Packet Marking During Flow Start . . . . . . . . . . 74 | Appendix D. Packet Marking During Flow Start . . . . . . . . . . 75 | |||
Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 74 | Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 75 | |||
Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 74 | Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 75 | |||
Appendix G. Policer Designs to ensure Congestion | Appendix G. Policer Designs to ensure Congestion | |||
Responsiveness . . . . . . . . . . . . . . . . . . . 75 | Responsiveness . . . . . . . . . . . . . . . . . . . 76 | |||
G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 75 | G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 76 | |||
G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 76 | G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 77 | |||
Appendix H. Downstream Congestion Metering Algorithms . . . . . . 79 | Appendix H. Downstream Congestion Metering Algorithms . . . . . . 80 | |||
H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 79 | H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 80 | |||
H.2. Inflation Factor for Persistently Negative Flows . . . . . 79 | H.2. Inflation Factor for Persistently Negative Flows . . . . . 80 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 81 | Appendix I. Argument for holding back the ECN nonce . . . . . . . 81 | |||
Intellectual Property and Copyright Statements . . . . . . . . . . 82 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 83 | |||
Intellectual Property and Copyright Statements . . . . . . . . . . 85 | ||||
1. Introduction | 1. Introduction | |||
This document aims: | This document aims: | |||
o To provide a complete specification of the addition of the re-ECN | o To provide a complete specification of the addition of the re-ECN | |||
protocol to IP and guidelines on how to add it to transport layer | protocol to IP and guidelines on how to add it to transport layer | |||
protocols, including a complete specification of re-ECN in TCP as | protocols, including a complete specification of re-ECN in TCP as | |||
an example; | an example; | |||
skipping to change at page 7, line 38 | skipping to change at page 7, line 38 | |||
(Section 5) layers, then the applications it can be put to, such as | (Section 5) layers, then the applications it can be put to, such as | |||
policing DDoS, QoS and congestion control (Section 6). Although | policing DDoS, QoS and congestion control (Section 6). Although | |||
these applications do not require standardisation themselves, they | these applications do not require standardisation themselves, they | |||
are described in a fair degree of detail in order to explain how re- | are described in a fair degree of detail in order to explain how re- | |||
ECN can be used. Given, re-ECN proposes to use the last undefined | ECN can be used. Given, re-ECN proposes to use the last undefined | |||
bit in the IPv4 header, we felt it necessary to outline the potential | bit in the IPv4 header, we felt it necessary to outline the potential | |||
that re-ECN could release in return for being given that bit. | that re-ECN could release in return for being given that bit. | |||
Deployment issues discussed throughout the document are brought | Deployment issues discussed throughout the document are brought | |||
together in Section 7, which is followed by a brief section | together in Section 7, which is followed by a brief section | |||
explaining the somewhat subtle rationale for the design, from an | explaining the somewhat subtle rationale for the design from an | |||
architectural perspective (Section 8). We end by describing related | architectural perspective (Section 8). We end by describing related | |||
work (Section 9), listing security considerations (Section 10) and | work (Section 9), listing security considerations (Section 10) and | |||
finally drawing conclusions (Section 12). | finally drawing conclusions (Section 12). | |||
2. Requirements notation | 2. Requirements notation | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
skipping to change at page 8, line 45 | skipping to change at page 8, line 45 | |||
The choice of two ECT code-points in the ECN field [RFC3168] | The choice of two ECT code-points in the ECN field [RFC3168] | |||
permitted future flexibility, optionally allowing the sender to | permitted future flexibility, optionally allowing the sender to | |||
encode the experimental ECN nonce [RFC3540] in the packet stream. | encode the experimental ECN nonce [RFC3540] in the packet stream. | |||
The nonce is designed to allow a sender to check the integrity of | The nonce is designed to allow a sender to check the integrity of | |||
congestion feedback. But Section 9.2 explains that it still gives no | congestion feedback. But Section 9.2 explains that it still gives no | |||
control over how fast the sender transmits as a result of the | control over how fast the sender transmits as a result of the | |||
feedback. On the other hand, re-ECN is designed both to ensure that | feedback. On the other hand, re-ECN is designed both to ensure that | |||
congestion is declared honestly and that the sender's rate responds | congestion is declared honestly and that the sender's rate responds | |||
appropriately. | appropriately. | |||
Re-ECN is based on a feedback arrangement called | Re-ECN is based on a feedback arrangement called `re- | |||
`re-feedback' [Re-fb]. The word is short for either receiver- | feedback' [Re-fb]. The word is short for either receiver-aligned, | |||
aligned, re-inserted or re-echoed feedback. But it actually works | re-inserted or re-echoed feedback. But it actually works even when | |||
even when no feedback is available. In fact it has been carefully | no feedback is available. In fact it has been carefully designed to | |||
designed to work for single datagram flows. Indeed, it even | work for single datagram flows. Indeed, it even encourages | |||
encourages aggregation of single packet flows by congestion control | aggregation of single packet flows by congestion control proxies. | |||
proxies. Then, even if the traffic mix of the Internet were to | ||||
become dominated by short messages, it would still be possible to | Then, even if the traffic mix of the Internet were to become | |||
control congestion effectively and efficiently. | dominated by short messages, it would still be possible to control | |||
congestion effectively and efficiently. | ||||
Changing the Internet's feedback architecture seems to imply | Changing the Internet's feedback architecture seems to imply | |||
considerable upheaval. But re-ECN can be deployed incrementally at | considerable upheaval. But re-ECN can be deployed incrementally at | |||
the transport layer around unmodified routers using existing fields | the transport layer around unmodified routers using existing fields | |||
in IP (v4 or v6). However it does also require the last undefined | in IP (v4 or v6). However it does also require the last undefined | |||
bit in the IPv4 header, which it uses in combination with the 2-bit | bit in the IPv4 header, which it uses in combination with the 2-bit | |||
ECN field to create four new codepoints. Nonetheless, changes to IP | ECN field to create four new codepoints. Nonetheless, changes to IP | |||
routers are RECOMMENDED in order to improve resilience against DoS | routers are RECOMMENDED in order to improve resilience against DoS | |||
attacks. Similarly, re-ECN works best if both the sender and | attacks. Similarly, re-ECN works best if both the sender and | |||
receiver transports are re-ECN-capable, but it can work with just | receiver transports are re-ECN-capable, but it can work with just | |||
skipping to change at page 10, line 13 | skipping to change at page 10, line 13 | |||
be defined in another specification (e.g. [Re-PCN]). | be defined in another specification (e.g. [Re-PCN]). | |||
Although the RE flag is a separate, single bit field, it can be read | Although the RE flag is a separate, single bit field, it can be read | |||
as an extension to the two-bit ECN field; the three concatenated bits | as an extension to the two-bit ECN field; the three concatenated bits | |||
in what we will call the extended ECN field (EECN) making eight | in what we will call the extended ECN field (EECN) making eight | |||
codepoints. We will use the RFC3168 names of the ECN codepoints to | codepoints. We will use the RFC3168 names of the ECN codepoints to | |||
describe settings of the ECN field when the RE flag setting is "don't | describe settings of the ECN field when the RE flag setting is "don't | |||
care", but we also define the following six extended ECN codepoint | care", but we also define the following six extended ECN codepoint | |||
names for when we need to be more specific. | names for when we need to be more specific. | |||
+-------+-----------+------+--------------+-------------------------+ | +-------+------------+------+--------------+------------------------+ | |||
| ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | |||
| field | codepoint | flag | codepoint | | | | field | codepoint | flag | codepoint | | | |||
+-------+-----------+------+--------------+-------------------------+ | +-------+------------+------+--------------+------------------------+ | |||
| 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | |||
| | | | | transport | | | | | | | transport | | |||
| 00 | Not-ECT | 1 | FNE | Feedback not | | | 00 | Not-ECT | 1 | FNE | Feedback not | | |||
| | | | | established | | | | | | | established | | |||
| 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | |||
| | | | | and RECT | | | | | | | and RECT | | |||
| 01 | ECT(1) | 1 | RECT | Re-ECN capable | | | 01 | ECT(1) | 1 | RECT | Re-ECN capable | | |||
| | | | | transport | | | | | | | transport | | |||
| 10 | ECT(0) | 0 | --- | Legacy ECN use only | | | 10 | ECT(0) | 0 | --- | Legacy ECN use only | | |||
| | | | | | | ||||
| 10 | ECT(0) | 1 | --CU-- | Currently unused | | | 10 | ECT(0) | 1 | --CU-- | Currently unused | | |||
| | | | | | | | | | | | | | |||
| 11 | CE | 0 | CE(0) | Re-Echo canceled by | | | 11 | CE | 0 | CE(0) | Re-Echo canceled by | | |||
| | | | | congestion experienced | | | | | | | congestion experienced | | |||
| 11 | CE | 1 | CE(-1) | Congestion experienced | | | 11 | CE | 1 | CE(-1) | Congestion experienced | | |||
+-------+-----------+------+--------------+-------------------------+ | +-------+------------+------+--------------+------------------------+ | |||
Table 1: Extended ECN Codepoints | Table 1: Extended ECN Codepoints | |||
3.3. Re-ECN Protocol Operation | 3.3. Re-ECN Protocol Operation | |||
In this section we will give an overview of the operation of the re- | In this section we will give an overview of the operation of the re- | |||
ECN protocol for TCP/IP, leaving a detailed specification to the | ECN protocol for TCP/IP, leaving a detailed specification to the | |||
following sections. Other transports will be discussed later. | following sections. Other transports will be discussed later. | |||
In summary, the protocol adds a third `re-echo' stage to the existing | In summary, the protocol adds a third `re-echo' stage to the existing | |||
skipping to change at page 12, line 44 | skipping to change at page 12, line 44 | |||
of a negative metric arises because it is derived by subtracting one | of a negative metric arises because it is derived by subtracting one | |||
metric from another. Of course actual downstream congestion cannot | metric from another. Of course actual downstream congestion cannot | |||
be negative, only the metric can (whether due to time lags or | be negative, only the metric can (whether due to time lags or | |||
deliberate malice). | deliberate malice). | |||
Just as we will loosely talk of positive and negative flows, we will | Just as we will loosely talk of positive and negative flows, we will | |||
also talk of positive or negative packets, meaning packets that | also talk of positive or negative packets, meaning packets that | |||
contribute positively or negatively to the downstream congestion | contribute positively or negatively to the downstream congestion | |||
metric. | metric. | |||
Therefore packets we will talk of packets having `worth' of +1, 0 or | Therefore we will talk of packets having `worth' of +1, 0 or -1, | |||
-1, which, when multiplied by their size, indicates their | which, when multiplied by their size, indicates their contribution to | |||
contribution to the downstream congestion metric. | the downstream congestion metric. | |||
Figure 2 shows the main state transitions of the system once a flow | Figure 2 shows the main state transitions of the system once a flow | |||
is established, showing the worth of packets in each state. When the | is established, showing the worth of packets in each state. When the | |||
network congestion marks a packet it decrements its worth (moving | network congestion marks a packet it decrements its worth (moving | |||
from the left of the main square to the right). When the sender | from the left of the main square to the right). When the sender | |||
blanks the RE flag in order to re-echo congestion it increments the | blanks the RE flag in order to re-echo congestion it increments the | |||
worth of a packet (moving from the bottom of the main square to the | worth of a packet (moving from the bottom of the main square to the | |||
top). | top). | |||
Sender state Sent Worth Received Worth | Sender state Sent Worth Received Worth | |||
skipping to change at page 13, line 33 | skipping to change at page 13, line 33 | |||
Figure 2: Re-ECN System State Diagram (bootstrap not shown) | Figure 2: Re-ECN System State Diagram (bootstrap not shown) | |||
The idea is that every time the network decrements the worth of a | The idea is that every time the network decrements the worth of a | |||
packet, the sender increments the worth of a later packet. Then, | packet, the sender increments the worth of a later packet. Then, | |||
over time, as many positive octets should arrive at the receiver as | over time, as many positive octets should arrive at the receiver as | |||
negative. Note we have said octets not packets, so if packets are of | negative. Note we have said octets not packets, so if packets are of | |||
different sizes, the worth should be incremented on enough octets to | different sizes, the worth should be incremented on enough octets to | |||
balance the octets in negative packets arriving at the receiver. It | balance the octets in negative packets arriving at the receiver. It | |||
is this balance that will allow the network to hold the sender | is this balance that will allow the network to hold the sender | |||
accountable for the congestion it causes, as we shall see. the | accountable for the congestion it causes, as we shall see. The | |||
informal outline below uses TCP as an example transport, but the idea | informal outline below uses TCP as an example transport, but the idea | |||
would be broadly similar for any transport that adapts its rate to | would be broadly similar for any transport that adapts its rate to | |||
congestion. | congestion. | |||
We will start with the sender in `flow established' state, Normally | We will start with the sender in `flow established' state. Normally, | |||
as acknowledgements of earlier packets arrive that don't feedback any | as acknowledgements of earlier packets arrive that don't feedback any | |||
congestion, the congestion window can be opened, so the sender goes | congestion, the congestion window can be opened, so the sender goes | |||
round the smaller sub-loop, sending RECT packets (worth 0) and | round the smaller sub-loop, sending RECT packets (worth 0) and | |||
returning to the flow established state to send another one. If a | returning to the flow established state to send another one. If a | |||
router congestion marks one of the packets, it decrements the | router congestion marks one of the packets, it decrements the | |||
packet's worth. The sender will have been continuing to traverse | packet's worth. The sender will have been continuing to traverse | |||
round the smaller feedback loop every time acknowledgements arrive. | round the smaller feedback loop every time acknowledgements arrive. | |||
But when congestion feedback returns from this packet that was marked | But when congestion feedback returns from this packet that was marked | |||
with -1 worth (the largest loop in the figure) the sender jumps to | with -1 worth (the largest loop in the figure) the sender jumps to | |||
the congestion echoed state in order to re-echo the congestion, | the congestion echoed state in order to re-echo the congestion, | |||
skipping to change at page 14, line 16 | skipping to change at page 14, line 16 | |||
the same end to end feedback loop. | the same end to end feedback loop. | |||
If a packet carrying re-echoed congestion happens to also be | If a packet carrying re-echoed congestion happens to also be | |||
congestion marked, the +1 worth added by the sender will be cancelled | congestion marked, the +1 worth added by the sender will be cancelled | |||
out by the -1 network congestion marking. Although the two worth | out by the -1 network congestion marking. Although the two worth | |||
values correctly cancel out, neither the congestion marking nor the | values correctly cancel out, neither the congestion marking nor the | |||
re-echoed congestion are lost, because the RE bit and the ECN field | re-echoed congestion are lost, because the RE bit and the ECN field | |||
are orthogonal. So, whenever this happens, the receiver will | are orthogonal. So, whenever this happens, the receiver will | |||
correctly detect and re-echo the new congestion event as well (the | correctly detect and re-echo the new congestion event as well (the | |||
top sub-loop). When we need to distinguish, we will sometimes call a | top sub-loop). When we need to distinguish, we will sometimes call a | |||
packet marked RECT neutral (0 worth), while we will call the CE(0) | packet marked RECT 'neutral' (0 worth), while we will call the CE(0) | |||
marking canceled (also 0 worth). If a re-echoed packet isn't unlucky | marking 'canceled' (also 0 worth). If a re-echoed packet isn't | |||
enough to be further congestion marked, the sender will return to the | unlucky enough to be further congestion marked, the sender will | |||
flow established state and continue to send RECT packets (worth 0). | return to the flow established state and continue to send RECT | |||
packets (worth 0). | ||||
The table below specifies unambiguously the worth of each extended | The table below specifies unambiguously the worth of each extended | |||
ECN codepoint. Note the order is different from the previous table | ECN codepoint. Note the order is different from the previous table | |||
to better show how the worth increments and decrements. The FNE | to better show how the worth increments and decrements. The FNE | |||
codepoint is an exception. It is used in the flow bootstrap process | codepoint is an exception. It is used in the flow bootstrap process | |||
(explained later) and has the same positive (+1) worth as a packet | (explained later) and has the same positive (+1) worth as a packet | |||
with the Re-Echo codepoint. | with the Re-Echo codepoint. | |||
+-------+-----+----------------+-------+----------------------------+ | +--------+------+----------------+-------+--------------------------+ | |||
| ECN | RE | Extended ECN | Worth | Re-ECN meaning | | | ECN | RE | Extended ECN | Worth | Re-ECN meaning | | |||
| field | bit | codepoint | | | | | field | bit | codepoint | | | | |||
+-------+-----+----------------+-------+----------------------------+ | +--------+------+----------------+-------+--------------------------+ | |||
| 00 | 0 | Not-RECT | ... | Not re-ECN-capable | | | 00 | 0 | Not-RECT | ... | Not re-ECN-capable | | |||
| | | | | transport | | | | | | | transport | | |||
| 01 | 0 | Re-Echo | +1 | Re-echoed congestion and | | | 01 | 0 | Re-Echo | +1 | Re-echoed congestion and | | |||
| | | | | RECT | | | | | | | RECT | | |||
| 10 | 0 | --- | ... | Legacy ECN use only | | | 10 | 0 | --- | ... | Legacy ECN use only | | |||
| 11 | 0 | CE(0) | 0 | Re-Echo canceled by | | | 11 | 0 | CE(0) | 0 | Re-Echo canceled by | | |||
| | | | | congestion experienced | | | | | | | congestion experienced | | |||
| 00 | 1 | FNE | +1 | Feedback not established | | | 00 | 1 | FNE | +1 | Feedback not established | | |||
| 01 | 1 | RECT | 0 | Re-ECN capable transport | | | 01 | 1 | RECT | 0 | Re-ECN capable transport | | |||
| 10 | 1 | --CU-- | ... | Currently unused | | | 10 | 1 | --CU-- | ... | Currently unused | | |||
| | | | | | | | | | | | | | |||
| 11 | 1 | CE(-1) | -1 | Congestion experienced | | | 11 | 1 | CE(-1) | -1 | Congestion experienced | | |||
+-------+-----+----------------+-------+----------------------------+ | +--------+------+----------------+-------+--------------------------+ | |||
Table 3: 'Worth' of Extended ECN Codepoints | Table 3: 'Worth' of Extended ECN Codepoints | |||
4. Transport Layers | 4. Transport Layers | |||
4.1. TCP | 4.1. TCP | |||
Re-ECN capability at the sender is essential. At the receiver it is | Re-ECN capability at the sender is essential. At the receiver it is | |||
optional, as long as the receiver has a basic (`vanilla flavour') | optional, as long as the receiver has a basic (`vanilla flavour') | |||
RFC3168-compliant ECN-capable transport (ECT) [RFC3168]. Given re- | RFC3168-compliant ECN-capable transport (ECT) [RFC3168]. Given re- | |||
ECN is not the first attempt to define the semantics of the ECN | ECN is not the first attempt to define the semantics of the ECN | |||
field, we give a table below summarising what happens for various | field, we give a table below summarising what happens for various | |||
combinations of capabilities of the sender S and receiver R, as | combinations of capabilities of the sender S and receiver R, as | |||
indicated in the first four columns below. The last column gives the | indicated in the first four columns below. The last column gives the | |||
mode a half-connection should be in after the first two of the three | mode a half-connection should be in after the first two of the three | |||
TCP handshakes. | TCP handshakes. | |||
+--------+---------------+-----------+---------+--------------------+ | +--------+--------------+------------+---------+--------------------+ | |||
| Re-ECT | ECT-Nonce | ECT | Not-ECT | S-R | | | Re-ECT | ECT-Nonce | ECT | Not-ECT | S-R | | |||
| | (RFC3540) | (RFC3168) | | Half-connection | | | | (RFC3540) | (RFC3168) | | Half-connection | | |||
| | | | | Mode | | | | | | | Mode | | |||
+--------+---------------+-----------+---------+--------------------+ | +--------+--------------+------------+---------+--------------------+ | |||
| SR | | | | RECN | | | SR | | | | RECN | | |||
| S | R | | | RECN-Co | | | S | R | | | RECN-Co | | |||
| S | | R | | RECN-Co | | | S | | R | | RECN-Co | | |||
| S | | | R | Not-ECT | | | S | | | R | Not-ECT | | |||
+--------+---------------+-----------+---------+--------------------+ | +--------+--------------+------------+---------+--------------------+ | |||
Table 4: Modes of TCP Half-connection for Combinations of ECN | Table 4: Modes of TCP Half-connection for Combinations of ECN | |||
Capabilities of Sender S and Receiver R | Capabilities of Sender S and Receiver R | |||
We will describe what happens in each mode, then describe how they | We will describe what happens in each mode, then describe how they | |||
are negotiated. The abbreviations for the modes in the above table | are negotiated. The abbreviations for the modes in the above table | |||
mean: | mean: | |||
RECN: Full re-ECN capable transport | RECN: Full re-ECN capable transport | |||
RECN-Co: Re-ECN sender in compatibility mode with a vanilla [RFC3168] | RECN-Co: Re-ECN sender in compatibility mode with a | |||
ECN receiver or an [RFC3540] ECN nonce-capable receiver. | vanilla [RFC3168] ECN receiver or an [RFC3540] ECN nonce-capable | |||
Implementation of this mode is OPTIONAL. | receiver. Implementation of this mode is OPTIONAL. | |||
Not-ECT: Not ECN-capable transport, as defined in [RFC3168] for when | Not-ECT: Not ECN-capable transport, as defined in [RFC3168] for when | |||
at least one of the transports does not understand even basic ECN | at least one of the transports does not understand even basic ECN | |||
marking. | marking. | |||
Note that we use the term Re-ECT for a host transport that is re-ECN- | Note that we use the term Re-ECT for a host transport that is re-ECN- | |||
capable but RECN for the modes of the half connections between hosts | capable but RECN for the modes of the half connections between hosts | |||
when they are both Re-ECT. If a host transport is Re-ECT, this fact | when they are both Re-ECT. If a host transport is Re-ECT, this fact | |||
alone does NOT imply either of its half connections will necessarily | alone does NOT imply either of its half connections will necessarily | |||
be in RECN mode, at least not until it has confirmed that the other | be in RECN mode, at least not until it has confirmed that the other | |||
skipping to change at page 23, line 5 | skipping to change at page 23, line 5 | |||
RECN mode: Given the constraints on TCP's initial window [RFC3390] | RECN mode: Given the constraints on TCP's initial window [RFC3390] | |||
and its exponential window increase during slow start | and its exponential window increase during slow start | |||
phase [RFC2581], it turns out that the sender SHOULD set FNE on | phase [RFC2581], it turns out that the sender SHOULD set FNE on | |||
the first and third data packets in its flow, assuming equal sized | the first and third data packets in its flow, assuming equal sized | |||
data packets once a flow is established. Appendix D presents the | data packets once a flow is established. Appendix D presents the | |||
calculation that led to this conclusion. Below, after running | calculation that led to this conclusion. Below, after running | |||
through the start of an example TCP session, we give the intuition | through the start of an example TCP session, we give the intuition | |||
learned from that calculation. | learned from that calculation. | |||
RECN-Co mode: A re-ECT sender that switches into re-ECN compatibility | RECN-Co mode: A re-ECT sender that switches into re-ECN | |||
mode or into Not-ECT mode (because it has detected the | compatibility mode or into Not-ECT mode (because it has detected | |||
corresponding host is not re-ECN capable) MUST limit its initial | the corresponding host is not re-ECN capable) MUST limit its | |||
window to 1 segment. The reasoning behind this constraint is | initial window to 1 segment. The reasoning behind this constraint | |||
given in Section 5.4. Having set this initial window, a re-ECN | is given in Section 5.4. Having set this initial window, a re-ECN | |||
sender in RECN-Co mode SHOULD set FNE on the first and third data | sender in RECN-Co mode SHOULD set FNE on the first and third data | |||
packets in a flow, as for RECN mode. | packets in a flow, as for RECN mode. | |||
+----+------+----------------+-------+-------+---------------+------+ | +----+------+----------------+-------+-------+---------------+------+ | |||
| | Data | TCP A(Re-ECT) | IP A | IP B | TCP B(Re-ECT) | Data | | | | Data | TCP A(Re-ECT) | IP A | IP B | TCP B(Re-ECT) | Data | | |||
+----+------+----------------+-------+-------+---------------+------+ | +----+------+----------------+-------+-------+---------------+------+ | |||
| | Byte | SEQ ACK CTL | EECN | EECN | SEQ ACK CTL | Byte | | | | Byte | SEQ ACK CTL | EECN | EECN | SEQ ACK CTL | Byte | | |||
| -- | ---- | ------------- | ----- | ----- | ------------- | ---- | | | -- | ---- | ------------- | ----- | ----- | ------------- | ---- | | |||
| 1 | | 0100 SYN | FNE | --> | R.ECC=0 | | | | 1 | | 0100 SYN | FNE | --> | R.ECC=0 | | | |||
| | | CWR,ECE,NS | | | | | | | | | CWR,ECE,NS | | | | | | |||
skipping to change at page 26, line 7 | skipping to change at page 26, line 7 | |||
This does not ensure precisely the same number of octets have RE | This does not ensure precisely the same number of octets have RE | |||
blanked as were CE marked. But we believe positive errors will | blanked as were CE marked. But we believe positive errors will | |||
cancel negative over a long enough period. {ToDo: However, more | cancel negative over a long enough period. {ToDo: However, more | |||
research is needed to prove whether this is so. If it is not, it may | research is needed to prove whether this is so. If it is not, it may | |||
be necessary to increment and decrement R in octets rather than | be necessary to increment and decrement R in octets rather than | |||
packets, by incrementing R as the product of D and the size in octets | packets, by incrementing R as the product of D and the size in octets | |||
of packets being sent (typically the MSS).} | of packets being sent (typically the MSS).} | |||
4.2. Other Transports | 4.2. Other Transports | |||
4.2.1. Guidelines for Adding Re-ECN to Other Transports | 4.2.1. General Guidelines for Adding Re-ECN to Other Transports | |||
Re-ECT sender transports that have established the receiver transport | Re-ECT sender transports that have established the receiver transport | |||
is at least ECN-capable (not necessarily re-ECN capable) MUST blank | is at least ECN-capable (not necessarily re-ECN capable) MUST blank | |||
the RE codepoint in packets carrying at least as many octets as | the RE codepoint in packets carrying at least as many octets as | |||
arrive at receiver with the CE codepoint set. Re-ECN-capable sender | arrive at receiver with the CE codepoint set. Re-ECN-capable sender | |||
transports should always initialise the ECN field to the ECT(1) | transports should always initialise the ECN field to the ECT(1) | |||
codepoint once a flow is established. | codepoint once a flow is established. | |||
If the sender transport does not have sufficient feedback to even | If the sender transport does not have sufficient feedback to even | |||
estimate the path's CE rate, it SHOULD set FNE continuously. If the | estimate the path's CE rate, it SHOULD set FNE continuously. If the | |||
sender transport has some, perhaps stale, feedback to estimate that | sender transport has some, perhaps stale, feedback to estimate that | |||
the path's CE rate is nearly definitely less than E%, the transport | the path's CE rate is nearly definitely less than E%, the transport | |||
MAY blank RE in packets for E% of sent octets, and set the RECT | MAY blank RE in packets for E% of sent octets, and set the RECT | |||
codepoint for the remainder. | codepoint for the remainder. | |||
The following sections give guidelines on how re-ECN support could be | ||||
added to RSVP or NSIS, to DCCP, and to SCTP - although separate | ||||
Internet drafts will be necessary to document the exact mechanics of | ||||
re-ECN if each of these protocols. | ||||
{ToDo: Give a brief outline of what would be expected for each of the | {ToDo: Give a brief outline of what would be expected for each of the | |||
following: | following: | |||
o UDP fire and forget (e.g. DNS) | o UDP fire and forget (e.g. DNS) | |||
o UDP streaming with no feedback | o UDP streaming with no feedback | |||
o UDP streaming with feedback | o UDP streaming with feedback | |||
o DCCP [RFC4340] } | } | |||
o RSVP and/or NSIS: A separate I-D has been submitted [Re-PCN] | 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS | |||
describing how re-ECN can be used in an edge-to-edge rather than | ||||
end-to-end scenario. It can then be used by downstream networks | A separate I-D has been submitted [Re-PCN] describing how re-ECN can | |||
to police whether upstream networks are blocking new flow | be used in an edge-to-edge rather than end-to-end scenario. It can | |||
reservations when downstream congestion is too high, even though | then be used by downstream networks to police whether upstream | |||
the congestion is in other operators' downstream networks. This | networks are blocking new flow reservations when downstream | |||
relates to current work in progress on Admission Control over | congestion is too high, even though the congestion is in other | |||
Diffserv using Pre-Congestion Notification, being reported to the | operators' downstream networks. This relates to current work in | |||
IETF TSVWG [CL-deploy]. | progress on Admission Control over Diffserv using Pre-Congestion | |||
Notification, being reported to the IETF TSVWG [CL-deploy]. | ||||
4.2.3. Guidelines for adding Re-ECN to DCCP | ||||
Beside adjusting the initial features negotiation sequence, operating | ||||
re-ECN in DCCP could be achieved by defining a new option to be added | ||||
to acknowledgments, that would include a multibit field where the | ||||
destination could copy its ECC. | ||||
4.2.4. Guidelines for adding Re-ECN to SCTP | ||||
Annex 1 in RFC4340 gives the specifications for SCTP to support ECN. | ||||
Similar steps should be taken to support re-ECN. Beside adjusting | ||||
the initial features negotiation sequence, operating re-ECN in SCTP | ||||
could be achieved by defining a new control chunk, that would include | ||||
a multibit field where the destination could copy its ECC | ||||
5. Network Layer | 5. Network Layer | |||
5.1. Re-ECN IPv4 Wire Protocol | 5.1. Re-ECN IPv4 Wire Protocol | |||
The wire protocol of the ECN field in the IP header remains largely | The wire protocol of the ECN field in the IP header remains largely | |||
unchanged from [RFC3168]. However, an extension to the ECN field we | unchanged from [RFC3168]. However, an extension to the ECN field we | |||
call the RE (re-ECN extension) flag (Section 3.2) is defined in this | call the RE (re-ECN extension) flag (Section 3.2) is defined in this | |||
document. It doubles the extended ECN codepoint space, giving 8 | document. It doubles the extended ECN codepoint space, giving 8 | |||
potential codepoints. The semantics of the extra codepoints are | potential codepoints. The semantics of the extra codepoints are | |||
skipping to change at page 29, line 26 | skipping to change at page 30, line 9 | |||
field which we would expect to change en route. As the RE flag does | field which we would expect to change en route. As the RE flag does | |||
not need end-to-end authentication, we set the C flag to '1'. | not need end-to-end authentication, we set the C flag to '1'. | |||
{ToDo: A Congestion Hop by Hop Option ID will need to be registered | {ToDo: A Congestion Hop by Hop Option ID will need to be registered | |||
with IANA.} | with IANA.} | |||
5.3. Router Forwarding Behaviour | 5.3. Router Forwarding Behaviour | |||
Re-ECN works well without modifying the forwarding behaviour of any | Re-ECN works well without modifying the forwarding behaviour of any | |||
routers. However, below, two OPTIONAL changes to forwarding | routers. However, below, two OPTIONAL changes to forwarding | |||
behaviour are defined, which respectively enhance performance and | behaviour are defined which respectively enhance performance and | |||
improve a router's discrimination against flooding attacks. They are | improve a router's discrimination against flooding attacks. They are | |||
both OPTIONAL additions that we propose MAY apply by default to all | both OPTIONAL additions that we propose MAY apply by default to all | |||
Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN | Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN | |||
marking behaviours [RFC3168]. Specifications for PHBs MAY define | marking behaviours [RFC3168]. Specifications for PHBs MAY define | |||
different forwarding behaviours from this default, but this is NOT | different forwarding behaviours from this default, but this is NOT | |||
REQUIRED. [Re-PCN] is one example. | REQUIRED. [Re-PCN] is one example. | |||
FNE indicates ECT: | FNE indicates ECT: | |||
The FNE codepoint tells a router to assume that the packet was | The FNE codepoint tells a router to assume that the packet was | |||
skipping to change at page 30, line 12 | skipping to change at page 31, line 5 | |||
it MAY preferentially drop packets within the same Diffserv PHB | it MAY preferentially drop packets within the same Diffserv PHB | |||
using the preference order for extended ECN codepoints given in | using the preference order for extended ECN codepoints given in | |||
Table 7. Preferential dropping can be difficult to implement on | Table 7. Preferential dropping can be difficult to implement on | |||
some hardware, but if feasible it would discriminate against | some hardware, but if feasible it would discriminate against | |||
attack traffic if done as part of the overall policing framework | attack traffic if done as part of the overall policing framework | |||
of Section 6.1.3. If nowhere else, routers at the egress of a | of Section 6.1.3. If nowhere else, routers at the egress of a | |||
network SHOULD implement preferential drop (stronger than the MAY | network SHOULD implement preferential drop (stronger than the MAY | |||
above). For simplicity, preferences 4 & 5 MAY be merged into one | above). For simplicity, preferences 4 & 5 MAY be merged into one | |||
preference level. | preference level. | |||
+-------+-----+-----------+-------+------------+--------------------+ | +-------+-----+------------+-------+------------+-------------------+ | |||
| ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | | ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | |||
| field | bit | ECN | | (1 = drop | | | | field | bit | ECN | | (1 = drop | | | |||
| | | codepoint | | 1st) | | | | | | codepoint | | 1st) | | | |||
+-------+-----+-----------+-------+------------+--------------------+ | +-------+-----+------------+-------+------------+-------------------+ | |||
| 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | | 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | |||
| | | | | | congestion and | | | | | | | | congestion and | | |||
| | | | | | RECT | | | | | | | | RECT | | |||
| 00 | 1 | FNE | +1 | 4 | Feedback not | | | 00 | 1 | FNE | +1 | 4 | Feedback not | | |||
| | | | | | established | | | | | | | | established | | |||
| 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | | 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | |||
| | | | | | by congestion | | | | | | | | by congestion | | |||
| | | | | | experienced | | | | | | | | experienced | | |||
| 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | | 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | |||
| | | | | | transport | | | | | | | | transport | | |||
| 11 | 1 | CE(-1) | -1 | 3 | Congestion | | | 11 | 1 | CE(-1) | -1 | 3 | Congestion | | |||
| | | | | | experienced | | | | | | | | experienced | | |||
| 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | | 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | |||
| 10 | 0 | --- | n/a | 2 | Legacy ECN use | | | 10 | 0 | --- | n/a | 2 | Legacy ECN use | | |||
| | | | | | only | | | | | | | | only | | |||
| 00 | 0 | Not-RECT | n/a | 1 | Not re-ECN-capable | | | 00 | 0 | Not-RECT | n/a | 1 | Not | | |||
| | | | | | re-ECN-capable | | ||||
| | | | | | transport | | | | | | | | transport | | |||
+-------+-----+-----------+-------+------------+--------------------+ | +-------+-----+------------+-------+------------+-------------------+ | |||
Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth') | Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth') | |||
The above drop preferences are arranged to preserve packets with | The above drop preferences are arranged to preserve packets with | |||
more positive worth (Section 3.4), given senders of positive | more positive worth (Section 3.4), given senders of positive | |||
packets must have honestly declared downstream congestion. This | packets must have honestly declared downstream congestion. This | |||
is explained fully in Section 6 on applications, particularly when | is explained fully in Section 6 on applications, particularly when | |||
the application of re-ECN to protect against DDoS attacks is | the application of re-ECN to protect against DDoS attacks is | |||
described. | described. | |||
skipping to change at page 31, line 9 | skipping to change at page 32, line 5 | |||
Congested routers may mark an FNE packet to CE(-1) (Section 5.3), and | Congested routers may mark an FNE packet to CE(-1) (Section 5.3), and | |||
the initial SYN MUST be set to FNE by Re-ECT client A | the initial SYN MUST be set to FNE by Re-ECT client A | |||
(Section 4.1.4). So an initial SYN may be marked CE(-1) rather than | (Section 4.1.4). So an initial SYN may be marked CE(-1) rather than | |||
dropped. This seems dangerous, because the sender has not yet | dropped. This seems dangerous, because the sender has not yet | |||
established whether the receiver is a legacy one that does not | established whether the receiver is a legacy one that does not | |||
understand congestion marking. It also seems to allow malicious | understand congestion marking. It also seems to allow malicious | |||
senders to take advantage of ECN marking to avoid so much drop when | senders to take advantage of ECN marking to avoid so much drop when | |||
launching SYN flooding attacks. Below we explain the features of the | launching SYN flooding attacks. Below we explain the features of the | |||
protocol design that remove both these dangers. | protocol design that remove both these dangers. | |||
ECN-capable initial SYN with a Not-ECT server: If the TCP server B is | ECN-capable initial SYN with a Not-ECT server: If the TCP server B | |||
re-ECN capable, provision is made for it to feedback a possible | is re-ECN capable, provision is made for it to feedback a possible | |||
congestion marked SYN in the SYN ACK (Section 4.1.4). But if the | congestion marked SYN in the SYN ACK (Section 4.1.4). But if the | |||
TCP client A finds out from the SYN ACK that the server was not | TCP client A finds out from the SYN ACK that the server was not | |||
ECN-capable, the TCP client MUST consider the first SYN as | ECN-capable, the TCP client MUST consider the first SYN as | |||
congestion marked before setting itself into Not-ECT mode. | congestion marked before setting itself into Not-ECT mode. | |||
Section 4.1.4 mandates that such a TCP client MUST also set its | Section 4.1.4 mandates that such a TCP client MUST also set its | |||
initial window to 1 segment. In this way we remove the need to | initial window to 1 segment. In this way we remove the need to | |||
cautiously avoid setting the first SYN to Not-RECT. This will | cautiously avoid setting the first SYN to Not-RECT. This will | |||
give worse performance while deployment is patchy, but better | give worse performance while deployment is patchy, but better | |||
performance once deployment is widespread. | performance once deployment is widespread. | |||
skipping to change at page 38, line 24 | skipping to change at page 39, line 19 | |||
their own expected downstream congestion so that N1 can deploy a | their own expected downstream congestion so that N1 can deploy a | |||
policer at its ingress to check that S1 is complying with whatever | policer at its ingress to check that S1 is complying with whatever | |||
congestion control it should be using (Section 6.1.5). If N1 is | congestion control it should be using (Section 6.1.5). If N1 is | |||
extremely conservative it may police each flow, but it can choose | extremely conservative it may police each flow, but it can choose | |||
to just police the bulk amount of congestion each customer causes | to just police the bulk amount of congestion each customer causes | |||
without regard to flows, or if it is extremely liberal it need not | without regard to flows, or if it is extremely liberal it need not | |||
police congestion control at all. Whatever, it is always | police congestion control at all. Whatever, it is always | |||
preferable to police traffic at the very first ingress into an | preferable to police traffic at the very first ingress into an | |||
internetwork, before non-compliant traffic can cause any damage. | internetwork, before non-compliant traffic can cause any damage. | |||
Edge egress dropper: If the policer ensures the source has less right | Edge egress dropper: If the policer ensures the source has less | |||
to a high rate the higher it declares downstream congestion, the | right to a high rate the higher it declares downstream congestion, | |||
source has a clear incentive to understate downstream congestion. | the source has a clear incentive to understate downstream | |||
But, if flows of packets are understated when they enter the | congestion. But, if flows of packets are understated when they | |||
internetwork, they will have become negative by the time they | enter the internetwork, they will have become negative by the time | |||
leave. So, we introduce a dropper at the last network egress, | they leave. So, we introduce a dropper at the last network | |||
which drops packets in flows that persistently declare negative | egress, which drops packets in flows that persistently declare | |||
downstream congestion (see Section 6.1.4 for details). | negative downstream congestion (see Section 6.1.4 for details). | |||
..competitive routing | ..competitive routing | |||
.' : '. | .' : '. | |||
.' p e n a l:t i e s '. | .' p e n a l:t i e s '. | |||
: | : \ : | : | : \ : | |||
A : | : | : | A : | : | : | |||
|S <-----N1----> <---N2---> <---N4--> R domain | |S <-----N1----> <---N2---> <---N4--> R domain | |||
| : | : | : | | : | : | : | |||
| V | : | : | | V | : | : | |||
3% |--------+ | : | : | 3% |--------+ | : | : | |||
skipping to change at page 40, line 14 | skipping to change at page 40, line 39 | |||
may all be allowed different responses to congestion. The figure | may all be allowed different responses to congestion. The figure | |||
depicts this downward pressure on N2 by the solid downward arrow | depicts this downward pressure on N2 by the solid downward arrow | |||
at the egress of N2. Then N2 has an incentive either to police | at the egress of N2. Then N2 has an incentive either to police | |||
the congestion response of its own ingress traffic (from N1) or to | the congestion response of its own ingress traffic (from N1) or to | |||
emulate policing by applying penalties to N1 in turn on the basis | emulate policing by applying penalties to N1 in turn on the basis | |||
of congestion counted at their mutual boundary. In this recursive | of congestion counted at their mutual boundary. In this recursive | |||
way, the incentives for each flow to respond correctly to | way, the incentives for each flow to respond correctly to | |||
congestion trace back with each flow precisely to each source, | congestion trace back with each flow precisely to each source, | |||
despite the mechanism not recognising flows (see Section 6.2.2). | despite the mechanism not recognising flows (see Section 6.2.2). | |||
Inter-domain congestion charging diversity: Any two networks are free | Inter-domain congestion charging diversity: Any two networks are | |||
to agree any of a range of penalty regimes between themselves | free to agree any of a range of penalty regimes between themselves | |||
within the following reasonable constraints. N2 should expect to | within the following reasonable constraints. N2 should expect to | |||
have to pay penalties to N4 where penalties monotonically increase | have to pay penalties to N4 where penalties monotonically increase | |||
with the volume of congestion and negative penalties are not | with the volume of congestion and negative penalties are not | |||
allowed. For instance, they may agree an SLA with tiered | allowed. For instance, they may agree an SLA with tiered | |||
congestion thresholds, where higher penalties apply the higher the | congestion thresholds, where higher penalties apply the higher the | |||
threshold that is broken. But the most obvious (and useful) form | threshold that is broken. But the most obvious (and useful) form | |||
of penalty is where N4 levies a charge on N2 proportional to the | of penalty is where N4 levies a charge on N2 proportional to the | |||
volume of downstream congestion N2 dumps into N4. In the | volume of downstream congestion N2 dumps into N4. In the | |||
explanation that follows, we assume this specific variant of | explanation that follows, we assume this specific variant of | |||
volume charging between networks - charging proportionate to the | volume charging between networks - charging proportionate to the | |||
skipping to change at page 43, line 45 | skipping to change at page 44, line 19 | |||
fraction of negative octets introduced by congestion marking, leaving | fraction of negative octets introduced by congestion marking, leaving | |||
a balance of zero. If it is less (a negative flow), it implies that | a balance of zero. If it is less (a negative flow), it implies that | |||
the source is understating path congestion (which will reduce the | the source is understating path congestion (which will reduce the | |||
penalties that N2 owes N4). | penalties that N2 owes N4). | |||
If flows are positive, N4 need take no action---this simply means its | If flows are positive, N4 need take no action---this simply means its | |||
upstream neighbour is paying more penalties than it needs to, and the | upstream neighbour is paying more penalties than it needs to, and the | |||
source is going slower than it needs to. But, to protect itself | source is going slower than it needs to. But, to protect itself | |||
against persistently negative flows, N4 will need to install a | against persistently negative flows, N4 will need to install a | |||
dropper at its egress. Appendix E gives a suggested algorithm for | dropper at its egress. Appendix E gives a suggested algorithm for | |||
this dropper. There is not intention that the dropper algorithm | this dropper. There is no intention that the dropper algorithm needs | |||
needs to be standardised, it is merely provided to show that an | to be standardised, it is merely provided to show that an efficient, | |||
efficient, robust algorithm is possible. But whatever algorithm is | robust algorithm is possible. But whatever algorithm is used must | |||
used must meet the criteria below: | meet the criteria below: | |||
o It SHOULD introduce minimal false positives for honest flows; | o It SHOULD introduce minimal false positives for honest flows; | |||
o It SHOULD quickly detect and sanction dishonest flows (minimal | o It SHOULD quickly detect and sanction dishonest flows (minimal | |||
false negatives); | false negatives); | |||
o It MUST be invulnerable to state exhaustion attacks from malicious | o It MUST be invulnerable to state exhaustion attacks from malicious | |||
sources. For instance, if the dropper uses flow-state, it should | sources. For instance, if the dropper uses flow-state, it should | |||
not be possible for a source to send numerous packets, each with a | not be possible for a source to send numerous packets, each with a | |||
different flow ID, to force the dropper to exhaust its memory | different flow ID, to force the dropper to exhaust its memory | |||
capacity; | capacity; | |||
o It MUST introduce sufficient loss in goodput so that malicious | o It MUST introduce sufficient loss in goodput so that malicious | |||
skipping to change at page 44, line 35 | skipping to change at page 45, line 9 | |||
setting the FNE codepoint at the start of a flow, even though there | setting the FNE codepoint at the start of a flow, even though there | |||
is a cost to the sender of setting FNE (positive `worth'). Indeed, | is a cost to the sender of setting FNE (positive `worth'). Indeed, | |||
with the FNE codepoint, the rate at which a sender can generate new | with the FNE codepoint, the rate at which a sender can generate new | |||
flows can be limited (Appendix G). In this respect, the FNE | flows can be limited (Appendix G). In this respect, the FNE | |||
codepoint works like Handley's state set-up bit [Steps_DoS]. | codepoint works like Handley's state set-up bit [Steps_DoS]. | |||
Appendix E also gives an example dropper implementation that | Appendix E also gives an example dropper implementation that | |||
aggregates flow state. Dropper algorithms will often maintain a | aggregates flow state. Dropper algorithms will often maintain a | |||
moving average across flows of the fraction of RE blanked packets. | moving average across flows of the fraction of RE blanked packets. | |||
When maintaining an average across flows, a dropper SHOULD only allow | When maintaining an average across flows, a dropper SHOULD only allow | |||
flows into the average if they start with FNE, but it SHOULD not | flows into the average if they start with FNE, but it SHOULD NOT | |||
include packets with the FNE codepoint set in the average. A sender | include packets with the FNE codepoint set in the average. A sender | |||
sets the FNE codepoint when it does not have the benefit of feedback | sets the FNE codepoint when it does not have the benefit of feedback | |||
from the receiver. So, counting packets with FNE cleared would be | from the receiver. So, counting packets with FNE cleared would be | |||
likely to make the average unnecessarily positive, providing headroom | likely to make the average unnecessarily positive, providing headroom | |||
(or should we say footroom?) for dishonest (negative) traffic. | (or should we say footroom?) for dishonest (negative) traffic. | |||
If the dropper detects a persistently negative flow, it SHOULD drop | If the dropper detects a persistently negative flow, it SHOULD drop | |||
sufficient negative and neutral packets to force the flow to not be | sufficient negative and neutral packets to force the flow to not be | |||
negative. Drops SHOULD be focused on just sufficient packets in | negative. Drops SHOULD be focused on just sufficient packets in | |||
misbehaving flows to remove the negative bias while doing minimal | misbehaving flows to remove the negative bias while doing minimal | |||
skipping to change at page 54, line 5 | skipping to change at page 54, line 25 | |||
that the feedback loop is not broken but useful data can be | that the feedback loop is not broken but useful data can be | |||
removed. | removed. | |||
7. Incremental Deployment | 7. Incremental Deployment | |||
7.1. Incremental Deployment Features | 7.1. Incremental Deployment Features | |||
The design of the re-ECN protocol started from the fact that the | The design of the re-ECN protocol started from the fact that the | |||
current ECN marking behaviour of routers was sufficient and that re- | current ECN marking behaviour of routers was sufficient and that re- | |||
feedback could be introduced around these routers by changing the | feedback could be introduced around these routers by changing the | |||
sender behaviour but not the routers. Otherwise, if had required | sender behaviour but not the routers. Otherwise, if we had required | |||
routers to be changed, the chance of encountering a path that had | routers to be changed, the chance of encountering a path that had | |||
every router upgraded would be vanishly small during early | every router upgraded would be vanishly small during early | |||
deployment, giving no incentive to start deployment. Also, as there | deployment, giving no incentive to start deployment. Also, as there | |||
is no new forwarding behaviour, routers and hosts do not have to | is no new forwarding behaviour, routers and hosts do not have to | |||
signal or negotiate anything. | signal or negotiate anything. | |||
However, networks that choose to protect themselves using re-ECN do | However, networks that choose to protect themselves using re-ECN do | |||
have to add new security functions at their trust boundaries with | have to add new security functions at their trust boundaries with | |||
others. They distinguish legacy traffic by its ECN field. Traffic | others. They distinguish legacy traffic by its ECN field. Traffic | |||
from Not-ECT transports is distinguishable by its Not-RECT marking. | from Not-ECT transports is distinguishable by its Not-RECT marking. | |||
skipping to change at page 55, line 25 | skipping to change at page 55, line 47 | |||
None of these changes REQUIRE any modifications to routers. Also | None of these changes REQUIRE any modifications to routers. Also | |||
none of these changes affect anything about end to end congestion | none of these changes affect anything about end to end congestion | |||
control; they are all to do with allowing networks to police that end | control; they are all to do with allowing networks to police that end | |||
to end congestion control is well-behaved. | to end congestion control is well-behaved. | |||
7.2. Incremental Deployment Incentives | 7.2. Incremental Deployment Incentives | |||
It would only be worth standardising the re-ECN protocol if there | It would only be worth standardising the re-ECN protocol if there | |||
existed a coherent story for how it might be incrementally deployed. | existed a coherent story for how it might be incrementally deployed. | |||
In order for it to have a chance of deployment, everyone who needs to | In order for it to have a chance of deployment, everyone who needs to | |||
act, must have a strong incentive to act, and the incentives must | act must have a strong incentive to act, and the incentives must | |||
arise in the order that deployment would have to happen. Re-ECN | arise in the order that deployment would have to happen. Re-ECN | |||
works around unmodified ECN routers, but we can't just discuss why | works around unmodified ECN routers, but we can't just discuss why | |||
and how re-ECN deployment might build on ECN deployment, because | and how re-ECN deployment might build on ECN deployment, because | |||
there is precious little to build on in the first place. Instead, we | there is precious little to build on in the first place. Instead, we | |||
aim to show that re-ECN deployment could carry ECN with it. We focus | aim to show that re-ECN deployment could carry ECN with it. We focus | |||
on commercial deployment incentives, although some of the arguments | on commercial deployment incentives, although some of the arguments | |||
apply equally to academic or government sectors. | apply equally to academic or government sectors. | |||
ECN deployment: | ECN deployment: | |||
skipping to change at page 58, line 40 | skipping to change at page 59, line 13 | |||
world to the religion of policing. Networks that chose not to | world to the religion of policing. Networks that chose not to | |||
deploy egress droppers would leave themselves open to being | deploy egress droppers would leave themselves open to being | |||
congested by senders in other networks. But that would be their | congested by senders in other networks. But that would be their | |||
choice. | choice. | |||
The important aspect of the egress dropper though is that it most | The important aspect of the egress dropper though is that it most | |||
protects the network that deploys it. If a network does not | protects the network that deploys it. If a network does not | |||
deploy an egress dropper, sources sending into it from other | deploy an egress dropper, sources sending into it from other | |||
networks will be able to understate the congestion they are | networks will be able to understate the congestion they are | |||
causing. Whereas, if a network deploys an egress dropper, it can | causing. Whereas, if a network deploys an egress dropper, it can | |||
know how much congestion other networks are dumping into it. And | know how much congestion other networks are dumping into it, and | |||
apply penalties or charges accordingly. So, whether or not a | apply penalties or charges accordingly. So, whether or not a | |||
network polices its own sources at ingress, it is in its interests | network polices its own sources at ingress, it is in its interests | |||
to deploy an egress dropper. | to deploy an egress dropper. | |||
Host support: | Host support: | |||
In the above deployment scenario, host operating system support | In the above deployment scenario, host operating system support | |||
for re-ECN came about through the cellular operators demanding it | for re-ECN came about through the cellular operators demanding it | |||
in device standards (i.e. 3GPP). Of course, increasingly, mobile | in device standards (i.e. 3GPP). Of course, increasingly, mobile | |||
devices are being built to support multiple wireless technologies. | devices are being built to support multiple wireless technologies. | |||
skipping to change at page 60, line 7 | skipping to change at page 60, line 25 | |||
the motivator, but it seems optimistic to expect such a level of | the motivator, but it seems optimistic to expect such a level of | |||
joined-up thinking from today's communications industry. We | joined-up thinking from today's communications industry. We | |||
believe a single application alone must be a sufficient motivator. | believe a single application alone must be a sufficient motivator. | |||
In short, everyone gains from adding accountability to TCP/IP, | In short, everyone gains from adding accountability to TCP/IP, | |||
except the selfish or malicious. So, deployment incentives tend | except the selfish or malicious. So, deployment incentives tend | |||
to be strong. | to be strong. | |||
8. Architectural Rationale | 8. Architectural Rationale | |||
In the Internet's technical community the danger of not responding to | In the Internet's technical community, the danger of not responding | |||
congestion is well-understood, with its attendant risk of congestion | to congestion is well-understood, as well as its attendant risk of | |||
collapse [RFC3714]. However, many of the Internet's commercial | congestion collapse [RFC3714]. However, one side of the Internet's | |||
community consider that the very essence of IP is to provide open | commercial community considers that the very essence of IP is to | |||
access to the internetwork for all applications. Congestion is seen | provide open access to the internetwork for all applications. They | |||
as a symptom of over-conservative investment. And the goal of | see congestion as a symptom of over-conservative investment, and rely | |||
application design is to find novel ways to continue working despite | on revising application designs to find novel ways to keep | |||
congestion. They argue that the Internet was never intended to be | applications working despite congestion. They argue that the | |||
solely for TCP-friendly applications. Another side of the Internet's | Internet was never intended to be solely for TCP-friendly | |||
commercial community believe that it is no use providing a network | applications. Meanwhile, another side of the Internet's commercial | |||
for novel applications if it has insufficient capacity. And it will | community believes that it is worthwhile providing a network for | |||
always have insufficient capacity unless a greater share of | novel applications only if it has sufficient capacity, which can | |||
application revenues can be /assured/ for the infrastructure | happen only if a greater share of application revenues can be | |||
provider. Otherwise the major investments required will carry too | /assured/ for the infrastructure provider. Otherwise the major | |||
much risk and won't happen. | investments required would carry too much risk and wouldn't happen. | |||
The lesson articulated in [Tussle] is that we shouldn't embed our | The lesson articulated in [Tussle] is that we shouldn't embed our | |||
view on these arguments into the Internet at design time. Instead we | view on these arguments into the Internet at design time. Instead we | |||
should design the Internet so that the outcome of these arguments can | should design the Internet so that the outcome of these arguments can | |||
get decided at run-time. Re-ECN is designed in that spirit. Once | get decided at run-time. Re-ECN is designed in that spirit. Once | |||
the protocol is available, different network operators can choose how | the protocol is available, different network operators can choose how | |||
liberal they want to be in holding people accountable for the | liberal they want to be in holding people accountable for the | |||
congestion they cause. Some might boldly invest in capacity and not | congestion they cause. Some might boldly invest in capacity and not | |||
police its use at all, hoping that novel applications will result. | police its use at all, hoping that novel applications will result. | |||
Others might use re-ECN for fine-grained flow policing, expecting to | Others might use re-ECN for fine-grained flow policing, expecting to | |||
skipping to change at page 62, line 39 | skipping to change at page 63, line 13 | |||
the network layer to modify the next guess. | the network layer to modify the next guess. | |||
9. Related Work | 9. Related Work | |||
{Due to lack of time, this section is incomplete. The reader is | {Due to lack of time, this section is incomplete. The reader is | |||
referred to the Related Work section of [Re-fb] for a brief selection | referred to the Related Work section of [Re-fb] for a brief selection | |||
of related ideas.} | of related ideas.} | |||
9.1. Policing Rate Response to Congestion | 9.1. Policing Rate Response to Congestion | |||
ATM network elements send congestion back-pressure messages [ITU- | ATM network elements send congestion back-pressure | |||
T.I.371] along each connection, duplicating any end to end feedback | messages [ITU-T.I.371] along each connection, duplicating any end to | |||
because they don't trust it. On the other hand, re-ECN ensures | end feedback because they don't trust it. On the other hand, re-ECN | |||
information in forwarded packets can be used for congestion | ensures information in forwarded packets can be used for congestion | |||
management without requiring a connection-oriented architecture and | management without requiring a connection-oriented architecture and | |||
re-using the overhead of fields that are already set aside for end to | re-using the overhead of fields that are already set aside for end to | |||
end congestion control (and routing loop detection in the case of re- | end congestion control (and routing loop detection in the case of re- | |||
TTL in Appendix F). | TTL in Appendix F). | |||
We borrowed ideas from policers in the literature [pBox],[XCHOKe], | We borrowed ideas from policers in the literature [pBox],[XCHOKe], | |||
AFD etc. for our rate equation policer. However, without the benefit | AFD etc. for our rate equation policer. However, without the benefit | |||
of re-ECN they don't police the correct rate for the condition of | of re-ECN they don't police the correct rate for the condition of | |||
their path. They detect unusually high /absolute/ rates, but only | their path. They detect unusually high /absolute/ rates, but only | |||
while the policer itself is congested, because they work by detecting | while the policer itself is congested, because they work by detecting | |||
skipping to change at page 63, line 25 | skipping to change at page 63, line 47 | |||
accidental side-effect. They actually punish traffic that fills | accidental side-effect. They actually punish traffic that fills | |||
troughs as much as traffic that causes peaks in utilisation. In | troughs as much as traffic that causes peaks in utilisation. In | |||
practice network operators need to be able to allocate service by | practice network operators need to be able to allocate service by | |||
cost during congestion, and by value at other times. | cost during congestion, and by value at other times. | |||
9.2. Congestion Notification Integrity | 9.2. Congestion Notification Integrity | |||
The choice of two ECT code-points in the ECN field [RFC3168] | The choice of two ECT code-points in the ECN field [RFC3168] | |||
permitted future flexibility, optionally allowing the sender to | permitted future flexibility, optionally allowing the sender to | |||
encode the experimental ECN nonce [RFC3540] in the packet stream. | encode the experimental ECN nonce [RFC3540] in the packet stream. | |||
This mechanism has since been included in the specifications of DCCP | ||||
[RFC4340]. | ||||
The ECN nonce is an elegant scheme that allows the sender to detect | The ECN nonce is an elegant scheme that allows the sender to detect | |||
if someone in the feedback loop tries to claim no congestion was | if someone in the feedback loop - the receiver especially - tries to | |||
experienced when it fact it was (whether drop or ECN marking). The | claim no congestion was experienced when in fact congestion lead to | |||
sender chooses between the two ECT codepoints in a pseudo-random | packet drops or ECN marks. For each packet it sends, the sender | |||
sequence. Then, whenever the network marks a packet with CE, to deny | chooses between the two ECT codepoints in a pseudo-random sequence. | |||
the congestion happened, the cheater would have to guess which ECT | Then, whenever the network marks a packet with CE, if the receiver | |||
codepoint was overwritten, with only a 50:50 chance of being correct | wants to deny congestion happened, she has to guess which ECT | |||
each time. | codepoint was overwritten. She has only a 50:50 chance of being | |||
correct each time she denies a congestion mark or a drop, which | ||||
ultimately will give her away. | ||||
The assumption behind the ECN nonce is that a sender will want to | The purpose of a network-layer nonce has to be the protection of the | |||
detect whether a receiver is suppressing congestion feedback. This | network in the first place, while a transport-layer nonce had better | |||
is only true if the sender's interests are aligned with the | be used to protect the sender from cheating receivers. Now, the | |||
network's, or with the community of users as a whole. This may be | assumption behind the ECN nonce is that a sender will want to detect | |||
true for certain large senders, who are under close scrutiny and have | whether a receiver is suppressing congestion feedback. This is only | |||
a reputation to maintain. But we have to deal with a more hostile | true if the sender's interests are aligned with the network's, or | |||
world, where traffic may be dominated by peer-to-peer transfers, | with the community of users as a whole. This may be true for certain | |||
rather than downloads from a few popular sites. Often the `natural' | large senders, who are under close scrutiny and have a reputation to | |||
self-interest of a sender is not aligned with the interests of other | maintain. But we have to deal with a more hostile world, where | |||
traffic may be dominated by peer-to-peer transfers, rather than | ||||
downloads from a few popular sites. Often the `natural' self- | ||||
interest of a sender is not aligned with the interests of other | ||||
users. It often wishes to transfer data quickly to the receiver as | users. It often wishes to transfer data quickly to the receiver as | |||
much as the receiver wants the data quickly. | much as the receiver wants the data quickly. | |||
In contrast, the re-ECN protocol enables policing of an agreed rate- | In contrast, the re-ECN protocol enables policing of an agreed rate- | |||
response to congestion (e.g. TCP-friendliness) at the sender's | response to congestion (e.g. TCP-friendliness) at the sender's | |||
interface with the internetwork. It also ensures downstream networks | interface with the internetwork. It also ensures downstream networks | |||
can police their upstream neighbours, to encourage them to police | can police their upstream neighbours, to encourage them to police | |||
their users in turn. But most importantly, it requires the sender to | their users in turn. But most importantly, it requires the sender to | |||
declare path congestion to the network and it can remove traffic at | declare path congestion to the network and it can remove traffic at | |||
the egress if this declaration is dishonest. So it can police | the egress if this declaration is dishonest. So it can police | |||
skipping to change at page 67, line 22 | skipping to change at page 68, line 5 | |||
[RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | |||
S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | |||
Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | |||
S., Wroclawski, J., and L. Zhang, "Recommendations on | S., Wroclawski, J., and L. Zhang, "Recommendations on | |||
Queue Management and Congestion Avoidance in the | Queue Management and Congestion Avoidance in the | |||
Internet", RFC 2309, April 1998. | Internet", RFC 2309, April 1998. | |||
[RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion | [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion | |||
Control", RFC 2581, April 1999. | Control", RFC 2581, April 1999. | |||
[RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C., | ||||
Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M., | ||||
Zhang, L., and V. Paxson, "Stream Control Transmission | ||||
Protocol", RFC 2960, October 2000. | ||||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
RFC 3168, September 2001. | RFC 3168, September 2001. | |||
[RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's | [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's | |||
Initial Window", RFC 3390, October 2002. | Initial Window", RFC 3390, October 2002. | |||
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram | |||
Congestion Notification (ECN) Signaling with Nonces", | Congestion Control Protocol (DCCP)", RFC 4340, March 2006. | |||
RFC 3540, June 2003. | ||||
[RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion | ||||
Control Protocol (DCCP) Congestion Control ID 2: TCP-like | ||||
Congestion Control", RFC 4341, March 2006. | ||||
[RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for | ||||
Datagram Congestion Control Protocol (DCCP) Congestion | ||||
Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, | ||||
March 2006. | ||||
15.2. Informative References | 15.2. Informative References | |||
[ARI05] Adams, J., Roberts, L., and A. IJsselmuiden, "Changing the | [ARI05] Adams, J., Roberts, L., and A. IJsselmuiden, "Changing the | |||
Internet to Support Real-Time Content Supply from a Large | Internet to Support Real-Time Content Supply from a Large | |||
Fraction of Broadband Residential Users", BT Technology | Fraction of Broadband Residential Users", BT Technology | |||
Journal (BTTJ) 23(2), April 2005. | Journal (BTTJ) 23(2), April 2005. | |||
[Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the | [Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the | |||
assumptions underlying mechanism design for the Internet", | assumptions underlying mechanism design for the Internet", | |||
skipping to change at page 69, line 28 | skipping to change at page 70, line 25 | |||
[RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission | [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission | |||
Timer", RFC 2988, November 2000. | Timer", RFC 2988, November 2000. | |||
[RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", | [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", | |||
RFC 3124, June 2001. | RFC 3124, June 2001. | |||
[RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", | [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", | |||
RFC 3514, April 2003. | RFC 3514, April 2003. | |||
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | ||||
Congestion Notification (ECN) Signaling with Nonces", | ||||
RFC 3540, June 2003. | ||||
[RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | |||
Control for Voice Traffic in the Internet", RFC 3714, | Control for Voice Traffic in the Internet", RFC 3714, | |||
March 2004. | March 2004. | |||
[RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram | ||||
Congestion Control Protocol (DCCP)", RFC 4340, March 2006. | ||||
[Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN | [Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN | |||
on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01 | on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01 | |||
(work in progress), March 2006. | (work in progress), March 2006. | |||
[Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., | [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., | |||
Salvatori, A., Soppera, A., and M. Koyabe, "Policing | Salvatori, A., Soppera, A., and M. Koyabe, "Policing | |||
Congestion Response in an Internetwork Using Re-Feedback", | Congestion Response in an Internetwork Using Re-Feedback", | |||
ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// | ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// | |||
www.acm.org/sigs/sigcomm/sigcomm2005/ | www.acm.org/sigs/sigcomm/sigcomm2005/ | |||
techprog.html#session8>. | techprog.html#session8>. | |||
skipping to change at page 70, line 31 | skipping to change at page 71, line 28 | |||
Protocols (ICNP-02) , November 2002, | Protocols (ICNP-02) , November 2002, | |||
<http://www.cc.gatech.edu/~akumar/xchoke.pdf>. | <http://www.cc.gatech.edu/~akumar/xchoke.pdf>. | |||
[pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End | [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End | |||
Congestion Control in the Internet", IEEE/ACM Transactions | Congestion Control in the Internet", IEEE/ACM Transactions | |||
on Networking 7(4) 458--472, August 1999, | on Networking 7(4) 458--472, August 1999, | |||
<http://www.aciri.org/floyd/end2end-paper.html>. | <http://www.aciri.org/floyd/end2end-paper.html>. | |||
Appendix A. Precise Re-ECN Protocol Operation | Appendix A. Precise Re-ECN Protocol Operation | |||
{ToDo: fix this} | ||||
The protocol operation described in Section 3.3 was an approximation. | The protocol operation described in Section 3.3 was an approximation. | |||
In fact, standard ECN router marking combines 1% and 2% marking into | In fact, standard ECN router marking combines 1% and 2% marking into | |||
slightly less than 3% whole-path marking, because routers | slightly less than 3% whole-path marking, because routers | |||
deliberately mark CE whether or not it has already been marked by | deliberately mark CE whether or not it has already been marked by | |||
another router upstream. So the combined marking fraction would | another router upstream. So the combined marking fraction would | |||
actually be 100% - (100% - 1%)(100% - 2%) = 2.98%. | actually be 100% - (100% - 1%)(100% - 2%) = 2.98%. | |||
To generalise this we will need some notation. | To generalise this we will need some notation. | |||
o j represents the index of each resource (typically queues) along a | o j represents the index of each resource (typically queues) along a | |||
skipping to change at page 74, line 12 | skipping to change at page 75, line 12 | |||
defines this combination as a non-ECN-setup SYN ACK, which remains | defines this combination as a non-ECN-setup SYN ACK, which remains | |||
true for vanilla and Nonce ECTs. But for re-ECN we define it as a | true for vanilla and Nonce ECTs. But for re-ECN we define it as a | |||
Re-ECN-setup SYN ACK. We didn't use a SYN ACK with both CWR and | Re-ECN-setup SYN ACK. We didn't use a SYN ACK with both CWR and | |||
ECE cleared to 0 because that would be the likely response from | ECE cleared to 0 because that would be the likely response from | |||
most Not-ECT receivers. And we didn't use a SYN ACK with both CWR | most Not-ECT receivers. And we didn't use a SYN ACK with both CWR | |||
and ECE set to 1 either, as at least one broken receiver | and ECE set to 1 either, as at least one broken receiver | |||
implementation echoes whatever flags were in the SYN into its SYN | implementation echoes whatever flags were in the SYN into its SYN | |||
ACK. Therefore we define a Re-ECN-setup SYN ACK as one with CWR=1 | ACK. Therefore we define a Re-ECN-setup SYN ACK as one with CWR=1 | |||
& ECE=0. | & ECE=0. | |||
Choice of two alternative SYN ACKs: the NS flag may take either value | Choice of two alternative SYN ACKs: the NS flag may take either | |||
in a Re-ECN-setup SYN ACK. Section 5.4 REQUIRES that a Re-ECT | value in a Re-ECN-setup SYN ACK. Section 5.4 REQUIRES that a Re- | |||
server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to echo | ECT server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to | |||
congestion experienced (CE) on the initial SYN. Otherwise a Re- | echo congestion experienced (CE) on the initial SYN. Otherwise a | |||
ECN-setup SYN ACK MUST be returned with NS=0. The only current | Re-ECN-setup SYN ACK MUST be returned with NS=0. The only current | |||
known use of the NS flag in a SYN ACK is to indicate support for | known use of the NS flag in a SYN ACK is to indicate support for | |||
the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1. | the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1. | |||
Given the ECN nonce MUST NOT be used for a RECN mode connection, a | Given the ECN nonce MUST NOT be used for a RECN mode connection, a | |||
Re-ECN-setup SYN ACK can use either setting of the NS flag without | Re-ECN-setup SYN ACK can use either setting of the NS flag without | |||
any risk of confusion, because the CWR & ECE flags will be | any risk of confusion, because the CWR & ECE flags will be | |||
reversed relative to those used by an ECN nonce SYN ACK. | reversed relative to those used by an ECN nonce SYN ACK. | |||
Appendix D. Packet Marking During Flow Start | Appendix D. Packet Marking During Flow Start | |||
{ToDo: Write up proof that sender should mark FNE on first and third | {ToDo: Write up proof that sender should mark FNE on first and third | |||
skipping to change at page 81, line 5 | skipping to change at page 81, line 37 | |||
account from the subset I. Then the weighted mean of all these | account from the subset I. Then the weighted mean of all these | |||
samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I} | samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I} | |||
V_{bI}. | V_{bI}. | |||
If V_b is the result of the bulk accounting algorithm over the | If V_b is the result of the bulk accounting algorithm over the | |||
accounting period (Appendix H.1) it can be inflated by this factor | accounting period (Appendix H.1) it can be inflated by this factor | |||
a_S to get a good unbiased estimate of the volume of downstream | a_S to get a good unbiased estimate of the volume of downstream | |||
congestion over the accounting period a_S.V_b, without being polluted | congestion over the accounting period a_S.V_b, without being polluted | |||
by the effect of persistently negative flows. | by the effect of persistently negative flows. | |||
Appendix I. Argument for holding back the ECN nonce | ||||
The ECN nonce is a mechanism that allows a /sending/ transport to | ||||
detect if drop or ECN marking at a congested router has been | ||||
suppressed by a node somewhere in the feedback loop---another router | ||||
or the receiver. | ||||
Space for the ECN nonce was set aside in [RFC3168] (currently | ||||
proposed standard) while the full nonce mechanism is specified in RFC | ||||
3540 (currently experimental). The specifications for [RFC4340] | ||||
(currently proposed standard) requires that "Each DCCP sender SHOULD | ||||
set ECN Nonces on its packets...". It also mandates as a requirement | ||||
for all CCID profiles that "Any newly defined acknowledgement | ||||
mechanism MUST include a way to transmit ECN Nonce Echoes back to the | ||||
sender.", therefore: | ||||
o The CCID profile for TCP-like Congestion Control [RFC4341] | ||||
(currently proposed standard) says "The sender will use the ECN | ||||
Nonce for data packets, and the receiver will echo those nonces in | ||||
its Ack Vectors." | ||||
o The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342] | ||||
recommends that "The sender [use] Loss Intervals options' ECN | ||||
Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to | ||||
probabilistically verify that the receiver is correctly reporting | ||||
all dropped or marked packets." | ||||
The ECN nonce is used for three types of functions: | ||||
o if the sender wants to ensure the integrity of the information | ||||
about packet drops, | ||||
o if the sending transport chooses to act in the interests of a | ||||
congested router, | ||||
o if the sending transport wants to allocate its own resources in | ||||
proportion to the rates that each network path can sustain, based | ||||
on congestion control. | ||||
However, when the nonce is used to protect the integrity of | ||||
information about packet drops, rather than ECN marks, a transport | ||||
layer nonce will always be sufficient (because a drop loses the | ||||
transport header as well as the ECN field in the network header), | ||||
which would avoid using scarce IP header codepoint space. Similarly, | ||||
a transport layer nonce would protect against a receiver sending | ||||
early acknowledgements. | ||||
The other two functions need the ECN nonce to be in the network | ||||
layer, but both require rather optimistic trust assumptions in order | ||||
to be useful. If the sending transport chooses to act in the | ||||
interests of a congested router, it can reduce its rate if it detects | ||||
some malicious party in the feedback loop may be suppressing ECN | ||||
feedback. But it would only be useful to a router when /all/ senders | ||||
using the router are trusted to act in the router's interest. | ||||
In the end, the only essential use of a network layer nonce is when | ||||
sending transports (e.g. large servers) want to allocate their /own/ | ||||
resources in proportion to the rates that each network path can | ||||
sustain, based on congestion control. In that case, the nonce allows | ||||
senders to be assured that they aren't being duped into giving more | ||||
of their own resources to a particular flow. And if congestion | ||||
suppression is detected, the sending transport can rate limit the | ||||
offending connection to protect its own resources. Certainly, this | ||||
is a useful function, but the IETF should carefully decide whether | ||||
such a single, very specific case warrants IP header space. | ||||
In contrast, re-ECN allows all routers to fully protect themselves | ||||
from such attacks, without having to trust anyone - senders, | ||||
receivers, neighbouring networks. Re-ECN is therefore proposed in | ||||
preference to the ECN nonce on the basis that it addresses the | ||||
generic problem of accountability for congestion of a network's | ||||
resources at the IP layer. | ||||
Delaying the ECN nonce is justified because the applicability of the | ||||
ECN nonce seems too limited for it to consume a two-bit codepoint in | ||||
the IP header. | ||||
Moreover, while we have re-designed the re-ECN codepoints so that | ||||
they do not prevent the ECN nonce progressing, the same is not true | ||||
the other way round. If the ECN nonce started to see some deployment | ||||
(perhaps because it was blessed with proposed standard status), | ||||
incremental deployment of re-ECN would effectively be impossible, | ||||
because re-ECN marking fractions at inter-domain borders would be | ||||
polluted by unknown levels of nonce traffic. | ||||
The authors are aware that re-ECN must prove it has the potential it | ||||
claims if it is to displace the nonce. Therefore, every effort has | ||||
been made to complete a comprehensive specification of re-ECN so that | ||||
its potential can be assessed. We therefore seek the opinion of the | ||||
Internet community on whether the re-ECN protocol is sufficiently | ||||
useful to warrant standards action. | ||||
Authors' Addresses | Authors' Addresses | |||
Bob Briscoe | Bob Briscoe | |||
BT & UCL | BT & UCL | |||
B54/77, Adastral Park | B54/77, Adastral Park | |||
Martlesham Heath | Martlesham Heath | |||
Ipswich IP5 3RE | Ipswich IP5 3RE | |||
UK | UK | |||
Phone: +44 1473 645196 | Phone: +44 1473 645196 | |||
skipping to change at page 82, line 5 | skipping to change at page 85, line 5 | |||
BT | BT | |||
B54/69, Adastral Park | B54/69, Adastral Park | |||
Martlesham Heath | Martlesham Heath | |||
Ipswich IP5 3RE | Ipswich IP5 3RE | |||
UK | UK | |||
Phone: +44 1473 646923 | Phone: +44 1473 646923 | |||
Email: martin.koyabe@bt.com | Email: martin.koyabe@bt.com | |||
URI: | URI: | |||
Intellectual Property Statement | Full Copyright Statement | |||
Copyright (C) The Internet Society (2006). | ||||
This document is subject to the rights, licenses and restrictions | ||||
contained in BCP 78, and except as set forth therein, the authors | ||||
retain all their rights. | ||||
This document and the information contained herein are provided on an | ||||
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | ||||
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | ||||
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | ||||
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
Intellectual Property | ||||
The IETF takes no position regarding the validity or scope of any | The IETF takes no position regarding the validity or scope of any | |||
Intellectual Property Rights or other rights that might be claimed to | Intellectual Property Rights or other rights that might be claimed to | |||
pertain to the implementation or use of the technology described in | pertain to the implementation or use of the technology described in | |||
this document or the extent to which any license under such rights | this document or the extent to which any license under such rights | |||
might or might not be available; nor does it represent that it has | might or might not be available; nor does it represent that it has | |||
made any independent effort to identify any such rights. Information | made any independent effort to identify any such rights. Information | |||
on the procedures with respect to rights in RFC documents can be | on the procedures with respect to rights in RFC documents can be | |||
found in BCP 78 and BCP 79. | found in BCP 78 and BCP 79. | |||
skipping to change at page 82, line 29 | skipping to change at page 85, line 45 | |||
such proprietary rights by implementers or users of this | such proprietary rights by implementers or users of this | |||
specification can be obtained from the IETF on-line IPR repository at | specification can be obtained from the IETF on-line IPR repository at | |||
http://www.ietf.org/ipr. | http://www.ietf.org/ipr. | |||
The IETF invites any interested party to bring to its attention any | The IETF invites any interested party to bring to its attention any | |||
copyrights, patents or patent applications, or other proprietary | copyrights, patents or patent applications, or other proprietary | |||
rights that may cover technology that may be required to implement | rights that may cover technology that may be required to implement | |||
this standard. Please address the information to the IETF at | this standard. Please address the information to the IETF at | |||
ietf-ipr@ietf.org. | ietf-ipr@ietf.org. | |||
Disclaimer of Validity | ||||
This document and the information contained herein are provided on an | ||||
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | ||||
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET | ||||
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, | ||||
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE | ||||
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED | ||||
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. | ||||
Copyright Statement | ||||
Copyright (C) The Internet Society (2006). This document is subject | ||||
to the rights, licenses and restrictions contained in BCP 78, and | ||||
except as set forth therein, the authors retain all their rights. | ||||
Acknowledgment | Acknowledgment | |||
Funding for the RFC Editor function is currently provided by the | Funding for the RFC Editor function is provided by the IETF | |||
Internet Society. | Administrative Support Activity (IASA). | |||
End of changes. 68 change blocks. | ||||
219 lines changed or deleted | 340 lines changed or added | |||
This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |