draft-briscoe-tsvwg-re-ecn-tcp-03.txt   draft-briscoe-tsvwg-re-ecn-tcp-04.txt 
Transport Area Working Group B. Briscoe Transport Area Working Group B. Briscoe
Internet-Draft BT & UCL Internet-Draft BT & UCL
Intended status: Informational A. Jacquet Intended status: Standards Track A. Jacquet
Expires: April 26, 2007 A. Salvatori Expires: January 10, 2008 A. Salvatori
M. Koyabe M. Koyabe
T. Moncaster
BT BT
October 23, 2006 July 09, 2007
Re-ECN: Adding Accountability for Causing Congestion to TCP/IP Re-ECN: Adding Accountability for Causing Congestion to TCP/IP
draft-briscoe-tsvwg-re-ecn-tcp-03 draft-briscoe-tsvwg-re-ecn-tcp-04
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 37 skipping to change at page 1, line 38
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on April 26, 2007. This Internet-Draft will expire on January 10, 2008.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2006). Copyright (C) The IETF Trust (2007).
Abstract Abstract
This document introduces a new protocol for explicit congestion This document introduces a new protocol for explicit congestion
notification (ECN), termed re-ECN, which can be deployed notification (ECN), termed re-ECN, which can be deployed
incrementally around unmodified routers. The protocol arranges an incrementally around unmodified routers. The protocol arranges an
extended ECN field in each packet so that, as it crosses any extended ECN field in each packet so that, as it crosses any
interface in an internetwork, it will carry a truthful prediction of interface in an internetwork, it will carry a truthful prediction of
congestion on the remainder of its path. Then the upstream party at congestion on the remainder of its path. Then the upstream party at
any trust boundary in the internetwork can be held responsible for any trust boundary in the internetwork can be held responsible for
skipping to change at page 2, line 21 skipping to change at page 2, line 22
changes required to transport protocols. It includes the changes changes required to transport protocols. It includes the changes
required to TCP both as an example and as a specification. It also required to TCP both as an example and as a specification. It also
gives examples of mechanisms that can use the protocol to ensure data gives examples of mechanisms that can use the protocol to ensure data
sources respond correctly to congestion. And it describes example sources respond correctly to congestion. And it describes example
mechanisms that ensure the dominant selfish strategy of both network mechanisms that ensure the dominant selfish strategy of both network
domains and end-points will be to set the extended ECN field domains and end-points will be to set the extended ECN field
honestly. honestly.
Authors' Statement: Status (to be removed by the RFC Editor) Authors' Statement: Status (to be removed by the RFC Editor)
This document is posted as an Internet-Draft with the intent (at
least that of the authors) to eventually progress to standards track.
Although the re-ECN protocol is intended to make a simple but far- Although the re-ECN protocol is intended to make a simple but far-
reaching change to the Internet architecture, the most immediate reaching change to the Internet architecture, the most immediate
priority for the authors is to delay any move of the ECN nonce to priority for the authors is to delay any move of the ECN nonce to
Proposed Standard status. The argument for this position is Proposed Standard status. The argument for this position is
developed in Appendix I. developed in Appendix I.
Changes from previous drafts (to be removed by the RFC Editor) Changes from previous drafts (to be removed by the RFC Editor)
From -00 to -01: Full diffs created using the rfcdiff tool are available at
<http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp>
Encoding of re-ECN wire protocol changed for reasons given in From -03 to -04 (current version):
Appendix B and consequently draft substantially re-written.
Substantial text added in sections on applications, incremental Clarified reasons for holding back ECN nonce (Section 3.2 &
deployment, architectural rationale and security considerations. Appendix I).
Clarified Figure 1.
Added Section 4.1.1.1 on equivalence of drops and ECN marks.
Improved precision of Section 5.6 on IP in IP tunnels.
Explained the RTT fairness is possible to enforce, but unlikely to
be required (Section 6.1.3 & Appendix F).
Explained that bulk per-user policing should be adequate but per-
flow policing is also possible if desired, though it is not likely
to be necessary (Section 6.1.5 & Appendix G).
Reinforced need for passive policing at inter-domain borders to
enable all-optical networking (Section 6.1.6).
Minor editorial changes throughout.
From -02 to -03:
Started guidelines for re-ECN support in DCCP and SCTP.
Added annex on limitations of nonce mechanism.
Minor editorial changes throughout.
From -01 to -02: From -01 to -02:
Explanation on informal terminology in Section 3.4 clarified. Explanation on informal terminology in Section 3.4 clarified.
IPv6 wire protocol encoding added (Section 5.2). IPv6 wire protocol encoding added (Section 5.2).
Text on (non-)issues with tunnels, encryption and link layer Text on (non-)issues with tunnels, encryption and link layer
congestion notification added (Section 5.6 & Section 5.7). congestion notification added (Section 5.6 & Section 5.7).
Section added giving evolvability arguments against encouraging Section added giving evolvability arguments against encouraging
bottleneck policing (Section 6.1.2). And text on re-ECN's bottleneck policing (Section 6.1.2). And text on re-ECN's
evolvability by design added to Section 6.1.3 evolvability by design added to Section 6.1.3
Text on inter-domain policing (Section 6.1.6) and inter-domain Text on inter-domain policing (Section 6.1.6) and inter-domain
fail-safes (Section 6.1.7) added. fail-safes (Section 6.1.7) added.
From -02 to -03: From -00 to -01:
Started guidelines for re-ECN support in DCCP and SCTP.
Added annex on limitations of nonce mechanism. Encoding of re-ECN wire protocol changed for reasons given in
Appendix B and consequently draft substantially re-written.
Minor editorial changes throughout. Substantial text added in sections on applications, incremental
deployment, architectural rationale and security considerations.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6
2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7
3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8
3.1. Background and Applicability . . . . . . . . . . . . . . . 8 3.1. Background and Applicability . . . . . . . . . . . . . . . 8
3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or 3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or
v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 10 3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 11
3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 12 3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 13
4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15
4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16 4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16
4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or 4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or
Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 18 Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 20
4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 20 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 21
4.1.4. Extended ECN (EECN) Field Settings during Flow 4.1.4. Extended ECN (EECN) Field Settings during Flow
Start or after Idle Periods . . . . . . . . . . . . . 21 Start or after Idle Periods . . . . . . . . . . . . . 23
4.1.5. Pure ACKS, Retransmissions, Window Probes and 4.1.5. Pure ACKS, Retransmissions, Window Probes and
Partial ACKs . . . . . . . . . . . . . . . . . . . . . 25 Partial ACKs . . . . . . . . . . . . . . . . . . . . . 26
4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 26 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 27
4.2.1. General Guidelines for Adding Re-ECN to Other 4.2.1. General Guidelines for Adding Re-ECN to Other
Transports . . . . . . . . . . . . . . . . . . . . . . 26 Transports . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 26 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 28
4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 27 4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 28
4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 27 4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 28
5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 27 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 28
5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 27 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 28
5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 28 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 30
5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 30 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 31
5.4. Justification for Setting the First SYN to FNE . . . . . . 31 5.4. Justification for Setting the First SYN to FNE . . . . . . 32
5.5. Control and Management . . . . . . . . . . . . . . . . . . 32 5.5. Control and Management . . . . . . . . . . . . . . . . . . 33
5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 32 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 33
5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 33 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 34
5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 33 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 34
5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 34 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 35
6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 35 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 36
6.1. Policing Congestion Response . . . . . . . . . . . . . . . 35 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 36
6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 35 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 36
6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 36 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 37
6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 37 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 38
6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 44 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 45
6.1.5. Rate Policing . . . . . . . . . . . . . . . . . . . . 45 6.1.5. Policing . . . . . . . . . . . . . . . . . . . . . . . 47
6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 47 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 48
6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 51 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 52
6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 51 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 53
6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 51 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 53
6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 52 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 53
6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 53 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 54
6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 53 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 54
6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 53 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 54
6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 53 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 54
7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 54 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 55
7.1. Incremental Deployment Features . . . . . . . . . . . . . 54 7.1. Incremental Deployment Features . . . . . . . . . . . . . 55
7.2. Incremental Deployment Incentives . . . . . . . . . . . . 55 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 57
8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 60 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 61
9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 63 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 64
9.1. Policing Rate Response to Congestion . . . . . . . . . . . 63 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 64
9.2. Congestion Notification Integrity . . . . . . . . . . . . 63 9.2. Congestion Notification Integrity . . . . . . . . . . . . 65
9.3. Identifying Upstream and Downstream Congestion . . . . . . 64 9.3. Identifying Upstream and Downstream Congestion . . . . . . 66
10. Security Considerations . . . . . . . . . . . . . . . . . . . 65 10. Security Considerations . . . . . . . . . . . . . . . . . . . 66
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 66 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 68
12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 67 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 68
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 67 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 68
14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 67 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 69
15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 67 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 69
15.1. Normative References . . . . . . . . . . . . . . . . . . . 67 15.1. Normative References . . . . . . . . . . . . . . . . . . . 69
15.2. Informative References . . . . . . . . . . . . . . . . . . 68 15.2. Informative References . . . . . . . . . . . . . . . . . . 70
Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 71 Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 73
Appendix B. Justification for Two Codepoints Signifying Zero Appendix B. Justification for Two Codepoints Signifying Zero
Worth Packets . . . . . . . . . . . . . . . . . . . . 72 Worth Packets . . . . . . . . . . . . . . . . . . . . 74
Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 74 Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 76
Appendix D. Packet Marking During Flow Start . . . . . . . . . . 75 Appendix D. Packet Marking During Flow Start . . . . . . . . . . 77
Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 75 Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 77
Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 75 Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 77
Appendix G. Policer Designs to ensure Congestion Appendix G. Policer Designs to ensure Congestion
Responsiveness . . . . . . . . . . . . . . . . . . . 76 Responsiveness . . . . . . . . . . . . . . . . . . . 78
G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 76 G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 78
G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 77 G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 79
Appendix H. Downstream Congestion Metering Algorithms . . . . . . 80 Appendix H. Downstream Congestion Metering Algorithms . . . . . . 82
H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 80 H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 82
H.2. Inflation Factor for Persistently Negative Flows . . . . . 80 H.2. Inflation Factor for Persistently Negative Flows . . . . . 83
Appendix I. Argument for holding back the ECN nonce . . . . . . . 81 Appendix I. Argument for holding back the ECN nonce . . . . . . . 84
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 83 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 85
Intellectual Property and Copyright Statements . . . . . . . . . . 85 Intellectual Property and Copyright Statements . . . . . . . . . . 88
1. Introduction 1. Introduction
This document aims: This document aims:
o To provide a complete specification of the addition of the re-ECN o To provide a complete specification of the addition of the re-ECN
protocol to IP and guidelines on how to add it to transport layer protocol to IP and guidelines on how to add it to transport layer
protocols, including a complete specification of re-ECN in TCP as protocols, including a complete specification of re-ECN in TCP as
an example; an example;
skipping to change at page 7, line 32 skipping to change at page 7, line 32
This document is structured as follows. First an overview of the re- This document is structured as follows. First an overview of the re-
ECN protocol is given (Section 3), outlining its attributes and ECN protocol is given (Section 3), outlining its attributes and
explaining conceptually how it works as a whole. The two main parts explaining conceptually how it works as a whole. The two main parts
of the document follow, as described above. That is, the protocol of the document follow, as described above. That is, the protocol
specification divided into transport (Section 4) and network specification divided into transport (Section 4) and network
(Section 5) layers, then the applications it can be put to, such as (Section 5) layers, then the applications it can be put to, such as
policing DDoS, QoS and congestion control (Section 6). Although policing DDoS, QoS and congestion control (Section 6). Although
these applications do not require standardisation themselves, they these applications do not require standardisation themselves, they
are described in a fair degree of detail in order to explain how re- are described in a fair degree of detail in order to explain how re-
ECN can be used. Given, re-ECN proposes to use the last undefined ECN can be used. Given re-ECN proposes to use the last undefined bit
bit in the IPv4 header, we felt it necessary to outline the potential in the IPv4 header, we felt it necessary to outline the potential
that re-ECN could release in return for being given that bit. that re-ECN could release in return for being given that bit.
Deployment issues discussed throughout the document are brought Deployment issues discussed throughout the document are brought
together in Section 7, which is followed by a brief section together in Section 7, which is followed by a brief section
explaining the somewhat subtle rationale for the design from an explaining the somewhat subtle rationale for the design from an
architectural perspective (Section 8). We end by describing related architectural perspective (Section 8). We end by describing related
work (Section 9), listing security considerations (Section 10) and work (Section 9), listing security considerations (Section 10) and
finally drawing conclusions (Section 12). finally drawing conclusions (Section 12).
2. Requirements notation 2. Requirements notation
skipping to change at page 8, line 49 skipping to change at page 8, line 49
congestion feedback. But Section 9.2 explains that it still gives no congestion feedback. But Section 9.2 explains that it still gives no
control over how fast the sender transmits as a result of the control over how fast the sender transmits as a result of the
feedback. On the other hand, re-ECN is designed both to ensure that feedback. On the other hand, re-ECN is designed both to ensure that
congestion is declared honestly and that the sender's rate responds congestion is declared honestly and that the sender's rate responds
appropriately. appropriately.
Re-ECN is based on a feedback arrangement called `re- Re-ECN is based on a feedback arrangement called `re-
feedback' [Re-fb]. The word is short for either receiver-aligned, feedback' [Re-fb]. The word is short for either receiver-aligned,
re-inserted or re-echoed feedback. But it actually works even when re-inserted or re-echoed feedback. But it actually works even when
no feedback is available. In fact it has been carefully designed to no feedback is available. In fact it has been carefully designed to
work for single datagram flows. Indeed, it even encourages work for single datagram flows. It also encourages aggregation of
aggregation of single packet flows by congestion control proxies. single packet flows by congestion control proxies. Then, even if the
traffic mix of the Internet were to become dominated by short
Then, even if the traffic mix of the Internet were to become messages, it would still be possible to control congestion
dominated by short messages, it would still be possible to control effectively and efficiently.
congestion effectively and efficiently.
Changing the Internet's feedback architecture seems to imply Changing the Internet's feedback architecture seems to imply
considerable upheaval. But re-ECN can be deployed incrementally at considerable upheaval. But re-ECN can be deployed incrementally at
the transport layer around unmodified routers using existing fields the transport layer around unmodified routers using existing fields
in IP (v4 or v6). However it does also require the last undefined in IP (v4 or v6). However it does also require the last undefined
bit in the IPv4 header, which it uses in combination with the 2-bit bit in the IPv4 header, which it uses in combination with the 2-bit
ECN field to create four new codepoints. Nonetheless, changes to IP ECN field to create four new codepoints. Nonetheless, changes to IP
routers are RECOMMENDED in order to improve resilience against DoS routers are RECOMMENDED in order to improve resilience against DoS
attacks. Similarly, re-ECN works best if both the sender and attacks. Similarly, re-ECN works best if both the sender and
receiver transports are re-ECN-capable, but it can work with just receiver transports are re-ECN-capable, but it can work with just
skipping to change at page 10, line 13 skipping to change at page 10, line 13
be defined in another specification (e.g. [Re-PCN]). be defined in another specification (e.g. [Re-PCN]).
Although the RE flag is a separate, single bit field, it can be read Although the RE flag is a separate, single bit field, it can be read
as an extension to the two-bit ECN field; the three concatenated bits as an extension to the two-bit ECN field; the three concatenated bits
in what we will call the extended ECN field (EECN) making eight in what we will call the extended ECN field (EECN) making eight
codepoints. We will use the RFC3168 names of the ECN codepoints to codepoints. We will use the RFC3168 names of the ECN codepoints to
describe settings of the ECN field when the RE flag setting is "don't describe settings of the ECN field when the RE flag setting is "don't
care", but we also define the following six extended ECN codepoint care", but we also define the following six extended ECN codepoint
names for when we need to be more specific. names for when we need to be more specific.
RFC3168 ECN defines uses for all four codepoints of the two-bit ECN
field. This memo widens the codepoint space to eight, and uses six
codepoints. One of re-ECN's codepoints is an alternative use of the
codepoint set aside in RFC3168 for the ECN nonce (ECT(1)).
Transports not using re-ECN can still use the ECN nonce, while those
using re-ECN do not need to as long as the sender is also checking
for transport protocol compliance [I-D.moncaster-tcpm-rcv-cheat].
The case for doing this is given in Appendix I. Two re-ECN
codepoints are given compatible uses to those defined in RFC3168
(Not-ECT and CE). The other codepoint used by RFC3168 (ECT(0)) isn't
used for re-ECN. Altogether this leave one codepoint of the eight
unused and available for future use.
+-------+------------+------+--------------+------------------------+ +-------+------------+------+--------------+------------------------+
| ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning |
| field | codepoint | flag | codepoint | | | field | codepoint | flag | codepoint | |
+-------+------------+------+--------------+------------------------+ +-------+------------+------+--------------+------------------------+
| 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable |
| | | | | transport | | | | | | transport |
| 00 | Not-ECT | 1 | FNE | Feedback not | | 00 | Not-ECT | 1 | FNE | Feedback not |
| | | | | established | | | | | | established |
| 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion |
| | | | | and RECT | | | | | | and RECT |
skipping to change at page 11, line 19 skipping to change at page 12, line 5
re-ECN sender will clear the RE flag to "0" in the next packet it re-ECN sender will clear the RE flag to "0" in the next packet it
sends. sends.
We chose to set and clear the RE flag this way round to ease We chose to set and clear the RE flag this way round to ease
incremental deployment (see Section 7.1). To avoid confusion we will incremental deployment (see Section 7.1). To avoid confusion we will
use the term `blanking' (rather than marking) when the RE flag is use the term `blanking' (rather than marking) when the RE flag is
cleared to "0". So, over a stream of packets, we will talk of the cleared to "0". So, over a stream of packets, we will talk of the
`RE blanking fraction' as the fraction of octets in packets with the `RE blanking fraction' as the fraction of octets in packets with the
RE flag cleared to "0". RE flag cleared to "0".
^ _ _ _ _
| / \ / \ / \ / \
| RE blanking fraction | S |--| 0 | - - - - - - - - | i |--| D |
3% |--------------------------------+===== \ _ / \ _ / \ _ / \ _ /
| | . . . .
2% | | ^ . . . .
| CE marking fraction | | . . . .
1% | +-----------------------+ | . RE blanking fraction . .
| | 3% |-------------------------------+=======
0% +----------------------------------------> | . . | .
2% | . . | .
| . . CE marking fraction | .
1% | . +----------------------+ .
| . | . .
0% +--------------------------------------->
^ 0 ^ i ^ resource index ^ 0 ^ i ^ resource index
| ^ | ^ | 0 ^ 1 ^ 2 observation points
0 | 1 | 2 observation points | |
1.00% 2.00% marking fraction 1.00% 2.00% marking fraction
Figure 1: A 2-Router Example (Imprecise) Figure 1: A 2-Router Example (Imprecise)
Figure 1 uses the two router example introduced earlier to illustrate Figure 1 uses a simple network to illustrate how re-ECN allows
why re-ECN allows routers to measure downstream congestion. The routers to measure downstream congestion. The horizontal axis
horizontal axis represents the index of each congestible resource represents the index of each congestible resource (typically queues)
(typically queues) along a path through the Internet. There may be along a path through the Internet. There may be many routers on the
many routers on the path, but we assume only two are currently path, but we assume only two are currently congested (those with
congested (those with resource index 0 and i). The two superimposed resource index 0 and i). The two superimposed plots show the
plots show the fraction of each extended ECN codepoint in a flow fraction of each extended ECN codepoint in a flow observed along this
observed along this path. Given about 3% of packets reaching the path. Given about 3% of packets reaching the destination are marked
destination are marked CE, in response to feedback the sender will CE, in response to feedback the sender will blank the RE flag in
blank the RE flag in about 3% of packets it sends. Then approximate about 3% of packets it sends. Then approximate downstream congestion
downstream congestion can be measured at the observation points shown can be measured at the observation points shown along the path by
along the path by subtracting the CE marking fraction from the RE subtracting the CE marking fraction from the RE blanking fraction, as
blanking fraction, as shown in the table below (Appendix A derives shown in the table below (Appendix A derives these approximations
these approximations from a precise analysis). from a precise analysis).
+-------------------+------------------------------+ +-------------------+------------------------------+
| Observation point | Approx downstream congestion | | Observation point | Approx downstream congestion |
+-------------------+------------------------------+ +-------------------+------------------------------+
| 0 | 3% - 0% = 3% | | 0 | 3% - 0% = 3% |
| 1 | 3% - 1% = 2% | | 1 | 3% - 1% = 2% |
| 2 | 3% - 3% = 0% | | 2 | 3% - 3% = 0% |
+-------------------+------------------------------+ +-------------------+------------------------------+
Table 2: Downstream Congestion Measured at Example Observation Points Table 2: Downstream Congestion Measured at Example Observation Points
skipping to change at page 16, line 12 skipping to change at page 17, line 6
be in RECN mode, at least not until it has confirmed that the other be in RECN mode, at least not until it has confirmed that the other
host is Re-ECT. host is Re-ECT.
4.1.1. RECN mode: Full re-ECN capable transport 4.1.1. RECN mode: Full re-ECN capable transport
In full RECN mode, for each half connection, both the sender and the In full RECN mode, for each half connection, both the sender and the
receiver each maintain an unsigned integer counter we will call ECC receiver each maintain an unsigned integer counter we will call ECC
(echo congestion counter). The receiver maintains a count, modulo 8, (echo congestion counter). The receiver maintains a count, modulo 8,
of how many times a CE marked packet has arrived during the half- of how many times a CE marked packet has arrived during the half-
connection. Once a RECN connection is established, the three TCP connection. Once a RECN connection is established, the three TCP
option flags (ECE, CWR & NS) used for ECN-related functions in option flags (ECE, CWR & NS) used for ECN-related functions in other
previous versions of ECN are used as a 3-bit field for the receiver versions of ECN are used as a 3-bit field for the receiver to
to repeatedly tell the sender the current value of ECC whenever it repeatedly tell the sender the current value of ECC whenever it sends
sends a TCP ACK. We will call this the echo congestion increment a TCP ACK. We will call this the echo congestion increment (ECI)
(ECI) field. This overloaded use of these 3 option flags as one field. This overloaded use of these 3 option flags as one 3-bit ECI
3-bit ECI field is shown in Figure 4. The actual definition of the field is shown in Figure 4. The actual definition of the TCP header,
TCP header, including the addition of support for the ECN nonce, is including the addition of support for the ECN nonce, is shown for
shown for comparison in Figure 3. This specification does not comparison in Figure 3. This specification does not redefine the
redefine the names of these three TCP option flags, it merely names of these three TCP option flags, it merely overloads them with
overloads them with another definition once a flow is established. another definition once a flow is established.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | N | C | E | U | A | P | R | S | F | | | | N | C | E | U | A | P | R | S | F |
| Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I |
| | | | R | E | G | K | H | T | N | N | | | | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 3: The (post-ECN Nonce) definition of bytes 13 and 14 of the Figure 3: The (post-ECN Nonce) definition of bytes 13 and 14 of the
TCP Header TCP Header
skipping to change at page 17, line 20 skipping to change at page 18, line 16
delayed-ACK, which would be necessary if ACK-withholding were delayed-ACK, which would be necessary if ACK-withholding were
implemented. implemented.
Sender Action in RECN Mode Sender Action in RECN Mode
On the arrival of every ACK, the sender compares the ECI field On the arrival of every ACK, the sender compares the ECI field
with its own ECC value, then replaces its local value with that with its own ECC value, then replaces its local value with that
from the ACK. The difference D is assumed to be the number of CE from the ACK. The difference D is assumed to be the number of CE
marked packets that arrived at the receiver since it sent the marked packets that arrived at the receiver since it sent the
previously received ACK (but see below for the sender's safety previously received ACK (but see below for the sender's safety
strategy). Whenever the ECI field increments by D (or D drops are strategy). Whenever the ECI field increments by D (and/or d drops
detected), the sender MUST clear the RE flag to "0" in the IP are detected), the sender MUST clear the RE flag to "0" in the IP
header of the next D data packets it sends, effectively re-echoing header of the next D' data packets it sends (where D' = D + d),
each single increment of ECI. Otherwise the data sender MUST send effectively re-echoing each single increment of ECI. Otherwise
all data packets with RE set to "1". the data sender MUST send all data packets with RE set to "1".
As a general rule, once a flow is established, as well as setting As a general rule, once a flow is established, as well as setting
or clearing the RE flag as above, a data sender in RECN mode MUST or clearing the RE flag as above, a data sender in RECN mode MUST
always set the ECN field to ECT(1). However, the settings of the always set the ECN field to ECT(1). However, the settings of the
extended ECN field during flow start are defined in Section 4.1.4. extended ECN field during flow start are defined in Section 4.1.4.
As we have already emphasised, the re-ECN protocol makes no As we have already emphasised, the re-ECN protocol makes no
changes and has no effect on the TCP congestion control algorithm. changes and has no effect on the TCP congestion control algorithm.
So, each increment of ECI (or detection of a drop) also triggers So, each increment of ECI (or detection of a drop) also triggers
the standard TCP congestion response, but with no more than one the standard TCP congestion response, but with no more than one
skipping to change at page 18, line 5 skipping to change at page 18, line 43
A TCP sender also acts as the receiver for the other half- A TCP sender also acts as the receiver for the other half-
connection. The host will maintain two ECC values S.ECC and R.ECC connection. The host will maintain two ECC values S.ECC and R.ECC
as sender and receiver respectively. Every TCP header sent by a as sender and receiver respectively. Every TCP header sent by a
host in RECN mode will also repeat the prevailing value of R.ECC host in RECN mode will also repeat the prevailing value of R.ECC
in its ECI field. If a sender in RECN mode has to retransmit a in its ECI field. If a sender in RECN mode has to retransmit a
packet due to a suspected loss, the re-transmitted packet MUST packet due to a suspected loss, the re-transmitted packet MUST
carry the latest prevailing value of R.ECC when it is re- carry the latest prevailing value of R.ECC when it is re-
transmitted, which will not necessarily be the one it carried transmitted, which will not necessarily be the one it carried
originally. originally.
4.1.1.1. Safety against Long Pure ACK Loss Sequences 4.1.1.1. Drops and Marks
Re-ECN is based on the ECN protocol [RFC3168] which in turn is
typically based on the RED algorithm [RFC2309]. This algorithm marks
packets as CE with a probability that increases as the size of the
router queue increases. Howeverif the queue becomes too full then it
will revert to dropping packets. Because of this it is important
that re-ECN treats each packet drop it detects as if it were actually
a CE mark. This ensures that it can continue to correctly echo
congestion even through a highly congested path.
In order to ensure that drops are correctly echoed the sender needs
to add the number of drops detected per RTT to the difference in ECI
value waiting to be echoed. A drop is defined as set out in
[RFC2581] -- if the connection is in slow start then a single
duplicate aknowledgement will be treated as an indication of a drop.
When the system is in the congestion avoidance stage then 3 duplicate
acknowledgements will be treated as a sign of a drop. In all cases,
if a re-transmission time-out occurs then that will be treatd as a
drop.
4.1.1.2. Safety against Long Pure ACK Loss Sequences
The ECI method was chosen for echoing congestion marking because a The ECI method was chosen for echoing congestion marking because a
re-ECN sender needs to know about every CE mark arriving at the re-ECN sender needs to know about every CE mark arriving at the
receiver, not just whether at least one arrives within a round trip receiver, not just whether at least one arrives within a round trip
time (which is all the ECE/CWR mechanism supported). And, as pure time (which is all the ECE/CWR mechanism supported). And, as pure
ACKs are not protected by TCP reliable delivery, we repeat the same ACKs are not protected by TCP reliable delivery, we repeat the same
ECI value in every ACK until it changes. Even if many ACKs in a row ECI value in every ACK until it changes. Even if many ACKs in a row
are lost, as soon as one gets through, the ECI field it repeats from are lost, as soon as one gets through, the ECI field it repeats from
previous ACKs that didn't get through will update the sender on how previous ACKs that didn't get through will update the sender on how
many CE marks arrived since the last ACK got through. many CE marks arrived since the last ACK got through.
skipping to change at page 22, line 24 skipping to change at page 23, line 36
means that Re-ECT server B MUST set FNE on a SYN ACK whether it is means that Re-ECT server B MUST set FNE on a SYN ACK whether it is
responding to a SYN from a Re-ECT client or from a client that is responding to a SYN from a Re-ECT client or from a client that is
merely ECN-capable. merely ECN-capable.
The original ECN specification [RFC3168] required SYNs and SYN ACKs The original ECN specification [RFC3168] required SYNs and SYN ACKs
to use the Not-ECT codepoint of the ECN field. The aim was to to use the Not-ECT codepoint of the ECN field. The aim was to
prevent well-known DoS attacks such as SYN flooding being able to prevent well-known DoS attacks such as SYN flooding being able to
gain from the advantage that ECN capability afforded over drop at gain from the advantage that ECN capability afforded over drop at
ECN-capable routers. ECN-capable routers.
For a SYN ACK, Kuzmanovic [I-D.ietf-tsvwg-ecnsyn] has shown that this For a SYN ACK, Kuzmanovic [I-D.ietf-tcpm-ecnsyn] has shown that this
caution was unnecessary, and proposes to allow a SYN ACK to be ECN- caution was unnecessary, and proposes to allow a SYN ACK to be ECN-
capable to improve performance. We have gone further by proposing to capable to improve performance. We have gone further by proposing to
make the initial SYN ECN-capable too. By stipulating the FNE make the initial SYN ECN-capable too. By stipulating the FNE
codepoint for the initial SYN, we comply with RFC3168 in word but not codepoint for the initial SYN, we comply with RFC3168 in word but not
in spirit, because we have indeed set the ECN field to Not-ECT, but in spirit, because we have indeed set the ECN field to Not-ECT, but
we have extended the ECN field with another bit. And it will be seen we have extended the ECN field with another bit. And it will be seen
(Section 5.3) that we have defined one setting of that bit to mean an (Section 5.3) that we have defined one setting of that bit to mean an
ECN-capable transport. Therefore, by proposing that the FNE ECN-capable transport. Therefore, by proposing that the FNE
codepoint MUST be used on the initial SYN of a connection, we have codepoint MUST be used on the initial SYN of a connection, we have
(deliberately) made the initial SYN ECN-capable. Section 5.4 (deliberately) made the initial SYN ECN-capable. Section 5.4
skipping to change at page 26, line 26 skipping to change at page 27, line 37
If the sender transport does not have sufficient feedback to even If the sender transport does not have sufficient feedback to even
estimate the path's CE rate, it SHOULD set FNE continuously. If the estimate the path's CE rate, it SHOULD set FNE continuously. If the
sender transport has some, perhaps stale, feedback to estimate that sender transport has some, perhaps stale, feedback to estimate that
the path's CE rate is nearly definitely less than E%, the transport the path's CE rate is nearly definitely less than E%, the transport
MAY blank RE in packets for E% of sent octets, and set the RECT MAY blank RE in packets for E% of sent octets, and set the RECT
codepoint for the remainder. codepoint for the remainder.
The following sections give guidelines on how re-ECN support could be The following sections give guidelines on how re-ECN support could be
added to RSVP or NSIS, to DCCP, and to SCTP - although separate added to RSVP or NSIS, to DCCP, and to SCTP - although separate
Internet drafts will be necessary to document the exact mechanics of Internet drafts will be necessary to document the exact mechanics of
re-ECN if each of these protocols. re-ECN in each of these protocols.
{ToDo: Give a brief outline of what would be expected for each of the {ToDo: Give a brief outline of what would be expected for each of the
following: following:
o UDP fire and forget (e.g. DNS) o UDP fire and forget (e.g. DNS)
o UDP streaming with no feedback o UDP streaming with no feedback
o UDP streaming with feedback o UDP streaming with feedback
} }
4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS
A separate I-D has been submitted [Re-PCN] describing how re-ECN can A separate I-D has been submitted [Re-PCN] describing how re-ECN can
be used in an edge-to-edge rather than end-to-end scenario. It can be used in an edge-to-edge rather than end-to-end scenario. It can
then be used by downstream networks to police whether upstream then be used by downstream networks to police whether upstream
networks are blocking new flow reservations when downstream networks are blocking new flow reservations when downstream
congestion is too high, even though the congestion is in other congestion is too high, even though the congestion is in other
operators' downstream networks. This relates to current work in operators' downstream networks. This relates to current IETF work on
progress on Admission Control over Diffserv using Pre-Congestion Admission Control over Diffserv using Pre-Congestion Notification
Notification, being reported to the IETF TSVWG [CL-deploy]. (PCN) [PCN-arch].
4.2.3. Guidelines for adding Re-ECN to DCCP 4.2.3. Guidelines for adding Re-ECN to DCCP
Beside adjusting the initial features negotiation sequence, operating Beside adjusting the initial features negotiation sequence, operating
re-ECN in DCCP could be achieved by defining a new option to be added re-ECN in DCCP [RFC4340] could be achieved by defining a new option
to acknowledgments, that would include a multibit field where the to be added to acknowledgments, that would include a multibit field
destination could copy its ECC. where the destination could copy its ECC.
4.2.4. Guidelines for adding Re-ECN to SCTP 4.2.4. Guidelines for adding Re-ECN to SCTP
Annex 1 in RFC4340 gives the specifications for SCTP to support ECN. Annex 1 in [RFC2960] gives the specifications for SCTP to support
Similar steps should be taken to support re-ECN. Beside adjusting ECN. Similar steps should be taken to support re-ECN. Beside
the initial features negotiation sequence, operating re-ECN in SCTP adjusting the initial features negotiation sequence, operating re-ECN
could be achieved by defining a new control chunk, that would include in SCTP could be achieved by defining a new control chunk, that would
a multibit field where the destination could copy its ECC include a multibit field where the destination could copy its ECC
5. Network Layer 5. Network Layer
5.1. Re-ECN IPv4 Wire Protocol 5.1. Re-ECN IPv4 Wire Protocol
The wire protocol of the ECN field in the IP header remains largely The wire protocol of the ECN field in the IP header remains largely
unchanged from [RFC3168]. However, an extension to the ECN field we unchanged from [RFC3168]. However, an extension to the ECN field we
call the RE (re-ECN extension) flag (Section 3.2) is defined in this call the RE (re-ECN extension) flag (Section 3.2) is defined in this
document. It doubles the extended ECN codepoint space, giving 8 document. It doubles the extended ECN codepoint space, giving 8
potential codepoints. The semantics of the extra codepoints are potential codepoints. The semantics of the extra codepoints are
skipping to change at page 29, line 8 skipping to change at page 30, line 14
5.2. Re-ECN IPv6 Wire Protocol 5.2. Re-ECN IPv6 Wire Protocol
For IPv6, this document proposes that the new RE control flag will be For IPv6, this document proposes that the new RE control flag will be
positioned as the first bit of the option field of a new Congestion positioned as the first bit of the option field of a new Congestion
hop by hop option header (Figure 6). hop by hop option header (Figure 6).
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Header | Hdr ext Len | Option Type | Option Len | | Next Header | Hdr ext Len | Option Type | Opt Length =4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| Reserved for future use | |R| Reserved for future use |
|E| | |E| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6: Definition of a New IPv6 Congestion Hop by Hop Option Figure 6: Definition of a New IPv6 Congestion Hop by Hop Option
Header containing the Re-ECN Extension (RE) Control Flag Header containing the Re-ECN Extension (RE) Control Flag
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
+-+-+-+-+-+-+-+-+- +-+-+-+-+-+-+-+-+-
skipping to change at page 33, line 19 skipping to change at page 34, line 19
not aware of. Otherwise, spoof messages could be sent by malicious not aware of. Otherwise, spoof messages could be sent by malicious
sources to slow down a sender (c.f. ICMP source quench). sources to slow down a sender (c.f. ICMP source quench).
However, the need for this message type is not yet confirmed, as we However, the need for this message type is not yet confirmed, as we
are considering how to prevent it being used by malicious senders to are considering how to prevent it being used by malicious senders to
scan for droppers and to test their threshold settings. {ToDo: scan for droppers and to test their threshold settings. {ToDo:
Complete this section.} Complete this section.}
5.5.2. Rate Response Control 5.5.2. Rate Response Control
The incentive framework of Section 6.1.3 implies there may be a need As discussed in Section 6.1.5 the sender's access operator will be
for a sender to send a request to an ingress policer asking that it expected to use bulk per-user policing, but they might choose to
be allowed to apply a non-default response to congestion (where TCP- introduce a per-flow policer. In cases where operators do introduce
friendly is assumed to be the default). This would require the per-flow policing, there may be a need for a sender to send a request
sender to know what message format(s) to use and to be able to to the ingress policer asking for permission to apply a non-default
discover how to address the policer. The required control response to congestion (where TCP-friendly is assumed to be the
protocol(s) are outside the scope of this document, but will require default). This would require the sender to know what message
definition elsewhere. format(s) to use and to be able to discover how to address the
policer. The required control protocol(s) are outside the scope of
this document, but will require definition elsewhere.
The policer is likely to be local to the sender and inline, probably The policer is likely to be local to the sender and inline, probably
at the ingress interface to the internetwork. So, discovery should at the ingress interface to the internetwork. So, discovery should
not be hard. A variety of control protocols already exist for some not be hard. A variety of control protocols already exist for some
widely used rate-responses to congestion. For instance DCCP widely used rate-responses to congestion. For instance DCCP
congestion control identifiers (CCIDs [RFC4340]) fulfil this role and congestion control identifiers (CCIDs [RFC4340]) fulfil this role and
so does QoS signalling (e.g. and RSVP request for controlled load so does QoS signalling (e.g. and RSVP request for controlled load
service is equivalent to a request for no rate response to service is equivalent to a request for no rate response to
congestion, but with admission control). congestion, but with admission control).
5.6. IP in IP Tunnels 5.6. IP in IP Tunnels
For re-ECN to work correctly through IP in IP tunnels, it needs For re-ECN to work correctly through IP in IP tunnels, it needs
slightly different tunnel handling to regular ECN [RFC3168]. slightly different tunnel handling to regular ECN [RFC3168].
Ideally, for re-ECN to work through a tunnel, the tunnel entry should Currently there is some incosistency between how the handling of IP
copy both the RE flag and the ECN field from the inner to the outer in IP tunnels is defined in [RFC3168] and how it is defined in
IP header. Then at the tunnel exit, any congestion marking of the [RFC4301], but re-ECN would work fine with the IPsec behaviour. This
outer ECN field should overwrite the inner ECN field (unless the inconsistency is addressed in a new Internet Draft [ECN-tunnel] that
inner field is Not-ECT in which case an alarm should be raised). The proposes to update RFC3168 tunnel behaviour to bring it into line
RE flag shouldn't change along a path, so the outer RE flag should be with IPsec. Ideally, for re-ECN to work through a tunnel, the tunnel
the same as the inner. If it isn't a management alarm should be entry should copy both the RE flag and the ECN field from the inner
raised. This behaviour is the same as the full-functionality variant to the outer IP header. Then at the tunnel exit, any congestion
of [RFC3168] at tunnel exit, but different at tunnel entry. marking of the outer ECN field should overwrite the inner ECN field
(unless the inner field is Not-ECT in which case an alarm should be
raised). The RE flag shouldn't change along a path, so the outer RE
flag should be the same as the inner. If it isn't a management alarm
should be raised. This behaviour is the same as the full-
functionality variant of [RFC3168] at tunnel exit, but different at
tunnel entry.
If tunnels are left as they are specified in [RFC3168], whether the If tunnels are left as they are specified in [RFC3168], whether the
limited or full-functionality variants are used, a problem arises limited or full-functionality variants are used, a problem arises
with re-ECN if a tunnel crosses an inter-domain boundary, because the with re-ECN if a tunnel crosses an inter-domain boundary, because the
difference between positive and negative markings will not be difference between positive and negative markings will not be
correctly accounted for. In a limited functionality ECN tunnel, the correctly accounted for. In a limited functionality ECN tunnel, the
flow will appear to be legacy traffic, and therefore may be wrongly flow will appear to be legacy traffic, and therefore may be wrongly
rate limited. In a full-functionality ECN tunnel, the result will rate limited. In a full-functionality ECN tunnel, the result will
depend whether the tunnel entry copies the inner RE flag to the outer depend whether the tunnel entry copies the inner RE flag to the outer
header or the RE flag in the outer header is always cleared. If the header or the RE flag in the outer header is always cleared. If the
former, the flow will tend to be too positive when accounted for at former, the flow will tend to be too positive when accounted for at
borders. If the latter, it will be too negative. borders. If the latter, it will be too negative. If the rules set
out in [ECN-tunnel] are followed then this will not be an issue.
{ToDo: A future version of this draft will discuss the necessary
changes to IP in IP tunnels in more depth.}
5.7. Non-Issues 5.7. Non-Issues
The following issues might seem to cause unfavourable interactions The following issues might seem to cause unfavourable interactions
with re-ECN, but we will explain why they don't: with re-ECN, but we will explain why they don't:
o Various link layers support explicit congestion notification, such o Various link layers support explicit congestion notification, such
as Frame Relay and ATM. Explicit congestion notification is as Frame Relay and ATM. Explicit congestion notification is
proposed to be added to other link layers, such as Ethernet proposed to be added to other link layers, such as Ethernet
(802.3ar Ethernet congestion management) and MPLS [ECN-MPLS]; (802.3ar Ethernet congestion management) and MPLS [ECN-MPLS];
skipping to change at page 35, line 31 skipping to change at page 36, line 37
6. Applications 6. Applications
6.1. Policing Congestion Response 6.1. Policing Congestion Response
6.1.1. The Policing Problem 6.1.1. The Policing Problem
The current Internet architecture trusts hosts to respond voluntarily The current Internet architecture trusts hosts to respond voluntarily
to congestion. Limited evidence shows that the large majority of to congestion. Limited evidence shows that the large majority of
end-points on the Internet comply with a TCP-friendly response to end-points on the Internet comply with a TCP-friendly response to
congestion. But telephony (and increasingly video) services over the congestion. But telephony (and increasingly video) services over the
best efforts Internet are attracting the interest of major commercial best effort Internet are attracting the interest of major commercial
operations. Most of these applications do not respond to congestion operations. Most of these applications do not respond to congestion
at all. Those that can switch to lower rate codecs, still have a at all. Those that can switch to lower rate codecs, still have a
lower bound below which they must become unresponsive to congestion. lower bound below which they must become unresponsive to congestion.
Of course, the Internet is intended to support many different Of course, the Internet is intended to support many different
application behaviours. But the problem is that this freedom can be application behaviours. But the problem is that this freedom can be
exercised irresponsibly. The greater problem is that we will never exercised irresponsibly. The greater problem is that we will never
be able to agree on where the boundary is between responsible and be able to agree on where the boundary is between responsible and
irresponsible. Therefore re-ECN is designed to allow different irresponsible. Therefore re-ECN is designed to allow different
networks to set their own view of the limit to irresponsibility, and networks to set their own view of the limit to irresponsibility, and
skipping to change at page 37, line 37 skipping to change at page 38, line 44
return address at a higher layer. return address at a higher layer.
6.1.3. Re-ECN Incentive Framework 6.1.3. Re-ECN Incentive Framework
The aim is to create an incentive environment that ensures optimal The aim is to create an incentive environment that ensures optimal
sharing of capacity despite everyone acting selfishly (including sharing of capacity despite everyone acting selfishly (including
lying and cheating). Of course, the mechanisms put in place for this lying and cheating). Of course, the mechanisms put in place for this
can lie dormant wherever co-operation is the norm. can lie dormant wherever co-operation is the norm.
Throughout this document we focus on path congestion. But some forms Throughout this document we focus on path congestion. But some forms
of fairness, particularly TCP's, also depend on round trip time. So, of fairness, particularly TCP's, also depend on round trip time. If
we also propose to measure downstream path delay using re-feedback. TCP-fairness is required, we also propose to measure downstream path
This proposal will be published in a very simple future draft, but delay using re-feedback. We give a simple outline of how this could
for now we give an outline in Appendix F. work in Appendix F. However, we do not expect this to be necessary,
as researchers tend to agree that only congestion control dynamics
need to depend on RTT, not the rate that the algorithm would converge
on after a period of stability.
Figure 8 sketches the incentive framework that we will describe piece Figure 8 sketches the incentive framework that we will describe piece
by piece throughout this section. We will do a first pass in by piece throughout this section. We will do a first pass in
overview, then return to each piece in detail. We re-use the earlier overview, then return to each piece in detail. We re-use the earlier
example of how downstream congestion is derived by subtracting example of how downstream congestion is derived by subtracting
upstream congestion from path congestion (Figure 1) but depict upstream congestion from path congestion (Figure 1) but depict
multiple trust boundaries to turn it into an internetwork. For multiple trust boundaries to turn it into an internetwork. For
clarity, only downstream congestion is shown (the difference between clarity, only downstream congestion is shown (the difference between
the two earlier plots). The graph displays downstream path the two earlier plots). The graph displays downstream path
congestion seen in a typical flow as it traverses an example path congestion seen in a typical flow as it traverses an example path
skipping to change at page 39, line 12 skipping to change at page 40, line 42
enhanced QoS), to some extent it will always be against the enhanced QoS), to some extent it will always be against the
sender's interest to comply. sender's interest to comply.
Ingress policing: But it is in all the network operators' interests Ingress policing: But it is in all the network operators' interests
to encourage fair congestion response, so that their investments to encourage fair congestion response, so that their investments
are employed to satisfy the most valuable demand. The re-ECN are employed to satisfy the most valuable demand. The re-ECN
protocol ensures packets carry the necessary information about protocol ensures packets carry the necessary information about
their own expected downstream congestion so that N1 can deploy a their own expected downstream congestion so that N1 can deploy a
policer at its ingress to check that S1 is complying with whatever policer at its ingress to check that S1 is complying with whatever
congestion control it should be using (Section 6.1.5). If N1 is congestion control it should be using (Section 6.1.5). If N1 is
extremely conservative it may police each flow, but it can choose extremely conservative it could police each flow, but it is likely
to just police the bulk amount of congestion each customer causes to just police the bulk amount of congestion each customer causes
without regard to flows, or if it is extremely liberal it need not without regard to flows, or if it is extremely liberal it need not
police congestion control at all. Whatever, it is always police congestion control at all. Whatever, it is always
preferable to police traffic at the very first ingress into an preferable to police traffic at the very first ingress into an
internetwork, before non-compliant traffic can cause any damage. internetwork, before non-compliant traffic can cause any damage.
Edge egress dropper: If the policer ensures the source has less Edge egress dropper: If the policer ensures the source has less
right to a high rate the higher it declares downstream congestion, right to a high rate the higher it declares downstream congestion,
the source has a clear incentive to understate downstream the source has a clear incentive to understate downstream
congestion. But, if flows of packets are understated when they congestion. But, if flows of packets are understated when they
skipping to change at page 40, line 41 skipping to change at page 42, line 21
at the egress of N2. Then N2 has an incentive either to police at the egress of N2. Then N2 has an incentive either to police
the congestion response of its own ingress traffic (from N1) or to the congestion response of its own ingress traffic (from N1) or to
emulate policing by applying penalties to N1 in turn on the basis emulate policing by applying penalties to N1 in turn on the basis
of congestion counted at their mutual boundary. In this recursive of congestion counted at their mutual boundary. In this recursive
way, the incentives for each flow to respond correctly to way, the incentives for each flow to respond correctly to
congestion trace back with each flow precisely to each source, congestion trace back with each flow precisely to each source,
despite the mechanism not recognising flows (see Section 6.2.2). despite the mechanism not recognising flows (see Section 6.2.2).
Inter-domain congestion charging diversity: Any two networks are Inter-domain congestion charging diversity: Any two networks are
free to agree any of a range of penalty regimes between themselves free to agree any of a range of penalty regimes between themselves
but they would only provide the right incentives if they were
within the following reasonable constraints. N2 should expect to within the following reasonable constraints. N2 should expect to
have to pay penalties to N4 where penalties monotonically increase have to pay penalties to N4 where penalties monotonically increase
with the volume of congestion and negative penalties are not with the volume of congestion and negative penalties are not
allowed. For instance, they may agree an SLA with tiered allowed. For instance, they may agree an SLA with tiered
congestion thresholds, where higher penalties apply the higher the congestion thresholds, where higher penalties apply the higher the
threshold that is broken. But the most obvious (and useful) form threshold that is broken. But the most obvious (and useful) form
of penalty is where N4 levies a charge on N2 proportional to the of penalty is where N4 levies a charge on N2 proportional to the
volume of downstream congestion N2 dumps into N4. In the volume of downstream congestion N2 dumps into N4. In the
explanation that follows, we assume this specific variant of explanation that follows, we assume this specific variant of
volume charging between networks - charging proportionate to the volume charging between networks - charging proportionate to the
skipping to change at page 41, line 14 skipping to change at page 42, line 43
We must make clear that we are not advocating that everyone should We must make clear that we are not advocating that everyone should
use this form of contract. We are well aware that the IETF tries use this form of contract. We are well aware that the IETF tries
to avoid standardising technology that depends on a particular to avoid standardising technology that depends on a particular
business model. And we strongly share this desire to encourage business model. And we strongly share this desire to encourage
diversity. But our aim is merely to show that border policing can diversity. But our aim is merely to show that border policing can
at least work with this one model, then we can assume that at least work with this one model, then we can assume that
operators might experiment with the metric in other models (see operators might experiment with the metric in other models (see
Section 6.1.6 for examples). Of course, operators are free to Section 6.1.6 for examples). Of course, operators are free to
complement this usage element of their charges with traditional complement this usage element of their charges with traditional
capacity charging, and we expect they will. capacity charging, and we expect they will as predicted by
economics.
No congestion charging to users: Bulk congestion penalties at trust No congestion charging to users: Bulk congestion penalties at trust
boundaries are passive and extremely simple, and lose none of boundaries are passive and extremely simple, and lose none of
their per-packet precision from one boundary to the next (unlike their per-packet precision from one boundary to the next (unlike
Diffserv all-address traffic conditioning agreements, which Diffserv all-address traffic conditioning agreements, which
dissipate their effectiveness across long topologies). But at any dissipate their effectiveness across long topologies). But at any
trust boundary, there is no imperative to use congestion charging. trust boundary, there is no imperative to use congestion charging.
Traditional traffic policing can be used, if the complexity and Traditional traffic policing can be used, if the complexity and
cost is preferred. In particular, at the boundary with end cost is preferred. In particular, at the boundary with end
customers (e.g. between S and N1), traffic policing will most customers (e.g. between S and N1), traffic policing will most
likely be more appropriate. Policer complexity is less of a likely be more appropriate. Policer complexity is less of a
concern at the edge of the network. And end-customers are known concern at the edge of the network. And end-customers are known
to be highly averse to the unpredictability of congestion to be highly averse to the unpredictability of congestion
charging. charging.
NOTE WELL: This document neither advocates nor requires congestion NOTE WELL: This document neither advocates nor requires congestion
charging for end customers and advocates but does not require charging for end customers and advocates but does not require
skipping to change at page 41, line 40 skipping to change at page 43, line 23
NOTE WELL: This document neither advocates nor requires congestion NOTE WELL: This document neither advocates nor requires congestion
charging for end customers and advocates but does not require charging for end customers and advocates but does not require
inter-domain congestion charging. inter-domain congestion charging.
Competitive discipline of inter-domain traffic engineering: With Competitive discipline of inter-domain traffic engineering: With
inter-domain congestion charging, a domain seems to have a inter-domain congestion charging, a domain seems to have a
perverse incentive to fake congestion; N2's profit depends on the perverse incentive to fake congestion; N2's profit depends on the
difference between congestion at its ingress (its revenue) and at difference between congestion at its ingress (its revenue) and at
its egress (its cost). So, overstating internal congestion seems its egress (its cost). So, overstating internal congestion seems
to increase profit. However, smart border routing [Smart_rtg] by to increase profit. However, smart border routing [Smart_rtg] by
N1 will bias its multipath routing towards the least cost routes. N1 will bias its routing towards the least cost routes. So, N2
So, N2 risks losing all its revenue to competitive routes if it risks losing all its revenue to competitive routes if it
overstates congestion (see Section 6.2.3). In other words, if N2 overstates congestion (see Section 6.2.3). In other words, if N2
is the least congested route, its ability to raise excess profits is the least congested route, its ability to raise excess profits
is limited by the congestion on the next least congested route. is limited by the congestion on the next least congested route.
This pressure on N2 to remain competitive is represented by the This pressure on N2 to remain competitive is represented by the
dotted downward arrow at the ingress to N2 in Figure 9. dotted downward arrow at the ingress to N2 in Figure 9.
Closing the loop: All the above elements conspire to trap everyone Closing the loop: All the above elements conspire to trap everyone
between two opposing pressures (the downward and upward arrows in between two opposing pressures (the downward and upward arrows in
Figure 8 & Figure 9), ensuring the downstream congestion metric Figure 8 & Figure 9), ensuring the downstream congestion metric
arrives at the destination neither above nor below zero. So, we arrives at the destination neither above nor below zero. So, we
skipping to change at page 42, line 24 skipping to change at page 44, line 7
superior to bottleneck policing or to any policing of different superior to bottleneck policing or to any policing of different
QoS for different flows. Even if all access networks choose to QoS for different flows. Even if all access networks choose to
conservatively police congestion per flow, each will want to conservatively police congestion per flow, each will want to
compete with the others to allow new responses to congestion for compete with the others to allow new responses to congestion for
new types of application. With re-ECN, each can introduce new new types of application. With re-ECN, each can introduce new
controls independently, without coordinating with other networks controls independently, without coordinating with other networks
and without having to standardise anything. But, as we have just and without having to standardise anything. But, as we have just
seen, by making inter-domain penalties proportionate to bulk seen, by making inter-domain penalties proportionate to bulk
downtream congestion, downstream networks can be agnostic to the downtream congestion, downstream networks can be agnostic to the
specific congestion response for each flow, but they can still specific congestion response for each flow, but they can still
apply more back-pressure the more liberal the ingress access apply more penalty the more liberal the ingress access network has
network has been in the response to congestion it allowed for each been in the response to congestion it allowed for each flow.
flow.
6.1.3.1. The Case against Classic Feedback 6.1.3.1. The Case against Classic Feedback
A system that produces an optimal outcome as a result of everyone's A system that produces an optimal outcome as a result of everyone's
selfish actions is extremely powerful. Especially one that enables selfish actions is extremely powerful. Especially one that enables
evolvability of congestion control. But why do we have to change to evolvability of congestion control. But why do we have to change to
re-ECN to achieve it? Can't classic congestion feedback (as used re-ECN to achieve it? Can't classic congestion feedback (as used
already by standard ECN) be arranged to provide similar incentives already by standard ECN) be arranged to provide similar incentives
and similar evolvability? Superficially it can. Kelly's seminal and similar evolvability? Superficially it can. Kelly's seminal
work showed how we can allow everyone the freedom to evolve whatever work showed how we can allow everyone the freedom to evolve whatever
congestion control behaviour is in their application's best interest congestion control behaviour is in their application's best interest
but still optimise the whole system of networks and users by placing but still optimise the whole system of networks and users by placing
a price on congestion to ensure responsible use of this a price on congestion to ensure responsible use of this
freedom [Evol_cc]). Kelly used ECN with its classic congestion freedom [Evol_cc]). Kelly used ECN with its classic congestion
feedback model as the mechanism to convey congestion price feedback model as the mechanism to convey congestion price
information. The mechanism was nearly identical to volume charging; information. The mechanism could be thought of as volume charging;
except only the volume of packets marked with congestion experienced except only the volume of packets marked with congestion experienced
(CE) was counted. (CE) was counted.
However, below we explain why relying on classic feedback /required/ However, below we explain why relying on classic feedback /required/
congestion charging to be used, while re-ECN achieves the same congestion charging to be used, while re-ECN achieves the same
powerful outcome (given it is built on Kelly's foundations), but does powerful outcome (given it is built on Kelly's foundations), but does
not /require/ congestion charging. In brief, the problem with not /require/ congestion charging. In brief, the problem with
classic feedback is that the incentives have to trace the indirect classic feedback is that the incentives have to trace the indirect
path back to the sender---the long way round the feedback loop. For path back to the sender---the long way round the feedback loop. For
example, if classic feedback were used in Figure 8, N2 would have had example, if classic feedback were used in Figure 8, N2 would have had
skipping to change at page 45, line 22 skipping to change at page 47, line 5
from the receiver. So, counting packets with FNE cleared would be from the receiver. So, counting packets with FNE cleared would be
likely to make the average unnecessarily positive, providing headroom likely to make the average unnecessarily positive, providing headroom
(or should we say footroom?) for dishonest (negative) traffic. (or should we say footroom?) for dishonest (negative) traffic.
If the dropper detects a persistently negative flow, it SHOULD drop If the dropper detects a persistently negative flow, it SHOULD drop
sufficient negative and neutral packets to force the flow to not be sufficient negative and neutral packets to force the flow to not be
negative. Drops SHOULD be focused on just sufficient packets in negative. Drops SHOULD be focused on just sufficient packets in
misbehaving flows to remove the negative bias while doing minimal misbehaving flows to remove the negative bias while doing minimal
extra harm. extra harm.
6.1.5. Rate Policing 6.1.5. Policing
Access operators who wish to check that a sender is complying with a Access operators who wish to limit the congeston that a sender is
particular rate response to congestion can deploy rate policers at able to cause can deploy policers at the very first ingress to the
the very first ingress to the internetwork. Re-ECN has been designed internetwork. Re-ECN has been designed to avoid the need for
to avoid the need for bottleneck policing so that we can avoid a bottleneck policing so that we can avoid a future where a single rate
future where a single rate adaptation policy is embedded throughout adaptation policy is embedded throughout the network. Instead, re-
the network. Instead, re-ECN allows the particular rate adaptation ECN allows the particular rate adaptation policy to be solely agreed
policy to be solely agreed bilaterally between the sender and its bilaterally between the sender and its ingress access provider
ingress access provider (Section 5.5.2 discusses possible ways to (Section 5.5.2 discusses possible ways to signal between them), which
signal between them), which allows congestion control to be policed, allows congestion control to be policed, but maintains its
but maintains its evolvability, requiring only a single, local box to evolvability, requiring only a single, local box to be updated.
be updated.
If desired, the re-ECN protocol allows these ingress policers to Appendix G gives examples of per-user policing algorithms. But there
perform per-flow policing according to the widely adopted TCP rate is no implication that these algorithms are to be standardised, or
adaptation, perhaps as a default. But it also allows new rate that they are ideal. The ingress rate policer is the part of the re-
adaptation policies beyond TCP to be enforced. Perhaps more ECN incentive framework that is intended to be the most flexible.
usefully, it also allows the flexibility for networks to choose to Once endpoint protocol handlers for re-ECN and egress droppers are in
police users as a whole, rather than flows. place, operators can choose exactly which congestion response they
want to police, and whether they want to do it per user, per flow or
not at all.
Appendix G gives examples of per-user and per-flow policing The re-ECN protocol allows these ingress policers to easily perform
algorithms. But there is no implication that these algorithms are to bulk per-user policing (Appendix G.1). This is likely to provide
be standardised, or that they are ideal. The ingress rate policer is sufficient incentive to the user to correctly respond to congestion
the part of the re-ECN incentive framework that is intended to be the without needing the policing function to be overly complex. If an
most flexible. Once endpoint protocol handlers for re-ECN and egress access operator chose they could use per-flow policing according to
droppers are in place, operators can choose exactly which congestion the widely adopted TCP rate adaptation ( Appendix G.2) or other
response they want to police, and whether they want to do it per alternatives, however this would introduce extra complexity to the
user, per flow or not at all. system.
However, if a rate policer is used, it should use path (not If a per-flow rate policer is used, it should use path (not
downstream) congestion as the relevant metric, which is represented downstream) congestion as the relevant metric, which is represented
by the fraction of octets in packets with positive (Re-Echo and FNE) by the fraction of octets in packets with positive (Re-Echo and FNE)
and canceled (CE(0)) markings. Of course, re-ECN provides all the and canceled (CE(0)) markings. Of course, re-ECN provides all the
information a policer needs directly in the packets being policed. information a policer needs directly in the packets being policed.
So, even policing TCP's AIMD algorithm is relatively straightforward. So, even policing TCP's AIMD algorithm is relatively straightforward
Appendix G presents an example design, but the choice of preferred (Appendix G.2).
mechanism is up to the implementer.
Note that we have included canceled packets in the measure of path Note that we have included canceled packets in the measure of path
congestion. Canceled packets arise when the sender re-echoes earlier congestion. Canceled packets arise when the sender re-echoes earlier
congestion, but then this Re-Echo packet just happens to be congestion, but then this Re-Echo packet just happens to be
congestion marked itself. One would not normally expect many congestion marked itself. One would not normally expect many
canceled packets at the first ingress because one would not normally canceled packets at the first ingress because one would not normally
expect much congestion marking to have been necessary that soon in expect much congestion marking to have been necessary that soon in
the path. However, a home network or campus network may well sit the path. However, a home network or campus network may well sit
between the sending endpoint and the ingress policer, so some between the sending endpoint and the ingress policer, so some
congestion may occur upstream of the policer. And if congestion does congestion may occur upstream of the policer. And if congestion does
skipping to change at page 47, line 5 skipping to change at page 48, line 36
Of course, even if the sender does operate its own network, it may Of course, even if the sender does operate its own network, it may
arrange not to congestion mark traffic. Whether the sender does this arrange not to congestion mark traffic. Whether the sender does this
or not is of no concern to anyone else except the sender. Such a or not is of no concern to anyone else except the sender. Such a
sender will not be policed against its own network's contribution to sender will not be policed against its own network's contribution to
congestion, but the only resulting problem would be overload in the congestion, but the only resulting problem would be overload in the
sender's own network. sender's own network.
Finally, we must not forget that an easy way to circumvent re-ECN's Finally, we must not forget that an easy way to circumvent re-ECN's
defences is for the source to turn off re-ECN support, by setting the defences is for the source to turn off re-ECN support, by setting the
Not-RECT codepoint, implying legacy traffic. Therefore an ingress Not-RECT codepoint, implying legacy traffic. Therefore an ingress
policer must put a general rate-limit on Not-RECT traffic, which policer should put a general rate-limit on Not-RECT traffic, which
SHOULD be lax during early, patchy deployment, but will have to SHOULD be lax during early, patchy deployment, but will have to
become stricter as deployment widens. Similarly, flows starting become stricter as deployment widens. Similarly, flows starting
without an FNE packet can be confined by a strict rate-limit used for without an FNE packet can be confined by a strict rate-limit used for
the remainder of flows that haven't proved they are well-behaved by the remainder of flows that haven't proved they are well-behaved by
starting correctly (therefore they need not consume any flow state--- starting correctly (therefore they need not consume any flow state---
they are just confined to the `misbehaving' bin if they carry an they are just confined to the `misbehaving' bin if they carry an
unrecognised flow ID). unrecognised flow ID).
6.1.6. Inter-domain Policing 6.1.6. Inter-domain Policing
One of the main design goals of re-ECN is for border security One of the main design goals of re-ECN is for border security
mechanisms to be as simple as possible, otherwise they will become mechanisms to be as simple as possible, otherwise they will become
the pinch-points that limit scalability of the whole internetwork. the pinch-points that limit scalability of the whole internetwork.
We want to avoid per-flow processing at borders and to keep to We want to avoid per-flow processing at borders and to keep to
passive mechanisms that can monitor traffic in parallel to passive mechanisms that can monitor traffic in parallel to
forwarding, rather than having to filter traffic inline---in series forwarding, rather than having to filter traffic inline---in series
with forwarding. with forwarding. Such passive, off-line mechanisms are essential for
future high-speed all-optical border interconnection where packets
cannot be buffered while they are checked for policy compliance.
So far, we have been able to keep the border mechanisms simple, So far, we have been able to keep the border mechanisms simple,
despite having had to harden them against some subtle attacks on the despite having had to harden them against some subtle attacks on the
re-ECN design. The mechanisms are still passive and avoid per-flow re-ECN design. The mechanisms are still passive and avoid per-flow
processing. processing.
The basic accounting mechanism at each border interface simply The basic accounting mechanism at each border interface simply
involves accumulating the volume of packets with positive worth (Re- involves accumulating the volume of packets with positive worth (Re-
Echo and FNE), and subtracting the volume of those with negative Echo and FNE), and subtracting the volume of those with negative
worth: CE(-1). Even though this mechanism takes no regard of flows, worth: CE(-1). Even though this mechanism takes no regard of flows,
skipping to change at page 50, line 33 skipping to change at page 52, line 18
tend to be dropped before others if routers use the preferential drop tend to be dropped before others if routers use the preferential drop
rules in Section 5.3, which discriminate against non-positive rules in Section 5.3, which discriminate against non-positive
packets. All networks below the point where a flow goes negative packets. All networks below the point where a flow goes negative
(N1, N2 and N4 in this case) have an incentive to remove this flow, (N1, N2 and N4 in this case) have an incentive to remove this flow,
but the router where it first goes negative (in N1) can of course but the router where it first goes negative (in N1) can of course
remove the problem for everyone downstream. remove the problem for everyone downstream.
In the case of DDoS attacks, Section 6.2.1 describes how re-ECN In the case of DDoS attacks, Section 6.2.1 describes how re-ECN
mitigates their force. mitigates their force.
Note that the guiding principle behind all the above discussion is
that any gain from subverting the protocol should be precisely
neutralised, rather than punished. If a gain is punished to a
greater extent than is sufficient to neutralise it, it will most
likely open up a new vulnerability, where the amplifying effect of
the punishment mechanism can be turned on others.
For instance, if possible, flows should be removed as soon as they go
negative, but we do NOT RECOMMEND any attempts to discard such flows
further upstream while they are still positive. Such over-zealous
push-back is unnecessary and potentially dangerous. These flows have
paid their `fare' up to the point they go negative, so there is no
harm in delivering them that far. If someone downstream asks for a
flow to be dropped as near to the source as possible, because they
say it is going to become negative later, an upstream node cannot
test the truth of this assertion. Rather than have to authenticate
such messages, re-ECN has been designed so that flows can be dropped
solely based on locally measurable evidence. A message hinting that
a flow should be watched closely to test for negativity is fine. But
not a message that claims that a positive flow will go negative
later, so it should be dropped. .
6.1.7. Inter-domain Fail-safes 6.1.7. Inter-domain Fail-safes
The mechanisms described so far create incentives for rational The mechanisms described so far create incentives for rational
network operators to behave. That is, one operator aims to make network operators to behave. That is, one operator aims to make
another behave responsibly by applying penalties and expects a another behave responsibly by applying penalties and expects a
rational response (i.e. one that trades off costs against benefits). rational response (i.e. one that trades off costs against benefits).
It is usually reasonable to assume that other network operators will It is usually reasonable to assume that other network operators will
behave rationally (policy routing can avoid those that might not). behave rationally (policy routing can avoid those that might not).
But this approach does not protect against the misconfigurations and But this approach does not protect against the misconfigurations and
accidents of other operators. accidents of other operators.
skipping to change at page 56, line 36 skipping to change at page 57, line 47
* ECN `only' gives a performance improvement. Making a product a * ECN `only' gives a performance improvement. Making a product a
bit faster (whether the product is a device or a network), bit faster (whether the product is a device or a network),
isn't usually a sufficient selling point to be worth the cost isn't usually a sufficient selling point to be worth the cost
of co-ordinating across the industry to deploy it. Network of co-ordinating across the industry to deploy it. Network
operators tend to avoid re-configuring a working network unless operators tend to avoid re-configuring a working network unless
launching a new product. launching a new product.
ECN and re-ECN for Edge-to-edge Assured QoS: ECN and re-ECN for Edge-to-edge Assured QoS:
We believe the proposal to provide assured QoS sessions using a We believe the proposal to provide assured QoS sessions using a
form of ECN called pre-congestion notification (PCN) [CL-deploy] form of ECN called pre-congestion notification (PCN) [PCN-arch] is
is most likely to break the deadlock in ECN deployment first. It most likely to break the deadlock in ECN deployment first. It
only requires edge-to-edge deployment so it does not require only requires edge-to-edge deployment so it does not require
endpoint support. It can be deployed in a single network, then endpoint support. It can be deployed in a single network, then
grow incrementally to interconnected networks. And it provides a grow incrementally to interconnected networks. And it provides a
different `product' (internetworked assured QoS), rather than different `product' (internetworked assured QoS), rather than
merely making an existing product a bit faster. merely making an existing product a bit faster.
Not only could this assured QoS application kick-start ECN Not only could this assured QoS application kick-start ECN
deployment, it could also carry re-ECN deployment with it; because deployment, it could also carry re-ECN deployment with it; because
re-ECN can enable the assured QoS region to expand to a large re-ECN can enable the assured QoS region to expand to a large
internetwork where neighbouring networks do not trust each other. internetwork where neighbouring networks do not trust each other.
skipping to change at page 63, line 5 skipping to change at page 64, line 15
to the higher layer and hide how the lower layer does it. However, to the higher layer and hide how the lower layer does it. However,
ECN reveals the state of the network layer and below to the transport ECN reveals the state of the network layer and below to the transport
layer. A more positive way to describe ECN is that it is like the layer. A more positive way to describe ECN is that it is like the
return value of a function call to the network layer. It explicitly return value of a function call to the network layer. It explicitly
returns the status of the request to deliver a packet, by returning a returns the status of the request to deliver a packet, by returning a
value representing the current risk that a packet will not be served. value representing the current risk that a packet will not be served.
Re-ECN has similar semantics, except the transport layer must try to Re-ECN has similar semantics, except the transport layer must try to
guess the return value, then it can use the actual return value from guess the return value, then it can use the actual return value from
the network layer to modify the next guess. the network layer to modify the next guess.
The guiding principle behind all the discussion in Section 6.1.6 on
Policing is that any gain from subverting the protocol should be
precisely neutralised, rather than punished. If a gain is punished
to a greater extent than is sufficient to neutralise it, it will most
likely open up a new vulnerability, where the amplifying effect of
the punishment mechanism can be turned on others.
For instance, if possible, flows should be removed as soon as they go
negative, but we do NOT RECOMMEND any attempts to discard such flows
further upstream while they are still positive. Such over-zealous
push-back is unnecessary and potentially dangerous. These flows have
paid their `fare' up to the point they go negative, so there is no
harm in delivering them that far. If someone downstream asks for a
flow to be dropped as near to the source as possible, because they
say it is going to become negative later, an upstream node cannot
test the truth of this assertion. Rather than have to authenticate
such messages, re-ECN has been designed so that flows can be dropped
solely based on locally measurable evidence. A message hinting that
a flow should be watched closely to test for negativity is fine. But
not a message that claims that a positive flow will go negative
later, so it should be dropped. .
9. Related Work 9. Related Work
{Due to lack of time, this section is incomplete. The reader is {Due to lack of time, this section is incomplete. The reader is
referred to the Related Work section of [Re-fb] for a brief selection referred to the Related Work section of [Re-fb] for a brief selection
of related ideas.} of related ideas.}
9.1. Policing Rate Response to Congestion 9.1. Policing Rate Response to Congestion
ATM network elements send congestion back-pressure ATM network elements send congestion back-pressure
messages [ITU-T.I.371] along each connection, duplicating any end to messages [ITU-T.I.371] along each connection, duplicating any end to
skipping to change at page 63, line 52 skipping to change at page 65, line 37
9.2. Congestion Notification Integrity 9.2. Congestion Notification Integrity
The choice of two ECT code-points in the ECN field [RFC3168] The choice of two ECT code-points in the ECN field [RFC3168]
permitted future flexibility, optionally allowing the sender to permitted future flexibility, optionally allowing the sender to
encode the experimental ECN nonce [RFC3540] in the packet stream. encode the experimental ECN nonce [RFC3540] in the packet stream.
This mechanism has since been included in the specifications of DCCP This mechanism has since been included in the specifications of DCCP
[RFC4340]. [RFC4340].
The ECN nonce is an elegant scheme that allows the sender to detect The ECN nonce is an elegant scheme that allows the sender to detect
if someone in the feedback loop - the receiver especially - tries to if someone in the feedback loop - the receiver especially - tries to
claim no congestion was experienced when in fact congestion lead to claim no congestion was experienced when in fact congestion led to
packet drops or ECN marks. For each packet it sends, the sender packet drops or ECN marks. For each packet it sends, the sender
chooses between the two ECT codepoints in a pseudo-random sequence. chooses between the two ECT codepoints in a pseudo-random sequence.
Then, whenever the network marks a packet with CE, if the receiver Then, whenever the network marks a packet with CE, if the receiver
wants to deny congestion happened, she has to guess which ECT wants to deny congestion happened, she has to guess which ECT
codepoint was overwritten. She has only a 50:50 chance of being codepoint was overwritten. She has only a 50:50 chance of being
correct each time she denies a congestion mark or a drop, which correct each time she denies a congestion mark or a drop, which
ultimately will give her away. ultimately will give her away.
The purpose of a network-layer nonce has to be the protection of the The purpose of a network-layer nonce should primarily be protection
network in the first place, while a transport-layer nonce had better of the network, while a transport-layer nonce would be better used to
be used to protect the sender from cheating receivers. Now, the protect the sender from cheating receivers. Now, the assumption
assumption behind the ECN nonce is that a sender will want to detect behind the ECN nonce is that a sender will want to detect whether a
whether a receiver is suppressing congestion feedback. This is only receiver is suppressing congestion feedback. This is only true if
true if the sender's interests are aligned with the network's, or the sender's interests are aligned with the network's, or with the
with the community of users as a whole. This may be true for certain community of users as a whole. This may be true for certain large
large senders, who are under close scrutiny and have a reputation to senders, who are under close scrutiny and have a reputation to
maintain. But we have to deal with a more hostile world, where maintain. But we have to deal with a more hostile world, where
traffic may be dominated by peer-to-peer transfers, rather than traffic may be dominated by peer-to-peer transfers, rather than
downloads from a few popular sites. Often the `natural' self- downloads from a few popular sites. Often the `natural' self-
interest of a sender is not aligned with the interests of other interest of a sender is not aligned with the interests of other
users. It often wishes to transfer data quickly to the receiver as users. It often wishes to transfer data quickly to the receiver as
much as the receiver wants the data quickly. much as the receiver wants the data quickly.
In contrast, the re-ECN protocol enables policing of an agreed rate- In contrast, the re-ECN protocol enables policing of an agreed rate-
response to congestion (e.g. TCP-friendliness) at the sender's response to congestion (e.g. TCP-friendliness) at the sender's
interface with the internetwork. It also ensures downstream networks interface with the internetwork. It also ensures downstream networks
skipping to change at page 66, line 16 skipping to change at page 67, line 49
rather wastefully to encode just five states. In effect the RE flag rather wastefully to encode just five states. In effect the RE flag
has been used as an orthogonal single bit, using up four codepoints has been used as an orthogonal single bit, using up four codepoints
to encode the three states of positive, neutral and negative worth. to encode the three states of positive, neutral and negative worth.
The mapping of the codepoints in an earlier version of this proposal The mapping of the codepoints in an earlier version of this proposal
used the codepoint space more efficiently, but the scheme became used the codepoint space more efficiently, but the scheme became
vulnerable to network operators bypassing congestion penalties by vulnerable to network operators bypassing congestion penalties by
focusing congestion marking on positive packets. Appendix B explains focusing congestion marking on positive packets. Appendix B explains
why fixing that problem while allowing for incremental deployment, why fixing that problem while allowing for incremental deployment,
would have used another codepoint anyway. So it was better to use would have used another codepoint anyway. So it was better to use
this orthogonal encoding scheme, which greatly simplified the whole this orthogonal encoding scheme, which greatly simplified the whole
protocol and brought with it some subtle security benefits. protocol and brought with it some subtle security benefits (see the
last paragraph of Appendix B).
With the scheme as now proposed, once the RE flag is set or cleared With the scheme as now proposed, once the RE flag is set or cleared
by the sender or its proxy, it should not be written by the network, by the sender or its proxy, it should not be written by the network,
only read. So the gateways can detect if any network maliciously only read. So the endpoints can detect if any network maliciously
alters the RE flag. IPSec AH integrity checking does not cover the alters the RE flag. IPSec AH integrity checking does not cover the
IPv4 option flags (they were considered mutable---even the one we IPv4 option flags (they were considered mutable---even the one we
propose using for the RE flag that was `currently unused' when IPSec propose using for the RE flag that was `currently unused' when IPSec
was defined). But it would be sufficient for a pair of gateways to was defined). But it would be sufficient for a pair of endpoints to
make random checks on whether the RE flag was the same when it make random checks on whether the RE flag was the same when it
reached the egress gateway as when it left the ingress. Indeed, if reached the egress as when it left the ingress. Indeed, if IPSec AH
IPSec AH had covered the RE flag, any network intending to alter had covered the RE flag, any network intending to alter sufficient RE
sufficient RE flags to make a gain would have focused its alterations flags to make a gain would have focused its alterations on packets
on packets without authenticating headers (AHs). without authenticating headers (AHs).
The security of re-ECN has been deliberately designed to not rely on The security of re-ECN has been deliberately designed to not rely on
cryptography. cryptography.
11. IANA Considerations 11. IANA Considerations
This memo includes no request to IANA (yet). This memo includes no request to IANA (yet).
If this memo was to progress to standards track, it would list: If this memo was to progress to standards track, it would list:
skipping to change at page 68, line 42 skipping to change at page 70, line 28
Internet to Support Real-Time Content Supply from a Large Internet to Support Real-Time Content Supply from a Large
Fraction of Broadband Residential Users", BT Technology Fraction of Broadband Residential Users", BT Technology
Journal (BTTJ) 23(2), April 2005. Journal (BTTJ) 23(2), April 2005.
[Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the [Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the
assumptions underlying mechanism design for the Internet", assumptions underlying mechanism design for the Internet",
Proc. Workshop on the Economics of Networked Systems Proc. Workshop on the Economics of Networked Systems
(NetEcon06) , June 2006, <http://www.cs.duke.edu/nicl/ (NetEcon06) , June 2006, <http://www.cs.duke.edu/nicl/
netecon06/papers/ne06-assessing.pdf>. netecon06/papers/ne06-assessing.pdf>.
[CL-deploy]
Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F.,
Charny, A., Babiarz, J., Chan, K., Westberg, L., Bader,
A., and G. Karagiannis, "A Deployment Model for Admission
Control over DiffServ using Pre-Congestion Notification",
draft-briscoe-tsvwg-cl-architecture-03 (work in progress),
June 2006.
[CLoop_pol] [CLoop_pol]
Salvatori, A., "Closed Loop Traffic Policing", Politecnico Salvatori, A., "Closed Loop Traffic Policing", Politecnico
Torino and Institut Eurecom Masters Thesis , Torino and Institut Eurecom Masters Thesis ,
September 2005. September 2005.
[ECN-Deploy] [ECN-Deploy]
Floyd, S., "ECN (Explicit Congestion Notification) in Floyd, S., "ECN (Explicit Congestion Notification) in
TCP/IP; Implementation and Deployment of ECN", Web-page , TCP/IP; Implementation and Deployment of ECN", Web-page ,
May 2004, May 2004,
<http://www.icir.org/floyd/ecn.html#implementations>. <http://www.icir.org/floyd/ecn.html#implementations>.
[ECN-MPLS] [ECN-MPLS]
Bruce, B., Briscoe, B., and J. Tay, "Explicit Congestion Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion
Marking in MPLS", draft-davie-ecn-mpls-00 (work in Marking in MPLS", draft-ietf-tsvwg-ecn-mpls-01 (work in
progress), June 2006. progress), June 2007.
[ECN-tunnel]
Briscoe, B., "Layered Encapsulation of Congestion
Notification", draft-briscoe-tsvwg-ecn-tunnel-00 (work in
progress), July 2007.
[Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the
evolution of congestion control", Automatica 35(12)1969-- evolution of congestion control", Automatica 35(12)1969--
1985, December 1999, 1985, December 1999,
<http://www.statslab.cam.ac.uk/~frank/evol.html>. <http://www.statslab.cam.ac.uk/~frank/evol.html>.
[I-D.ietf-tsvwg-ecnsyn] [I-D.ietf-tcpm-ecnsyn]
Kuzmanovic, A., "Adding Explicit Congestion Notification Kuzmanovic, A., "Adding Explicit Congestion Notification
(ECN) Capability to TCP's SYN/ACK Packets", (ECN) Capability to TCP's SYN/ACK Packets",
draft-ietf-tsvwg-ecnsyn-00 (work in progress), draft-ietf-tcpm-ecnsyn-01 (work in progress),
November 2005. October 2006.
[I-D.moncaster-tcpm-rcv-cheat]
Moncaster, T., "A TCP Test to Allow Senders to Identify
Receiver Non-Compliance",
draft-moncaster-tcpm-rcv-cheat-01 (work in progress),
June 2007.
[ITU-T.I.371] [ITU-T.I.371]
ITU-T, "Traffic Control and Congestion Control in ITU-T, "Traffic Control and Congestion Control in
{B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004. {B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004.
[Jiang02] Jiang, H. and D. Dovrolis, "The Macroscopic Behavior of [Jiang02] Jiang, H. and D. Dovrolis, "The Macroscopic Behavior of
the TCP Congestion Avoidance Algorithm", ACM SIGCOMM the TCP Congestion Avoidance Algorithm", ACM SIGCOMM
CCR 32(3)75-88, July 2002, CCR 32(3)75-88, July 2002,
<http://doi.acm.org/10.1145/571697.571725>. <http://doi.acm.org/10.1145/571697.571725>.
[Mathis97] [Mathis97]
Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The
Macroscopic Behavior of the TCP Congestion Avoidance Macroscopic Behavior of the TCP Congestion Avoidance
Algorithm", ACM SIGCOMM CCR 27(3)67--82, July 1997, Algorithm", ACM SIGCOMM CCR 27(3)67--82, July 1997,
<http://doi.acm.org/10.1145/263932.264023>. <http://doi.acm.org/10.1145/263932.264023>.
[PCN-arch]
Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R.,
Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion
Notification Architecture",
draft-eardley-pcn-architecture-00 (work in progress),
June 2007.
[Purple] Pletka, R., Waldvogel, M., and S. Mannal, "PURPLE: [Purple] Pletka, R., Waldvogel, M., and S. Mannal, "PURPLE:
Predictive Active Queue Management Utilizing Congestion Predictive Active Queue Management Utilizing Congestion
Information", Proc. Local Computer Networks (LCN 2003) , Information", Proc. Local Computer Networks (LCN 2003) ,
October 2003. October 2003.
[RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell, [RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell,
M., Romanow, A., Weinrib, A., and L. Zhang, "Resource M., Romanow, A., Weinrib, A., and L. Zhang, "Resource
ReSerVation Protocol (RSVP) Version 1 Applicability ReSerVation Protocol (RSVP) Version 1 Applicability
Statement Some Guidelines on Deployment", RFC 2208, Statement Some Guidelines on Deployment", RFC 2208,
September 1997. September 1997.
skipping to change at page 70, line 33 skipping to change at page 72, line 30
RFC 3514, April 2003. RFC 3514, April 2003.
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces", Congestion Notification (ECN) Signaling with Nonces",
RFC 3540, June 2003. RFC 3540, June 2003.
[RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion
Control for Voice Traffic in the Internet", RFC 3714, Control for Voice Traffic in the Internet", RFC 3714,
March 2004. March 2004.
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005.
[Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN [Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN
on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01 on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01
(work in progress), March 2006. (work in progress), March 2006.
[Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C.,
Salvatori, A., Soppera, A., and M. Koyabe, "Policing Salvatori, A., Soppera, A., and M. Koyabe, "Policing
Congestion Response in an Internetwork Using Re-Feedback", Congestion Response in an Internetwork Using Re-Feedback",
ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// ACM SIGCOMM CCR 35(4)277--288, August 2005, <http://
www.acm.org/sigs/sigcomm/sigcomm2005/ www.acm.org/sigs/sigcomm/sigcomm2005/
techprog.html#session8>. techprog.html#session8>.
[Savage99]
Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
"TCP congestion control with a misbehaving receiver", ACM
SIGCOMM CCR 29(5), October 1999,
<http://citeseer.ist.psu.edu/savage99tcp.html>.
[Smart_rtg] [Smart_rtg]
Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang, Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang,
"Optimizing Cost and Performance for Multihoming", ACM "Optimizing Cost and Performance for Multihoming", ACM
SIGCOMM CCR 34(4)79--92, October 2004, SIGCOMM CCR 34(4)79--92, October 2004,
<http://citeseer.ist.psu.edu/698472.html>. <http://citeseer.ist.psu.edu/698472.html>.
[Steps_DoS] [Steps_DoS]
Handley, M. and A. Greenhalgh, "Steps towards a DoS- Handley, M. and A. Greenhalgh, "Steps towards a DoS-
resistant Internet Architecture", Proc. ACM SIGCOMM resistant Internet Architecture", Proc. ACM SIGCOMM
workshop on Future directions in network architecture workshop on Future directions in network architecture
skipping to change at page 75, line 38 skipping to change at page 77, line 43
Appendix E. Example Egress Dropper Algorithm Appendix E. Example Egress Dropper Algorithm
{ToDo: Write up the basic algorithm with flow state, then the {ToDo: Write up the basic algorithm with flow state, then the
aggregated one.} aggregated one.}
Appendix F. Re-TTL Appendix F. Re-TTL
This Appendix gives an overview of a proposal to be able to overload This Appendix gives an overview of a proposal to be able to overload
the TTL field in the IP header to monitor downstream propagation the TTL field in the IP header to monitor downstream propagation
delay. It is planned to fully write up this proposal in a future delay. This is included to show that it would be possible to take
Internet Draft. account of RTT if it was deemed desirable.
Delay re-feedback can be achieved by overloading the TTL field, Delay re-feedback can be achieved by overloading the TTL field,
without changing IP or router TTL processing. A target value for TTL without changing IP or router TTL processing. A target value for TTL
at the destination would need standardising, say 16. If the path hop at the destination would need standardising, say 16. If the path hop
count increased by more than 16 during a routing change, it would count increased by more than 16 during a routing change, it would
temporarily be mistaken for a routing loop, so this target would need temporarily be mistaken for a routing loop, so this target would need
to be chosen to exceed typical hop count increases. The TCP wire to be chosen to exceed typical hop count increases. The TCP wire
protocol and handlers would need modifying to feed back the protocol and handlers would need modifying to feed back the
destination TTL and initialise it. It would be necessary to destination TTL and initialise it. It would be necessary to
standardise the unit of TTL in terms of real time (as was the standardise the unit of TTL in terms of real time (as was the
skipping to change at page 77, line 38 skipping to change at page 79, line 43
o r = C_FNE/T_FNE o r = C_FNE/T_FNE
o b_max = b_0 o b_max = b_0
T_FNE should be a much shorter period than T_user: for instance T_FNE T_FNE should be a much shorter period than T_user: for instance T_FNE
could be in the order of minutes while T_user could be in order of could be in the order of minutes while T_user could be in order of
weeks. weeks.
G.2. Per-flow Rate Policing G.2. Per-flow Rate Policing
Per-flow policing aims to enforce congestion responsiveness on the Whilst we believe that simple per-user policing would be sufficient
shortest information timescale on a network path: packet roundtrips. to ensure senders comply with congestion control, some operators may
wish to police the rate response of each flow to congestion as well.
Although we do not believe this will be neceesary, we include this
section to show how one could perform per-flow policing using
enforcement of TCP-fairness as an example. Per-flow policing aims to
enforce congestion responsiveness on the shortest information
timescale on a network path: packet roundtrips.
This again requires that the appropriate terms be agreed between a This again requires that the appropriate terms be agreed between a
network operator and its users, where a congestion responsiveness network operator and its users, where a congestion responsiveness
policy might be required for the use of a given network service policy might be required for the use of a given network service
(perhaps unless the user specifically requests otherwise). (perhaps unless the user specifically requests otherwise).
As an example, we describe below how a rate adaptation policer can be As an example, we describe below how a rate adaptation policer can be
designed when the applicable rate adaptation policy is TCP- designed when the applicable rate adaptation policy is TCP-
compliance. In that context, the average throughput of a flow will compliance. In that context, the average throughput of a flow will
be expected to be bounded by the value of the TCP throughput during be expected to be bounded by the value of the TCP throughput during
congestion avoidance, given n Mathis' formula [Mathis97] congestion avoidance, given in Mathis' formula [Mathis97]
x_TCP = k * s / ( T * sqrt(m) ) x_TCP = k * s / ( T * sqrt(m) )
where: where:
o x_TCP is the throughput of the TCP flow in packets per second, o x_TCP is the throughput of the TCP flow in packets per second,
o k is a constant upper-bounded by sqrt(3/2), o k is a constant upper-bounded by sqrt(3/2),
o s is the average packet size of the flow, o s is the average packet size of the flow,
skipping to change at page 81, line 8 skipping to change at page 83, line 20
H.2. Inflation Factor for Persistently Negative Flows H.2. Inflation Factor for Persistently Negative Flows
The following process is suggested to complement the simple algorithm The following process is suggested to complement the simple algorithm
above in order to protect against the various attacks from above in order to protect against the various attacks from
persistently negative flows described in Section 6.1.6. As explained persistently negative flows described in Section 6.1.6. As explained
in that section, the most important and first step is to estimate the in that section, the most important and first step is to estimate the
contribution of persistently negative flows to the bulk volume of contribution of persistently negative flows to the bulk volume of
downstream pre-congestion and to inflate this bulk volume as if these downstream pre-congestion and to inflate this bulk volume as if these
flows weren't there. The process below has been designed to give an flows weren't there. The process below has been designed to give an
unboased estimate, but it may be possible to define other processes unbiased estimate, but it may be possible to define other processes
that achieve similar ends. that achieve similar ends.
While the above simple metering algorithm is counting the bulk of While the above simple metering algorithm is counting the bulk of
traffic over an accounting period, the meter should also select a traffic over an accounting period, the meter should also select a
subset of the whole flow ID space that is small enough to be able to subset of the whole flow ID space that is small enough to be able to
realistically measure but large enough to give a realistic sample. realistically measure but large enough to give a realistic sample.
Many different samples of different subsets of the ID space should be Many different samples of different subsets of the ID space should be
taken at different times during the accounting period, preferably taken at different times during the accounting period, preferably
covering the whole ID space. During each sample, the meter should covering the whole ID space. During each sample, the meter should
count the volume of positive packets and subtract the volume of count the volume of positive packets and subtract the volume of
skipping to change at page 81, line 45 skipping to change at page 84, line 13
by the effect of persistently negative flows. by the effect of persistently negative flows.
Appendix I. Argument for holding back the ECN nonce Appendix I. Argument for holding back the ECN nonce
The ECN nonce is a mechanism that allows a /sending/ transport to The ECN nonce is a mechanism that allows a /sending/ transport to
detect if drop or ECN marking at a congested router has been detect if drop or ECN marking at a congested router has been
suppressed by a node somewhere in the feedback loop---another router suppressed by a node somewhere in the feedback loop---another router
or the receiver. or the receiver.
Space for the ECN nonce was set aside in [RFC3168] (currently Space for the ECN nonce was set aside in [RFC3168] (currently
proposed standard) while the full nonce mechanism is specified in RFC proposed standard) while the full nonce mechanism is specified in
3540 (currently experimental). The specifications for [RFC4340] [RFC3540] (currently experimental). The specifications for [RFC4340]
(currently proposed standard) requires that "Each DCCP sender SHOULD (currently proposed standard) requires that "Each DCCP sender SHOULD
set ECN Nonces on its packets...". It also mandates as a requirement set ECN Nonces on its packets...". It also mandates as a requirement
for all CCID profiles that "Any newly defined acknowledgement for all CCID profiles that "Any newly defined acknowledgement
mechanism MUST include a way to transmit ECN Nonce Echoes back to the mechanism MUST include a way to transmit ECN Nonce Echoes back to the
sender.", therefore: sender.", therefore:
o The CCID profile for TCP-like Congestion Control [RFC4341] o The CCID profile for TCP-like Congestion Control [RFC4341]
(currently proposed standard) says "The sender will use the ECN (currently proposed standard) says "The sender will use the ECN
Nonce for data packets, and the receiver will echo those nonces in Nonce for data packets, and the receiver will echo those nonces in
its Ack Vectors." its Ack Vectors."
o The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342] o The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342]
recommends that "The sender [use] Loss Intervals options' ECN recommends that "The sender [use] Loss Intervals options' ECN
Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to
probabilistically verify that the receiver is correctly reporting probabilistically verify that the receiver is correctly reporting
all dropped or marked packets." all dropped or marked packets."
The ECN nonce is used for three types of functions: The primary function of the ECN nonce is to protect the integrity of
the information about congestion: ECN marks and packet drops.
o if the sender wants to ensure the integrity of the information
about packet drops,
o if the sending transport chooses to act in the interests of a
congested router,
o if the sending transport wants to allocate its own resources in
proportion to the rates that each network path can sustain, based
on congestion control.
However, when the nonce is used to protect the integrity of However, when the nonce is used to protect the integrity of
information about packet drops, rather than ECN marks, a transport information about packet drops, rather than ECN marks, a transport
layer nonce will always be sufficient (because a drop loses the layer nonce will always be sufficient (because a drop loses the
transport header as well as the ECN field in the network header), transport header as well as the ECN field in the network header),
which would avoid using scarce IP header codepoint space. Similarly, which would avoid using scarce IP header codepoint space. Similarly,
a transport layer nonce would protect against a receiver sending a transport layer nonce would protect against a receiver sending
early acknowledgements. early acknowledgements [Savage99].
The other two functions need the ECN nonce to be in the network If the ECN nonce reveals integrity problems with the information
layer, but both require rather optimistic trust assumptions in order about congestion, the sending transport can use that knowledge for
to be useful. If the sending transport chooses to act in the two functions:
interests of a congested router, it can reduce its rate if it detects
some malicious party in the feedback loop may be suppressing ECN o to protect its own resources, by allocating them in proportion to
feedback. But it would only be useful to a router when /all/ senders the rates that each network path can sustain, based on congestion
using the router are trusted to act in the router's interest. control,
o and to protect congested routers in the network, by slowing down
drastically its connection to the destination with corrupt
congestion information.
If the sending transport chooses to act in the interests of congested
routers, it can reduce its rate if it detects some malicious party in
the feedback loop may be suppressing ECN feedback. But it would only
be useful to congested routers when /all/ senders using them are
trusted to act in interest of the congested routers.
In the end, the only essential use of a network layer nonce is when In the end, the only essential use of a network layer nonce is when
sending transports (e.g. large servers) want to allocate their /own/ sending transports (e.g. large servers) want to allocate their /own/
resources in proportion to the rates that each network path can resources in proportion to the rates that each network path can
sustain, based on congestion control. In that case, the nonce allows sustain, based on congestion control. In that case, the nonce allows
senders to be assured that they aren't being duped into giving more senders to be assured that they aren't being duped into giving more
of their own resources to a particular flow. And if congestion of their own resources to a particular flow. And if congestion
suppression is detected, the sending transport can rate limit the suppression is detected, the sending transport can rate limit the
offending connection to protect its own resources. Certainly, this offending connection to protect its own resources. Certainly, this
is a useful function, but the IETF should carefully decide whether is a useful function, but the IETF should carefully decide whether
skipping to change at page 83, line 17 skipping to change at page 85, line 31
In contrast, re-ECN allows all routers to fully protect themselves In contrast, re-ECN allows all routers to fully protect themselves
from such attacks, without having to trust anyone - senders, from such attacks, without having to trust anyone - senders,
receivers, neighbouring networks. Re-ECN is therefore proposed in receivers, neighbouring networks. Re-ECN is therefore proposed in
preference to the ECN nonce on the basis that it addresses the preference to the ECN nonce on the basis that it addresses the
generic problem of accountability for congestion of a network's generic problem of accountability for congestion of a network's
resources at the IP layer. resources at the IP layer.
Delaying the ECN nonce is justified because the applicability of the Delaying the ECN nonce is justified because the applicability of the
ECN nonce seems too limited for it to consume a two-bit codepoint in ECN nonce seems too limited for it to consume a two-bit codepoint in
the IP header. the IP header. It therefore seems prudent to give time for an
alternative way to be found to do the one function the nonce is
essential for.
Moreover, while we have re-designed the re-ECN codepoints so that Moreover, while we have re-designed the re-ECN codepoints so that
they do not prevent the ECN nonce progressing, the same is not true they do not prevent the ECN nonce progressing, the same is not true
the other way round. If the ECN nonce started to see some deployment the other way round. If the ECN nonce started to see some deployment
(perhaps because it was blessed with proposed standard status), (perhaps because it was blessed with proposed standard status),
incremental deployment of re-ECN would effectively be impossible, incremental deployment of re-ECN would effectively be impossible,
because re-ECN marking fractions at inter-domain borders would be because re-ECN marking fractions at inter-domain borders would be
polluted by unknown levels of nonce traffic. polluted by unknown levels of nonce traffic.
The authors are aware that re-ECN must prove it has the potential it The authors are aware that re-ECN must prove it has the potential it
skipping to change at page 84, line 22 skipping to change at page 86, line 36
Email: arnaud.jacquet@bt.com Email: arnaud.jacquet@bt.com
URI: URI:
Alessandro Salvatori Alessandro Salvatori
BT BT
B54/77, Adastral Park B54/77, Adastral Park
Martlesham Heath Martlesham Heath
Ipswich IP5 3RE Ipswich IP5 3RE
UK UK
Email: sandr8@gmail.com Email: alessandro.salvatori@gmail.com
Martin Koyabe Martin Koyabe
BT BT
B54/69, Adastral Park PP2a Rigel House, Adastral Park
Martlesham Heath Martlesham Heath
Ipswich IP5 3RE Ipswich IP5 3RE
UK UK
Phone: +44 1473 646923 Phone: +44 1473 646923
Email: martin.koyabe@bt.com Email: martin.koyabe@bt.com
URI: URI:
Toby Moncaster
BT
B54/70, Adastral Park
Martlesham Heath
Ipswich IP5 3RE
UK
Phone: +44 1473 648734
Email: toby.moncaster@bt.com
Full Copyright Statement Full Copyright Statement
Copyright (C) The Internet Society (2006). Copyright (C) The IETF Trust (2007).
This document is subject to the rights, licenses and restrictions This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors contained in BCP 78, and except as set forth therein, the authors
retain all their rights. retain all their rights.
This document and the information contained herein are provided on an This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property Intellectual Property
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
skipping to change at page 85, line 45 skipping to change at page 88, line 45
such proprietary rights by implementers or users of this such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr. http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at this standard. Please address the information to the IETF at
ietf-ipr@ietf.org. ietf-ipr@ietf.org.
Acknowledgment Acknowledgments
Funding for the RFC Editor function is provided by the IETF Funding for the RFC Editor function is provided by the IETF
Administrative Support Activity (IASA). Administrative Support Activity (IASA). This document was produced
using xml2rfc v1.32 (of http://xml.resource.org/) from a source in
RFC-2629 XML format.
 End of changes. 87 change blocks. 
308 lines changed or deleted 422 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/