draft-briscoe-tsvwg-re-ecn-tcp-02.txt   draft-briscoe-tsvwg-re-ecn-tcp-03.txt 
Transport Area Working Group B. Briscoe Transport Area Working Group B. Briscoe
Internet-Draft BT & UCL Internet-Draft BT & UCL
Expires: December 28, 2006 A. Jacquet Intended status: Informational A. Jacquet
A. Salvatori Expires: April 26, 2007 A. Salvatori
M. Koyabe M. Koyabe
BT BT
June 26, 2006 October 23, 2006
Re-ECN: Adding Accountability for Causing Congestion to TCP/IP Re-ECN: Adding Accountability for Causing Congestion to TCP/IP
draft-briscoe-tsvwg-re-ecn-tcp-02 draft-briscoe-tsvwg-re-ecn-tcp-03
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 37 skipping to change at page 1, line 37
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on December 28, 2006. This Internet-Draft will expire on April 26, 2007.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2006). Copyright (C) The Internet Society (2006).
Abstract Abstract
This document introduces a new protocol for explicit congestion This document introduces a new protocol for explicit congestion
notification (ECN), termed re-ECN, which can be deployed notification (ECN), termed re-ECN, which can be deployed
incrementally around unmodified routers. The protocol arranges an incrementally around unmodified routers. The protocol arranges an
skipping to change at page 2, line 27 skipping to change at page 2, line 27
honestly. honestly.
Authors' Statement: Status (to be removed by the RFC Editor) Authors' Statement: Status (to be removed by the RFC Editor)
This document is posted as an Internet-Draft with the intent (at This document is posted as an Internet-Draft with the intent (at
least that of the authors) to eventually progress to standards track. least that of the authors) to eventually progress to standards track.
Although the re-ECN protocol is intended to make a simple but far- Although the re-ECN protocol is intended to make a simple but far-
reaching change to the Internet architecture, the most immediate reaching change to the Internet architecture, the most immediate
priority for the authors is to delay any move of the ECN nonce to priority for the authors is to delay any move of the ECN nonce to
Proposed Standard status. Proposed Standard status. The argument for this position is
developed in Appendix I.
The ECN nonce is an experimental RFC that allows /senders/ to check
the integrity of congestion feedback from /networks/. Therefore the
nonce only helps in scenarios where the sender is trusted to control
network congestion. On the other hand, the re-ECN protocol aims to
allow networks themselves to be able to police cheating senders and
receivers and to police neighbouring networks. Re-ECN is therefore
proposed in preference to the ECN nonce on the basis that it
addresses the generic problem of accountability for congestion of a
network's resources at the IP layer.
Delaying the ECN nonce is justified by two factors:
o The ECN nonce would permanently consumes a two-bit codepoint in
the IP header for a purpose specific to a limited trust model.
Although the nonce is a neat idea, its applicability seems too
limited to warrant space in the IP header;
o Although we have re-designed the re-ECN codepoints so that they do
not prevent the ECN nonce progressing, the same is not true the
other way round. If the ECN nonce started to see some deployment
(perhaps because it was blessed with proposed standard status),
incremental deployment of re-ECN would effectively be impossible,
because re-ECN marking fractions at inter-domain borders would be
polluted by unknown levels of nonce traffic.
The authors are aware that re-ECN must prove it has the potential it
claims if it is to displace the nonce. Therefore, every effort has
been made to complete a comprehensive specification of re-ECN so that
its potential can be assessed. We therefore seek the opinion of the
Internet community on whether the re-ECN protocol is sufficiently
useful to warrant standards action.
Changes from previous drafts (to be removed by the RFC Editor) Changes from previous drafts (to be removed by the RFC Editor)
From -00 to -01: From -00 to -01:
Encoding of re-ECN wire protocol changed for reasons given in Encoding of re-ECN wire protocol changed for reasons given in
Appendix B and consequently draft substantially re-written. Appendix B and consequently draft substantially re-written.
Substantial text added in sections on applications, incremental Substantial text added in sections on applications, incremental
deployment, architectural rationale and security considerations. deployment, architectural rationale and security considerations.
skipping to change at page 3, line 39 skipping to change at page 3, line 12
Text on (non-)issues with tunnels, encryption and link layer Text on (non-)issues with tunnels, encryption and link layer
congestion notification added (Section 5.6 & Section 5.7). congestion notification added (Section 5.6 & Section 5.7).
Section added giving evolvability arguments against encouraging Section added giving evolvability arguments against encouraging
bottleneck policing (Section 6.1.2). And text on re-ECN's bottleneck policing (Section 6.1.2). And text on re-ECN's
evolvability by design added to Section 6.1.3 evolvability by design added to Section 6.1.3
Text on inter-domain policing (Section 6.1.6) and inter-domain Text on inter-domain policing (Section 6.1.6) and inter-domain
fail-safes (Section 6.1.7) added. fail-safes (Section 6.1.7) added.
From -02 to -03:
Started guidelines for re-ECN support in DCCP and SCTP.
Added annex on limitations of nonce mechanism.
Minor editorial changes throughout. Minor editorial changes throughout.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6
2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7
3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8
3.1. Background and Applicability . . . . . . . . . . . . . . . 8 3.1. Background and Applicability . . . . . . . . . . . . . . . 8
3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or 3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or
v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 10 3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 10
3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 12 3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 12
4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 14 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15
4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16 4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16
4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or 4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or
Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 18 Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 18
4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 20 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 20
4.1.4. Extended ECN (EECN) Field Settings during Flow 4.1.4. Extended ECN (EECN) Field Settings during Flow
Start or after Idle Periods . . . . . . . . . . . . . 21 Start or after Idle Periods . . . . . . . . . . . . . 21
4.1.5. Pure ACKS, Retransmissions, Window Probes and 4.1.5. Pure ACKS, Retransmissions, Window Probes and
Partial ACKs . . . . . . . . . . . . . . . . . . . . . 25 Partial ACKs . . . . . . . . . . . . . . . . . . . . . 25
4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 26 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 26
4.2.1. Guidelines for Adding Re-ECN to Other Transports . . . 26 4.2.1. General Guidelines for Adding Re-ECN to Other
5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 26 Transports . . . . . . . . . . . . . . . . . . . . . . 26
5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 26 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 26
4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 27
4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 27
5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 27
5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 27
5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 28 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 28
5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 29 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 30
5.4. Justification for Setting the First SYN to FNE . . . . . . 30 5.4. Justification for Setting the First SYN to FNE . . . . . . 31
5.5. Control and Management . . . . . . . . . . . . . . . . . . 31 5.5. Control and Management . . . . . . . . . . . . . . . . . . 32
5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 31 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 32
5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 32 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 33
5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 32 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 33
5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 33 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 34
6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 34 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.1. Policing Congestion Response . . . . . . . . . . . . . . . 34 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 35
6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 34 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 35
6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 35 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 36
6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 36 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 37
6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 43 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 44
6.1.5. Rate Policing . . . . . . . . . . . . . . . . . . . . 44 6.1.5. Rate Policing . . . . . . . . . . . . . . . . . . . . 45
6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 46 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 47
6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 50 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 51
6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 51 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 51
6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 51 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 51
6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 51 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 52
6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 52 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 53
6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 52 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 53
6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 53 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 53
6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 53 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 53
7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 53 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 54
7.1. Incremental Deployment Features . . . . . . . . . . . . . 53 7.1. Incremental Deployment Features . . . . . . . . . . . . . 54
7.2. Incremental Deployment Incentives . . . . . . . . . . . . 55 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 55
8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 60 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 60
9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 62 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 63
9.1. Policing Rate Response to Congestion . . . . . . . . . . . 62 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 63
9.2. Congestion Notification Integrity . . . . . . . . . . . . 63 9.2. Congestion Notification Integrity . . . . . . . . . . . . 63
9.3. Identifying Upstream and Downstream Congestion . . . . . . 64 9.3. Identifying Upstream and Downstream Congestion . . . . . . 64
10. Security Considerations . . . . . . . . . . . . . . . . . . . 64 10. Security Considerations . . . . . . . . . . . . . . . . . . . 65
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 66 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 66
12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 66 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 67
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 66 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 67
14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 66 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 67
15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 67 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 67
15.1. Normative References . . . . . . . . . . . . . . . . . . . 67 15.1. Normative References . . . . . . . . . . . . . . . . . . . 67
15.2. Informative References . . . . . . . . . . . . . . . . . . 67 15.2. Informative References . . . . . . . . . . . . . . . . . . 68
Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 70 Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 71
Appendix B. Justification for Two Codepoints Signifying Zero Appendix B. Justification for Two Codepoints Signifying Zero
Worth Packets . . . . . . . . . . . . . . . . . . . . 71 Worth Packets . . . . . . . . . . . . . . . . . . . . 72
Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 73 Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 74
Appendix D. Packet Marking During Flow Start . . . . . . . . . . 74 Appendix D. Packet Marking During Flow Start . . . . . . . . . . 75
Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 74 Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 75
Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 74 Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 75
Appendix G. Policer Designs to ensure Congestion Appendix G. Policer Designs to ensure Congestion
Responsiveness . . . . . . . . . . . . . . . . . . . 75 Responsiveness . . . . . . . . . . . . . . . . . . . 76
G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 75 G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 76
G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 76 G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 77
Appendix H. Downstream Congestion Metering Algorithms . . . . . . 79 Appendix H. Downstream Congestion Metering Algorithms . . . . . . 80
H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 79 H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 80
H.2. Inflation Factor for Persistently Negative Flows . . . . . 79 H.2. Inflation Factor for Persistently Negative Flows . . . . . 80
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 81 Appendix I. Argument for holding back the ECN nonce . . . . . . . 81
Intellectual Property and Copyright Statements . . . . . . . . . . 82 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 83
Intellectual Property and Copyright Statements . . . . . . . . . . 85
1. Introduction 1. Introduction
This document aims: This document aims:
o To provide a complete specification of the addition of the re-ECN o To provide a complete specification of the addition of the re-ECN
protocol to IP and guidelines on how to add it to transport layer protocol to IP and guidelines on how to add it to transport layer
protocols, including a complete specification of re-ECN in TCP as protocols, including a complete specification of re-ECN in TCP as
an example; an example;
skipping to change at page 7, line 38 skipping to change at page 7, line 38
(Section 5) layers, then the applications it can be put to, such as (Section 5) layers, then the applications it can be put to, such as
policing DDoS, QoS and congestion control (Section 6). Although policing DDoS, QoS and congestion control (Section 6). Although
these applications do not require standardisation themselves, they these applications do not require standardisation themselves, they
are described in a fair degree of detail in order to explain how re- are described in a fair degree of detail in order to explain how re-
ECN can be used. Given, re-ECN proposes to use the last undefined ECN can be used. Given, re-ECN proposes to use the last undefined
bit in the IPv4 header, we felt it necessary to outline the potential bit in the IPv4 header, we felt it necessary to outline the potential
that re-ECN could release in return for being given that bit. that re-ECN could release in return for being given that bit.
Deployment issues discussed throughout the document are brought Deployment issues discussed throughout the document are brought
together in Section 7, which is followed by a brief section together in Section 7, which is followed by a brief section
explaining the somewhat subtle rationale for the design, from an explaining the somewhat subtle rationale for the design from an
architectural perspective (Section 8). We end by describing related architectural perspective (Section 8). We end by describing related
work (Section 9), listing security considerations (Section 10) and work (Section 9), listing security considerations (Section 10) and
finally drawing conclusions (Section 12). finally drawing conclusions (Section 12).
2. Requirements notation 2. Requirements notation
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
skipping to change at page 8, line 45 skipping to change at page 8, line 45
The choice of two ECT code-points in the ECN field [RFC3168] The choice of two ECT code-points in the ECN field [RFC3168]
permitted future flexibility, optionally allowing the sender to permitted future flexibility, optionally allowing the sender to
encode the experimental ECN nonce [RFC3540] in the packet stream. encode the experimental ECN nonce [RFC3540] in the packet stream.
The nonce is designed to allow a sender to check the integrity of The nonce is designed to allow a sender to check the integrity of
congestion feedback. But Section 9.2 explains that it still gives no congestion feedback. But Section 9.2 explains that it still gives no
control over how fast the sender transmits as a result of the control over how fast the sender transmits as a result of the
feedback. On the other hand, re-ECN is designed both to ensure that feedback. On the other hand, re-ECN is designed both to ensure that
congestion is declared honestly and that the sender's rate responds congestion is declared honestly and that the sender's rate responds
appropriately. appropriately.
Re-ECN is based on a feedback arrangement called Re-ECN is based on a feedback arrangement called `re-
`re-feedback' [Re-fb]. The word is short for either receiver- feedback' [Re-fb]. The word is short for either receiver-aligned,
aligned, re-inserted or re-echoed feedback. But it actually works re-inserted or re-echoed feedback. But it actually works even when
even when no feedback is available. In fact it has been carefully no feedback is available. In fact it has been carefully designed to
designed to work for single datagram flows. Indeed, it even work for single datagram flows. Indeed, it even encourages
encourages aggregation of single packet flows by congestion control aggregation of single packet flows by congestion control proxies.
proxies. Then, even if the traffic mix of the Internet were to
become dominated by short messages, it would still be possible to Then, even if the traffic mix of the Internet were to become
control congestion effectively and efficiently. dominated by short messages, it would still be possible to control
congestion effectively and efficiently.
Changing the Internet's feedback architecture seems to imply Changing the Internet's feedback architecture seems to imply
considerable upheaval. But re-ECN can be deployed incrementally at considerable upheaval. But re-ECN can be deployed incrementally at
the transport layer around unmodified routers using existing fields the transport layer around unmodified routers using existing fields
in IP (v4 or v6). However it does also require the last undefined in IP (v4 or v6). However it does also require the last undefined
bit in the IPv4 header, which it uses in combination with the 2-bit bit in the IPv4 header, which it uses in combination with the 2-bit
ECN field to create four new codepoints. Nonetheless, changes to IP ECN field to create four new codepoints. Nonetheless, changes to IP
routers are RECOMMENDED in order to improve resilience against DoS routers are RECOMMENDED in order to improve resilience against DoS
attacks. Similarly, re-ECN works best if both the sender and attacks. Similarly, re-ECN works best if both the sender and
receiver transports are re-ECN-capable, but it can work with just receiver transports are re-ECN-capable, but it can work with just
skipping to change at page 10, line 13 skipping to change at page 10, line 13
be defined in another specification (e.g. [Re-PCN]). be defined in another specification (e.g. [Re-PCN]).
Although the RE flag is a separate, single bit field, it can be read Although the RE flag is a separate, single bit field, it can be read
as an extension to the two-bit ECN field; the three concatenated bits as an extension to the two-bit ECN field; the three concatenated bits
in what we will call the extended ECN field (EECN) making eight in what we will call the extended ECN field (EECN) making eight
codepoints. We will use the RFC3168 names of the ECN codepoints to codepoints. We will use the RFC3168 names of the ECN codepoints to
describe settings of the ECN field when the RE flag setting is "don't describe settings of the ECN field when the RE flag setting is "don't
care", but we also define the following six extended ECN codepoint care", but we also define the following six extended ECN codepoint
names for when we need to be more specific. names for when we need to be more specific.
+-------+-----------+------+--------------+-------------------------+ +-------+------------+------+--------------+------------------------+
| ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning |
| field | codepoint | flag | codepoint | | | field | codepoint | flag | codepoint | |
+-------+-----------+------+--------------+-------------------------+ +-------+------------+------+--------------+------------------------+
| 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable |
| | | | | transport | | | | | | transport |
| 00 | Not-ECT | 1 | FNE | Feedback not | | 00 | Not-ECT | 1 | FNE | Feedback not |
| | | | | established | | | | | | established |
| 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion |
| | | | | and RECT | | | | | | and RECT |
| 01 | ECT(1) | 1 | RECT | Re-ECN capable | | 01 | ECT(1) | 1 | RECT | Re-ECN capable |
| | | | | transport | | | | | | transport |
| 10 | ECT(0) | 0 | --- | Legacy ECN use only | | 10 | ECT(0) | 0 | --- | Legacy ECN use only |
| | | | | |
| 10 | ECT(0) | 1 | --CU-- | Currently unused | | 10 | ECT(0) | 1 | --CU-- | Currently unused |
| | | | | | | | | | | |
| 11 | CE | 0 | CE(0) | Re-Echo canceled by | | 11 | CE | 0 | CE(0) | Re-Echo canceled by |
| | | | | congestion experienced | | | | | | congestion experienced |
| 11 | CE | 1 | CE(-1) | Congestion experienced | | 11 | CE | 1 | CE(-1) | Congestion experienced |
+-------+-----------+------+--------------+-------------------------+ +-------+------------+------+--------------+------------------------+
Table 1: Extended ECN Codepoints Table 1: Extended ECN Codepoints
3.3. Re-ECN Protocol Operation 3.3. Re-ECN Protocol Operation
In this section we will give an overview of the operation of the re- In this section we will give an overview of the operation of the re-
ECN protocol for TCP/IP, leaving a detailed specification to the ECN protocol for TCP/IP, leaving a detailed specification to the
following sections. Other transports will be discussed later. following sections. Other transports will be discussed later.
In summary, the protocol adds a third `re-echo' stage to the existing In summary, the protocol adds a third `re-echo' stage to the existing
skipping to change at page 12, line 44 skipping to change at page 12, line 44
of a negative metric arises because it is derived by subtracting one of a negative metric arises because it is derived by subtracting one
metric from another. Of course actual downstream congestion cannot metric from another. Of course actual downstream congestion cannot
be negative, only the metric can (whether due to time lags or be negative, only the metric can (whether due to time lags or
deliberate malice). deliberate malice).
Just as we will loosely talk of positive and negative flows, we will Just as we will loosely talk of positive and negative flows, we will
also talk of positive or negative packets, meaning packets that also talk of positive or negative packets, meaning packets that
contribute positively or negatively to the downstream congestion contribute positively or negatively to the downstream congestion
metric. metric.
Therefore packets we will talk of packets having `worth' of +1, 0 or Therefore we will talk of packets having `worth' of +1, 0 or -1,
-1, which, when multiplied by their size, indicates their which, when multiplied by their size, indicates their contribution to
contribution to the downstream congestion metric. the downstream congestion metric.
Figure 2 shows the main state transitions of the system once a flow Figure 2 shows the main state transitions of the system once a flow
is established, showing the worth of packets in each state. When the is established, showing the worth of packets in each state. When the
network congestion marks a packet it decrements its worth (moving network congestion marks a packet it decrements its worth (moving
from the left of the main square to the right). When the sender from the left of the main square to the right). When the sender
blanks the RE flag in order to re-echo congestion it increments the blanks the RE flag in order to re-echo congestion it increments the
worth of a packet (moving from the bottom of the main square to the worth of a packet (moving from the bottom of the main square to the
top). top).
Sender state Sent Worth Received Worth Sender state Sent Worth Received Worth
skipping to change at page 13, line 33 skipping to change at page 13, line 33
Figure 2: Re-ECN System State Diagram (bootstrap not shown) Figure 2: Re-ECN System State Diagram (bootstrap not shown)
The idea is that every time the network decrements the worth of a The idea is that every time the network decrements the worth of a
packet, the sender increments the worth of a later packet. Then, packet, the sender increments the worth of a later packet. Then,
over time, as many positive octets should arrive at the receiver as over time, as many positive octets should arrive at the receiver as
negative. Note we have said octets not packets, so if packets are of negative. Note we have said octets not packets, so if packets are of
different sizes, the worth should be incremented on enough octets to different sizes, the worth should be incremented on enough octets to
balance the octets in negative packets arriving at the receiver. It balance the octets in negative packets arriving at the receiver. It
is this balance that will allow the network to hold the sender is this balance that will allow the network to hold the sender
accountable for the congestion it causes, as we shall see. the accountable for the congestion it causes, as we shall see. The
informal outline below uses TCP as an example transport, but the idea informal outline below uses TCP as an example transport, but the idea
would be broadly similar for any transport that adapts its rate to would be broadly similar for any transport that adapts its rate to
congestion. congestion.
We will start with the sender in `flow established' state, Normally We will start with the sender in `flow established' state. Normally,
as acknowledgements of earlier packets arrive that don't feedback any as acknowledgements of earlier packets arrive that don't feedback any
congestion, the congestion window can be opened, so the sender goes congestion, the congestion window can be opened, so the sender goes
round the smaller sub-loop, sending RECT packets (worth 0) and round the smaller sub-loop, sending RECT packets (worth 0) and
returning to the flow established state to send another one. If a returning to the flow established state to send another one. If a
router congestion marks one of the packets, it decrements the router congestion marks one of the packets, it decrements the
packet's worth. The sender will have been continuing to traverse packet's worth. The sender will have been continuing to traverse
round the smaller feedback loop every time acknowledgements arrive. round the smaller feedback loop every time acknowledgements arrive.
But when congestion feedback returns from this packet that was marked But when congestion feedback returns from this packet that was marked
with -1 worth (the largest loop in the figure) the sender jumps to with -1 worth (the largest loop in the figure) the sender jumps to
the congestion echoed state in order to re-echo the congestion, the congestion echoed state in order to re-echo the congestion,
skipping to change at page 14, line 16 skipping to change at page 14, line 16
the same end to end feedback loop. the same end to end feedback loop.
If a packet carrying re-echoed congestion happens to also be If a packet carrying re-echoed congestion happens to also be
congestion marked, the +1 worth added by the sender will be cancelled congestion marked, the +1 worth added by the sender will be cancelled
out by the -1 network congestion marking. Although the two worth out by the -1 network congestion marking. Although the two worth
values correctly cancel out, neither the congestion marking nor the values correctly cancel out, neither the congestion marking nor the
re-echoed congestion are lost, because the RE bit and the ECN field re-echoed congestion are lost, because the RE bit and the ECN field
are orthogonal. So, whenever this happens, the receiver will are orthogonal. So, whenever this happens, the receiver will
correctly detect and re-echo the new congestion event as well (the correctly detect and re-echo the new congestion event as well (the
top sub-loop). When we need to distinguish, we will sometimes call a top sub-loop). When we need to distinguish, we will sometimes call a
packet marked RECT neutral (0 worth), while we will call the CE(0) packet marked RECT 'neutral' (0 worth), while we will call the CE(0)
marking canceled (also 0 worth). If a re-echoed packet isn't unlucky marking 'canceled' (also 0 worth). If a re-echoed packet isn't
enough to be further congestion marked, the sender will return to the unlucky enough to be further congestion marked, the sender will
flow established state and continue to send RECT packets (worth 0). return to the flow established state and continue to send RECT
packets (worth 0).
The table below specifies unambiguously the worth of each extended The table below specifies unambiguously the worth of each extended
ECN codepoint. Note the order is different from the previous table ECN codepoint. Note the order is different from the previous table
to better show how the worth increments and decrements. The FNE to better show how the worth increments and decrements. The FNE
codepoint is an exception. It is used in the flow bootstrap process codepoint is an exception. It is used in the flow bootstrap process
(explained later) and has the same positive (+1) worth as a packet (explained later) and has the same positive (+1) worth as a packet
with the Re-Echo codepoint. with the Re-Echo codepoint.
+-------+-----+----------------+-------+----------------------------+ +--------+------+----------------+-------+--------------------------+
| ECN | RE | Extended ECN | Worth | Re-ECN meaning | | ECN | RE | Extended ECN | Worth | Re-ECN meaning |
| field | bit | codepoint | | | | field | bit | codepoint | | |
+-------+-----+----------------+-------+----------------------------+ +--------+------+----------------+-------+--------------------------+
| 00 | 0 | Not-RECT | ... | Not re-ECN-capable | | 00 | 0 | Not-RECT | ... | Not re-ECN-capable |
| | | | | transport | | | | | | transport |
| 01 | 0 | Re-Echo | +1 | Re-echoed congestion and | | 01 | 0 | Re-Echo | +1 | Re-echoed congestion and |
| | | | | RECT | | | | | | RECT |
| 10 | 0 | --- | ... | Legacy ECN use only | | 10 | 0 | --- | ... | Legacy ECN use only |
| 11 | 0 | CE(0) | 0 | Re-Echo canceled by | | 11 | 0 | CE(0) | 0 | Re-Echo canceled by |
| | | | | congestion experienced | | | | | | congestion experienced |
| 00 | 1 | FNE | +1 | Feedback not established | | 00 | 1 | FNE | +1 | Feedback not established |
| 01 | 1 | RECT | 0 | Re-ECN capable transport | | 01 | 1 | RECT | 0 | Re-ECN capable transport |
| 10 | 1 | --CU-- | ... | Currently unused | | 10 | 1 | --CU-- | ... | Currently unused |
| | | | | | | | | | | |
| 11 | 1 | CE(-1) | -1 | Congestion experienced | | 11 | 1 | CE(-1) | -1 | Congestion experienced |
+-------+-----+----------------+-------+----------------------------+ +--------+------+----------------+-------+--------------------------+
Table 3: 'Worth' of Extended ECN Codepoints Table 3: 'Worth' of Extended ECN Codepoints
4. Transport Layers 4. Transport Layers
4.1. TCP 4.1. TCP
Re-ECN capability at the sender is essential. At the receiver it is Re-ECN capability at the sender is essential. At the receiver it is
optional, as long as the receiver has a basic (`vanilla flavour') optional, as long as the receiver has a basic (`vanilla flavour')
RFC3168-compliant ECN-capable transport (ECT) [RFC3168]. Given re- RFC3168-compliant ECN-capable transport (ECT) [RFC3168]. Given re-
ECN is not the first attempt to define the semantics of the ECN ECN is not the first attempt to define the semantics of the ECN
field, we give a table below summarising what happens for various field, we give a table below summarising what happens for various
combinations of capabilities of the sender S and receiver R, as combinations of capabilities of the sender S and receiver R, as
indicated in the first four columns below. The last column gives the indicated in the first four columns below. The last column gives the
mode a half-connection should be in after the first two of the three mode a half-connection should be in after the first two of the three
TCP handshakes. TCP handshakes.
+--------+---------------+-----------+---------+--------------------+ +--------+--------------+------------+---------+--------------------+
| Re-ECT | ECT-Nonce | ECT | Not-ECT | S-R | | Re-ECT | ECT-Nonce | ECT | Not-ECT | S-R |
| | (RFC3540) | (RFC3168) | | Half-connection | | | (RFC3540) | (RFC3168) | | Half-connection |
| | | | | Mode | | | | | | Mode |
+--------+---------------+-----------+---------+--------------------+ +--------+--------------+------------+---------+--------------------+
| SR | | | | RECN | | SR | | | | RECN |
| S | R | | | RECN-Co | | S | R | | | RECN-Co |
| S | | R | | RECN-Co | | S | | R | | RECN-Co |
| S | | | R | Not-ECT | | S | | | R | Not-ECT |
+--------+---------------+-----------+---------+--------------------+ +--------+--------------+------------+---------+--------------------+
Table 4: Modes of TCP Half-connection for Combinations of ECN Table 4: Modes of TCP Half-connection for Combinations of ECN
Capabilities of Sender S and Receiver R Capabilities of Sender S and Receiver R
We will describe what happens in each mode, then describe how they We will describe what happens in each mode, then describe how they
are negotiated. The abbreviations for the modes in the above table are negotiated. The abbreviations for the modes in the above table
mean: mean:
RECN: Full re-ECN capable transport RECN: Full re-ECN capable transport
RECN-Co: Re-ECN sender in compatibility mode with a vanilla [RFC3168] RECN-Co: Re-ECN sender in compatibility mode with a
ECN receiver or an [RFC3540] ECN nonce-capable receiver. vanilla [RFC3168] ECN receiver or an [RFC3540] ECN nonce-capable
Implementation of this mode is OPTIONAL. receiver. Implementation of this mode is OPTIONAL.
Not-ECT: Not ECN-capable transport, as defined in [RFC3168] for when Not-ECT: Not ECN-capable transport, as defined in [RFC3168] for when
at least one of the transports does not understand even basic ECN at least one of the transports does not understand even basic ECN
marking. marking.
Note that we use the term Re-ECT for a host transport that is re-ECN- Note that we use the term Re-ECT for a host transport that is re-ECN-
capable but RECN for the modes of the half connections between hosts capable but RECN for the modes of the half connections between hosts
when they are both Re-ECT. If a host transport is Re-ECT, this fact when they are both Re-ECT. If a host transport is Re-ECT, this fact
alone does NOT imply either of its half connections will necessarily alone does NOT imply either of its half connections will necessarily
be in RECN mode, at least not until it has confirmed that the other be in RECN mode, at least not until it has confirmed that the other
skipping to change at page 23, line 5 skipping to change at page 23, line 5
RECN mode: Given the constraints on TCP's initial window [RFC3390] RECN mode: Given the constraints on TCP's initial window [RFC3390]
and its exponential window increase during slow start and its exponential window increase during slow start
phase [RFC2581], it turns out that the sender SHOULD set FNE on phase [RFC2581], it turns out that the sender SHOULD set FNE on
the first and third data packets in its flow, assuming equal sized the first and third data packets in its flow, assuming equal sized
data packets once a flow is established. Appendix D presents the data packets once a flow is established. Appendix D presents the
calculation that led to this conclusion. Below, after running calculation that led to this conclusion. Below, after running
through the start of an example TCP session, we give the intuition through the start of an example TCP session, we give the intuition
learned from that calculation. learned from that calculation.
RECN-Co mode: A re-ECT sender that switches into re-ECN compatibility RECN-Co mode: A re-ECT sender that switches into re-ECN
mode or into Not-ECT mode (because it has detected the compatibility mode or into Not-ECT mode (because it has detected
corresponding host is not re-ECN capable) MUST limit its initial the corresponding host is not re-ECN capable) MUST limit its
window to 1 segment. The reasoning behind this constraint is initial window to 1 segment. The reasoning behind this constraint
given in Section 5.4. Having set this initial window, a re-ECN is given in Section 5.4. Having set this initial window, a re-ECN
sender in RECN-Co mode SHOULD set FNE on the first and third data sender in RECN-Co mode SHOULD set FNE on the first and third data
packets in a flow, as for RECN mode. packets in a flow, as for RECN mode.
+----+------+----------------+-------+-------+---------------+------+ +----+------+----------------+-------+-------+---------------+------+
| | Data | TCP A(Re-ECT) | IP A | IP B | TCP B(Re-ECT) | Data | | | Data | TCP A(Re-ECT) | IP A | IP B | TCP B(Re-ECT) | Data |
+----+------+----------------+-------+-------+---------------+------+ +----+------+----------------+-------+-------+---------------+------+
| | Byte | SEQ ACK CTL | EECN | EECN | SEQ ACK CTL | Byte | | | Byte | SEQ ACK CTL | EECN | EECN | SEQ ACK CTL | Byte |
| -- | ---- | ------------- | ----- | ----- | ------------- | ---- | | -- | ---- | ------------- | ----- | ----- | ------------- | ---- |
| 1 | | 0100 SYN | FNE | --> | R.ECC=0 | | | 1 | | 0100 SYN | FNE | --> | R.ECC=0 | |
| | | CWR,ECE,NS | | | | | | | | CWR,ECE,NS | | | | |
skipping to change at page 26, line 7 skipping to change at page 26, line 7
This does not ensure precisely the same number of octets have RE This does not ensure precisely the same number of octets have RE
blanked as were CE marked. But we believe positive errors will blanked as were CE marked. But we believe positive errors will
cancel negative over a long enough period. {ToDo: However, more cancel negative over a long enough period. {ToDo: However, more
research is needed to prove whether this is so. If it is not, it may research is needed to prove whether this is so. If it is not, it may
be necessary to increment and decrement R in octets rather than be necessary to increment and decrement R in octets rather than
packets, by incrementing R as the product of D and the size in octets packets, by incrementing R as the product of D and the size in octets
of packets being sent (typically the MSS).} of packets being sent (typically the MSS).}
4.2. Other Transports 4.2. Other Transports
4.2.1. Guidelines for Adding Re-ECN to Other Transports 4.2.1. General Guidelines for Adding Re-ECN to Other Transports
Re-ECT sender transports that have established the receiver transport Re-ECT sender transports that have established the receiver transport
is at least ECN-capable (not necessarily re-ECN capable) MUST blank is at least ECN-capable (not necessarily re-ECN capable) MUST blank
the RE codepoint in packets carrying at least as many octets as the RE codepoint in packets carrying at least as many octets as
arrive at receiver with the CE codepoint set. Re-ECN-capable sender arrive at receiver with the CE codepoint set. Re-ECN-capable sender
transports should always initialise the ECN field to the ECT(1) transports should always initialise the ECN field to the ECT(1)
codepoint once a flow is established. codepoint once a flow is established.
If the sender transport does not have sufficient feedback to even If the sender transport does not have sufficient feedback to even
estimate the path's CE rate, it SHOULD set FNE continuously. If the estimate the path's CE rate, it SHOULD set FNE continuously. If the
sender transport has some, perhaps stale, feedback to estimate that sender transport has some, perhaps stale, feedback to estimate that
the path's CE rate is nearly definitely less than E%, the transport the path's CE rate is nearly definitely less than E%, the transport
MAY blank RE in packets for E% of sent octets, and set the RECT MAY blank RE in packets for E% of sent octets, and set the RECT
codepoint for the remainder. codepoint for the remainder.
The following sections give guidelines on how re-ECN support could be
added to RSVP or NSIS, to DCCP, and to SCTP - although separate
Internet drafts will be necessary to document the exact mechanics of
re-ECN if each of these protocols.
{ToDo: Give a brief outline of what would be expected for each of the {ToDo: Give a brief outline of what would be expected for each of the
following: following:
o UDP fire and forget (e.g. DNS) o UDP fire and forget (e.g. DNS)
o UDP streaming with no feedback o UDP streaming with no feedback
o UDP streaming with feedback o UDP streaming with feedback
o DCCP [RFC4340] } }
o RSVP and/or NSIS: A separate I-D has been submitted [Re-PCN] 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS
describing how re-ECN can be used in an edge-to-edge rather than
end-to-end scenario. It can then be used by downstream networks A separate I-D has been submitted [Re-PCN] describing how re-ECN can
to police whether upstream networks are blocking new flow be used in an edge-to-edge rather than end-to-end scenario. It can
reservations when downstream congestion is too high, even though then be used by downstream networks to police whether upstream
the congestion is in other operators' downstream networks. This networks are blocking new flow reservations when downstream
relates to current work in progress on Admission Control over congestion is too high, even though the congestion is in other
Diffserv using Pre-Congestion Notification, being reported to the operators' downstream networks. This relates to current work in
IETF TSVWG [CL-deploy]. progress on Admission Control over Diffserv using Pre-Congestion
Notification, being reported to the IETF TSVWG [CL-deploy].
4.2.3. Guidelines for adding Re-ECN to DCCP
Beside adjusting the initial features negotiation sequence, operating
re-ECN in DCCP could be achieved by defining a new option to be added
to acknowledgments, that would include a multibit field where the
destination could copy its ECC.
4.2.4. Guidelines for adding Re-ECN to SCTP
Annex 1 in RFC4340 gives the specifications for SCTP to support ECN.
Similar steps should be taken to support re-ECN. Beside adjusting
the initial features negotiation sequence, operating re-ECN in SCTP
could be achieved by defining a new control chunk, that would include
a multibit field where the destination could copy its ECC
5. Network Layer 5. Network Layer
5.1. Re-ECN IPv4 Wire Protocol 5.1. Re-ECN IPv4 Wire Protocol
The wire protocol of the ECN field in the IP header remains largely The wire protocol of the ECN field in the IP header remains largely
unchanged from [RFC3168]. However, an extension to the ECN field we unchanged from [RFC3168]. However, an extension to the ECN field we
call the RE (re-ECN extension) flag (Section 3.2) is defined in this call the RE (re-ECN extension) flag (Section 3.2) is defined in this
document. It doubles the extended ECN codepoint space, giving 8 document. It doubles the extended ECN codepoint space, giving 8
potential codepoints. The semantics of the extra codepoints are potential codepoints. The semantics of the extra codepoints are
skipping to change at page 29, line 26 skipping to change at page 30, line 9
field which we would expect to change en route. As the RE flag does field which we would expect to change en route. As the RE flag does
not need end-to-end authentication, we set the C flag to '1'. not need end-to-end authentication, we set the C flag to '1'.
{ToDo: A Congestion Hop by Hop Option ID will need to be registered {ToDo: A Congestion Hop by Hop Option ID will need to be registered
with IANA.} with IANA.}
5.3. Router Forwarding Behaviour 5.3. Router Forwarding Behaviour
Re-ECN works well without modifying the forwarding behaviour of any Re-ECN works well without modifying the forwarding behaviour of any
routers. However, below, two OPTIONAL changes to forwarding routers. However, below, two OPTIONAL changes to forwarding
behaviour are defined, which respectively enhance performance and behaviour are defined which respectively enhance performance and
improve a router's discrimination against flooding attacks. They are improve a router's discrimination against flooding attacks. They are
both OPTIONAL additions that we propose MAY apply by default to all both OPTIONAL additions that we propose MAY apply by default to all
Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN
marking behaviours [RFC3168]. Specifications for PHBs MAY define marking behaviours [RFC3168]. Specifications for PHBs MAY define
different forwarding behaviours from this default, but this is NOT different forwarding behaviours from this default, but this is NOT
REQUIRED. [Re-PCN] is one example. REQUIRED. [Re-PCN] is one example.
FNE indicates ECT: FNE indicates ECT:
The FNE codepoint tells a router to assume that the packet was The FNE codepoint tells a router to assume that the packet was
skipping to change at page 30, line 12 skipping to change at page 31, line 5
it MAY preferentially drop packets within the same Diffserv PHB it MAY preferentially drop packets within the same Diffserv PHB
using the preference order for extended ECN codepoints given in using the preference order for extended ECN codepoints given in
Table 7. Preferential dropping can be difficult to implement on Table 7. Preferential dropping can be difficult to implement on
some hardware, but if feasible it would discriminate against some hardware, but if feasible it would discriminate against
attack traffic if done as part of the overall policing framework attack traffic if done as part of the overall policing framework
of Section 6.1.3. If nowhere else, routers at the egress of a of Section 6.1.3. If nowhere else, routers at the egress of a
network SHOULD implement preferential drop (stronger than the MAY network SHOULD implement preferential drop (stronger than the MAY
above). For simplicity, preferences 4 & 5 MAY be merged into one above). For simplicity, preferences 4 & 5 MAY be merged into one
preference level. preference level.
+-------+-----+-----------+-------+------------+--------------------+ +-------+-----+------------+-------+------------+-------------------+
| ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning |
| field | bit | ECN | | (1 = drop | | | field | bit | ECN | | (1 = drop | |
| | | codepoint | | 1st) | | | | | codepoint | | 1st) | |
+-------+-----+-----------+-------+------------+--------------------+ +-------+-----+------------+-------+------------+-------------------+
| 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed |
| | | | | | congestion and | | | | | | | congestion and |
| | | | | | RECT | | | | | | | RECT |
| 00 | 1 | FNE | +1 | 4 | Feedback not | | 00 | 1 | FNE | +1 | 4 | Feedback not |
| | | | | | established | | | | | | | established |
| 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled |
| | | | | | by congestion | | | | | | | by congestion |
| | | | | | experienced | | | | | | | experienced |
| 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | 01 | 1 | RECT | 0 | 3 | Re-ECN capable |
| | | | | | transport | | | | | | | transport |
| 11 | 1 | CE(-1) | -1 | 3 | Congestion | | 11 | 1 | CE(-1) | -1 | 3 | Congestion |
| | | | | | experienced | | | | | | | experienced |
| 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | 10 | 1 | --CU-- | n/a | 2 | Currently Unused |
| 10 | 0 | --- | n/a | 2 | Legacy ECN use | | 10 | 0 | --- | n/a | 2 | Legacy ECN use |
| | | | | | only | | | | | | | only |
| 00 | 0 | Not-RECT | n/a | 1 | Not re-ECN-capable | | 00 | 0 | Not-RECT | n/a | 1 | Not |
| | | | | | re-ECN-capable |
| | | | | | transport | | | | | | | transport |
+-------+-----+-----------+-------+------------+--------------------+ +-------+-----+------------+-------+------------+-------------------+
Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth') Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth')
The above drop preferences are arranged to preserve packets with The above drop preferences are arranged to preserve packets with
more positive worth (Section 3.4), given senders of positive more positive worth (Section 3.4), given senders of positive
packets must have honestly declared downstream congestion. This packets must have honestly declared downstream congestion. This
is explained fully in Section 6 on applications, particularly when is explained fully in Section 6 on applications, particularly when
the application of re-ECN to protect against DDoS attacks is the application of re-ECN to protect against DDoS attacks is
described. described.
skipping to change at page 31, line 9 skipping to change at page 32, line 5
Congested routers may mark an FNE packet to CE(-1) (Section 5.3), and Congested routers may mark an FNE packet to CE(-1) (Section 5.3), and
the initial SYN MUST be set to FNE by Re-ECT client A the initial SYN MUST be set to FNE by Re-ECT client A
(Section 4.1.4). So an initial SYN may be marked CE(-1) rather than (Section 4.1.4). So an initial SYN may be marked CE(-1) rather than
dropped. This seems dangerous, because the sender has not yet dropped. This seems dangerous, because the sender has not yet
established whether the receiver is a legacy one that does not established whether the receiver is a legacy one that does not
understand congestion marking. It also seems to allow malicious understand congestion marking. It also seems to allow malicious
senders to take advantage of ECN marking to avoid so much drop when senders to take advantage of ECN marking to avoid so much drop when
launching SYN flooding attacks. Below we explain the features of the launching SYN flooding attacks. Below we explain the features of the
protocol design that remove both these dangers. protocol design that remove both these dangers.
ECN-capable initial SYN with a Not-ECT server: If the TCP server B is ECN-capable initial SYN with a Not-ECT server: If the TCP server B
re-ECN capable, provision is made for it to feedback a possible is re-ECN capable, provision is made for it to feedback a possible
congestion marked SYN in the SYN ACK (Section 4.1.4). But if the congestion marked SYN in the SYN ACK (Section 4.1.4). But if the
TCP client A finds out from the SYN ACK that the server was not TCP client A finds out from the SYN ACK that the server was not
ECN-capable, the TCP client MUST consider the first SYN as ECN-capable, the TCP client MUST consider the first SYN as
congestion marked before setting itself into Not-ECT mode. congestion marked before setting itself into Not-ECT mode.
Section 4.1.4 mandates that such a TCP client MUST also set its Section 4.1.4 mandates that such a TCP client MUST also set its
initial window to 1 segment. In this way we remove the need to initial window to 1 segment. In this way we remove the need to
cautiously avoid setting the first SYN to Not-RECT. This will cautiously avoid setting the first SYN to Not-RECT. This will
give worse performance while deployment is patchy, but better give worse performance while deployment is patchy, but better
performance once deployment is widespread. performance once deployment is widespread.
skipping to change at page 38, line 24 skipping to change at page 39, line 19
their own expected downstream congestion so that N1 can deploy a their own expected downstream congestion so that N1 can deploy a
policer at its ingress to check that S1 is complying with whatever policer at its ingress to check that S1 is complying with whatever
congestion control it should be using (Section 6.1.5). If N1 is congestion control it should be using (Section 6.1.5). If N1 is
extremely conservative it may police each flow, but it can choose extremely conservative it may police each flow, but it can choose
to just police the bulk amount of congestion each customer causes to just police the bulk amount of congestion each customer causes
without regard to flows, or if it is extremely liberal it need not without regard to flows, or if it is extremely liberal it need not
police congestion control at all. Whatever, it is always police congestion control at all. Whatever, it is always
preferable to police traffic at the very first ingress into an preferable to police traffic at the very first ingress into an
internetwork, before non-compliant traffic can cause any damage. internetwork, before non-compliant traffic can cause any damage.
Edge egress dropper: If the policer ensures the source has less right Edge egress dropper: If the policer ensures the source has less
to a high rate the higher it declares downstream congestion, the right to a high rate the higher it declares downstream congestion,
source has a clear incentive to understate downstream congestion. the source has a clear incentive to understate downstream
But, if flows of packets are understated when they enter the congestion. But, if flows of packets are understated when they
internetwork, they will have become negative by the time they enter the internetwork, they will have become negative by the time
leave. So, we introduce a dropper at the last network egress, they leave. So, we introduce a dropper at the last network
which drops packets in flows that persistently declare negative egress, which drops packets in flows that persistently declare
downstream congestion (see Section 6.1.4 for details). negative downstream congestion (see Section 6.1.4 for details).
..competitive routing ..competitive routing
.' : '. .' : '.
.' p e n a l:t i e s '. .' p e n a l:t i e s '.
: | : \ : : | : \ :
A : | : | : A : | : | :
|S <-----N1----> <---N2---> <---N4--> R domain |S <-----N1----> <---N2---> <---N4--> R domain
| : | : | : | : | : | :
| V | : | : | V | : | :
3% |--------+ | : | : 3% |--------+ | : | :
skipping to change at page 40, line 14 skipping to change at page 40, line 39
may all be allowed different responses to congestion. The figure may all be allowed different responses to congestion. The figure
depicts this downward pressure on N2 by the solid downward arrow depicts this downward pressure on N2 by the solid downward arrow
at the egress of N2. Then N2 has an incentive either to police at the egress of N2. Then N2 has an incentive either to police
the congestion response of its own ingress traffic (from N1) or to the congestion response of its own ingress traffic (from N1) or to
emulate policing by applying penalties to N1 in turn on the basis emulate policing by applying penalties to N1 in turn on the basis
of congestion counted at their mutual boundary. In this recursive of congestion counted at their mutual boundary. In this recursive
way, the incentives for each flow to respond correctly to way, the incentives for each flow to respond correctly to
congestion trace back with each flow precisely to each source, congestion trace back with each flow precisely to each source,
despite the mechanism not recognising flows (see Section 6.2.2). despite the mechanism not recognising flows (see Section 6.2.2).
Inter-domain congestion charging diversity: Any two networks are free Inter-domain congestion charging diversity: Any two networks are
to agree any of a range of penalty regimes between themselves free to agree any of a range of penalty regimes between themselves
within the following reasonable constraints. N2 should expect to within the following reasonable constraints. N2 should expect to
have to pay penalties to N4 where penalties monotonically increase have to pay penalties to N4 where penalties monotonically increase
with the volume of congestion and negative penalties are not with the volume of congestion and negative penalties are not
allowed. For instance, they may agree an SLA with tiered allowed. For instance, they may agree an SLA with tiered
congestion thresholds, where higher penalties apply the higher the congestion thresholds, where higher penalties apply the higher the
threshold that is broken. But the most obvious (and useful) form threshold that is broken. But the most obvious (and useful) form
of penalty is where N4 levies a charge on N2 proportional to the of penalty is where N4 levies a charge on N2 proportional to the
volume of downstream congestion N2 dumps into N4. In the volume of downstream congestion N2 dumps into N4. In the
explanation that follows, we assume this specific variant of explanation that follows, we assume this specific variant of
volume charging between networks - charging proportionate to the volume charging between networks - charging proportionate to the
skipping to change at page 43, line 45 skipping to change at page 44, line 19
fraction of negative octets introduced by congestion marking, leaving fraction of negative octets introduced by congestion marking, leaving
a balance of zero. If it is less (a negative flow), it implies that a balance of zero. If it is less (a negative flow), it implies that
the source is understating path congestion (which will reduce the the source is understating path congestion (which will reduce the
penalties that N2 owes N4). penalties that N2 owes N4).
If flows are positive, N4 need take no action---this simply means its If flows are positive, N4 need take no action---this simply means its
upstream neighbour is paying more penalties than it needs to, and the upstream neighbour is paying more penalties than it needs to, and the
source is going slower than it needs to. But, to protect itself source is going slower than it needs to. But, to protect itself
against persistently negative flows, N4 will need to install a against persistently negative flows, N4 will need to install a
dropper at its egress. Appendix E gives a suggested algorithm for dropper at its egress. Appendix E gives a suggested algorithm for
this dropper. There is not intention that the dropper algorithm this dropper. There is no intention that the dropper algorithm needs
needs to be standardised, it is merely provided to show that an to be standardised, it is merely provided to show that an efficient,
efficient, robust algorithm is possible. But whatever algorithm is robust algorithm is possible. But whatever algorithm is used must
used must meet the criteria below: meet the criteria below:
o It SHOULD introduce minimal false positives for honest flows; o It SHOULD introduce minimal false positives for honest flows;
o It SHOULD quickly detect and sanction dishonest flows (minimal o It SHOULD quickly detect and sanction dishonest flows (minimal
false negatives); false negatives);
o It MUST be invulnerable to state exhaustion attacks from malicious o It MUST be invulnerable to state exhaustion attacks from malicious
sources. For instance, if the dropper uses flow-state, it should sources. For instance, if the dropper uses flow-state, it should
not be possible for a source to send numerous packets, each with a not be possible for a source to send numerous packets, each with a
different flow ID, to force the dropper to exhaust its memory different flow ID, to force the dropper to exhaust its memory
capacity; capacity;
o It MUST introduce sufficient loss in goodput so that malicious o It MUST introduce sufficient loss in goodput so that malicious
skipping to change at page 44, line 35 skipping to change at page 45, line 9
setting the FNE codepoint at the start of a flow, even though there setting the FNE codepoint at the start of a flow, even though there
is a cost to the sender of setting FNE (positive `worth'). Indeed, is a cost to the sender of setting FNE (positive `worth'). Indeed,
with the FNE codepoint, the rate at which a sender can generate new with the FNE codepoint, the rate at which a sender can generate new
flows can be limited (Appendix G). In this respect, the FNE flows can be limited (Appendix G). In this respect, the FNE
codepoint works like Handley's state set-up bit [Steps_DoS]. codepoint works like Handley's state set-up bit [Steps_DoS].
Appendix E also gives an example dropper implementation that Appendix E also gives an example dropper implementation that
aggregates flow state. Dropper algorithms will often maintain a aggregates flow state. Dropper algorithms will often maintain a
moving average across flows of the fraction of RE blanked packets. moving average across flows of the fraction of RE blanked packets.
When maintaining an average across flows, a dropper SHOULD only allow When maintaining an average across flows, a dropper SHOULD only allow
flows into the average if they start with FNE, but it SHOULD not flows into the average if they start with FNE, but it SHOULD NOT
include packets with the FNE codepoint set in the average. A sender include packets with the FNE codepoint set in the average. A sender
sets the FNE codepoint when it does not have the benefit of feedback sets the FNE codepoint when it does not have the benefit of feedback
from the receiver. So, counting packets with FNE cleared would be from the receiver. So, counting packets with FNE cleared would be
likely to make the average unnecessarily positive, providing headroom likely to make the average unnecessarily positive, providing headroom
(or should we say footroom?) for dishonest (negative) traffic. (or should we say footroom?) for dishonest (negative) traffic.
If the dropper detects a persistently negative flow, it SHOULD drop If the dropper detects a persistently negative flow, it SHOULD drop
sufficient negative and neutral packets to force the flow to not be sufficient negative and neutral packets to force the flow to not be
negative. Drops SHOULD be focused on just sufficient packets in negative. Drops SHOULD be focused on just sufficient packets in
misbehaving flows to remove the negative bias while doing minimal misbehaving flows to remove the negative bias while doing minimal
skipping to change at page 54, line 5 skipping to change at page 54, line 25
that the feedback loop is not broken but useful data can be that the feedback loop is not broken but useful data can be
removed. removed.
7. Incremental Deployment 7. Incremental Deployment
7.1. Incremental Deployment Features 7.1. Incremental Deployment Features
The design of the re-ECN protocol started from the fact that the The design of the re-ECN protocol started from the fact that the
current ECN marking behaviour of routers was sufficient and that re- current ECN marking behaviour of routers was sufficient and that re-
feedback could be introduced around these routers by changing the feedback could be introduced around these routers by changing the
sender behaviour but not the routers. Otherwise, if had required sender behaviour but not the routers. Otherwise, if we had required
routers to be changed, the chance of encountering a path that had routers to be changed, the chance of encountering a path that had
every router upgraded would be vanishly small during early every router upgraded would be vanishly small during early
deployment, giving no incentive to start deployment. Also, as there deployment, giving no incentive to start deployment. Also, as there
is no new forwarding behaviour, routers and hosts do not have to is no new forwarding behaviour, routers and hosts do not have to
signal or negotiate anything. signal or negotiate anything.
However, networks that choose to protect themselves using re-ECN do However, networks that choose to protect themselves using re-ECN do
have to add new security functions at their trust boundaries with have to add new security functions at their trust boundaries with
others. They distinguish legacy traffic by its ECN field. Traffic others. They distinguish legacy traffic by its ECN field. Traffic
from Not-ECT transports is distinguishable by its Not-RECT marking. from Not-ECT transports is distinguishable by its Not-RECT marking.
skipping to change at page 55, line 25 skipping to change at page 55, line 47
None of these changes REQUIRE any modifications to routers. Also None of these changes REQUIRE any modifications to routers. Also
none of these changes affect anything about end to end congestion none of these changes affect anything about end to end congestion
control; they are all to do with allowing networks to police that end control; they are all to do with allowing networks to police that end
to end congestion control is well-behaved. to end congestion control is well-behaved.
7.2. Incremental Deployment Incentives 7.2. Incremental Deployment Incentives
It would only be worth standardising the re-ECN protocol if there It would only be worth standardising the re-ECN protocol if there
existed a coherent story for how it might be incrementally deployed. existed a coherent story for how it might be incrementally deployed.
In order for it to have a chance of deployment, everyone who needs to In order for it to have a chance of deployment, everyone who needs to
act, must have a strong incentive to act, and the incentives must act must have a strong incentive to act, and the incentives must
arise in the order that deployment would have to happen. Re-ECN arise in the order that deployment would have to happen. Re-ECN
works around unmodified ECN routers, but we can't just discuss why works around unmodified ECN routers, but we can't just discuss why
and how re-ECN deployment might build on ECN deployment, because and how re-ECN deployment might build on ECN deployment, because
there is precious little to build on in the first place. Instead, we there is precious little to build on in the first place. Instead, we
aim to show that re-ECN deployment could carry ECN with it. We focus aim to show that re-ECN deployment could carry ECN with it. We focus
on commercial deployment incentives, although some of the arguments on commercial deployment incentives, although some of the arguments
apply equally to academic or government sectors. apply equally to academic or government sectors.
ECN deployment: ECN deployment:
skipping to change at page 58, line 40 skipping to change at page 59, line 13
world to the religion of policing. Networks that chose not to world to the religion of policing. Networks that chose not to
deploy egress droppers would leave themselves open to being deploy egress droppers would leave themselves open to being
congested by senders in other networks. But that would be their congested by senders in other networks. But that would be their
choice. choice.
The important aspect of the egress dropper though is that it most The important aspect of the egress dropper though is that it most
protects the network that deploys it. If a network does not protects the network that deploys it. If a network does not
deploy an egress dropper, sources sending into it from other deploy an egress dropper, sources sending into it from other
networks will be able to understate the congestion they are networks will be able to understate the congestion they are
causing. Whereas, if a network deploys an egress dropper, it can causing. Whereas, if a network deploys an egress dropper, it can
know how much congestion other networks are dumping into it. And know how much congestion other networks are dumping into it, and
apply penalties or charges accordingly. So, whether or not a apply penalties or charges accordingly. So, whether or not a
network polices its own sources at ingress, it is in its interests network polices its own sources at ingress, it is in its interests
to deploy an egress dropper. to deploy an egress dropper.
Host support: Host support:
In the above deployment scenario, host operating system support In the above deployment scenario, host operating system support
for re-ECN came about through the cellular operators demanding it for re-ECN came about through the cellular operators demanding it
in device standards (i.e. 3GPP). Of course, increasingly, mobile in device standards (i.e. 3GPP). Of course, increasingly, mobile
devices are being built to support multiple wireless technologies. devices are being built to support multiple wireless technologies.
skipping to change at page 60, line 7 skipping to change at page 60, line 25
the motivator, but it seems optimistic to expect such a level of the motivator, but it seems optimistic to expect such a level of
joined-up thinking from today's communications industry. We joined-up thinking from today's communications industry. We
believe a single application alone must be a sufficient motivator. believe a single application alone must be a sufficient motivator.
In short, everyone gains from adding accountability to TCP/IP, In short, everyone gains from adding accountability to TCP/IP,
except the selfish or malicious. So, deployment incentives tend except the selfish or malicious. So, deployment incentives tend
to be strong. to be strong.
8. Architectural Rationale 8. Architectural Rationale
In the Internet's technical community the danger of not responding to In the Internet's technical community, the danger of not responding
congestion is well-understood, with its attendant risk of congestion to congestion is well-understood, as well as its attendant risk of
collapse [RFC3714]. However, many of the Internet's commercial congestion collapse [RFC3714]. However, one side of the Internet's
community consider that the very essence of IP is to provide open commercial community considers that the very essence of IP is to
access to the internetwork for all applications. Congestion is seen provide open access to the internetwork for all applications. They
as a symptom of over-conservative investment. And the goal of see congestion as a symptom of over-conservative investment, and rely
application design is to find novel ways to continue working despite on revising application designs to find novel ways to keep
congestion. They argue that the Internet was never intended to be applications working despite congestion. They argue that the
solely for TCP-friendly applications. Another side of the Internet's Internet was never intended to be solely for TCP-friendly
commercial community believe that it is no use providing a network applications. Meanwhile, another side of the Internet's commercial
for novel applications if it has insufficient capacity. And it will community believes that it is worthwhile providing a network for
always have insufficient capacity unless a greater share of novel applications only if it has sufficient capacity, which can
application revenues can be /assured/ for the infrastructure happen only if a greater share of application revenues can be
provider. Otherwise the major investments required will carry too /assured/ for the infrastructure provider. Otherwise the major
much risk and won't happen. investments required would carry too much risk and wouldn't happen.
The lesson articulated in [Tussle] is that we shouldn't embed our The lesson articulated in [Tussle] is that we shouldn't embed our
view on these arguments into the Internet at design time. Instead we view on these arguments into the Internet at design time. Instead we
should design the Internet so that the outcome of these arguments can should design the Internet so that the outcome of these arguments can
get decided at run-time. Re-ECN is designed in that spirit. Once get decided at run-time. Re-ECN is designed in that spirit. Once
the protocol is available, different network operators can choose how the protocol is available, different network operators can choose how
liberal they want to be in holding people accountable for the liberal they want to be in holding people accountable for the
congestion they cause. Some might boldly invest in capacity and not congestion they cause. Some might boldly invest in capacity and not
police its use at all, hoping that novel applications will result. police its use at all, hoping that novel applications will result.
Others might use re-ECN for fine-grained flow policing, expecting to Others might use re-ECN for fine-grained flow policing, expecting to
skipping to change at page 62, line 39 skipping to change at page 63, line 13
the network layer to modify the next guess. the network layer to modify the next guess.
9. Related Work 9. Related Work
{Due to lack of time, this section is incomplete. The reader is {Due to lack of time, this section is incomplete. The reader is
referred to the Related Work section of [Re-fb] for a brief selection referred to the Related Work section of [Re-fb] for a brief selection
of related ideas.} of related ideas.}
9.1. Policing Rate Response to Congestion 9.1. Policing Rate Response to Congestion
ATM network elements send congestion back-pressure messages [ITU- ATM network elements send congestion back-pressure
T.I.371] along each connection, duplicating any end to end feedback messages [ITU-T.I.371] along each connection, duplicating any end to
because they don't trust it. On the other hand, re-ECN ensures end feedback because they don't trust it. On the other hand, re-ECN
information in forwarded packets can be used for congestion ensures information in forwarded packets can be used for congestion
management without requiring a connection-oriented architecture and management without requiring a connection-oriented architecture and
re-using the overhead of fields that are already set aside for end to re-using the overhead of fields that are already set aside for end to
end congestion control (and routing loop detection in the case of re- end congestion control (and routing loop detection in the case of re-
TTL in Appendix F). TTL in Appendix F).
We borrowed ideas from policers in the literature [pBox],[XCHOKe], We borrowed ideas from policers in the literature [pBox],[XCHOKe],
AFD etc. for our rate equation policer. However, without the benefit AFD etc. for our rate equation policer. However, without the benefit
of re-ECN they don't police the correct rate for the condition of of re-ECN they don't police the correct rate for the condition of
their path. They detect unusually high /absolute/ rates, but only their path. They detect unusually high /absolute/ rates, but only
while the policer itself is congested, because they work by detecting while the policer itself is congested, because they work by detecting
skipping to change at page 63, line 25 skipping to change at page 63, line 47
accidental side-effect. They actually punish traffic that fills accidental side-effect. They actually punish traffic that fills
troughs as much as traffic that causes peaks in utilisation. In troughs as much as traffic that causes peaks in utilisation. In
practice network operators need to be able to allocate service by practice network operators need to be able to allocate service by
cost during congestion, and by value at other times. cost during congestion, and by value at other times.
9.2. Congestion Notification Integrity 9.2. Congestion Notification Integrity
The choice of two ECT code-points in the ECN field [RFC3168] The choice of two ECT code-points in the ECN field [RFC3168]
permitted future flexibility, optionally allowing the sender to permitted future flexibility, optionally allowing the sender to
encode the experimental ECN nonce [RFC3540] in the packet stream. encode the experimental ECN nonce [RFC3540] in the packet stream.
This mechanism has since been included in the specifications of DCCP
[RFC4340].
The ECN nonce is an elegant scheme that allows the sender to detect The ECN nonce is an elegant scheme that allows the sender to detect
if someone in the feedback loop tries to claim no congestion was if someone in the feedback loop - the receiver especially - tries to
experienced when it fact it was (whether drop or ECN marking). The claim no congestion was experienced when in fact congestion lead to
sender chooses between the two ECT codepoints in a pseudo-random packet drops or ECN marks. For each packet it sends, the sender
sequence. Then, whenever the network marks a packet with CE, to deny chooses between the two ECT codepoints in a pseudo-random sequence.
the congestion happened, the cheater would have to guess which ECT Then, whenever the network marks a packet with CE, if the receiver
codepoint was overwritten, with only a 50:50 chance of being correct wants to deny congestion happened, she has to guess which ECT
each time. codepoint was overwritten. She has only a 50:50 chance of being
correct each time she denies a congestion mark or a drop, which
ultimately will give her away.
The assumption behind the ECN nonce is that a sender will want to The purpose of a network-layer nonce has to be the protection of the
detect whether a receiver is suppressing congestion feedback. This network in the first place, while a transport-layer nonce had better
is only true if the sender's interests are aligned with the be used to protect the sender from cheating receivers. Now, the
network's, or with the community of users as a whole. This may be assumption behind the ECN nonce is that a sender will want to detect
true for certain large senders, who are under close scrutiny and have whether a receiver is suppressing congestion feedback. This is only
a reputation to maintain. But we have to deal with a more hostile true if the sender's interests are aligned with the network's, or
world, where traffic may be dominated by peer-to-peer transfers, with the community of users as a whole. This may be true for certain
rather than downloads from a few popular sites. Often the `natural' large senders, who are under close scrutiny and have a reputation to
self-interest of a sender is not aligned with the interests of other maintain. But we have to deal with a more hostile world, where
traffic may be dominated by peer-to-peer transfers, rather than
downloads from a few popular sites. Often the `natural' self-
interest of a sender is not aligned with the interests of other
users. It often wishes to transfer data quickly to the receiver as users. It often wishes to transfer data quickly to the receiver as
much as the receiver wants the data quickly. much as the receiver wants the data quickly.
In contrast, the re-ECN protocol enables policing of an agreed rate- In contrast, the re-ECN protocol enables policing of an agreed rate-
response to congestion (e.g. TCP-friendliness) at the sender's response to congestion (e.g. TCP-friendliness) at the sender's
interface with the internetwork. It also ensures downstream networks interface with the internetwork. It also ensures downstream networks
can police their upstream neighbours, to encourage them to police can police their upstream neighbours, to encourage them to police
their users in turn. But most importantly, it requires the sender to their users in turn. But most importantly, it requires the sender to
declare path congestion to the network and it can remove traffic at declare path congestion to the network and it can remove traffic at
the egress if this declaration is dishonest. So it can police the egress if this declaration is dishonest. So it can police
skipping to change at page 67, line 22 skipping to change at page 68, line 5
[RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
S., Wroclawski, J., and L. Zhang, "Recommendations on S., Wroclawski, J., and L. Zhang, "Recommendations on
Queue Management and Congestion Avoidance in the Queue Management and Congestion Avoidance in the
Internet", RFC 2309, April 1998. Internet", RFC 2309, April 1998.
[RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Control", RFC 2581, April 1999. Control", RFC 2581, April 1999.
[RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C.,
Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M.,
Zhang, L., and V. Paxson, "Stream Control Transmission
Protocol", RFC 2960, October 2000.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, September 2001. RFC 3168, September 2001.
[RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
Initial Window", RFC 3390, October 2002. Initial Window", RFC 3390, October 2002.
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram
Congestion Notification (ECN) Signaling with Nonces", Congestion Control Protocol (DCCP)", RFC 4340, March 2006.
RFC 3540, June 2003.
[RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion
Control Protocol (DCCP) Congestion Control ID 2: TCP-like
Congestion Control", RFC 4341, March 2006.
[RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for
Datagram Congestion Control Protocol (DCCP) Congestion
Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342,
March 2006.
15.2. Informative References 15.2. Informative References
[ARI05] Adams, J., Roberts, L., and A. IJsselmuiden, "Changing the [ARI05] Adams, J., Roberts, L., and A. IJsselmuiden, "Changing the
Internet to Support Real-Time Content Supply from a Large Internet to Support Real-Time Content Supply from a Large
Fraction of Broadband Residential Users", BT Technology Fraction of Broadband Residential Users", BT Technology
Journal (BTTJ) 23(2), April 2005. Journal (BTTJ) 23(2), April 2005.
[Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the [Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the
assumptions underlying mechanism design for the Internet", assumptions underlying mechanism design for the Internet",
skipping to change at page 69, line 28 skipping to change at page 70, line 25
[RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
Timer", RFC 2988, November 2000. Timer", RFC 2988, November 2000.
[RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager",
RFC 3124, June 2001. RFC 3124, June 2001.
[RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header",
RFC 3514, April 2003. RFC 3514, April 2003.
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces",
RFC 3540, June 2003.
[RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion
Control for Voice Traffic in the Internet", RFC 3714, Control for Voice Traffic in the Internet", RFC 3714,
March 2004. March 2004.
[RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram
Congestion Control Protocol (DCCP)", RFC 4340, March 2006.
[Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN [Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN
on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01 on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01
(work in progress), March 2006. (work in progress), March 2006.
[Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C.,
Salvatori, A., Soppera, A., and M. Koyabe, "Policing Salvatori, A., Soppera, A., and M. Koyabe, "Policing
Congestion Response in an Internetwork Using Re-Feedback", Congestion Response in an Internetwork Using Re-Feedback",
ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// ACM SIGCOMM CCR 35(4)277--288, August 2005, <http://
www.acm.org/sigs/sigcomm/sigcomm2005/ www.acm.org/sigs/sigcomm/sigcomm2005/
techprog.html#session8>. techprog.html#session8>.
skipping to change at page 70, line 31 skipping to change at page 71, line 28
Protocols (ICNP-02) , November 2002, Protocols (ICNP-02) , November 2002,
<http://www.cc.gatech.edu/~akumar/xchoke.pdf>. <http://www.cc.gatech.edu/~akumar/xchoke.pdf>.
[pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End
Congestion Control in the Internet", IEEE/ACM Transactions Congestion Control in the Internet", IEEE/ACM Transactions
on Networking 7(4) 458--472, August 1999, on Networking 7(4) 458--472, August 1999,
<http://www.aciri.org/floyd/end2end-paper.html>. <http://www.aciri.org/floyd/end2end-paper.html>.
Appendix A. Precise Re-ECN Protocol Operation Appendix A. Precise Re-ECN Protocol Operation
{ToDo: fix this}
The protocol operation described in Section 3.3 was an approximation. The protocol operation described in Section 3.3 was an approximation.
In fact, standard ECN router marking combines 1% and 2% marking into In fact, standard ECN router marking combines 1% and 2% marking into
slightly less than 3% whole-path marking, because routers slightly less than 3% whole-path marking, because routers
deliberately mark CE whether or not it has already been marked by deliberately mark CE whether or not it has already been marked by
another router upstream. So the combined marking fraction would another router upstream. So the combined marking fraction would
actually be 100% - (100% - 1%)(100% - 2%) = 2.98%. actually be 100% - (100% - 1%)(100% - 2%) = 2.98%.
To generalise this we will need some notation. To generalise this we will need some notation.
o j represents the index of each resource (typically queues) along a o j represents the index of each resource (typically queues) along a
skipping to change at page 74, line 12 skipping to change at page 75, line 12
defines this combination as a non-ECN-setup SYN ACK, which remains defines this combination as a non-ECN-setup SYN ACK, which remains
true for vanilla and Nonce ECTs. But for re-ECN we define it as a true for vanilla and Nonce ECTs. But for re-ECN we define it as a
Re-ECN-setup SYN ACK. We didn't use a SYN ACK with both CWR and Re-ECN-setup SYN ACK. We didn't use a SYN ACK with both CWR and
ECE cleared to 0 because that would be the likely response from ECE cleared to 0 because that would be the likely response from
most Not-ECT receivers. And we didn't use a SYN ACK with both CWR most Not-ECT receivers. And we didn't use a SYN ACK with both CWR
and ECE set to 1 either, as at least one broken receiver and ECE set to 1 either, as at least one broken receiver
implementation echoes whatever flags were in the SYN into its SYN implementation echoes whatever flags were in the SYN into its SYN
ACK. Therefore we define a Re-ECN-setup SYN ACK as one with CWR=1 ACK. Therefore we define a Re-ECN-setup SYN ACK as one with CWR=1
& ECE=0. & ECE=0.
Choice of two alternative SYN ACKs: the NS flag may take either value Choice of two alternative SYN ACKs: the NS flag may take either
in a Re-ECN-setup SYN ACK. Section 5.4 REQUIRES that a Re-ECT value in a Re-ECN-setup SYN ACK. Section 5.4 REQUIRES that a Re-
server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to echo ECT server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to
congestion experienced (CE) on the initial SYN. Otherwise a Re- echo congestion experienced (CE) on the initial SYN. Otherwise a
ECN-setup SYN ACK MUST be returned with NS=0. The only current Re-ECN-setup SYN ACK MUST be returned with NS=0. The only current
known use of the NS flag in a SYN ACK is to indicate support for known use of the NS flag in a SYN ACK is to indicate support for
the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1. the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1.
Given the ECN nonce MUST NOT be used for a RECN mode connection, a Given the ECN nonce MUST NOT be used for a RECN mode connection, a
Re-ECN-setup SYN ACK can use either setting of the NS flag without Re-ECN-setup SYN ACK can use either setting of the NS flag without
any risk of confusion, because the CWR & ECE flags will be any risk of confusion, because the CWR & ECE flags will be
reversed relative to those used by an ECN nonce SYN ACK. reversed relative to those used by an ECN nonce SYN ACK.
Appendix D. Packet Marking During Flow Start Appendix D. Packet Marking During Flow Start
{ToDo: Write up proof that sender should mark FNE on first and third {ToDo: Write up proof that sender should mark FNE on first and third
skipping to change at page 81, line 5 skipping to change at page 81, line 37
account from the subset I. Then the weighted mean of all these account from the subset I. Then the weighted mean of all these
samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I} samples should be taken a_S = sum_{forall I} V_{fI} / sum_{forall I}
V_{bI}. V_{bI}.
If V_b is the result of the bulk accounting algorithm over the If V_b is the result of the bulk accounting algorithm over the
accounting period (Appendix H.1) it can be inflated by this factor accounting period (Appendix H.1) it can be inflated by this factor
a_S to get a good unbiased estimate of the volume of downstream a_S to get a good unbiased estimate of the volume of downstream
congestion over the accounting period a_S.V_b, without being polluted congestion over the accounting period a_S.V_b, without being polluted
by the effect of persistently negative flows. by the effect of persistently negative flows.
Appendix I. Argument for holding back the ECN nonce
The ECN nonce is a mechanism that allows a /sending/ transport to
detect if drop or ECN marking at a congested router has been
suppressed by a node somewhere in the feedback loop---another router
or the receiver.
Space for the ECN nonce was set aside in [RFC3168] (currently
proposed standard) while the full nonce mechanism is specified in RFC
3540 (currently experimental). The specifications for [RFC4340]
(currently proposed standard) requires that "Each DCCP sender SHOULD
set ECN Nonces on its packets...". It also mandates as a requirement
for all CCID profiles that "Any newly defined acknowledgement
mechanism MUST include a way to transmit ECN Nonce Echoes back to the
sender.", therefore:
o The CCID profile for TCP-like Congestion Control [RFC4341]
(currently proposed standard) says "The sender will use the ECN
Nonce for data packets, and the receiver will echo those nonces in
its Ack Vectors."
o The CCID profile for TCP-Friendly Rate Control (TFRC) [RFC4342]
recommends that "The sender [use] Loss Intervals options' ECN
Nonce Echoes (and possibly any Ack Vectors' ECN Nonce Echoes) to
probabilistically verify that the receiver is correctly reporting
all dropped or marked packets."
The ECN nonce is used for three types of functions:
o if the sender wants to ensure the integrity of the information
about packet drops,
o if the sending transport chooses to act in the interests of a
congested router,
o if the sending transport wants to allocate its own resources in
proportion to the rates that each network path can sustain, based
on congestion control.
However, when the nonce is used to protect the integrity of
information about packet drops, rather than ECN marks, a transport
layer nonce will always be sufficient (because a drop loses the
transport header as well as the ECN field in the network header),
which would avoid using scarce IP header codepoint space. Similarly,
a transport layer nonce would protect against a receiver sending
early acknowledgements.
The other two functions need the ECN nonce to be in the network
layer, but both require rather optimistic trust assumptions in order
to be useful. If the sending transport chooses to act in the
interests of a congested router, it can reduce its rate if it detects
some malicious party in the feedback loop may be suppressing ECN
feedback. But it would only be useful to a router when /all/ senders
using the router are trusted to act in the router's interest.
In the end, the only essential use of a network layer nonce is when
sending transports (e.g. large servers) want to allocate their /own/
resources in proportion to the rates that each network path can
sustain, based on congestion control. In that case, the nonce allows
senders to be assured that they aren't being duped into giving more
of their own resources to a particular flow. And if congestion
suppression is detected, the sending transport can rate limit the
offending connection to protect its own resources. Certainly, this
is a useful function, but the IETF should carefully decide whether
such a single, very specific case warrants IP header space.
In contrast, re-ECN allows all routers to fully protect themselves
from such attacks, without having to trust anyone - senders,
receivers, neighbouring networks. Re-ECN is therefore proposed in
preference to the ECN nonce on the basis that it addresses the
generic problem of accountability for congestion of a network's
resources at the IP layer.
Delaying the ECN nonce is justified because the applicability of the
ECN nonce seems too limited for it to consume a two-bit codepoint in
the IP header.
Moreover, while we have re-designed the re-ECN codepoints so that
they do not prevent the ECN nonce progressing, the same is not true
the other way round. If the ECN nonce started to see some deployment
(perhaps because it was blessed with proposed standard status),
incremental deployment of re-ECN would effectively be impossible,
because re-ECN marking fractions at inter-domain borders would be
polluted by unknown levels of nonce traffic.
The authors are aware that re-ECN must prove it has the potential it
claims if it is to displace the nonce. Therefore, every effort has
been made to complete a comprehensive specification of re-ECN so that
its potential can be assessed. We therefore seek the opinion of the
Internet community on whether the re-ECN protocol is sufficiently
useful to warrant standards action.
Authors' Addresses Authors' Addresses
Bob Briscoe Bob Briscoe
BT & UCL BT & UCL
B54/77, Adastral Park B54/77, Adastral Park
Martlesham Heath Martlesham Heath
Ipswich IP5 3RE Ipswich IP5 3RE
UK UK
Phone: +44 1473 645196 Phone: +44 1473 645196
skipping to change at page 82, line 5 skipping to change at page 85, line 5
BT BT
B54/69, Adastral Park B54/69, Adastral Park
Martlesham Heath Martlesham Heath
Ipswich IP5 3RE Ipswich IP5 3RE
UK UK
Phone: +44 1473 646923 Phone: +44 1473 646923
Email: martin.koyabe@bt.com Email: martin.koyabe@bt.com
URI: URI:
Intellectual Property Statement Full Copyright Statement
Copyright (C) The Internet Society (2006).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors
retain all their rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Intellectual Property
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79. found in BCP 78 and BCP 79.
skipping to change at page 82, line 29 skipping to change at page 85, line 45
such proprietary rights by implementers or users of this such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr. http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at this standard. Please address the information to the IETF at
ietf-ipr@ietf.org. ietf-ipr@ietf.org.
Disclaimer of Validity
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Copyright Statement
Copyright (C) The Internet Society (2006). This document is subject
to the rights, licenses and restrictions contained in BCP 78, and
except as set forth therein, the authors retain all their rights.
Acknowledgment Acknowledgment
Funding for the RFC Editor function is currently provided by the Funding for the RFC Editor function is provided by the IETF
Internet Society. Administrative Support Activity (IASA).
 End of changes. 68 change blocks. 
219 lines changed or deleted 340 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/