draft-briscoe-tsvwg-re-ecn-tcp-05.txt   draft-briscoe-tsvwg-re-ecn-tcp-06.txt 
Transport Area Working Group B. Briscoe Transport Area Working Group B. Briscoe
Internet-Draft BT & UCL Internet-Draft BT & UCL
Intended status: Standards Track A. Jacquet Intended status: Standards Track A. Jacquet
Expires: July 13, 2008 T. Moncaster Expires: January 15, 2009 T. Moncaster
A. Smith A. Smith
BT BT
January 10, 2008 July 14, 2008
Re-ECN: Adding Accountability for Causing Congestion to TCP/IP Re-ECN: Adding Accountability for Causing Congestion to TCP/IP
draft-briscoe-tsvwg-re-ecn-tcp-05 draft-briscoe-tsvwg-re-ecn-tcp-06
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 37 skipping to change at page 1, line 37
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on July 13, 2008. This Internet-Draft will expire on January 15, 2009.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2008). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This document introduces a new protocol for explicit congestion This document introduces a new protocol for explicit congestion
notification (ECN), termed re-ECN, which can be deployed notification (ECN), termed re-ECN, which can be deployed
incrementally around unmodified routers. The protocol arranges an incrementally around unmodified routers. It enbales the the upstream
extended ECN field in each packet so that, as it crosses any party at any trust boundary in the internetwork to be held
interface in an internetwork, it will carry a truthful prediction of responsible for the congestion they cause, or allow to be caused.
congestion on the remainder of its path. Then the upstream party at
any trust boundary in the internetwork can be held responsible for So, networks can introduce straightforward accountability for
the congestion they cause, or allow to be caused. So, networks can congestion and policing mechanisms for incoming traffic from end-
introduce straightforward accountability and policing mechanisms for customers or from neighbouring network domains. The protocol works
incoming traffic from end-customers or from neighbouring network by arranging an extended ECN field in each packet so that, as it
domains. The purpose of this document is to specify the re-ECN crosses any interface in an internetwork, it will carry a truthful
protocol at the IP layer and to give guidelines on any consequent prediction of congestion on the remainder of its path. The purpose
changes required to transport protocols. It includes the changes of this document is to specify the re-ECN protocol at the IP layer
required to TCP both as an example and as a specification. It also and to give guidelines on any consequent changes required to
gives examples of mechanisms that can use the protocol to ensure data transport protocols. It includes the changes required to TCP both as
sources respond correctly to congestion. And it describes example an example and as a specification. It also gives examples of
mechanisms that ensure the dominant selfish strategy of both network mechanisms that can use the protocol to ensure data sources respond
domains and end-points will be to set the extended ECN field correctly to congestion. And it describes example mechanisms that
honestly. ensure the dominant selfish strategy of both network domains and end-
points will be to set the extended ECN field honestly.
Authors' Statement: Status (to be removed by the RFC Editor) Authors' Statement: Status (to be removed by the RFC Editor)
Although the re-ECN protocol is intended to make a simple but far- Although the re-ECN protocol is intended to make a simple but far-
reaching change to the Internet architecture, the most immediate reaching change to the Internet architecture, the most immediate
priority for the authors is to delay any move of the ECN nonce to priority for the authors is to delay any move of the ECN nonce to
Proposed Standard status. The argument for this position is Proposed Standard status. The argument for this position is
developed in Appendix I. developed in Appendix I.
Changes from previous drafts (to be removed by the RFC Editor) Changes from previous drafts (to be removed by the RFC Editor)
Full diffs created using the rfcdiff tool are available at Full diffs created using the rfcdiff tool are available at
<http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp> <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp>
From -04 to -05 (current version): From -05 to -06 (current version):
Clarifications made to Section 1 and Section 3.
Minor editorial changes throughout.
From -04 to -05:
Completed justification for packet marking with FNE during slow- Completed justification for packet marking with FNE during slow-
start(Appendix D). start(Appendix D).
Minor editorial changes throughout. Minor editorial changes throughout.
From -03 to -04: From -03 to -04:
Clarified reasons for holding back ECN nonce (Section 3.2 & Clarified reasons for holding back ECN nonce (Section 3.3 &
Appendix I). Appendix I).
Clarified Figure 1. Clarified Figure 2.
Added Section 4.1.1.1 on equivalence of drops and ECN marks. Added Section 4.1.1.1 on equivalence of drops and ECN marks.
Improved precision of Section 5.6 on IP in IP tunnels. Improved precision of Section 5.6 on IP in IP tunnels.
Explained the RTT fairness is possible to enforce, but unlikely to Explained the RTT fairness is possible to enforce, but unlikely to
be required (Section 6.1.3 & Appendix F). be required (Section 6.1.3 & Appendix F).
Explained that bulk per-user policing should be adequate but per- Explained that bulk per-user policing should be adequate but per-
flow policing is also possible if desired, though it is not likely flow policing is also possible if desired, though it is not likely
skipping to change at page 3, line 27 skipping to change at page 3, line 33
From -02 to -03: From -02 to -03:
Started guidelines for re-ECN support in DCCP and SCTP. Started guidelines for re-ECN support in DCCP and SCTP.
Added annex on limitations of nonce mechanism. Added annex on limitations of nonce mechanism.
Minor editorial changes throughout. Minor editorial changes throughout.
From -01 to -02: From -01 to -02:
Explanation on informal terminology in Section 3.4 clarified. Explanation on informal terminology in Section 3.5 clarified.
IPv6 wire protocol encoding added (Section 5.2). IPv6 wire protocol encoding added (Section 5.2).
Text on (non-)issues with tunnels, encryption and link layer Text on (non-)issues with tunnels, encryption and link layer
congestion notification added (Section 5.6 & Section 5.7). congestion notification added (Section 5.6 & Section 5.7).
Section added giving evolvability arguments against encouraging Section added giving evolvability arguments against encouraging
bottleneck policing (Section 6.1.2). And text on re-ECN's bottleneck policing (Section 6.1.2). And text on re-ECN's
evolvability by design added to Section 6.1.3 evolvability by design added to Section 6.1.3
skipping to change at page 4, line 8 skipping to change at page 4, line 11
Encoding of re-ECN wire protocol changed for reasons given in Encoding of re-ECN wire protocol changed for reasons given in
Appendix B and consequently draft substantially re-written. Appendix B and consequently draft substantially re-written.
Substantial text added in sections on applications, incremental Substantial text added in sections on applications, incremental
deployment, architectural rationale and security considerations. deployment, architectural rationale and security considerations.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 6
2. Requirements notation . . . . . . . . . . . . . . . . . . . . 7 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 8
3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8 3. Protocol Overview . . . . . . . . . . . . . . . . . . . . . . 8
3.1. Background and Applicability . . . . . . . . . . . . . . . 8 3.1. Background and Applicability . . . . . . . . . . . . . . . 8
3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or 3.2. Simplified Re-ECN Protocol . . . . . . . . . . . . . . . . 10
v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.3. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or
3.3. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 11 v6) . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4. Informal Terminology . . . . . . . . . . . . . . . . . . . 13 3.4. Re-ECN Protocol Operation . . . . . . . . . . . . . . . . 12
3.5. Informal Terminology . . . . . . . . . . . . . . . . . . . 14
4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15 4. Transport Layers . . . . . . . . . . . . . . . . . . . . . . . 15
4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.1. TCP . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.1.1. RECN mode: Full re-ECN capable transport . . . . . . . 16 4.1.1. RECN mode: Full Re-ECN capable transport . . . . . . . 17
4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or 4.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168
Nonce ECT Receiver . . . . . . . . . . . . . . . . . . 20 compliant ECN Receiver . . . . . . . . . . . . . . . . 20
4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 21 4.1.3. Capability Negotiation . . . . . . . . . . . . . . . . 21
4.1.4. Extended ECN (EECN) Field Settings during Flow 4.1.4. Extended ECN (EECN) Field Settings during Flow
Start or after Idle Periods . . . . . . . . . . . . . 23 Start or after Idle Periods . . . . . . . . . . . . . 23
4.1.5. Pure ACKS, Retransmissions, Window Probes and 4.1.5. Pure ACKS, Retransmissions, Window Probes and
Partial ACKs . . . . . . . . . . . . . . . . . . . . . 26 Partial ACKs . . . . . . . . . . . . . . . . . . . . . 27
4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 27 4.2. Other Transports . . . . . . . . . . . . . . . . . . . . . 27
4.2.1. General Guidelines for Adding Re-ECN to Other 4.2.1. General Guidelines for Adding Re-ECN to Other
Transports . . . . . . . . . . . . . . . . . . . . . . 27 Transports . . . . . . . . . . . . . . . . . . . . . . 27
4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 28 4.2.2. Guidelines for adding Re-ECN to RSVP or NSIS . . . . . 28
4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 28 4.2.3. Guidelines for adding Re-ECN to DCCP . . . . . . . . . 28
4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 28 4.2.4. Guidelines for adding Re-ECN to SCTP . . . . . . . . . 29
5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 28 5. Network Layer . . . . . . . . . . . . . . . . . . . . . . . . 29
5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 28 5.1. Re-ECN IPv4 Wire Protocol . . . . . . . . . . . . . . . . 29
5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 30 5.2. Re-ECN IPv6 Wire Protocol . . . . . . . . . . . . . . . . 30
5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 31 5.3. Router Forwarding Behaviour . . . . . . . . . . . . . . . 31
5.4. Justification for Setting the First SYN to FNE . . . . . . 32 5.4. Justification for Setting the First SYN to FNE . . . . . . 33
5.5. Control and Management . . . . . . . . . . . . . . . . . . 33 5.5. Control and Management . . . . . . . . . . . . . . . . . . 34
5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 33 5.5.1. Negative Balance Warning . . . . . . . . . . . . . . . 34
5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 34 5.5.2. Rate Response Control . . . . . . . . . . . . . . . . 35
5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 34 5.6. IP in IP Tunnels . . . . . . . . . . . . . . . . . . . . . 35
5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 35 5.7. Non-Issues . . . . . . . . . . . . . . . . . . . . . . . . 36
6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 36 6. Applications . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1. Policing Congestion Response . . . . . . . . . . . . . . . 36 6.1. Policing Congestion Response . . . . . . . . . . . . . . . 37
6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 36 6.1.1. The Policing Problem . . . . . . . . . . . . . . . . . 37
6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 37 6.1.2. The Case Against Bottleneck Policing . . . . . . . . . 38
6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 38 6.1.3. Re-ECN Incentive Framework . . . . . . . . . . . . . . 39
6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 45 6.1.4. Egress Dropper . . . . . . . . . . . . . . . . . . . . 46
6.1.5. Policing . . . . . . . . . . . . . . . . . . . . . . . 47 6.1.5. Policing . . . . . . . . . . . . . . . . . . . . . . . 47
6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 48 6.1.6. Inter-domain Policing . . . . . . . . . . . . . . . . 49
6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 52 6.1.7. Inter-domain Fail-safes . . . . . . . . . . . . . . . 52
6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 53 6.1.8. Simulations . . . . . . . . . . . . . . . . . . . . . 53
6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 53 6.2. Other Applications . . . . . . . . . . . . . . . . . . . . 53
6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 53 6.2.1. DDoS Mitigation . . . . . . . . . . . . . . . . . . . 53
6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 54 6.2.2. End-to-end QoS . . . . . . . . . . . . . . . . . . . . 54
6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 54 6.2.3. Traffic Engineering . . . . . . . . . . . . . . . . . 55
6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 54 6.2.4. Inter-Provider Service Monitoring . . . . . . . . . . 55
6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 54 6.3. Limitations . . . . . . . . . . . . . . . . . . . . . . . 55
7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 55 7. Incremental Deployment . . . . . . . . . . . . . . . . . . . . 56
7.1. Incremental Deployment Features . . . . . . . . . . . . . 55 7.1. Incremental Deployment Features . . . . . . . . . . . . . 56
7.2. Incremental Deployment Incentives . . . . . . . . . . . . 57 7.2. Incremental Deployment Incentives . . . . . . . . . . . . 57
8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 61 8. Architectural Rationale . . . . . . . . . . . . . . . . . . . 62
9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 64 9. Related Work . . . . . . . . . . . . . . . . . . . . . . . . . 65
9.1. Policing Rate Response to Congestion . . . . . . . . . . . 64 9.1. Policing Rate Response to Congestion . . . . . . . . . . . 65
9.2. Congestion Notification Integrity . . . . . . . . . . . . 65 9.2. Congestion Notification Integrity . . . . . . . . . . . . 66
9.3. Identifying Upstream and Downstream Congestion . . . . . . 66 9.3. Identifying Upstream and Downstream Congestion . . . . . . 67
10. Security Considerations . . . . . . . . . . . . . . . . . . . 66 10. Security Considerations . . . . . . . . . . . . . . . . . . . 67
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 68 11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 68
12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 68 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 69
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 68 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 69
14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 69 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 69
15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 69 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 70
15.1. Normative References . . . . . . . . . . . . . . . . . . . 69 15.1. Normative References . . . . . . . . . . . . . . . . . . . 70
15.2. Informative References . . . . . . . . . . . . . . . . . . 70 15.2. Informative References . . . . . . . . . . . . . . . . . . 70
Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 73 Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 74
Appendix B. Justification for Two Codepoints Signifying Zero Appendix B. Justification for Two Codepoints Signifying Zero
Worth Packets . . . . . . . . . . . . . . . . . . . . 74 Worth Packets . . . . . . . . . . . . . . . . . . . . 75
Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 76 Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 76
Appendix D. Packet Marking with FNE During Flow Start . . . . . . 77 Appendix D. Packet Marking with FNE During Flow Start . . . . . . 78
Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 79 Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 80
Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 79 Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 80
Appendix G. Policer Designs to ensure Congestion Appendix G. Policer Designs to ensure Congestion
Responsiveness . . . . . . . . . . . . . . . . . . . 80 Responsiveness . . . . . . . . . . . . . . . . . . . 80
G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 80 G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 80
G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 81 G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 82
Appendix H. Downstream Congestion Metering Algorithms . . . . . . 84 Appendix H. Downstream Congestion Metering Algorithms . . . . . . 84
H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 84 H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 84
H.2. Inflation Factor for Persistently Negative Flows . . . . . 85 H.2. Inflation Factor for Persistently Negative Flows . . . . . 85
Appendix I. Argument for holding back the ECN nonce . . . . . . . 85 Appendix I. Argument for holding back the ECN nonce . . . . . . . 86
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 87 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 88
Intellectual Property and Copyright Statements . . . . . . . . . . 89 Intellectual Property and Copyright Statements . . . . . . . . . . 90
1. Introduction 1. Introduction
This document aims: This document aims:
o To provide a complete specification of the addition of the re-ECN o To provide a complete specification of the addition of the re-ECN
protocol to IP and guidelines on how to add it to transport layer protocol to IP and guidelines on how to add it to transport layer
protocols, including a complete specification of re-ECN in TCP as protocols, including a complete specification of re-ECN in TCP as
an example; an example;
o To show how a number of hard problems become much easier to solve o To show how a number of hard problems become much easier to solve
once re-ECN is available in IP. once re-ECN is available in IP.
In ECN [RFC3168] congested queues probabilistically mark packets as
they approach a congested state. The receiver informs the sender
that they have seen one or more marks. In re-ECN the sender must
predict the level of congestion on the path by re-inserting feedback
according to the marking scheme described later in this draft. This
results in packets that carry a prediction of downstream congestion.
If a sender understates expected congestion compared to actual
congestion then the network could discard packets or enact some other
sanction. A policer can also be introduced at the ingress of
networks that can limit the congestion caused (or base penalties on
it).
It is important to add a few key points.
o It can be seen that it takes one round trip before any feedback is
received. For this reason a sender must make a conservative
prediction by transmitting IP packets with a special Feedback Not
Established (FNE) marking.
o It should be noted that the prediction is carried in-band in
normal data packets and for many transports feedback can be
carried in the normal acknowledgements or control packets.
o The re-ECN protocol is independent of the transport. In TCP,
acknowledgments are used to convey the feedback from receiver to
sender. This memo concentrates on TCP as an example transport
protocol, however the re-ECN protocol is compatible with any
transport where feedback can be sent from receiver to sender.
A general statement of the problem solved by re-ECN is to provide A general statement of the problem solved by re-ECN is to provide
sufficient information in each IP datagram to be able to hold senders sufficient information in each IP datagram to be able to hold senders
and whole networks accountable for the congestion they cause and whole networks accountable for the congestion they cause
downstream, before they cause it. But the every-day problems that downstream, before they cause it. But the every-day problems that
re-ECN can solve are much more recognisable than this rather generic re-ECN can solve are much more recognisable than this rather generic
statement: mitigating distributed denial of service (DDoS); statement: mitigating distributed denial of service (DDoS);
simplifying differentiation of quality of service (QoS); policing simplifying differentiation of quality of service (QoS); policing
compliance to congestion control; and so on. compliance to congestion control; and so on.
Uniquely, re-ECN manages to enable solutions to these problems Uniquely, re-ECN manages to enable solutions to these problems
skipping to change at page 6, line 45 skipping to change at page 7, line 26
For instance, some network owners want to block applications like For instance, some network owners want to block applications like
voice and video unless their network is compensated for the extra voice and video unless their network is compensated for the extra
share of bottleneck bandwidth taken. These real-time applications share of bottleneck bandwidth taken. These real-time applications
tend to be unresponsive when congestion arises. Whereas elastic TCP- tend to be unresponsive when congestion arises. Whereas elastic TCP-
based applications back away quickly, ending up taking a much smaller based applications back away quickly, ending up taking a much smaller
share of congested capacity for themselves. Other network owners share of congested capacity for themselves. Other network owners
want to invest in large amounts of capacity and make their gains from want to invest in large amounts of capacity and make their gains from
simplicity of operation and economies of scale. simplicity of operation and economies of scale.
Re-ECN allows the more conservative networks to police out flows that re-ECN allows the more conservative networks to police out flows that
have not asked to be unresponsive to congestion---not because they have not asked to be unresponsive to congestion---not because they
are voice or video---just because they don't respond to congestion. are voice or video---just because they don't respond to congestion.
But it also allows other networks to choose not to police. But it also allows other networks to choose not to police.
Crucially, when flows from liberal networks cross into a conservative Crucially, when flows from liberal networks cross into a conservative
network, re-ECN enables the conservative network to apply penalties network, re-ECN enables the conservative network to apply penalties
to its neighbouring networks for the congestion they allow to be to its neighbouring networks for the congestion they allow to be
caused. And these penalties can be applied to bulk data, without caused. And these penalties can be applied to bulk data, without
regard to flows. regard to flows.
Then, if unresponsive applications become so dominant that some of Then, if unresponsive applications become so dominant that some of
the more liberal networks experience congestion collapse [RFC3714], the more liberal networks experience congestion collapse [RFC3714],
they can change their minds and use re-ECN to apply tighter controls they can change their minds and use re-ECN to apply tighter controls
in order to bring congestion back under control. in order to bring congestion back under control.
Re-ECN works by arranging that each packet arrives at each network re-ECN works by arranging that each packet arrives at each network
element carrying a view of expected congestion on its own downstream element carrying a view of expected congestion on its own downstream
path, albeit averaged over multiple packets. Most usefully, path, albeit averaged over multiple packets. Most usefully,
congestion on the remainder of the path becomes visible in the IP congestion on the remainder of the path becomes visible in the IP
header at the first ingress. Many of the applications of re-ECN header at the first ingress. Many of the applications of re-ECN
involve a policer at this ingress using the view of downstream involve a policer at this ingress using the view of downstream
congestion arriving in packets to police or control the packet rate. congestion arriving in packets to police or control the packet rate.
Importantly, the scheme is recursive: a whole network harbouring Importantly, the scheme is recursive: a whole network harbouring
users causing congestion in downstream networks can be held users causing congestion in downstream networks can be held
responsible or policed by its downstream neighbour. responsible or policed by its downstream neighbour.
This document is structured as follows. First an overview of the re- This document is structured as follows. First an overview of the re-
ECN protocol is given (Section 3), outlining its attributes and ECN protocol is given (Section 3), outlining its attributes and
explaining conceptually how it works as a whole. The two main parts explaining conceptually how it works as a whole. The two main parts
of the document follow, as described above. That is, the protocol of the document follow. That is, the protocol specification divided
specification divided into transport (Section 4) and network into transport (Section 4) and network (Section 5) layers which
(Section 5) layers, then the applications it can be put to, such as contain most of the standards compliance terminology, then the
policing DDoS, QoS and congestion control (Section 6). Although applications re-ECN can be put to, such as policing DDoS, QoS and
these applications do not require standardisation themselves, they congestion control (Section 6). Although these applications do not
are described in a fair degree of detail in order to explain how re- require standardisation themselves, they are described in a fair
ECN can be used. Given re-ECN proposes to use the last undefined bit degree of detail in order to explain how re-ECN can be used. Given
in the IPv4 header, we felt it necessary to outline the potential re-ECN proposes to use the last undefined bit in the IPv4 header, we
that re-ECN could release in return for being given that bit. felt it necessary to outline the potential that re-ECN could release
in return for being given that bit.
Deployment issues discussed throughout the document are brought Deployment issues discussed throughout the document are brought
together in Section 7, which is followed by a brief section together in Section 7, which is followed by a brief section
explaining the somewhat subtle rationale for the design from an explaining the somewhat subtle rationale for the design from an
architectural perspective (Section 8). We end by describing related architectural perspective (Section 8). We end by describing related
work (Section 9), listing security considerations (Section 10) and work (Section 9), listing security considerations (Section 10) and
finally drawing conclusions (Section 12). finally drawing conclusions (Section 12).
2. Requirements notation 2. Requirements notation
skipping to change at page 8, line 15 skipping to change at page 8, line 45
document considers many cases where malicious nodes may not comply document considers many cases where malicious nodes may not comply
with the protocol. When such contingencies are described, if any of with the protocol. When such contingencies are described, if any of
the above keywords are not capitalised, that is deliberate. So, for the above keywords are not capitalised, that is deliberate. So, for
instance, the following two apparently contradictory sentences would instance, the following two apparently contradictory sentences would
be perfectly consistent: i) x MUST do this; ii) x may not do this. be perfectly consistent: i) x MUST do this; ii) x may not do this.
3. Protocol Overview 3. Protocol Overview
3.1. Background and Applicability 3.1. Background and Applicability
First we briefly recap the essentials of the ECN protocol [RFC3168]. The re-ECN protocol makes no changes and has no effect on the TCP
Two bits in the IP protocol (v4 or v6) are assigned to the ECN field. congestion control algorithm or on other rate responses to
The sender clears the field to "00" (Not-ECT) if either end-point congestion. re-ECN is not a new congestion control protocol, rather
transport is not ECN-capable. Otherwise it indicates an ECN-capable it is orthogonal to congestion control itself. Re-ECN is concerned
transport (ECT) using either of the two code-points "10" or "01" with revealing information about congestion so that users and
(ECT(0) and ECT(1) resp.). networks can be held accountable for the congestion they cause, or
allow to be caused.
ECN-capable routers probabilistically set "11" if congestion is Re-ECN builds on ECN so we briefly recap the essentials of the ECN
protocol [RFC3168]. Two bits in the IP protocol (v4 or v6) are
assigned to the ECN field. The sender clears the field to "00" (Not-
ECT) if either end-point transport is not ECN-capable. Otherwise it
indicates an ECN-capable transport (ECT) using either of the two
code-points "10" or "01" (ECT(0) and ECT(1) resp.).
ECN-capable queues probabilistically set "11" if congestion is
experienced (CE), the marking probability increasing with the length experienced (CE), the marking probability increasing with the length
of the queue at its egress link (typically using the RED of the queue at its egress link (typically using the RED
algorithm [RFC2309]). However, they still drop rather than mark Not- algorithm [RFC2309]). However, they still drop rather than mark Not-
ECT packets. With multiple ECN-capable routers on a path, a flow of ECT packets. With multiple ECN-capable queues on a path, a flow of
packets accumulates the fraction of CE marking that each router adds. packets accumulates the fraction of CE marking that each queue adds.
The combined effect of the packet marking of all the routers along The combined effect of the packet marking of all the queues along the
the path signals congestion of the whole path to the receiver. So, path signals congestion of the whole path to the receiver. So, for
for example, if one router early in a path is marking 1% of packets example, if one queue early in a path is marking 1% of packets and
and another later in a path is marking 2%, flows that pass through another later in a path is marking 2%, flows that pass through both
both routers will experience approximately 3% marking (see Appendix A queues will experience approximately 3% marking (see Appendix A for a
for a precise treatment). precise treatment).
The choice of two ECT code-points in the ECN field [RFC3168] The choice of two ECT code-points in the ECN field [RFC3168]
permitted future flexibility, optionally allowing the sender to permitted future flexibility, optionally allowing the sender to
encode the experimental ECN nonce [RFC3540] in the packet stream. encode the experimental ECN nonce [RFC3540] in the packet stream.
The nonce is designed to allow a sender to check the integrity of The nonce is designed to allow a sender to check the integrity of
congestion feedback. But Section 9.2 explains that it still gives no congestion feedback. But Section 9.2 explains that it still gives no
control over how fast the sender transmits as a result of the control over how fast the sender transmits as a result of the
feedback. On the other hand, re-ECN is designed both to ensure that feedback. On the other hand, re-ECN is designed both to ensure that
congestion is declared honestly and that the sender's rate responds congestion is declared honestly and that the sender's rate responds
appropriately. appropriately.
skipping to change at page 9, line 10 skipping to change at page 9, line 48
re-inserted or re-echoed feedback. But it actually works even when re-inserted or re-echoed feedback. But it actually works even when
no feedback is available. In fact it has been carefully designed to no feedback is available. In fact it has been carefully designed to
work for single datagram flows. It also encourages aggregation of work for single datagram flows. It also encourages aggregation of
single packet flows by congestion control proxies. Then, even if the single packet flows by congestion control proxies. Then, even if the
traffic mix of the Internet were to become dominated by short traffic mix of the Internet were to become dominated by short
messages, it would still be possible to control congestion messages, it would still be possible to control congestion
effectively and efficiently. effectively and efficiently.
Changing the Internet's feedback architecture seems to imply Changing the Internet's feedback architecture seems to imply
considerable upheaval. But re-ECN can be deployed incrementally at considerable upheaval. But re-ECN can be deployed incrementally at
the transport layer around unmodified routers using existing fields the transport layer around unmodified queues using existing fields in
in IP (v4 or v6). However it does also require the last undefined IP (v4 or v6). However it does also require the last undefined bit
bit in the IPv4 header, which it uses in combination with the 2-bit in the IPv4 header, which it uses in combination with the 2-bit ECN
ECN field to create four new codepoints. Nonetheless, changes to IP field to create four new codepoints. Nonetheless, we RECOMMENDED
routers are RECOMMENDED in order to improve resilience against DoS adding optional preferentail drop to IP queues based on the re-ECN
attacks. Similarly, re-ECN works best if both the sender and fields in order to improve resilience against DoS attacks.
receiver transports are re-ECN-capable, but it can work with just Similarly, re-ECN works best if both the sender and receiver
sender support. Section 7.1 summarises the incremental deployment transports are re-ECN-capable, but it can work with just sender
strategy. support. Section 7.1 summarises the incremental deployment strategy.
The re-ECN protocol makes no changes and has no effect on the TCP
congestion control algorithm or on other rate responses to
congestion. Re-ECN is only concerned with enabling the ingress
network to police that a source is complying with a congestion
control algorithm, which is orthogonal to congestion control itself.
Before re-ECN can be considered worthy of using up the last bit in Before re-ECN can be considered worthy of using up the last bit in
the IP header, we must be sure that all our claims are robust. We the IP header, we must be sure that all our claims are robust. We
have gradually been reducing the list of outstanding issues, but the have gradually been reducing the list of outstanding issues, but the
few that still remain are listed in Section 6.3. We expect new few that still remain are listed in Section 6.3. We expect new
attacks may still be found, but we offer the re-ECN protocol on the attacks may still be found, but we offer the re-ECN protocol on the
basis that it is built on fairly solid theoretical foundations and, basis that it is built on fairly solid theoretical foundations and,
so far, it has proved possible to keep it relatively robust. so far, it has proved possible to keep it relatively robust.
3.2. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6) 3.2. Simplified Re-ECN Protocol
We describe here the simplified re-ECN protocol. In this first
description we assume packets and segments are synonymous.
Packets are sent from a sender to a receiver. In Figure 1 the queues
(Q1 and Q2) are ECN enabled as per RFC 3168 [ref]. If congestion
occurs then packets are marked with the congestion experienced (CE)
flag exactly as in the ECN protocol [RFC3168]; the routers do not
need to be modified and do not need to know the re-ECN protocol. On
reception of marked packets the receiver notifies the sender of the
current count of marked packets. Note that this is the number of
packets marked rather than the setting of the ECE flag in ECN. The
sender uses this information to re-echo mark packets in exact
correspondence to the number of CE marked bytes observed at the
receiver.
+--------- Feedback----------+
| |
v |
+---+ +----+ +----+ +---+
| | RE | | | | | |
| S |--->| Q1 |--->| Q2 |--->| R |
| | | | | | | |
+---+ +----+ +----+ +---+
Figure 1: Simple Re-ECN
3.3. Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6)
The re-ECN wire protocol uses the two bit ECN field broadly as in The re-ECN wire protocol uses the two bit ECN field broadly as in
RFC3168 [RFC3168] as described above, but with five differences of RFC3168 [RFC3168] as described above, but with five differences of
detail (brought together in a list in Section 7.1). This detail (brought together in a list in Section 7.1). This
specification defines a new re-ECN extension (RE) flag. We will specification defines a new re-ECN extension (RE) flag. We will
defer the definition of the actual position of the RE flag in the defer the definition of the actual position of the RE flag in the
IPv4 & v6 headers until Section 5. Until then it will suffice to use IPv4 & v6 headers until Section 5. When we don't need to choose
an abstraction of the IPv4 and v6 wire protocols by just calling it between IPv4 and v6 wire protocols it will suffice call it the RE
the RE flag. flag.
Unlike the ECN field, the RE flag is intended to be set by the sender Unlike the ECN field, the RE flag is intended to be set by the sender
and remain unchanged along the path, although it can be read by and remain unchanged along the path, although it can be read by
network elements that understand the re-ECN protocol. It is feasible network elements that understand the re-ECN protocol. It is feasible
that a network element MAY change the setting of the RE flag, perhaps that a network element MAY change the setting of the RE flag, perhaps
acting as a proxy for an end-point, but such a protocol would have to acting as a proxy for an end-point, but such a protocol would have to
be defined in another specification (e.g. [Re-PCN]). be defined in another specification (e.g. [Re-PCN]).
Although the RE flag is a separate, single bit field, it can be read Although the RE flag is a separate, single bit field, it can be read
as an extension to the two-bit ECN field; the three concatenated bits as an extension to the two-bit ECN field; the three concatenated bits
in what we will call the extended ECN field (EECN) making eight in what we will call the extended ECN field (EECN) giving eight
codepoints. We will use the RFC3168 names of the ECN codepoints to codepoints. We will use the RFC3168 names of the ECN codepoints to
describe settings of the ECN field when the RE flag setting is "don't describe settings of the ECN field when the RE flag setting is "don't
care", but we also define the following six extended ECN codepoint care", but we also define the following six extended ECN codepoint
names for when we need to be more specific. names for when we need to be more specific.
RFC3168 ECN defines uses for all four codepoints of the two-bit ECN One of re-ECN's codepoints is an alternative use of the codepoint set
field. This memo widens the codepoint space to eight, and uses six aside in RFC3168 for the ECN nonce (ECT(1)). Transports using re-ECN
codepoints. One of re-ECN's codepoints is an alternative use of the do not need to use the ECN nonce as long as the sender is also
codepoint set aside in RFC3168 for the ECN nonce (ECT(1)). checking for transport protocol compliance
Transports not using re-ECN can still use the ECN nonce, while those [I-D.moncaster-tcpm-rcv-cheat]. The case for doing this is given in
using re-ECN do not need to as long as the sender is also checking Appendix I. Two re-ECN codepoints are given compatible uses to those
for transport protocol compliance [I-D.moncaster-tcpm-rcv-cheat]. defined in RFC3168 (Not-ECT and CE). The other codepoint used by
The case for doing this is given in Appendix I. Two re-ECN RFC3168 (ECT(0)) isn't used for re-ECN. Altogether this leave one
codepoints are given compatible uses to those defined in RFC3168 codepoint of the eight unused by ECN or re-ECN and available for
(Not-ECT and CE). The other codepoint used by RFC3168 (ECT(0)) isn't future use.
used for re-ECN. Altogether this leave one codepoint of the eight
unused and available for future use.
+-------+------------+------+--------------+------------------------+ +-------+------------+------+--------------+------------------------+
| ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning | | ECN | RFC3168 | RE | Extended ECN | Re-ECN meaning |
| field | codepoint | flag | codepoint | | | field | codepoint | flag | codepoint | |
+-------+------------+------+--------------+------------------------+ +-------+------------+------+--------------+------------------------+
| 00 | Not-ECT | 0 | Not-RECT | Not re-ECN-capable | | 00 | Not-ECT | 0 | Not-ECT | Not re-ECN-capable |
| | | | | transport | | | | | | transport |
| 00 | Not-ECT | 1 | FNE | Feedback not | | 00 | --- | 1 | FNE | Feedback not |
| | | | | established | | | | | | established |
| 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion | | 01 | ECT(1) | 0 | Re-Echo | Re-echoed congestion |
| | | | | and RECT | | | | | | and RECT |
| 01 | ECT(1) | 1 | RECT | Re-ECN capable | | 01 | --- | 1 | RECT | Re-ECN capable |
| | | | | transport | | | | | | transport |
| 10 | ECT(0) | 0 | --- | Legacy ECN use only | | 10 | ECT(0) | 0 | ECT(0) | RFC3168 ECN use only |
| | | | | | | | | | | |
| 10 | ECT(0) | 1 | --CU-- | Currently unused | | 10 | --- | 1 | --CU-- | Currently unused |
| | | | | | | | | | | |
| 11 | CE | 0 | CE(0) | Re-Echo canceled by | | 11 | CE | 0 | CE(0) | Re-Echo canceled by |
| | | | | congestion experienced | | | | | | congestion experienced |
| 11 | CE | 1 | CE(-1) | Congestion experienced | | 11 | --- | 1 | CE(-1) | Congestion experienced |
+-------+------------+------+--------------+------------------------+ +-------+------------+------+--------------+------------------------+
Table 1: Extended ECN Codepoints Table 1: Extended ECN Codepoints
3.3. Re-ECN Protocol Operation 3.4. Re-ECN Protocol Operation
In this section we will give an overview of the operation of the re- In this section we will give an overview of the operation of the re-
ECN protocol for TCP/IP, leaving a detailed specification to the ECN protocol for TCP/IP, leaving a detailed specification to the
following sections. Other transports will be discussed later. following sections. Other transports will be discussed later.
In summary, the protocol adds a third `re-echo' stage to the existing In summary, the protocol adds a third `re-echo' stage to the existing
TCP/IP ECN protocol. Whenever the network adds CE congestion TCP/IP ECN protocol. Whenever the network adds CE congestion
signalling to the IP header on the forward data path, the receiver signalling to the IP header on the forward data path, the receiver
feeds it back to the ingress using TCP, then the sender re-echoes it feeds it back to the ingress using TCP, then the sender re-echoes it
into the forward data path using the RE flag in the next packet. into the forward data path using the RE flag in the next packet.
Prior to receiving any feedback a sender will not know which setting Prior to receiving any feedback a sender will not know which setting
of the RE flag to use, so it sets the feedback not established (FNE) of the RE flag to use, so it sets the feedback not established (FNE)
codepoint. The network reads the FNE codepoint conservatively as codepoint. The network reads the FNE codepoint conservatively as
equivalent to re-echoed congestion. equivalent to re-echoed congestion.
Specifically, once a flow is established, a re-ECN sender always Specifically, once feedback from a flow is established, a re-ECN
initialises the ECN field to ECT(1). And it usually sets the RE flag sender always initialises the ECN field to ECT(1). And it usually
to "1". Whenever a router re-marks a packet to CE, the receiver sets the RE flag to "1". Whenever a queue marks a packet to CE, the
feeds back this event to the sender. On receiving this feedback, the receiver feeds back this event to the sender. On receiving this
re-ECN sender will clear the RE flag to "0" in the next packet it feedback, the re-ECN sender will clear the RE flag to "0" in the next
sends. packet it sends.
We chose to set and clear the RE flag this way round to ease We chose to set and clear the RE flag this way round to ease
incremental deployment (see Section 7.1). To avoid confusion we will incremental deployment (see Section 7.1). To avoid confusion we will
use the term `blanking' (rather than marking) when the RE flag is use the term `blanking' (rather than marking) when the RE flag is
cleared to "0". So, over a stream of packets, we will talk of the cleared to "0". So, over a stream of packets, we will talk of the
`RE blanking fraction' as the fraction of octets in packets with the `RE blanking fraction' as the fraction of octets in packets with the
RE flag cleared to "0". RE flag cleared to "0".
_ _ _ _ +---+ +----+ +----+ +---+
/ \ / \ / \ / \ | S |--| Q1 |----------------| Q2 |--| R |
| S |--| 0 | - - - - - - - - | i |--| D | +---+ +----+ +----+ +---+
\ _ / \ _ / \ _ / \ _ /
. . . . . . . .
^ . . . . ^ . . . .
| . . . . | . . . .
| . RE blanking fraction . . | . RE blanking fraction . .
3% |-------------------------------+======= 3% |-------------------------------+=======
| . . | . | . . | .
2% | . . | . 2% | . . | .
| . . CE marking fraction | . | . . CE marking fraction | .
1% | . +----------------------+ . 1% | . +----------------------+ .
| . | . . | . | . .
0% +---------------------------------------> 0% +--------------------------------------->
^ 0 ^ i ^ resource index ^ ^ ^
0 ^ 1 ^ 2 observation points L M N Observation points
| |
1.00% 2.00% marking fraction
Figure 1: A 2-Router Example (Imprecise) Figure 2: A 2-Queue Example (Imprecise)
Figure 1 uses a simple network to illustrate how re-ECN allows Figure 2 uses a simple network to illustrate how re-ECN allows queues
routers to measure downstream congestion. The horizontal axis to measure downstream congestion. The receiver views a CE marking
represents the index of each congestible resource (typically queues) fraction of 3% which is fed back to the sender. The sender sets an
along a path through the Internet. There may be many routers on the RE blanking fraction of 3% to match this. This RE blanking fraction
path, but we assume only two are currently congested (those with can be observed along the path as the RE flag is not changed by
resource index 0 and i). The two superimposed plots show the network nodes once set by the sender. This is shown by the
fraction of each extended ECN codepoint in a flow observed along this horizontal line at 3% in the figure. The CE marked fraction is shown
path. Given about 3% of packets reaching the destination are marked by the stepped line which rises to meet the RE blanking fraction line
CE, in response to feedback the sender will blank the RE flag in with steps at at each queue where packets are marked. Two queues are
about 3% of packets it sends. Then approximate downstream congestion shown (Q1 and Q2) that are currently congested. Each time packets
can be measured at the observation points shown along the path by pass through a fraction are marked; 1% at Q1 and 2% at Q2). The
subtracting the CE marking fraction from the RE blanking fraction, as approximate downstream congestion can be measured at the observation
shown in the table below (Appendix A derives these approximations points shown along the path by subtracting the CE marking fraction
from a precise analysis). from the RE blanking fraction, as shown in the table below
(Appendix A derives these approximations from a precise analysis).
+-------------------+------------------------------+ +-------------------+------------------------------+
| Observation point | Approx downstream congestion | | Observation point | Approx downstream congestion |
+-------------------+------------------------------+ +-------------------+------------------------------+
| 0 | 3% - 0% = 3% | | L | 3% - 0% = 3% |
| 1 | 3% - 1% = 2% | | M | 3% - 1% = 2% |
| 2 | 3% - 3% = 0% | | N | 3% - 3% = 0% |
+-------------------+------------------------------+ +-------------------+------------------------------+
Table 2: Downstream Congestion Measured at Example Observation Points Table 2: Downstream Congestion Measured at Example Observation Points
All along the path, whole-path congestion remains unchanged so it can All along the path, whole-path congestion remains unchanged so it can
be used as a reference against which to compare upstream congestion. be used as a reference against which to compare upstream congestion.
The difference predicts downstream congestion for the rest of the The difference predicts downstream congestion for the rest of the
path. Therefore, measuring the fractions of each codepoint at any path. Therefore, measuring the fractions of each codepoint at any
point in the Internet will reveal upstream, downstream and whole path point in the Internet will reveal upstream, downstream and whole path
congestion. congestion.
Note that we have introduced discussion of marking and blanking Note that we have introduced discussion of marking and blanking
fractions solely for illustration. To be absolutely clear, these fractions solely for illustration. To be absolutely clear, for TCP
fractions are averages that would result from the behaviour of a TCP these fractions are averages that would result from the behaviour of
protocol handler mechanically blanking outgoing packets in direct the protocol handler mechanically blanking outgoing packets in direct
response to incoming feedback---we are not saying any protocol response to incoming feedback---we are not saying any protocol
handler works with these average fractions directly. handler has to work with these average fractions directly.
3.4. Informal Terminology 3.5. Informal Terminology
In the rest of this memo we will loosely talk of positive or negative In the rest of this memo we will loosely talk of positive or negative
flows, meaning flows where the moving average of the downstream flows, meaning flows where the moving average of the downstream
congestion metric is persistently positive or negative. The notion congestion metric is persistently positive or negative. A negative
of a negative metric arises because it is derived by subtracting one flow is one where more CE marked packets than re-ECN blanked packets
metric from another. Of course actual downstream congestion cannot arrive. Likewise in positive flows more re-ECN blanked packets
be negative, only the metric can (whether due to time lags or arrive than CE marked packets. The notion of a negative metric
deliberate malice). arises because it is derived by subtracting one metric from another.
Of course actual downstream congestion cannot be negative, only the
metric can (whether due to time lags or deliberate malice).
Just as we will loosely talk of positive and negative flows, we will Just as we will loosely talk of positive and negative flows, we will
also talk of positive or negative packets, meaning packets that also talk of positive or negative packets, meaning packets that
contribute positively or negatively to the downstream congestion contribute positively or negatively to the downstream congestion
metric. metric.
Therefore we will talk of packets having `worth' of +1, 0 or -1, Therefore we will talk of packets having `worth' of +1, 0 or -1,
which, when multiplied by their size, indicates their contribution to which, when multiplied by their size, indicates their contribution to
the downstream congestion metric. the downstream congestion metric.
Figure 2 shows the main state transitions of the system once a flow The idea is that most packets start with zero worth. Every time the
is established, showing the worth of packets in each state. When the network decrements the worth of a packet, the sender increments the
network congestion marks a packet it decrements its worth (moving worth of a later packet. Then, over time, as many positive octets
from the left of the main square to the right). When the sender should arrive at the receiver as negative. Note we have said octets
blanks the RE flag in order to re-echo congestion it increments the not packets, so if packets are of different sizes, the worth should
worth of a packet (moving from the bottom of the main square to the be incremented on enough octets to balance the octets in negative
top). packets arriving at the receiver. It is this balance that will allow
the network to hold the sender accountable for the congestion it
Sender state Sent Worth Received Worth causes.
packet packet
+----------------------------------------------------+
| ^
V |
Congestion echoed -->Re-Echo +1 --+---> CE(0) 0 --+
(positive) | (canceled) |
V network |
| congestion |
| |
Flow established --> RECT 0 ----+-> CE(-1) -1 --+
^ (neutral) | | (negative)
| | |
| no V V
| congestion | |
+-----------<--------------+-+
Figure 2: Re-ECN System State Diagram (bootstrap not shown)
The idea is that every time the network decrements the worth of a
packet, the sender increments the worth of a later packet. Then,
over time, as many positive octets should arrive at the receiver as
negative. Note we have said octets not packets, so if packets are of
different sizes, the worth should be incremented on enough octets to
balance the octets in negative packets arriving at the receiver. It
is this balance that will allow the network to hold the sender
accountable for the congestion it causes, as we shall see. The
informal outline below uses TCP as an example transport, but the idea
would be broadly similar for any transport that adapts its rate to
congestion.
We will start with the sender in `flow established' state. Normally,
as acknowledgements of earlier packets arrive that don't feedback any
congestion, the congestion window can be opened, so the sender goes
round the smaller sub-loop, sending RECT packets (worth 0) and
returning to the flow established state to send another one. If a
router congestion marks one of the packets, it decrements the
packet's worth. The sender will have been continuing to traverse
round the smaller feedback loop every time acknowledgements arrive.
But when congestion feedback returns from this packet that was marked
with -1 worth (the largest loop in the figure) the sender jumps to
the congestion echoed state in order to re-echo the congestion,
incrementing the worth of the next packet to +1 by blanking its RE
flag. The sender then returns to the flow established state and
continues round the smaller loop, sending packets worth 0. Note that
the size of the loops is just an artefact of the figure; it is not
meant to imply that one loop is slower than the other - they are both
the same end to end feedback loop.
If a packet carrying re-echoed congestion happens to also be If a packet carrying re-echoed congestion happens to also be
congestion marked, the +1 worth added by the sender will be cancelled congestion marked, the +1 worth added by the sender will be cancelled
out by the -1 network congestion marking. Although the two worth out by the -1 network congestion marking. Although the two worth
values correctly cancel out, neither the congestion marking nor the values correctly cancel out, neither the congestion marking nor the
re-echoed congestion are lost, because the RE bit and the ECN field re-echoed congestion are lost, because the RE bit and the ECN field
are orthogonal. So, whenever this happens, the receiver will are orthogonal. So, whenever this happens, the receiver will
correctly detect and re-echo the new congestion event as well (the correctly detect and re-echo the new congestion event as well.
top sub-loop). When we need to distinguish, we will sometimes call a
packet marked RECT 'neutral' (0 worth), while we will call the CE(0)
marking 'canceled' (also 0 worth). If a re-echoed packet isn't
unlucky enough to be further congestion marked, the sender will
return to the flow established state and continue to send RECT
packets (worth 0).
The table below specifies unambiguously the worth of each extended The table below specifies unambiguously the worth of each extended
ECN codepoint. Note the order is different from the previous table ECN codepoint. Note the order is different from the previous table
to better show how the worth increments and decrements. The FNE to better show how the worth increments and decrements. The FNE
codepoint is an exception. It is used in the flow bootstrap process codepoint is used in the flow bootstrap process (explained later) and
(explained later) and has the same positive (+1) worth as a packet has the same positive (+1) worth as a packet with the Re-Echo
with the Re-Echo codepoint. codepoint.
+--------+------+----------------+-------+--------------------------+ +--------+------+----------------+-------+--------------------------+
| ECN | RE | Extended ECN | Worth | Re-ECN meaning | | ECN | RE | Extended ECN | Worth | Re-ECN meaning |
| field | bit | codepoint | | | | field | bit | codepoint | | |
+--------+------+----------------+-------+--------------------------+ +--------+------+----------------+-------+--------------------------+
| 00 | 0 | Not-RECT | ... | Not re-ECN-capable | | 00 | 0 | Not-RECT | ... | Not re-ECN-capable |
| | | | | transport | | | | | | transport |
| 00 | 1 | FNE | +1 | Feedback not established |
| 01 | 0 | Re-Echo | +1 | Re-echoed congestion and | | 01 | 0 | Re-Echo | +1 | Re-echoed congestion and |
| | | | | RECT | | | | | | RECT |
| 10 | 0 | --- | ... | Legacy ECN use only | | 10 | 0 | --- | ... | RFC3168 ECN use only |
| 11 | 0 | CE(0) | 0 | Re-Echo canceled by | | 11 | 0 | CE(0) | 0 | Re-Echo canceled by |
| | | | | congestion experienced | | | | | | congestion experienced |
| 00 | 1 | FNE | +1 | Feedback not established |
| 01 | 1 | RECT | 0 | Re-ECN capable transport | | 01 | 1 | RECT | 0 | Re-ECN capable transport |
| 10 | 1 | --CU-- | ... | Currently unused | | 10 | 1 | --CU-- | ... | Currently unused |
| | | | | | | | | | | |
| 11 | 1 | CE(-1) | -1 | Congestion experienced | | 11 | 1 | CE(-1) | -1 | Congestion experienced |
+--------+------+----------------+-------+--------------------------+ +--------+------+----------------+-------+--------------------------+
Table 3: 'Worth' of Extended ECN Codepoints Table 3: 'Worth' of Extended ECN Codepoints
4. Transport Layers 4. Transport Layers
4.1. TCP 4.1. TCP
Re-ECN capability at the sender is essential. At the receiver it is Re-ECN capability at the sender is essential. At the receiver it is
optional, as long as the receiver has a basic (`vanilla flavour') optional, as long as the receiver has a basic RFC3168-compliant ECN-
RFC3168-compliant ECN-capable transport (ECT) [RFC3168]. Given re- capable transport (ECT) [RFC3168]. Given re-ECN is not the first
ECN is not the first attempt to define the semantics of the ECN attempt to define the semantics of the ECN field, we give a table
field, we give a table below summarising what happens for various below summarising what happens for various combinations of
combinations of capabilities of the sender S and receiver R, as capabilities of the sender S and receiver R, as indicated in the
indicated in the first four columns below. The last column gives the first four columns below. The last column gives the mode a half-
mode a half-connection should be in after the first two of the three connection should be in after the first two of the three TCP
TCP handshakes. handshakes.
+--------+--------------+------------+---------+--------------------+ +--------+--------------+------------+---------+--------------------+
| Re-ECT | ECT-Nonce | ECT | Not-ECT | S-R | | Re-ECT | ECT-Nonce | ECT | Not-ECT | S-R |
| | (RFC3540) | (RFC3168) | | Half-connection | | | (RFC3540) | (RFC3168) | | Half-connection |
| | | | | Mode | | | | | | Mode |
+--------+--------------+------------+---------+--------------------+ +--------+--------------+------------+---------+--------------------+
| SR | | | | RECN | | SR | | | | RECN |
| S | R | | | RECN-Co | | S | R | | | RECN-Co |
| S | | R | | RECN-Co | | S | | R | | RECN-Co |
| S | | | R | Not-ECT | | S | | | R | Not-ECT |
skipping to change at page 16, line 32 skipping to change at page 16, line 37
Table 4: Modes of TCP Half-connection for Combinations of ECN Table 4: Modes of TCP Half-connection for Combinations of ECN
Capabilities of Sender S and Receiver R Capabilities of Sender S and Receiver R
We will describe what happens in each mode, then describe how they We will describe what happens in each mode, then describe how they
are negotiated. The abbreviations for the modes in the above table are negotiated. The abbreviations for the modes in the above table
mean: mean:
RECN: Full re-ECN capable transport RECN: Full re-ECN capable transport
RECN-Co: Re-ECN sender in compatibility mode with a RECN-Co: Re-ECN sender in compatibility mode with a RFC3168
vanilla [RFC3168] ECN receiver or an [RFC3540] ECN nonce-capable compliant [RFC3168] ECN receiver or an [RFC3540] ECN nonce-capable
receiver. Implementation of this mode is OPTIONAL. receiver. Implementation of this mode is OPTIONAL.
Not-ECT: Not ECN-capable transport, as defined in [RFC3168] for when Not-ECT: Not ECN-capable transport, as defined in [RFC3168] for when
at least one of the transports does not understand even basic ECN at least one of the transports does not understand even basic ECN
marking. marking.
Note that we use the term Re-ECT for a host transport that is re-ECN- Note that we use the term Re-ECT for a host transport that is re-ECN-
capable but RECN for the modes of the half connections between hosts capable but RECN for the modes of the half connections between hosts
when they are both Re-ECT. If a host transport is Re-ECT, this fact when they are both Re-ECT. If a host transport is Re-ECT, this fact
alone does NOT imply either of its half connections will necessarily alone does NOT imply either of its half connections will necessarily
be in RECN mode, at least not until it has confirmed that the other be in RECN mode, at least not until it has confirmed that the other
host is Re-ECT. host is Re-ECT.
4.1.1. RECN mode: Full re-ECN capable transport 4.1.1. RECN mode: Full Re-ECN capable transport
In full RECN mode, for each half connection, both the sender and the In full RECN mode, for each half connection, both the sender and the
receiver each maintain an unsigned integer counter we will call ECC receiver each maintain an unsigned integer counter we will call ECC
(echo congestion counter). The receiver maintains a count, modulo 8, (echo congestion counter). The receiver maintains a count of how
of how many times a CE marked packet has arrived during the half- many times a CE marked packet has arrived during the half-connection.
connection. Once a RECN connection is established, the three TCP Once a RECN connection is established, the three TCP option flags
option flags (ECE, CWR & NS) used for ECN-related functions in other (ECE, CWR & NS) used for ECN-related functions in other versions of
versions of ECN are used as a 3-bit field for the receiver to ECN are used as a 3-bit field for the receiver to repeatedly tell the
repeatedly tell the sender the current value of ECC whenever it sends sender the current value of ECC, modulo 8, whenever it sends a TCP
a TCP ACK. We will call this the echo congestion increment (ECI) ACK. We will call this the echo congestion increment (ECI) field.
field. This overloaded use of these 3 option flags as one 3-bit ECI This overloaded use of these 3 option flags as one 3-bit ECI field is
field is shown in Figure 4. The actual definition of the TCP header, shown in Figure 4. The actual definition of the TCP header,
including the addition of support for the ECN nonce, is shown for including the addition of support for the ECN nonce, is shown for
comparison in Figure 3. This specification does not redefine the comparison in Figure 3. This specification does not redefine the
names of these three TCP option flags, it merely overloads them with names of these three TCP option flags, it merely overloads them with
another definition once a flow is established. another definition once a flow is established.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | N | C | E | U | A | P | R | S | F | | | | N | C | E | U | A | P | R | S | F |
| Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I |
| | | | R | E | G | K | H | T | N | N | | | | | R | E | G | K | H | T | N | N |
skipping to change at page 17, line 41 skipping to change at page 17, line 47
| | | | G | K | H | T | N | N | | | | | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 4: Definition of the ECI field within bytes 13 and 14 of the Figure 4: Definition of the ECI field within bytes 13 and 14 of the
TCP Header, overloading the current definitions above for established TCP Header, overloading the current definitions above for established
RECN flows. RECN flows.
Receiver Action in RECN Mode Receiver Action in RECN Mode
Every time a CE marked packet arrives at a receiver in RECN mode, Every time a CE marked packet arrives at a receiver in RECN mode,
the receiver transport increments its local value of ECC modulo 8 the receiver transport increments its local value of ECC and MUST
and MUST echo its value to the sender in the ECI field of the next echo its value, modulo 8, to the sender in the ECI field of the
ACK. It MUST repeat the same value of ECI in every subsequent ACK next ACK. It MUST repeat the same value of ECI in every
until the next CE event, when it increments ECI again. subsequent ACK until the next CE event, when it increments ECI
again.
The increment of the local ECC values is modulo 8 so the field The increment of the local ECC values is modulo 8 so the field
value simply wraps round back to zero when it overflows. The value simply wraps round back to zero when it overflows. The
least significant bit is to the right (labelled bit 9). least significant bit is to the right (labelled bit 9).
A receiver in RECN mode MAY delay the echo of a CE to the next A receiver in RECN mode MAY delay the echo of a CE to the next
delayed-ACK, which would be necessary if ACK-withholding were delayed-ACK, which would be necessary if ACK-withholding were
implemented. implemented.
Sender Action in RECN Mode Sender Action in RECN Mode
On the arrival of every ACK, the sender compares the ECI field On the arrival of every ACK, the sender compares the ECI field
with its own ECC value, then replaces its local value with that with its own ECC value, then replaces its local value with that
from the ACK. The difference D is assumed to be the number of CE from the ACK. The difference D (D = (ECI + 8 - ECC mod 8) mod 8)
marked packets that arrived at the receiver since it sent the is assumed to be the number of CE marked packets that arrived at
previously received ACK (but see below for the sender's safety the receiver since it sent the previously received ACK (but see
strategy). Whenever the ECI field increments by D (and/or d drops below for the sender's safety strategy). Whenever the ECI field
are detected), the sender MUST clear the RE flag to "0" in the IP increments by D (and/or d drops are detected), the sender MUST
header of the next D' data packets it sends (where D' = D + d), clear the RE flag to "0" in the IP header of the next D' data
effectively re-echoing each single increment of ECI. Otherwise packets it sends (where D' = D + d), effectively re-echoing each
the data sender MUST send all data packets with RE set to "1". single increment of ECI. Otherwise the data sender MUST send all
data packets with RE set to "1".
As a general rule, once a flow is established, as well as setting As a general rule, once a flow is established, as well as setting
or clearing the RE flag as above, a data sender in RECN mode MUST or clearing the RE flag as above, a data sender in RECN mode MUST
always set the ECN field to ECT(1). However, the settings of the always set the ECN field to ECT(1). However, the settings of the
extended ECN field during flow start are defined in Section 4.1.4. extended ECN field during flow start are defined in Section 4.1.4.
As we have already emphasised, the re-ECN protocol makes no As we have already emphasised, the re-ECN protocol makes no
changes and has no effect on the TCP congestion control algorithm. changes and has no effect on the TCP congestion control algorithm.
So, each increment of ECI (or detection of a drop) also triggers So, the first increment of ECI (or detection of a drop) in a RTT
the standard TCP congestion response, but with no more than one triggers the standard TCP congestion response, no more than one
congestion response per round trip, as usual. congestion response per round trip, as usual. However, the sender
re-echoes every increment of ECI irrespective of RTTs.
A TCP sender also acts as the receiver for the other half- A TCP sender also acts as the receiver for the other half-
connection. The host will maintain two ECC values S.ECC and R.ECC connection. The host will maintain two ECC values S.ECC and R.ECC
as sender and receiver respectively. Every TCP header sent by a as sender and receiver respectively. Every TCP header sent by a
host in RECN mode will also repeat the prevailing value of R.ECC host in RECN mode will also repeat the prevailing value of R.ECC
in its ECI field. If a sender in RECN mode has to retransmit a in its ECI field. If a sender in RECN mode has to retransmit a
packet due to a suspected loss, the re-transmitted packet MUST packet due to a suspected loss, the re-transmitted packet MUST
carry the latest prevailing value of R.ECC when it is re- carry the latest prevailing value of R.ECC when it is re-
transmitted, which will not necessarily be the one it carried transmitted, which will not necessarily be the one it carried
originally. originally.
4.1.1.1. Drops and Marks 4.1.1.1. Drops and Marks
Re-ECN is based on the ECN protocol [RFC3168] which in turn is Re-ECN is based on the ECN protocol [RFC3168] . In turn the
typically based on the RED algorithm [RFC2309]. This algorithm marks congestion markings ECN uses are typically based on the RED
packets as CE with a probability that increases as the size of the algorithm [RFC2309]. This algorithm marks packets as CE with a
router queue increases. Howeverif the queue becomes too full then it probability that increases as the size of the router queue increases.
will revert to dropping packets. Because of this it is important However, if the queue becomes too full then it will revert to
that re-ECN treats each packet drop it detects as if it were actually dropping packets. Because of this it is important that a re-ECN
a CE mark. This ensures that it can continue to correctly echo sender treats each packet drop it detects as if it were actually a CE
congestion even through a highly congested path. mark. This ensures that it can continue to correctly echo congestion
even through a highly congested path.
In order to ensure that drops are correctly echoed the sender needs In order to ensure that drops are correctly echoed the sender needs
to add the number of drops detected per RTT to the difference in ECI to add the number of drops detected per RTT to the difference in ECI
value waiting to be echoed. A drop is defined as set out in value waiting to be echoed. Drop detection is defined as set out in
[RFC2581] -- if the connection is in slow start then a single [RFC2581] -- if the connection is in slow start then a single
duplicate aknowledgement will be treated as an indication of a drop. duplicate aknowledgement will be treated as an indication of a drop.
When the system is in the congestion avoidance stage then 3 duplicate When the system is in the congestion avoidance stage then 3 duplicate
acknowledgements will be treated as a sign of a drop. In all cases, acknowledgements will be treated as a sign of a drop. In all cases,
if a re-transmission time-out occurs then that will be treatd as a if a re-transmission time-out occurs then that will be treatd as a
drop. drop.
4.1.1.2. Safety against Long Pure ACK Loss Sequences 4.1.1.2. Safety against Long Pure ACK Loss Sequences
The ECI method was chosen for echoing congestion marking because a The ECI method was chosen for echoing congestion marking because a
skipping to change at page 20, line 4 skipping to change at page 20, line 12
previous ACK but with a sequence number unchanged from the previously previous ACK but with a sequence number unchanged from the previously
received ACK, it SHOULD conservatively assume that the ECI field received ACK, it SHOULD conservatively assume that the ECI field
incremented by D' = L - ((L-D) mod 8), where D is the apparent incremented by D' = L - ((L-D) mod 8), where D is the apparent
increase in the ECI field. For example if the ACK arriving after 9 increase in the ECI field. For example if the ACK arriving after 9
pure ACK losses apparently increased ECI by 2, the assumed increment pure ACK losses apparently increased ECI by 2, the assumed increment
of ECI would still be 2. But if ECI apparently increased by 2 after of ECI would still be 2. But if ECI apparently increased by 2 after
11 pure ACK losses, ECI should be assumed to have increased by 10. 11 pure ACK losses, ECI should be assumed to have increased by 10.
A re-ECN sender MAY implement a heuristic algorithm to predict beyond A re-ECN sender MAY implement a heuristic algorithm to predict beyond
reasonable doubt that the ECI field probably did not wrap within a reasonable doubt that the ECI field probably did not wrap within a
sequence of lost pure ACKs. But such an algorithm is NOT REQUIRED. sequence of lost pure ACKs. But such an algorithm is OPTIONAL. Such
Such an algorithm MUST NOT be used unless it is proven to work even an algorithm MUST NOT be used unless it is proven to work even in the
in the presence of correlation between high ACK loss rate on the back presence of correlation between high ACK loss rate on the back
channel and high CE marking rate on the forward channel. channel and high CE marking rate on the forward channel.
Whatever assumption a re-ECN sender makes about potentially lost CE Whatever assumption a re-ECN sender makes about potentially lost CE
marks, both its congestion control and its re-echoing behaviour marks, both its congestion control and its re-echoing behaviour
SHOULD be consistent with the assumption it makes. SHOULD be consistent with the assumption it makes.
4.1.2. RECN-Co mode: Re-ECT Sender with a Vanilla or Nonce ECT Receiver 4.1.2. RECN-Co mode: Re-ECT Sender with a RFC3168 compliant ECN
Receiver
If the half-connection is in RECN-Co mode, ECN feedback proceeds no If the half-connection is in RECN-Co mode, ECN feedback proceeds no
differently to that of vanilla ECN. In other words, the receiver differently to that of RFC3168 compliant ECN. In other words, the
sets the ECE flag repeatedly in the TCP header and the sender receiver sets the ECE flag repeatedly in the TCP header and the
responds by setting the CWR flag. Although RECN-Co mode is used when sender responds by setting the CWR flag. Although RECN-Co mode is
the receiver has not implemented the re-ECN protocol, the sender can used when the receiver has not implemented the re-ECN protocol, the
infer enough from its vanilla ECN feedback to set or clear the RE sender can infer enough from its RFC3168 compliant ECN feedback to
flag reasonably well. Specifically, every time the receiver toggles set or clear the RE flag reasonably well. Specifically, every time
the ECE field from "0" to "1" (or a loss is detected), as well as the receiver toggles the ECE field from "0" to "1" (or a loss is
setting CWR in the TCP flags, the re-ECN sender MUST blank the RE detected), as well as setting CWR in the TCP flags, the re-ECN sender
flag of the next packet to "0" as it would do in full RECN mode. MUST blank the RE flag of the next packet to "0" as it would do in
Otherwise, the data sender SHOULD send all other packets with RE set full RECN mode. Otherwise, the data sender SHOULD send all other
to "1". Once a flow is established, a re-ECN data sender in RECN-Co packets with RE set to "1". Once a flow is established, a re-ECN
mode MUST always set the ECN field to ECT(1). data sender in RECN-Co mode MUST always set the ECN field to ECT(1).
If a CE marked packet arrives at the receiver within a round trip If a CE marked packet arrives at the receiver within a round trip
time of a previous mark, the receiver will still be echoing ECE for time of a previous mark, the receiver will still be echoing ECE for
the last CE mark. Therefore, such a mark will be missed by the the last CE mark. Therefore, such a mark will be missed by the
sender. Of course, this isn't of concern for congestion control, but sender. Of course, this isn't of concern for congestion control, but
it does mean that very occasionally the RE blanking fraction will be it does mean that very occasionally the RE blanking fraction will be
understated. Therefore flows in RECN-Co mode may occasionally be understated. Therefore flows in RECN-Co mode may occasionally be
mistaken for very lightly cheating flows and consequently might mistaken for very lightly cheating flows and consequently might
suffer a small number of packet drops through an egress dropper suffer a small number of packet drops through an egress dropper
(Section 6.1.4). We expect re-ECN would be deployed for some time (Section 6.1.4). We expect re-ECN would be deployed for some time
before policers and droppers start to enforce it. So, given there is before policers and droppers start to enforce it. So, given there is
not much ECN deployment yet anyway, this minor problem may affect not much ECN deployment yet anyway, this minor problem may affect
only a very small proportion of flows, reducing to nothing over the only a very small proportion of flows, reducing to nothing over the
years as vanilla ECN hosts upgrade. The use of RECN-Co mode would years as RFC3168 compliant ECN hosts upgrade. The use of RECN-Co
need to be reviewed in the light of experience at the time of re-ECN mode would need to be reviewed in the light of experience at the time
deployment. of re-ECN deployment.
RECN-Co mode is OPTIONAL. Re-ECN implementers who want to keep their RECN-Co mode is OPTIONAL. Re-ECN implementers who want to keep their
code simple, MAY choose not to implement this mode. If they do not, code simple, MAY choose not to implement this mode. If they do not,
a re-ECN sender SHOULD fall back to vanilla ECT mode in the presence a re-ECN sender SHOULD fall back to RFC3168 compliant ECT mode in the
of an ECN-capable receiver. It MAY choose to fall back to the ECT- presence of an ECN-capable receiver. It MAY choose to fall back to
Nonce mode, but if re-ECN implementers don't want to be bothered with the ECT-Nonce mode, but if re-ECN implementers don't want to be
RECN-Co mode, they probably won't want to add an ECT-Nonce mode bothered with RECN-Co mode, they probably won't want to add an ECT-
either. Nonce mode either.
4.1.2.1. Re-ECN support for the ECN Nonce 4.1.2.1. Re-ECN support for the ECN Nonce
A TCP half-connection in RECN-Co mode MUST NOT support the ECN A TCP half-connection in RECN-Co mode MUST NOT support the ECN
Nonce [RFC3540]. This means that the sending code of a re-ECN Nonce [RFC3540]. This means that the sending code of a re-ECN
implementation will never need to include ECN Nonce support. Re-ECN implementation will never need to include ECN Nonce support. Re-ECN
is intended to provide wider protection than the ECN nonce against is intended to provide wider protection than the ECN nonce against
congestion control misbehaviour, and re-ECN only requires support congestion control misbehaviour, and re-ECN only requires support
from the sender, therefore it is preferable to specifically rule out from the sender, therefore it is preferable to specifically rule out
the need for dual sender implementations. As a consequence, a re-ECN the need for dual sender implementations. As a consequence, a re-ECN
capable sender will never set ECT(0), so it will be easier for capable sender will never set ECT(0), so it will be easier for
network elements to discriminate re-ECN traffic flows from other ECN network elements to discriminate re-ECN traffic flows from other ECN
traffic, which will always contain some ECT(0) packets. traffic, which will always contain some ECT(0) packets.
However, a re-ECN implementation MAY OPTIONALLY include receiving However, a re-ECN implementation MAY OPTIONALLY include receiving
code that complies with the ECN Nonce protocol when interacting with code that complies with the ECN Nonce protocol when interacting with
a sender that supports the ECN nonce (rather than re-ECN), but this a sender that supports the ECN nonce (rather than re-ECN), but this
support is NOT REQUIRED. support is not required.
RFC3540 allows an ECN nonce sender to choose whether to sanction a RFC3540 allows an ECN nonce sender to choose whether to sanction a
receiver that does not ever set the nonce sum. Given re-ECN is receiver that does not ever set the nonce sum. Given re-ECN is
intended to provide wider protection than the ECN nonce against intended to provide wider protection than the ECN nonce against
congestion control misbehaviour, implementers of re-ECN receivers MAY congestion control misbehaviour, implementers of re-ECN receivers MAY
choose not to implement backwards compatibility with the ECN nonce choose not to implement backwards compatibility with the ECN nonce
capability. This may be because they deem that the risk of sanctions capability. This may be because they deem that the risk of sanctions
is low, perhaps because significant deployment of the ECN nonce seems is low, perhaps because significant deployment of the ECN nonce seems
unlikely at implementation time. unlikely at implementation time.
4.1.3. Capability Negotiation 4.1.3. Capability Negotiation
During the TCP hand-shake at the start of a connection, an originator During the TCP hand-shake at the start of a connection, an originator
of the connection (host A) with a re-ECN-capable transport MUST of the connection (host A) with a re-ECN-capable transport MUST
indicate it is Re-ECT by setting the TCP options NS=1, CWR=1 and indicate it is Re-ECT by setting the TCP flags NS=1, CWR=1 and ECE=1
ECE=1 in the initial SYN. in the initial SYN.
A responding Re-ECT host (host B) MUST return a SYN ACK with flags A responding Re-ECT host (host B) MUST return a SYN ACK with flags
CWR=1 and ECE=0. The responding host MUST NOT set this combination CWR=1 and ECE=0. The responding host MUST NOT set this combination
of flags unless the preceding SYN has already indicated Re-ECT of flags unless the preceding SYN has already indicated Re-ECT
support as above. A Re-ECT server (B) can use either setting of the support as above. Normally a Re-ECT server (B) will reply to a Re-
NS flag combined with this type of SYN ACK in response to a SYN from ECT client with NS=0, but if the initial SYN from Re-ECT client A is
a Re-ECT client (A). Normally a Re-ECT server will reply to a Re-ECT marked CE(-1), a Re-ECT server B MUST increment its local value of
client with NS=0, but in the special circumstance below it can return ECC. But B cannot reflect the value of ECC in the SYN ACK, because
a SYN ACK with NS=1. it is still using the 3 bits to negotiate connection capabilities.
So, server B MUST set the alternative TCP header flags in its SYN
If the initial SYN from Re-ECT client A is marked CE(-1), a Re-ECT ACK: NS=1, CWR=1 and ECE=0.
server B MUST increment its local value of ECC. But B cannot reflect
the value of ECC in the SYN ACK, because it is still using the 3 bits
to negotiate connection capabilities. So, server B MUST set the
alternative TCP header flags in its SYN ACK: NS=1, CWR=1 and ECE=0.
These handshakes are summarised in Table 5 below, with X meaning These handshakes are summarised in Table 5 below, with X indicating
`don't care'. The handshakes used for the other flavours of ECN are NS can be either 0 or 1 depending on whether congestion had been
experienced. The handshakes used for the other flavours of ECN are
also shown for comparison. To compress the width of the table, the also shown for comparison. To compress the width of the table, the
headings of the first four columns have been severely abbreviated, as headings of the first four columns have been severely abbreviated, as
follows: follows:
R: *R*e-ECT R: *R*e-ECT
N: ECT-*N*once (RFC3540) N: ECT-*N*once (RFC3540)
E: *E*CT (RFC3168) E: *E*CT (RFC3168)
skipping to change at page 22, line 47 skipping to change at page 23, line 6
Responder (B) Responder (B)
As soon as a re-ECN capable TCP server receives a SYN, it MUST set As soon as a re-ECN capable TCP server receives a SYN, it MUST set
its two half-connections into the modes given in Table 5. As soon as its two half-connections into the modes given in Table 5. As soon as
a re-ECN capable TCP client receives a SYN ACK, it MUST set its two a re-ECN capable TCP client receives a SYN ACK, it MUST set its two
half-connections into the modes given in Table 5. The half- half-connections into the modes given in Table 5. The half-
connections will remain in these modes for the rest of the connections will remain in these modes for the rest of the
connection, including for the third segment of TCP's three-way hand- connection, including for the third segment of TCP's three-way hand-
shake (the ACK). shake (the ACK).
{ToDo: Consider SYNs within a connection.} {ToDo: Consider RSTs within a connection.}
Recall that, if the SYN ACK reflects the same flag settings as the Recall that, if the SYN ACK reflects the same flag settings as the
preceding SYN (because there is a broken legacy implementation that preceding SYN (because there is a broken RFC3168 compliant
behaves this way), RFC3168 specifies that the whole connection MUST implementation that behaves this way), RFC3168 specifies that the
revert to Not-ECT. whole connection MUST revert to Not-ECT.
Also note that, whenever the SYN flag of a TCP segment is set Also note that, whenever the SYN flag of a TCP segment is set
(including when the ACK flag is also set), the NS, CWR and ECE flags (including when the ACK flag is also set), the NS, CWR and ECE flags
MUST NOT be interpreted as the 3-bit ECI value, which is only set as ( i.e the ECI field of the SYNACK) MUST NOT be interpreted as the
a copy of the local ECC value in non-SYN packets. 3-bit ECI value, which is only set as a copy of the local ECC value
in non-SYN packets.
4.1.4. Extended ECN (EECN) Field Settings during Flow Start or after 4.1.4. Extended ECN (EECN) Field Settings during Flow Start or after
Idle Periods Idle Periods
If the originator (A) of a TCP connection supports re-ECN it MUST set If the originator (A) of a TCP connection supports re-ECN it MUST set
the extended ECN (EECN) field in the IP header of the initial SYN the extended ECN (EECN) field in the IP header of the initial SYN
packet to the feedback not established (FNE) codepoint. packet to the feedback not established (FNE) codepoint.
FNE is a new extended ECN codepoint defined by this specification FNE is a new extended ECN codepoint defined by this specification
(Section 3.2). The feedback not established (FNE) codepoint is used (Section 3.3). The feedback not established (FNE) codepoint is used
when the transport does not have the benefit of ECN feedback so it when the transport does not have the benefit of ECN feedback so it
cannot decide whether to set or clear the RE flag. cannot decide whether to set or clear the RE flag.
If after receiving a SYN the server B has set its sending half- If after receiving a SYN the server B has set its sending half-
connection into RECN mode or RECN-Co mode, it MUST set the extended connection into RECN mode or RECN-Co mode, it MUST set the extended
ECN field in the IP header of its SYN ACK to the feedback not ECN field in the IP header of its SYN ACK to the feedback not
established (FNE) codepoint. Note the careful wording here, which established (FNE) codepoint. Note the careful wording here, which
means that Re-ECT server B MUST set FNE on a SYN ACK whether it is means that Re-ECT server B MUST set FNE on a SYN ACK whether it is
responding to a SYN from a Re-ECT client or from a client that is responding to a SYN from a Re-ECT client or from a client that is
merely ECN-capable. merely ECN-capable. This is because FNE indicates the transport is
ECN capable.
The original ECN specification [RFC3168] required SYNs and SYN ACKs The original ECN specification [RFC3168] required SYNs and SYN ACKs
to use the Not-ECT codepoint of the ECN field. The aim was to to use the Not-ECT codepoint of the ECN field. The aim was to
prevent well-known DoS attacks such as SYN flooding being able to prevent well-known DoS attacks such as SYN flooding being able to
gain from the advantage that ECN capability afforded over drop at gain from the advantage that ECN capability afforded over drop at
ECN-capable routers. ECN-capable routers.
For a SYN ACK, Kuzmanovic [I-D.ietf-tcpm-ecnsyn] has shown that this For a SYN ACK, Kuzmanovic [I-D.ietf-tcpm-ecnsyn] has shown that this
caution was unnecessary, and proposes to allow a SYN ACK to be ECN- caution was unnecessary, and proposes to allow a SYN ACK to be ECN-
capable to improve performance. We have gone further by proposing to capable to improve performance. By stipulating the FNE codepoint for
make the initial SYN ECN-capable too. By stipulating the FNE the initial SYN, we comply with RFC3168 in word but not in spirit,
codepoint for the initial SYN, we comply with RFC3168 in word but not because we have indeed set the ECN field to Not-ECT, but we have
in spirit, because we have indeed set the ECN field to Not-ECT, but extended the ECN field with another bit. And it will be seen
we have extended the ECN field with another bit. And it will be seen
(Section 5.3) that we have defined one setting of that bit to mean an (Section 5.3) that we have defined one setting of that bit to mean an
ECN-capable transport. Therefore, by proposing that the FNE ECN-capable transport. Therefore, by proposing that the FNE
codepoint MUST be used on the initial SYN of a connection, we have codepoint MUST be used on the initial SYN of a connection, we have
(deliberately) made the initial SYN ECN-capable. Section 5.4 gone further by proposing to make the initial SYN ECN-capable too.
justifies deciding to make the initial SYN ECN-capable. Section 5.4 justifies deciding to make the initial SYN ECN-capable.
Once a TCP half connection is in RECN mode or RECN-Co mode, FNE will Once a TCP half connection is in RECN mode or RECN-Co mode, FNE will
have already been set on the initial SYN and possibly the SYN ACK as have already been set on the initial SYN and possibly the SYN ACK as
above. But each re-ECN sender will have to set FNE cautiously on a above. But each re-ECN sender will have to set FNE cautiously on a
few data packets as well, given a number of packets will usually have few data packets as well, given a number of packets will usually have
to be sent before sufficient congestion feedback is received. The to be sent before sufficient congestion feedback is received. The
behaviour will be different depending on the mode of the half- behaviour will be different depending on the mode of the half-
connection: connection:
RECN mode: Given the constraints on TCP's initial window [RFC3390] RECN mode: Given the constraints on TCP's initial window [RFC3390]
and its exponential window increase during slow start and its exponential window increase during slow start
phase [RFC2581], it turns out that the sender SHOULD set FNE on phase [RFC2581], it turns out that the sender SHOULD set FNE on
the first and third data packets in its flow, assuming equal sized the first and third data packets in its flow after the initial
data packets once a flow is established. Appendix D presents the 3-way handshake, assuming equal sized data packets once a flow is
calculation that led to this conclusion. Below, after running established. Appendix D presents the calculation that led to this
through the start of an example TCP session, we give the intuition conclusion. Below, after running through the start of an example
learned from that calculation. TCP session, we give the intuition learned from that calculation.
RECN-Co mode: A re-ECT sender that switches into re-ECN RECN-Co mode: A re-ECT sender that switches into re-ECN
compatibility mode or into Not-ECT mode (because it has detected compatibility mode or into Not-ECT mode (because it has detected
the corresponding host is not re-ECN capable) MUST limit its the corresponding host is not re-ECN capable) MUST limit its
initial window to 1 segment. The reasoning behind this constraint initial window to 1 segment. The reasoning behind this constraint
is given in Section 5.4. Having set this initial window, a re-ECN is given in Section 5.4. Having set this initial window, a re-ECN
sender in RECN-Co mode SHOULD set FNE on the first and third data sender in RECN-Co mode SHOULD set FNE on the first and third data
packets in a flow, as for RECN mode. packets in a flow, as for RECN mode.
+----+------+----------------+-------+-------+---------------+------+ +----+------+----------------+-------+-------+---------------+------+
skipping to change at page 25, line 21 skipping to change at page 25, line 49
(EECN) field. (EECN) field.
Also shown on the receiving side of the table is the value of the Also shown on the receiving side of the table is the value of the
receiver's echo congestion counter (R.ECC) after processing the receiver's echo congestion counter (R.ECC) after processing the
incoming EECN header. Note that, once a host sets a half-connection incoming EECN header. Note that, once a host sets a half-connection
into RECN mode, it MUST initialise its local value of ECC to zero. into RECN mode, it MUST initialise its local value of ECC to zero.
The intuition that Appendix D gives for why a sender should set FNE The intuition that Appendix D gives for why a sender should set FNE
on the first and third data packets is as follows. At line 13, a on the first and third data packets is as follows. At line 13, a
packet sent by B is shown with an '*', which means it has been packet sent by B is shown with an '*', which means it has been
congestion marked by an intermediate router from RECT to CE(-1). On congestion marked by an intermediate queue from RECT to CE(-1). On
receiving this CE marked packet, client A increments its ECC counter receiving this CE marked packet, client A increments its ECC counter
to 1 as shown. This was the 7th data packet B sent, but before to 1 as shown. This was the 7th data packet B sent, but before
feedback about this event returns to B, it might well have sent many feedback about this event returns to B, it might well have sent many
more packets. Indeed, during exponential slow start, about as many more packets. Indeed, during exponential slow start, about as many
packets will be in flight (unacknowledged) as have been acknowledged. packets will be in flight (unacknowledged) as have been acknowledged.
So, when the feedback from the congestion event on B's 7th segment So, when the feedback from the congestion event on B's 7th segment
returns, B will have sent about 7 further packets that will still be returns, B will have sent about 7 further packets that will still be
in flight. At that stage, B's best estimate of the network's packet in flight. At that stage, B's best estimate of the network's packet
marking fraction will be 1/7. So, as B will have sent about 14 marking fraction will be 1/7. So, as B will have sent about 14
packets, it should have already marked 2 of them as FNE in order to packets, it should have already marked 2 of them as FNE in order to
skipping to change at page 26, line 19 skipping to change at page 26, line 46
that the design of network policers can be deterministic, this that the design of network policers can be deterministic, this
specification deliberately puts an absolute lower limit on how long a specification deliberately puts an absolute lower limit on how long a
connection can be idle before the packet that resumes the connection connection can be idle before the packet that resumes the connection
must be set to FNE, rather than relating it to the connection round must be set to FNE, rather than relating it to the connection round
trip time. We use the lower bound of the retransmission timeout trip time. We use the lower bound of the retransmission timeout
(RTO) [RFC2988], which is commonly used as the idle period before TCP (RTO) [RFC2988], which is commonly used as the idle period before TCP
must reduce to the restart window [RFC2581]. Note our specification must reduce to the restart window [RFC2581]. Note our specification
of re-ECN's idle period is NOT intended to change the idle period for of re-ECN's idle period is NOT intended to change the idle period for
TCP's restart, nor indeed for any other purposes. TCP's restart, nor indeed for any other purposes.
{ToDo: Describe how the sender falls back to legacy modes if packets {ToDo: Describe how the sender falls back to RFC3168 modes if packets
don't appear to be getting through (to work round firewalls don't appear to be getting through (to work round firewalls
discarding packets they consider unusual).} discarding packets they consider unusual).}
4.1.5. Pure ACKS, Retransmissions, Window Probes and Partial ACKs 4.1.5. Pure ACKS, Retransmissions, Window Probes and Partial ACKs
A re-ECN sender MUST clear the RE flag to "0" and set the ECN field A re-ECN sender MUST clear the RE flag to "0" and set the ECN field
to Not-ECT in pure ACKs, retransmissions and window probes, as to Not-ECT in pure ACKs, retransmissions and window probes, as
specified in [RFC3168]. Our eventual goal is for all packets to be specified in [RFC3168]. Our eventual goal is for all packets to be
sent with re-ECN enabled, and we believe the semantics of the ECI sent with re-ECN enabled, and we believe the semantics of the ECI
field go a long way towards being able to achieve this. However, we field go a long way towards being able to achieve this. However, we
skipping to change at page 26, line 46 skipping to change at page 27, line 28
general principle we work to is to remain compatible with TCP's general principle we work to is to remain compatible with TCP's
congestion control which is driven by congestion events at packet congestion control which is driven by congestion events at packet
granularity while at the same time aiming to blank the RE flag on at granularity while at the same time aiming to blank the RE flag on at
least as many octets in a flow as have been marked CE. least as many octets in a flow as have been marked CE.
Therefore, a re-ECN TCP receiver MUST increment its ECC value as many Therefore, a re-ECN TCP receiver MUST increment its ECC value as many
times as CE marked packets have been received. And that value MUST times as CE marked packets have been received. And that value MUST
be echoed to the sender in the first available ACK using the ECI be echoed to the sender in the first available ACK using the ECI
field. This ensures the TCP sender's congestion control receives field. This ensures the TCP sender's congestion control receives
timely feedback on congestion events at the same packet granularity timely feedback on congestion events at the same packet granularity
that they were generated on congested routers. that they were generated on congested queues.
Then, a re-ECN sender stores the difference D between its own ECC Then, a re-ECN sender stores the difference D between its own ECC
value and the incoming ECI field by incrementing a counter R. Then, R value and the incoming ECI field by incrementing a counter R. Then, R
is decremented by 1 each subsequent packet that is sent with the RE is decremented by 1 each subsequent packet that is sent with the RE
flag blanked, until R is no longer positive. Using this technique, flag blanked, until R is no longer positive. Using this technique,
whenever a re-ECN transport sends a not re-ECN capable (NRECN) packet whenever a re-ECN transport sends a not re-ECN capable packet (e.g. a
(e.g. a retransmission), the remaining packets required to have the retransmission), the remaining packets required to have the RE flag
RE flag blanked will be automatically carried over to subsequent blanked will be automatically carried over to subsequent packets,
packets, through the variable R. through the variable R.
This does not ensure precisely the same number of octets have RE This does not ensure precisely the same number of octets have RE
blanked as were CE marked. But we believe positive errors will blanked as were CE marked. But we believe positive errors will
cancel negative over a long enough period. {ToDo: However, more cancel negative over a long enough period. {ToDo: However, more
research is needed to prove whether this is so. If it is not, it may research is needed to prove whether this is so. If it is not, it may
be necessary to increment and decrement R in octets rather than be necessary to increment and decrement R in octets rather than
packets, by incrementing R as the product of D and the size in octets packets, by incrementing R as the product of D and the size in octets
of packets being sent (typically the MSS).} of packets being sent (typically the MSS).}
4.2. Other Transports 4.2. Other Transports
4.2.1. General Guidelines for Adding Re-ECN to Other Transports 4.2.1. General Guidelines for Adding Re-ECN to Other Transports
Re-ECT sender transports that have established the receiver transport As a general rule, Re-ECT sender transports that have established the
is at least ECN-capable (not necessarily re-ECN capable) MUST blank receiver transport is at least ECN-capable (not necessarily re-ECN
the RE codepoint in packets carrying at least as many octets as capable) MUST blank the RE codepoint for at least as many octets as
arrive at receiver with the CE codepoint set. Re-ECN-capable sender arrive at receiver with the CE codepoint set. Re-ECN-capable sender
transports should always initialise the ECN field to the ECT(1) transports should always initialise the ECN field to the ECT(1)
codepoint once a flow is established. codepoint once a flow is established.
If the sender transport does not have sufficient feedback to even If the sender transport does not have sufficient feedback to even
estimate the path's CE rate, it SHOULD set FNE continuously. If the estimate the path's CE rate, it SHOULD set FNE continuously. If the
sender transport has some, perhaps stale, feedback to estimate that sender transport has some, perhaps stale, feedback to estimate that
the path's CE rate is nearly definitely less than E%, the transport the path's CE rate is nearly definitely less than E%, the transport
MAY blank RE in packets for E% of sent octets, and set the RECT MAY blank RE in packets for E% of sent octets, and set the RECT
codepoint for the remainder. codepoint for the remainder.
skipping to change at page 28, line 25 skipping to change at page 29, line 7
4.2.3. Guidelines for adding Re-ECN to DCCP 4.2.3. Guidelines for adding Re-ECN to DCCP
Beside adjusting the initial features negotiation sequence, operating Beside adjusting the initial features negotiation sequence, operating
re-ECN in DCCP [RFC4340] could be achieved by defining a new option re-ECN in DCCP [RFC4340] could be achieved by defining a new option
to be added to acknowledgments, that would include a multibit field to be added to acknowledgments, that would include a multibit field
where the destination could copy its ECC. where the destination could copy its ECC.
4.2.4. Guidelines for adding Re-ECN to SCTP 4.2.4. Guidelines for adding Re-ECN to SCTP
Annex 1 in [RFC2960] gives the specifications for SCTP to support Appendix A in [RFC4960] gives the specifications for SCTP to support
ECN. Similar steps should be taken to support re-ECN. Beside ECN. Similar steps should be taken to support re-ECN. Beside
adjusting the initial features negotiation sequence, operating re-ECN adjusting the initial features negotiation sequence, operating re-ECN
in SCTP could be achieved by defining a new control chunk, that would in SCTP could be achieved by defining a new control chunk, that would
include a multibit field where the destination could copy its ECC include a multibit field where the destination could copy its ECC
5. Network Layer 5. Network Layer
5.1. Re-ECN IPv4 Wire Protocol 5.1. Re-ECN IPv4 Wire Protocol
The wire protocol of the ECN field in the IP header remains largely The wire protocol of the ECN field in the IP header remains largely
unchanged from [RFC3168]. However, an extension to the ECN field we unchanged from [RFC3168]. However, an extension to the ECN field we
call the RE (re-ECN extension) flag (Section 3.2) is defined in this call the RE (Re-ECN extension) flag (Section 3.3) is defined in this
document. It doubles the extended ECN codepoint space, giving 8 document. It doubles the extended ECN codepoint space, giving 8
potential codepoints. The semantics of the extra codepoints are potential codepoints. The semantics of the extra codepoints are
backward compatible with the semantics of the 4 original codepoints backward compatible with the semantics of the 4 original codepoints
[RFC3168] (Section 7.1 collects together and summarises all the [RFC3168] (Section 7.1 collects together and summarises all the
changes defined in this document). changes defined in this document).
For IPv4, this document proposes that the new RE control flag will be For IPv4, this document proposes that the new RE control flag will be
positioned where the `reserved' control flag was at bit 48 of the positioned where the `reserved' control flag was at bit 48 of the
IPv4 header (counting from 0). Alternatively, some would call this IPv4 header (counting from 0). Alternatively, some would call this
bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4 bit 0 (counting from 0) of byte 7 (counting from 1) of the IPv4
skipping to change at page 30, line 21 skipping to change at page 30, line 50
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Next Header | Hdr ext Len | Option Type | Opt Length =4 | | Next Header | Hdr ext Len | Option Type | Opt Length =4 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|R| Reserved for future use | |R| Reserved for future use |
|E| | |E| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 6: Definition of a New IPv6 Congestion Hop by Hop Option Figure 6: Definition of a New IPv6 Congestion Hop by Hop Option
Header containing the Re-ECN Extension (RE) Control Flag Header containing the re-ECN Extension (RE) Control Flag
0 1 2 3 4 5 6 7 8 0 1 2 3 4 5 6 7 8
+-+-+-+-+-+-+-+-+- +-+-+-+-+-+-+-+-+-
|AIU|C|Option ID| |AIU|C|Option ID|
+-+-+-+-+-+-+-+-+- +-+-+-+-+-+-+-+-+-
Figure 7: Congestion Hop by Hop Option Type Encoding Figure 7: Congestion Hop by Hop Option Type Encoding
The Hop-by-Hop Options header enables packets to carry information to The Hop-by-Hop Options header enables packets to carry information to
be examined and processed by routers or nodes along the packet's be examined and processed by routers or nodes along the packet's
delivery path, including the source and destination nodes. For re- delivery path, including the source and destination nodes. For re-
skipping to change at page 30, line 44 skipping to change at page 31, line 25
Congestion extension header MUST be set to "00" meaning if Congestion extension header MUST be set to "00" meaning if
unrecognized `skip over option and continue processing the header'. unrecognized `skip over option and continue processing the header'.
Then, any routers or a receiver not upgraded with the optional re-ECN Then, any routers or a receiver not upgraded with the optional re-ECN
features described in this memo will simply ignore this header. But features described in this memo will simply ignore this header. But
routers with these optional re-ECN features or a re-ECN policing routers with these optional re-ECN features or a re-ECN policing
function, will process this Congestion extension header. function, will process this Congestion extension header.
The `C' flag MUST be set to "1" to specify that the Option Data The `C' flag MUST be set to "1" to specify that the Option Data
(currently only the RE control flag) can change en-route to the (currently only the RE control flag) can change en-route to the
packet's final destination. This ensures that, when an packet's final destination. This ensures that, when an
Authentication header (AH [RFC2402]) is present in the packet, for Authentication header (AH [RFC4302]) is present in the packet, for
any option whose data may change en-route, its entire Option Data any option whose data may change en-route, its entire Option Data
field will be treated as zero-valued octets when computing or field will be treated as zero-valued octets when computing or
verifying the packet's authenticating value. verifying the packet's authenticating value.
Although the RE control flag should not be changed along the path, we Although the RE control flag should not be changed along the path, we
expect that the rest of this option field that is currently `Reserved expect that the rest of this option field that is currently `Reserved
for future use' could be used for a multi-bit congestion notification for future use' could be used for a multi-bit congestion notification
field which we would expect to change en route. As the RE flag does field which we would expect to change en route. As the RE flag does
not need end-to-end authentication, we set the C flag to '1'. not need end-to-end authentication, we set the C flag to '1'.
skipping to change at page 31, line 19 skipping to change at page 31, line 48
5.3. Router Forwarding Behaviour 5.3. Router Forwarding Behaviour
Re-ECN works well without modifying the forwarding behaviour of any Re-ECN works well without modifying the forwarding behaviour of any
routers. However, below, two OPTIONAL changes to forwarding routers. However, below, two OPTIONAL changes to forwarding
behaviour are defined which respectively enhance performance and behaviour are defined which respectively enhance performance and
improve a router's discrimination against flooding attacks. They are improve a router's discrimination against flooding attacks. They are
both OPTIONAL additions that we propose MAY apply by default to all both OPTIONAL additions that we propose MAY apply by default to all
Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN Diffserv per-hop scheduling behaviours (PHBs) [RFC2475] and ECN
marking behaviours [RFC3168]. Specifications for PHBs MAY define marking behaviours [RFC3168]. Specifications for PHBs MAY define
different forwarding behaviours from this default, but this is NOT different forwarding behaviours from this default, but this is not
REQUIRED. [Re-PCN] is one example. required. [Re-PCN] is one example.
FNE indicates ECT: FNE indicates ECT:
The FNE codepoint tells a router to assume that the packet was The FNE codepoint tells a router to assume that the packet was
sent by an ECN-capable transport (see Section 5.4). Therefore an sent by an ECN-capable transport (see Section 5.4). Therefore an
FNE packet MAY be marked rather than dropped. Note that the FNE FNE packet MAY be marked rather than dropped. Note that the FNE
codepoint has been intentionally chosen so that, to legacy routers codepoint has been intentionally chosen so that, to RFC3168
(which do not inspect the RE flag) an FNE packet appears to be compliant routers (which do not inspect the RE flag) an FNE packet
Not-ECT so it will be dropped by legacy AQM algorithms. appears to be Not-ECT so it will be dropped by legacy AQM
algorithms.
A network operator MUST NOT configure a router to ECN mark rather A network operator MUST NOT configure a queue to ECN mark rather
than drop FNE packets unless it can guarantee that FNE packets than drop FNE packets unless it can guarantee that FNE packets
will be rate limited, either locally or upstream. The ingress will be rate limited, either locally or upstream. The ingress
policers discussed in Section 6.1.5 would count as rate limiters policers discussed in Section 6.1.5 would count as rate limiters
for this purpose. for this purpose.
Preferential Drop: If a re-ECN capable router experiences very high Preferential Drop: If a re-ECN capable router queue experiences very
load so that it has to drop arriving packets (e.g. a DoS attack), high load so that it has to drop arriving packets (e.g. a DoS
it MAY preferentially drop packets within the same Diffserv PHB attack), it MAY preferentially drop packets within the same
using the preference order for extended ECN codepoints given in Diffserv PHB using the preference order for extended ECN
Table 7. Preferential dropping can be difficult to implement on codepoints given in Table 7. Preferential dropping can be
some hardware, but if feasible it would discriminate against difficult to implement on some hardware, but if feasible it would
attack traffic if done as part of the overall policing framework discriminate against attack traffic if done as part of the overall
of Section 6.1.3. If nowhere else, routers at the egress of a policing framework of Section 6.1.3. If nowhere else, routers at
network SHOULD implement preferential drop (stronger than the MAY the egress of a network SHOULD implement preferential drop
above). For simplicity, preferences 4 & 5 MAY be merged into one (stronger than the MAY above). For simplicity, preferences 4 & 5
preference level. MAY be merged into one preference level.
+-------+-----+------------+-------+------------+-------------------+ +-------+-----+------------+-------+------------+-------------------+
| ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning | | ECN | RE | Extended | Worth | Drop Pref | Re-ECN meaning |
| field | bit | ECN | | (1 = drop | | | field | bit | ECN | | (1 = drop | |
| | | codepoint | | 1st) | | | | | codepoint | | 1st) | |
+-------+-----+------------+-------+------------+-------------------+ +-------+-----+------------+-------+------------+-------------------+
| 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed | | 01 | 0 | Re-Echo | +1 | 5/4 | Re-echoed |
| | | | | | congestion and | | | | | | | congestion and |
| | | | | | RECT | | | | | | | RECT |
| 00 | 1 | FNE | +1 | 4 | Feedback not | | 00 | 1 | FNE | +1 | 4 | Feedback not |
| | | | | | established | | | | | | | established |
| 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled | | 11 | 0 | CE(0) | 0 | 3 | Re-Echo canceled |
| | | | | | by congestion | | | | | | | by congestion |
| | | | | | experienced | | | | | | | experienced |
| 01 | 1 | RECT | 0 | 3 | Re-ECN capable | | 01 | 1 | RECT | 0 | 3 | Re-ECN capable |
| | | | | | transport | | | | | | | transport |
| 11 | 1 | CE(-1) | -1 | 3 | Congestion | | 11 | 1 | CE(-1) | -1 | 3 | Congestion |
| | | | | | experienced | | | | | | | experienced |
| 10 | 1 | --CU-- | n/a | 2 | Currently Unused | | 10 | 1 | --CU-- | n/a | 2 | Currently Unused |
| 10 | 0 | --- | n/a | 2 | Legacy ECN use | | 10 | 0 | --- | n/a | 2 | RFC3168 ECN use |
| | | | | | only | | | | | | | only |
| 00 | 0 | Not-RECT | n/a | 1 | Not | | 00 | 0 | Not-RECT | n/a | 1 | Not |
| | | | | | re-ECN-capable | | | | | | | Re-ECN-capable |
| | | | | | transport | | | | | | | transport |
+-------+-----+------------+-------+------------+-------------------+ +-------+-----+------------+-------+------------+-------------------+
Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth') Table 7: Drop Preference of EECN Codepoints (Sorted by `Worth')
The above drop preferences are arranged to preserve packets with The above drop preferences are arranged to preserve packets with
more positive worth (Section 3.4), given senders of positive more positive worth (Section 3.5), given senders of positive
packets must have honestly declared downstream congestion. This packets must have honestly declared downstream congestion. This
is explained fully in Section 6 on applications, particularly when is explained fully in Section 6 on applications, particularly when
the application of re-ECN to protect against DDoS attacks is the application of re-ECN to protect against DDoS attacks is
described. described.
5.4. Justification for Setting the First SYN to FNE 5.4. Justification for Setting the First SYN to FNE
Congested routers may mark an FNE packet to CE(-1) (Section 5.3), and the initial SYN MUST be set to FNE by Re-ECT client A (Section 4.1.4)
the initial SYN MUST be set to FNE by Re-ECT client A and (Section 5.3) says a queue MAY optionally treat an FNE packet as
(Section 4.1.4). So an initial SYN may be marked CE(-1) rather than ECN capable, so an initial SYN may be marked CE(-1) rather than
dropped. This seems dangerous, because the sender has not yet dropped. This seems dangerous, because the sender has not yet
established whether the receiver is a legacy one that does not established whether the receiver is a RFC3168 one that does not
understand congestion marking. It also seems to allow malicious understand congestion marking. It also seems to allow malicious
senders to take advantage of ECN marking to avoid so much drop when senders to take advantage of ECN marking to avoid so much drop when
launching SYN flooding attacks. Below we explain the features of the launching SYN flooding attacks. Below we explain the features of the
protocol design that remove both these dangers. protocol design that remove both these dangers.
ECN-capable initial SYN with a Not-ECT server: If the TCP server B ECN-capable initial SYN with a Not-ECT server: If the TCP server B
is re-ECN capable, provision is made for it to feedback a possible is re-ECN capable, provision is made for it to feedback a possible
congestion marked SYN in the SYN ACK (Section 4.1.4). But if the congestion marked SYN in the SYN ACK (Section 4.1.4). But if the
TCP client A finds out from the SYN ACK that the server was not TCP client A finds out from the SYN ACK that the server was not
ECN-capable, the TCP client MUST consider the first SYN as ECN-capable, the TCP client MUST conservatively consider the first
congestion marked before setting itself into Not-ECT mode. SYN as congestion marked before setting itself into Not-ECT mode.
Section 4.1.4 mandates that such a TCP client MUST also set its Section 4.1.4 mandates that such a TCP client MUST also set its
initial window to 1 segment. In this way we remove the need to initial window to 1 segment. In this way we remove the need to
cautiously avoid setting the first SYN to Not-RECT. This will cautiously avoid setting the first SYN to Not-RECT. This will
give worse performance while deployment is patchy, but better give worse performance while deployment is patchy, but better
performance once deployment is widespread. performance once deployment is widespread.
SYN flooding attacks can't exploit ECN-capability: Malicious hosts SYN flooding attacks can't exploit ECN-capability: Malicious hosts
may think they can use the advantage that ECN-marking gives over may think they can use the advantage that ECN-marking gives over
drop in launching classic SYN-flood attacks. But Section 5.3 drop in launching classic SYN-flood attacks. But Section 5.3
mandates that a router MUST only be configured to treat packets mandates that a router MUST only be configured to treat packets
with the FNE codepoint as ECN-capable if FNE packets are rate with the FNE codepoint as ECN-capable if FNE packets are rate
limited. Introduction of the FNE codepoint was a deliberate move limited somewhere. Introduction of the FNE codepoint was a
to enable transport-neutral handling of flow-start and flow state deliberate move to enable transport-neutral handling of flow-start
set-up in the IP layer where it belongs. It then becomes possible and flow state set-up in the IP layer where it belongs. It then
to protect against flooding attacks of all forms (not just SYN becomes possible to protect against flooding attacks of all forms
flooding) without transport-specific inspection for things like (not just SYN flooding) without transport-specific inspection for
the SYN flag in TCP headers. Then, for instance, SYN flooding things like the SYN flag in TCP headers. Then, for instance, SYN
attacks using IPSec ESP encryption can also be rate limited at the flooding attacks using IPSec ESP encryption can also be rate
IP layer. limited at the IP layer.
It might seem pedantic going to all this trouble to enable ECN on the It might seem pedantic going to all this trouble to enable ECN on the
initial packet of a flow, but it is motivated by a much wider concern initial packet of a flow, but it is motivated by a much wider concern
to ensure safe congestion control will still be possible even if the to ensure safe congestion control will still be possible even if the
application mix evolves to the point where the majority of flows application mix evolves to the point where the majority of flows
consist of a single window or even a single packet. It also allows consist of a single window or even a single packet. It also allows
denial of service attacks to be more easily isolated and prevented. denial of service attacks to be more easily isolated and prevented.
5.5. Control and Management 5.5. Control and Management
skipping to change at page 35, line 15 skipping to change at page 36, line 15
flag should be the same as the inner. If it isn't a management alarm flag should be the same as the inner. If it isn't a management alarm
should be raised. This behaviour is the same as the full- should be raised. This behaviour is the same as the full-
functionality variant of [RFC3168] at tunnel exit, but different at functionality variant of [RFC3168] at tunnel exit, but different at
tunnel entry. tunnel entry.
If tunnels are left as they are specified in [RFC3168], whether the If tunnels are left as they are specified in [RFC3168], whether the
limited or full-functionality variants are used, a problem arises limited or full-functionality variants are used, a problem arises
with re-ECN if a tunnel crosses an inter-domain boundary, because the with re-ECN if a tunnel crosses an inter-domain boundary, because the
difference between positive and negative markings will not be difference between positive and negative markings will not be
correctly accounted for. In a limited functionality ECN tunnel, the correctly accounted for. In a limited functionality ECN tunnel, the
flow will appear to be legacy traffic, and therefore may be wrongly flow will appear to be RFC3168 compliant traffic, and therefore may
rate limited. In a full-functionality ECN tunnel, the result will be wrongly rate limited. In a full-functionality ECN tunnel, the
depend whether the tunnel entry copies the inner RE flag to the outer result will depend whether the tunnel entry copies the inner RE flag
header or the RE flag in the outer header is always cleared. If the to the outer header or the RE flag in the outer header is always
former, the flow will tend to be too positive when accounted for at cleared. If the former, the flow will tend to be too positive when
borders. If the latter, it will be too negative. If the rules set accounted for at borders. If the latter, it will be too negative.
out in [ECN-tunnel] are followed then this will not be an issue. If the rules set out in [ECN-tunnel] are followed then this will not
be an issue.
5.7. Non-Issues 5.7. Non-Issues
The following issues might seem to cause unfavourable interactions The following issues might seem to cause unfavourable interactions
with re-ECN, but we will explain why they don't: with re-ECN, but we will explain why they don't:
o Various link layers support explicit congestion notification, such o Various link layers support explicit congestion notification, such
as Frame Relay and ATM. Explicit congestion notification is as Frame Relay and ATM. Explicit congestion notification is
proposed to be added to other link layers, such as Ethernet proposed to be added to other link layers, such as Ethernet
(802.3ar Ethernet congestion management) and MPLS [ECN-MPLS]; (802.3ar Ethernet congestion management) and MPLS [RFC5129];
o Encryption and IPSec. o Encryption and IPSec.
In the case of congestion notification at the link layer, each In the case of congestion notification at the link layer, each
particular link layer scheme either manages congestion on the link particular link layer scheme either manages congestion on the link
with its own link-level feedback (the usual arrangement in the cases with its own link-level feedback (the usual arrangement in the cases
of ATM and Frame Relay), or congestion notification from the link of ATM and Frame Relay), or congestion notification from the link
layer is merged into congestion notification at the IP level when the layer is merged into congestion notification at the IP level when the
frame headers are decapsulated at the end of the link (the frame headers are decapsulated at the end of the link (the
recommended arrangement in the Ethernet and MPLS cases). Given the recommended arrangement in the Ethernet and MPLS cases). Given the
skipping to change at page 36, line 6 skipping to change at page 37, line 7
is processed on the path by subtracting positive from negative is processed on the path by subtracting positive from negative
markings. markings.
In the case of encryption, as long as the tunnel issues described in In the case of encryption, as long as the tunnel issues described in
Section 5.6 are dealt with, payload encryption itself will not be a Section 5.6 are dealt with, payload encryption itself will not be a
problem. The design goal of re-ECN is to include downstream problem. The design goal of re-ECN is to include downstream
congestion in the IP header so that it is not necessary to bury into congestion in the IP header so that it is not necessary to bury into
inner headers. Obfuscation of flow identifiers is not a problem for inner headers. Obfuscation of flow identifiers is not a problem for
re-ECN policing elements. Re-ECN doesn't ever require flow re-ECN policing elements. Re-ECN doesn't ever require flow
identifiers to be valid, it only requires them to be unique. So if identifiers to be valid, it only requires them to be unique. So if
an IPSec encapsulating security payload (ESP [RFC2406]) or an an IPSec encapsulating security payload (ESP [RFC4305]) or an
authentication header (AH [RFC2402]) is used, the security parameters authentication header (AH [RFC4302]) is used, the security parameters
index (SPI) will be a sufficient flow identifier, as it is intended index (SPI) will be a sufficient flow identifier, as it is intended
to be unique to a flow without revealing actual port numbers. to be unique to a flow without revealing actual port numbers.
In general, even if endpoints use some locally agreed scheme to hide In general, even if endpoints use some locally agreed scheme to hide
port numbers, re-ECN policing elements can just consider the pair of port numbers, re-ECN policing elements can just consider the pair of
source and destination IP addresses as the flow identifier. Re-ECN source and destination IP addresses as the flow identifier. Re-ECN
encourages endpoints to at least tell the network layer that a encourages endpoints to at least tell the network layer that a
sequence of packets are all part of the same flow, if indeed they sequence of packets are all part of the same flow, if indeed they
are. The alternative would be for the sender to make each packet are. The alternative would be for the sender to make each packet
appear to be a new flow, which would require them all to be marked appear to be a new flow, which would require them all to be marked
skipping to change at page 39, line 9 skipping to change at page 40, line 9
delay using re-feedback. We give a simple outline of how this could delay using re-feedback. We give a simple outline of how this could
work in Appendix F. However, we do not expect this to be necessary, work in Appendix F. However, we do not expect this to be necessary,
as researchers tend to agree that only congestion control dynamics as researchers tend to agree that only congestion control dynamics
need to depend on RTT, not the rate that the algorithm would converge need to depend on RTT, not the rate that the algorithm would converge
on after a period of stability. on after a period of stability.
Figure 8 sketches the incentive framework that we will describe piece Figure 8 sketches the incentive framework that we will describe piece
by piece throughout this section. We will do a first pass in by piece throughout this section. We will do a first pass in
overview, then return to each piece in detail. We re-use the earlier overview, then return to each piece in detail. We re-use the earlier
example of how downstream congestion is derived by subtracting example of how downstream congestion is derived by subtracting
upstream congestion from path congestion (Figure 1) but depict upstream congestion from path congestion (Figure 2) but depict
multiple trust boundaries to turn it into an internetwork. For multiple trust boundaries to turn it into an internetwork. For
clarity, only downstream congestion is shown (the difference between clarity, only downstream congestion is shown (the difference between
the two earlier plots). The graph displays downstream path the two earlier plots). The graph displays downstream path
congestion seen in a typical flow as it traverses an example path congestion seen in a typical flow as it traverses an example path
from sender S to receiver R, across networks N1, N2 & N4. Everyone from sender S to receiver R, across networks N1, N2 & N3. Everyone
is shown using re-ECN correctly, but we intend to show why everyone is shown using re-ECN correctly, but we intend to show why everyone
would /choose/ to use it correctly, and honestly. would /choose/ to use it correctly, and honestly.
Three main types of self-interest can be identified: Three main types of self-interest can be identified:
o Users want to transmit data across the network as fast as o Users want to transmit data across the network as fast as
possible, paying as little as possible for the privilege. In this possible, paying as little as possible for the privilege. In this
respect, there is no distinction between senders and receivers, respect, there is no distinction between senders and receivers,
but we must be wary of potential malice by one on the other; but we must be wary of potential malice by one on the other;
o Network operators want to maximise revenues from the resources o Network operators want to maximise revenues from the resources
they invest in. They compete amongst themselves for the custom of they invest in. They compete amongst themselves for the custom of
users. users.
o Attackers (whether users or networks) want to use any opportunity o Attackers (whether users or networks) want to use any opportunity
to subvert the new re-ECN system for their own gain or to damage to subvert the new re-ECN system for their own gain or to damage
the service of their victims, whether targeted or random. the service of their victims, whether targeted or random.
policer policer dropper
| | |
| | |
S <-----N1----> <---N2---> <---N4--> R domain S <-----N1----> <---N2---> <---N3--> R domain
| : :
A\|/: :
| V : :
3% |---------+ :
| : | :
2% | : +-----------------------+ :
| : downstream congestion | :
1% | : | :
| : | :
0% +---------------------------------+=====-->
0 i ^ resource index
| | /|\
1.00% 2.00% | marking fraction
| |
dropper 3% |---------+
| |
2% | +-----------------------+
| downstream congestion |
1% | |
| |
0% +---------------------------------+======
0 i
Figure 8: Incentive Framework, showing creation of opposing pressures Figure 8: Incentive Framework, showing creation of opposing pressures
to under-declare and over-declare downstream congestion, using a to under-declare and over-declare downstream congestion, using a
policer and a dropper policer and a dropper
Source congestion control: We want to ensure that the sender will Source congestion control: We want to ensure that the sender will
throttle its rate as downstream congestion increases. Whatever throttle its rate as downstream congestion increases. Whatever
the agreed congestion response (whether TCP-compatible or some the agreed congestion response (whether TCP-compatible or some
enhanced QoS), to some extent it will always be against the enhanced QoS), to some extent it will always be against the
sender's interest to comply. sender's interest to comply.
skipping to change at page 41, line 9 skipping to change at page 42, line 6
Edge egress dropper: If the policer ensures the source has less Edge egress dropper: If the policer ensures the source has less
right to a high rate the higher it declares downstream congestion, right to a high rate the higher it declares downstream congestion,
the source has a clear incentive to understate downstream the source has a clear incentive to understate downstream
congestion. But, if flows of packets are understated when they congestion. But, if flows of packets are understated when they
enter the internetwork, they will have become negative by the time enter the internetwork, they will have become negative by the time
they leave. So, we introduce a dropper at the last network they leave. So, we introduce a dropper at the last network
egress, which drops packets in flows that persistently declare egress, which drops packets in flows that persistently declare
negative downstream congestion (see Section 6.1.4 for details). negative downstream congestion (see Section 6.1.4 for details).
..competitive routing
.' : '.
.' p e n a l:t i e s '.
: | : \ :
A : | : | :
|S <-----N1----> <---N2---> <---N4--> R domain
| : | : | :
| V | : | :
3% |--------+ | : | :
| | V V V V
2% | +-----------------------+
| downstream congestion |
1% | : |
| : |
0% +--------------------------------+=====-->
0 ^ i resource index
| /|\ |
1.00% | 2.00% marking fraction
|
sanctions
Figure 9: Incentives at Inter-domain Borders
Inter-domain traffic policing: But next we must ask, if congestion Inter-domain traffic policing: But next we must ask, if congestion
arises downstream (say in N4), what is the ingress network's arises downstream (say in N3), what is the ingress network's
(N1's) incentive to police its customers' response? If N1 turns a (N1's) incentive to police its customers' response? If N1 turns a
blind eye, its own customers benefit while other networks suffer. blind eye, its own customers benefit while other networks suffer.
This is why all inter-domain QoS architectures (e.g. Intserv, This is why all inter-domain QoS architectures (e.g. Intserv,
Diffserv) police traffic each time it crosses a trust boundary. Diffserv) police traffic each time it crosses a trust boundary.
We have already shown that re-ECN gives a trustworthy measure of We have already shown that re-ECN gives a trustworthy measure of
the expected downstream congestion that a flow will cause by the expected downstream congestion that a flow will cause by
subtracting negative volume from positive at any intermediate subtracting negative volume from positive at any intermediate
point on a path. N4 (say) can use this measure to police all the point on a path. N3 (say) can use this measure to police all the
responses to congestion of all the sources beyond its upstream responses to congestion of all the sources beyond its upstream
neighbour (N2), but in bulk with one very simple passive neighbour (N2), but in bulk with one very simple passive
mechanism, rather than per flow, as we will now explain using mechanism, rather than per flow, as we will now explain.
Figure 9.
Emulating policing with inter-domain congestion penalties: Between Emulating policing with inter-domain congestion penalties: Between
high-speed networks, we would rather avoid per-flow policing, and high-speed networks, we would rather avoid per-flow policing, and
we would rather avoid holding back traffic while it is policed. we would rather avoid holding back traffic while it is policed.
Instead, once re-ECN has arranged headers to carry downstream Instead, once re-ECN has arranged headers to carry downstream
congestion honestly, N2 can contract to pay N4 penalties in congestion honestly, N2 can contract to pay N3 penalties in
proportion to a single bulk count of the congestion metrics proportion to a single bulk count of the congestion metrics
crossing their mutual trust boundary (Section 6.1.6). In this crossing their mutual trust boundary (Section 6.1.6). In this
way, N4 puts pressure on N2 to suppress downstream congestion, for way, N3 puts pressure on N2 to suppress downstream congestion, for
every flow passing through the border interface, even though they every flow passing through the border interface, even though they
will all start and end in different places, and even though they will all start and end in different places, and even though they
may all be allowed different responses to congestion. The figure may all be allowed different responses to congestion. The figure
depicts this downward pressure on N2 by the solid downward arrow depicts this downward pressure on N2 by the solid downward arrow
at the egress of N2. Then N2 has an incentive either to police at the egress of N2. Then N2 has an incentive either to police
the congestion response of its own ingress traffic (from N1) or to the congestion response of its own ingress traffic (from N1) or to
emulate policing by applying penalties to N1 in turn on the basis emulate policing by applying penalties to N1 in turn on the basis
of congestion counted at their mutual boundary. In this recursive of congestion counted at their mutual boundary. In this recursive
way, the incentives for each flow to respond correctly to way, the incentives for each flow to respond correctly to
congestion trace back with each flow precisely to each source, congestion trace back with each flow precisely to each source,
despite the mechanism not recognising flows (see Section 6.2.2). despite the mechanism not recognising flows (see Section 6.2.2).
Inter-domain congestion charging diversity: Any two networks are Inter-domain congestion charging diversity: Any two networks are
free to agree any of a range of penalty regimes between themselves free to agree any of a range of penalty regimes between themselves
but they would only provide the right incentives if they were but they would only provide the right incentives if they were
within the following reasonable constraints. N2 should expect to within the following reasonable constraints. N2 should expect to
have to pay penalties to N4 where penalties monotonically increase have to pay penalties to N3 where penalties monotonically increase
with the volume of congestion and negative penalties are not with the volume of congestion and negative penalties are not
allowed. For instance, they may agree an SLA with tiered allowed. For instance, they may agree an SLA with tiered
congestion thresholds, where higher penalties apply the higher the congestion thresholds, where higher penalties apply the higher the
threshold that is broken. But the most obvious (and useful) form threshold that is broken. But the most obvious (and useful) form
of penalty is where N4 levies a charge on N2 proportional to the of penalty is where N3 levies a charge on N2 proportional to the
volume of downstream congestion N2 dumps into N4. In the volume of downstream congestion N2 dumps into N3. In the
explanation that follows, we assume this specific variant of explanation that follows, we assume this specific variant of
volume charging between networks - charging proportionate to the volume charging between networks - charging proportionate to the
volume of congestion. volume of congestion.
We must make clear that we are not advocating that everyone should We must make clear that we are not advocating that everyone should
use this form of contract. We are well aware that the IETF tries use this form of contract. We are well aware that the IETF tries
to avoid standardising technology that depends on a particular to avoid standardising technology that depends on a particular
business model. And we strongly share this desire to encourage business model. And we strongly share this desire to encourage
diversity. But our aim is merely to show that border policing can diversity. But our aim is merely to show that border policing can
at least work with this one model, then we can assume that at least work with this one model, then we can assume that
skipping to change at page 43, line 28 skipping to change at page 44, line 4
inter-domain congestion charging, a domain seems to have a inter-domain congestion charging, a domain seems to have a
perverse incentive to fake congestion; N2's profit depends on the perverse incentive to fake congestion; N2's profit depends on the
difference between congestion at its ingress (its revenue) and at difference between congestion at its ingress (its revenue) and at
its egress (its cost). So, overstating internal congestion seems its egress (its cost). So, overstating internal congestion seems
to increase profit. However, smart border routing [Smart_rtg] by to increase profit. However, smart border routing [Smart_rtg] by
N1 will bias its routing towards the least cost routes. So, N2 N1 will bias its routing towards the least cost routes. So, N2
risks losing all its revenue to competitive routes if it risks losing all its revenue to competitive routes if it
overstates congestion (see Section 6.2.3). In other words, if N2 overstates congestion (see Section 6.2.3). In other words, if N2
is the least congested route, its ability to raise excess profits is the least congested route, its ability to raise excess profits
is limited by the congestion on the next least congested route. is limited by the congestion on the next least congested route.
This pressure on N2 to remain competitive is represented by the
dotted downward arrow at the ingress to N2 in Figure 9.
Closing the loop: All the above elements conspire to trap everyone Closing the loop: All the above elements conspire to trap everyone
between two opposing pressures (the downward and upward arrows in between two opposing pressures, ensuring the downstream congestion
Figure 8 & Figure 9), ensuring the downstream congestion metric metric arrives at the destination neither above nor below zero.
arrives at the destination neither above nor below zero. So, we So, we have arrived back where we started in our argument. The
have arrived back where we started in our argument. The ingress ingress edge network can rely on downstream congestion declared in
edge network can rely on downstream congestion declared in the the packet headers presented by the sender. So it can police the
packet headers presented by the sender. So it can police the
sender's congestion response accordingly. sender's congestion response accordingly.
Evolvability of congestion control: We have seen that re-ECN enables Evolvability of congestion control: We have seen that re-ECN enables
policing at the very first ingress. We have also seen that, as policing at the very first ingress. We have also seen that, as
flows continue on their path through further networks downstream, flows continue on their path through further networks downstream,
re-ECN removes the need for further per-domain ingress policing of re-ECN removes the need for further per-domain ingress policing of
all the different congestion responses allowed to each different all the different congestion responses allowed to each different
flow. This is why the evolvability of re-ECN policing is so flow. This is why the evolvability of re-ECN policing is so
superior to bottleneck policing or to any policing of different superior to bottleneck policing or to any policing of different
QoS for different flows. Even if all access networks choose to QoS for different flows. Even if all access networks choose to
skipping to change at page 44, line 35 skipping to change at page 45, line 8
except only the volume of packets marked with congestion experienced except only the volume of packets marked with congestion experienced
(CE) was counted. (CE) was counted.
However, below we explain why relying on classic feedback /required/ However, below we explain why relying on classic feedback /required/
congestion charging to be used, while re-ECN achieves the same congestion charging to be used, while re-ECN achieves the same
powerful outcome (given it is built on Kelly's foundations), but does powerful outcome (given it is built on Kelly's foundations), but does
not /require/ congestion charging. In brief, the problem with not /require/ congestion charging. In brief, the problem with
classic feedback is that the incentives have to trace the indirect classic feedback is that the incentives have to trace the indirect
path back to the sender---the long way round the feedback loop. For path back to the sender---the long way round the feedback loop. For
example, if classic feedback were used in Figure 8, N2 would have had example, if classic feedback were used in Figure 8, N2 would have had
to influence N1 via all of N4, R & S rather than directly. to influence N1 via all of N3, R & S rather than directly.
Inability to agree what is happening downstream: In order to police Inability to agree what is happening downstream: In order to police
its upstream neighbour's congestion response, the neighbours its upstream neighbour's congestion response, the neighbours
should be able to agree on the congestion to be responded to. should be able to agree on the congestion to be responded to.
Whatever the feedback regime, as packets change hands at each Whatever the feedback regime, as packets change hands at each
trust boundary, any path metrics they carry are verifiable by both trust boundary, any path metrics they carry are verifiable by both
neighbours. But, with a classic path metric, they can only agree neighbours. But, with a classic path metric, they can only agree
on the /upstream/ path congestion. on the /upstream/ path congestion.
Inaccessible back-channel: The network needs a whole-path congestion Inaccessible back-channel: The network needs a whole-path congestion
skipping to change at page 45, line 37 skipping to change at page 46, line 10
using the safer `sender pays' model. However, congestion charging is using the safer `sender pays' model. However, congestion charging is
only likely to be appropriate between domains. So, without losing only likely to be appropriate between domains. So, without losing
evolvability, re-ECN enables technical policing mechanisms that are evolvability, re-ECN enables technical policing mechanisms that are
more appropriate for end users than congestion pricing. more appropriate for end users than congestion pricing.
We now take a second pass over the incentive framework, filling in We now take a second pass over the incentive framework, filling in
the detail. the detail.
6.1.4. Egress Dropper 6.1.4. Egress Dropper
As traffic leaves the last network before the receiver (domain N4 in As traffic leaves the last network before the receiver (domain N3 in
Figure 8), the fraction of positive octets in a flow should match the Figure 8), the fraction of positive octets in a flow should match the
fraction of negative octets introduced by congestion marking, leaving fraction of negative octets introduced by congestion marking, leaving
a balance of zero. If it is less (a negative flow), it implies that a balance of zero. If it is less (a negative flow), it implies that
the source is understating path congestion (which will reduce the the source is understating path congestion (which will reduce the
penalties that N2 owes N4). penalties that N2 owes N3).
If flows are positive, N4 need take no action---this simply means its If flows are positive, N3 need take no action---this simply means its
upstream neighbour is paying more penalties than it needs to, and the upstream neighbour is paying more penalties than it needs to, and the
source is going slower than it needs to. But, to protect itself source is going slower than it needs to. But, to protect itself
against persistently negative flows, N4 will need to install a against persistently negative flows, N3 will need to install a
dropper at its egress. Appendix E gives a suggested algorithm for dropper at its egress. Appendix E gives a suggested algorithm for
this dropper. There is no intention that the dropper algorithm needs this dropper. There is no intention that the dropper algorithm needs
to be standardised, it is merely provided to show that an efficient, to be standardised, it is merely provided to show that an efficient,
robust algorithm is possible. But whatever algorithm is used must robust algorithm is possible. But whatever algorithm is used must
meet the criteria below: meet the criteria below:
o It SHOULD introduce minimal false positives for honest flows; o It SHOULD introduce minimal false positives for honest flows;
o It SHOULD quickly detect and sanction dishonest flows (minimal o It SHOULD quickly detect and sanction dishonest flows (minimal
false negatives); false negatives);
skipping to change at page 48, line 35 skipping to change at page 49, line 7
Of course, even if the sender does operate its own network, it may Of course, even if the sender does operate its own network, it may
arrange not to congestion mark traffic. Whether the sender does this arrange not to congestion mark traffic. Whether the sender does this
or not is of no concern to anyone else except the sender. Such a or not is of no concern to anyone else except the sender. Such a
sender will not be policed against its own network's contribution to sender will not be policed against its own network's contribution to
congestion, but the only resulting problem would be overload in the congestion, but the only resulting problem would be overload in the
sender's own network. sender's own network.
Finally, we must not forget that an easy way to circumvent re-ECN's Finally, we must not forget that an easy way to circumvent re-ECN's
defences is for the source to turn off re-ECN support, by setting the defences is for the source to turn off re-ECN support, by setting the
Not-RECT codepoint, implying legacy traffic. Therefore an ingress Not-RECT codepoint, implying RFC3168 compliant traffic. Therefore an
policer should put a general rate-limit on Not-RECT traffic, which ingress policer should put a general rate-limit on Not-RECT traffic,
SHOULD be lax during early, patchy deployment, but will have to which SHOULD be lax during early, patchy deployment, but will have to
become stricter as deployment widens. Similarly, flows starting become stricter as deployment widens. Similarly, flows starting
without an FNE packet can be confined by a strict rate-limit used for without an FNE packet can be confined by a strict rate-limit used for
the remainder of flows that haven't proved they are well-behaved by the remainder of flows that haven't proved they are well-behaved by
starting correctly (therefore they need not consume any flow state--- starting correctly (therefore they need not consume any flow state---
they are just confined to the `misbehaving' bin if they carry an they are just confined to the `misbehaving' bin if they carry an
unrecognised flow ID). unrecognised flow ID).
6.1.6. Inter-domain Policing 6.1.6. Inter-domain Policing
One of the main design goals of re-ECN is for border security One of the main design goals of re-ECN is for border security
skipping to change at page 51, line 39 skipping to change at page 52, line 9
Once an unbiased estimate of the effect of negative flows can be Once an unbiased estimate of the effect of negative flows can be
made, the problem reduces to detecting and preferably removing flows made, the problem reduces to detecting and preferably removing flows
that have gone negative as soon as possible. But importantly, that have gone negative as soon as possible. But importantly,
complete eradication of negative flows is no longer critical---best complete eradication of negative flows is no longer critical---best
endeavours will be sufficient. endeavours will be sufficient.
For instance, let us consider the case where a source sends traffic For instance, let us consider the case where a source sends traffic
with no positive markings at all, hoping to at least get as much with no positive markings at all, hoping to at least get as much
traffic delivered as network-based droppers will allow. The flow is traffic delivered as network-based droppers will allow. The flow is
likely to go at least slightly negative in the first network on the likely to go at least slightly negative in the first network on the
path (N1 if we use the example network layout in Figure 9). If all path (N1 if we use the example network layout in Figure 8). If all
networks use the algorithm in Appendix H.2 to inflate penalties at networks use the algorithm in Appendix H.2 to inflate penalties at
their border with an upstream network, they will remove the effect of their border with an upstream network, they will remove the effect of
negative flows. So, for instance, N2 will not be paying a penalty to negative flows. So, for instance, N2 will not be paying a penalty to
N1 for this flow. Further, because the flow contributes no positive N1 for this flow. Further, because the flow contributes no positive
markings at all, a dropper at the egress will completely remove it. markings at all, a dropper at the egress will completely remove it.
The remaining problem is that every network is carrying a flow that The remaining problem is that every network is carrying a flow that
is causing congestion to others but not being held to account for the is causing congestion to others but not being held to account for the
congestion it is causing. Whenever the fail-safe border algorithm congestion it is causing. Whenever the fail-safe border algorithm
(Section 6.1.7) or the border algorithm to compensate for negative (Section 6.1.7) or the border algorithm to compensate for negative
flows (Appendix H.2) detects a negative flow, it can instantiate a flows (Appendix H.2) detects a negative flow, it can instantiate a
focused dropper for that flow locally. It may be some time before focused dropper for that flow locally. It may be some time before
the flow is detected, but the more strongly negative the flow is, the the flow is detected, but the more strongly negative the flow is, the
more quickly it will be detected by the fail-safe algorithm. But, in more quickly it will be detected by the fail-safe algorithm. But, in
the meantime, it will not be distorting border incentives. Until it the meantime, it will not be distorting border incentives. Until it
is detected, if it contributes to drop anywhere, its packets will is detected, if it contributes to drop anywhere, its packets will
tend to be dropped before others if routers use the preferential drop tend to be dropped before others if queues use the preferential drop
rules in Section 5.3, which discriminate against non-positive rules in Section 5.3, which discriminate against non-positive
packets. All networks below the point where a flow goes negative packets. All networks below the point where a flow goes negative
(N1, N2 and N4 in this case) have an incentive to remove this flow, (N1, N2 and N3 in this case) have an incentive to remove this flow,
but the router where it first goes negative (in N1) can of course but the queue where it first goes negative (in N1) can of course
remove the problem for everyone downstream. remove the problem for everyone downstream.
In the case of DDoS attacks, Section 6.2.1 describes how re-ECN In the case of DDoS attacks, Section 6.2.1 describes how re-ECN
mitigates their force. mitigates their force.
6.1.7. Inter-domain Fail-safes 6.1.7. Inter-domain Fail-safes
The mechanisms described so far create incentives for rational The mechanisms described so far create incentives for rational
network operators to behave. That is, one operator aims to make network operators to behave. That is, one operator aims to make
another behave responsibly by applying penalties and expects a another behave responsibly by applying penalties and expects a
skipping to change at page 53, line 21 skipping to change at page 53, line 41
6.2. Other Applications 6.2. Other Applications
6.2.1. DDoS Mitigation 6.2.1. DDoS Mitigation
A flooding attack is inherently about congestion of a resource. A flooding attack is inherently about congestion of a resource.
Because re-ECN ensures the sources causing network congestion Because re-ECN ensures the sources causing network congestion
experience the cost of their own actions, it acts as a first line of experience the cost of their own actions, it acts as a first line of
defence against DDoS. As load focuses on a victim, upstream queues defence against DDoS. As load focuses on a victim, upstream queues
grow, requiring honest sources to pre-load packets with a higher grow, requiring honest sources to pre-load packets with a higher
fraction of positive packets. Once downstream routers are so fraction of positive packets. Once downstream queues are so
congested that they are dropping traffic, they will be CE marking the congested that they are dropping traffic, they will be CE marking the
traffic they do forward 100%. Honest sources will therefore be traffic they do forward 100%. Honest sources will therefore be
sending Re-Echo 100% (and therefore being severely rate-limited at sending Re-Echo 100% (and therefore being severely rate-limited at
the ingress). the ingress).
Senders under malicious control can either do the same as honest Senders under malicious control can either do the same as honest
sources, and be rate-limited at ingress, or they can understate sources, and be rate-limited at ingress, or they can understate
congestion by sending more neutral RECT packets than they should. If congestion by sending more neutral RECT packets than they should. If
sources understate congestion (i.e. do not re-echo sufficient sources understate congestion (i.e. do not re-echo sufficient
positive packets) and the preferential drop ranking is implemented on positive packets) and the preferential drop ranking is implemented on
routers (Section 5.3), these routers will preserve positive traffic queues (Section 5.3), these queues will preserve positive traffic
until last. So, the neutral traffic from malicious sources will all until last. So, the neutral traffic from malicious sources will all
be automatically dropped first. Either way, the malicious sources be automatically dropped first. Either way, the malicious sources
cannot send more than honest sources. cannot send more than honest sources.
Further, hosts under malicious control will tend to be re-used for Further, hosts under malicious control will tend to be re-used for
many different attacks. They will therefore build up a long term many different attacks. They will therefore build up a long term
history of causing congestion. Therefore, as long as the population history of causing congestion. Therefore, as long as the population
of potentially compromisable hosts around the Internet is limited, of potentially compromisable hosts around the Internet is limited,
the per-user policing algorithms in Appendix G.1 will gradually the per-user policing algorithms in Appendix G.1 will gradually
throttle down zombies and other launchpads for attacks. Therefore, throttle down zombies and other launchpads for attacks. Therefore,
skipping to change at page 55, line 32 skipping to change at page 56, line 10
o We are considering the issue of whether it would be useful to o We are considering the issue of whether it would be useful to
truncate rather than drop packets that appear to be malicious, so truncate rather than drop packets that appear to be malicious, so
that the feedback loop is not broken but useful data can be that the feedback loop is not broken but useful data can be
removed. removed.
7. Incremental Deployment 7. Incremental Deployment
7.1. Incremental Deployment Features 7.1. Incremental Deployment Features
The design of the re-ECN protocol started from the fact that the The design of the re-ECN protocol started from the fact that the
current ECN marking behaviour of routers was sufficient and that re- current ECN marking behaviour of queues was sufficient and that re-
feedback could be introduced around these routers by changing the feedback could be introduced around these queues by changing the
sender behaviour but not the routers. Otherwise, if we had required sender behaviour but not the routers. Otherwise, if we had required
routers to be changed, the chance of encountering a path that had routers to be changed, the chance of encountering a path that had
every router upgraded would be vanishly small during early every router upgraded would be vanishly small during early
deployment, giving no incentive to start deployment. Also, as there deployment, giving no incentive to start deployment. Also, as there
is no new forwarding behaviour, routers and hosts do not have to is no new forwarding behaviour, routers and hosts do not have to
signal or negotiate anything. signal or negotiate anything.
However, networks that choose to protect themselves using re-ECN do However, networks that choose to protect themselves using re-ECN do
have to add new security functions at their trust boundaries with have to add new security functions at their trust boundaries with
others. They distinguish legacy traffic by its ECN field. Traffic others. They distinguish legacy traffic by its ECN field. Traffic
from Not-ECT transports is distinguishable by its Not-RECT marking. from Not-ECT transports is distinguishable by its Not-ECT marking.
Traffic from legacy ECN transports is distinguished from re-ECN by Traffic from RFC3168 compliant ECN transports is distinguished from
which of ECT(0) or ECT(1) is used. We chose to use ECT(1) for re-ECN re-ECN by which of ECT(0) or ECT(1) is used. We chose to use ECT(1)
traffic deliberately. Existing ECN sources set ECT(0) on either 50% for re-ECN traffic deliberately. Existing ECN sources set ECT(0) on
(the nonce) or 100% (the default) of packets, whereas re-ECN does not either 50% (the nonce) or 100% (the default) of packets, whereas re-
use ECT(0) at all. We can use this distinguishing feature of legacy ECN does not use ECT(0) at all. We can use this distinguishing
ECN traffic to separate it out for different treatment at the various feature of RFC3168 compliant ECN traffic to separate it out for
border security functions: egress dropping, ingress policing and different treatment at the various border security functions: egress
border policing. dropping, ingress policing and border policing.
The general principle we adopt is that an egress dropper will not The general principle we adopt is that an egress dropper will not
drop any legacy traffic, but ingress and border policers will limit drop any legacy traffic, but ingress and border policers will limit
the bulk rate of legacy traffic that can enter each network. Then, the bulk rate of legacy traffic (Not-ECT, ECT(0) and those amrked
during early re-ECN deployment, operators can set very permissive (or with the unused codepoint) that can enter each network. Then, during
non-existent) rate-limits on legacy traffic, but once re-ECN early re-ECN deployment, operators can set very permissive (or non-
existent) rate-limits on legacy traffic, but once re-ECN
implementations are generally available, legacy traffic can be rate- implementations are generally available, legacy traffic can be rate-
limited increasingly harshly. Ultimately, an operator might choose limited increasingly harshly. Ultimately, an operator might choose
to block all legacy traffic entering its network, or at least only to block all legacy traffic entering its network, or at least only
allow through a trickle. allow through a trickle.
Then, as the limits are set more strictly, the more legacy ECN Then, as the limits are set more strictly, the more RFC3168 ECN
sources will gain by upgrading to re-ECN. Thus, towards the end of sources will gain by upgrading to re-ECN. Thus, towards the end of
the voluntary incremental deployment period, legacy transports can be the voluntary incremental deployment period, RFC3168 compliant
given progressively stronger encouragement to upgrade. transports can be given progressively stronger encouragement to
upgrade.
The following list of minor changes, brings together all the points The following list of minor changes, brings together all the points
where Re-ECN semantics for use of the two-bit ECN field are different where re-ECN semantics for use of the two-bit ECN field are different
compared to RFC3168: compared to RFC3168:
o A re-ECN sender sets ECT(1) by default, whereas an RFC3168 sender o A re-ECN sender sets ECT(1) by default, whereas an RFC3168 sender
sets ECT(0) by default (Section 3.3); sets ECT(0) by default (Section 3.4);
o No provision is necessary for a re-ECN capable source transport to o No provision is necessary for a re-ECN capable source transport to
use the ECN nonce (Section 4.1.2.1); use the ECN nonce (Section 4.1.2.1);
o Routers MAY preferentially drop different extended ECN codepoints o Routers MAY preferentially drop different extended ECN codepoints
(Section 5.3); (Section 5.3);
o Packets carrying the feedback not established (FNE) codepoint MAY o Packets carrying the feedback not established (FNE) codepoint MAY
optionally be marked rather than dropped by routers, even though optionally be marked rather than dropped by routers, even though
their ECN field is Not-ECT (with the important caveat in their ECN field is Not-ECT (with the important caveat in
skipping to change at page 57, line 44 skipping to change at page 58, line 20
Deployment that requires co-ordination adds cost and delay and Deployment that requires co-ordination adds cost and delay and
tends to dilute any competitive advantage that might be gained. tends to dilute any competitive advantage that might be gained.
* ECN `only' gives a performance improvement. Making a product a * ECN `only' gives a performance improvement. Making a product a
bit faster (whether the product is a device or a network), bit faster (whether the product is a device or a network),
isn't usually a sufficient selling point to be worth the cost isn't usually a sufficient selling point to be worth the cost
of co-ordinating across the industry to deploy it. Network of co-ordinating across the industry to deploy it. Network
operators tend to avoid re-configuring a working network unless operators tend to avoid re-configuring a working network unless
launching a new product. launching a new product.
ECN and re-ECN for Edge-to-edge Assured QoS: ECN and Re-ECN for Edge-to-edge Assured QoS:
We believe the proposal to provide assured QoS sessions using a We believe the proposal to provide assured QoS sessions using a
form of ECN called pre-congestion notification (PCN) [PCN-arch] is form of ECN called pre-congestion notification (PCN) [PCN-arch] is
most likely to break the deadlock in ECN deployment first. It most likely to break the deadlock in ECN deployment first. It
only requires edge-to-edge deployment so it does not require only requires edge-to-edge deployment so it does not require
endpoint support. It can be deployed in a single network, then endpoint support. It can be deployed in a single network, then
grow incrementally to interconnected networks. And it provides a grow incrementally to interconnected networks. And it provides a
different `product' (internetworked assured QoS), rather than different `product' (internetworked assured QoS), rather than
merely making an existing product a bit faster. merely making an existing product a bit faster.
Not only could this assured QoS application kick-start ECN Not only could this assured QoS application kick-start ECN
deployment, it could also carry re-ECN deployment with it; because deployment, it could also carry re-ECN deployment with it; because
re-ECN can enable the assured QoS region to expand to a large re-ECN can enable the assured QoS region to expand to a large
internetwork where neighbouring networks do not trust each other. internetwork where neighbouring networks do not trust each other.
[Re-PCN] argues that re-ECN security should be built in to the QoS [Re-PCN] argues that re-ECN security should be built in to the QoS
system from the start, explaining why and how. system from the start, explaining why and how.
If ECN and re-ECN were deployed edge-to-edge for assured QoS, If ECN and re-ECN were deployed edge-to-edge for assured QoS,
operators would gain valuable experience. They would also clear operators would gain valuable experience. They would also clear
away many technical obstacles such as firewall configurations that away many technical obstacles such as firewall configurations that
block all but the legacy settings of the ECN field and the RE block all but the RFC3168 settings of the ECN field and the RE
flag. flag.
ECN in Access Networks: ECN in Access Networks:
The next obstacle to ECN deployment would be extension to access The next obstacle to ECN deployment would be extension to access
and backhaul networks, where considerable link layer differences and backhaul networks, where considerable link layer differences
makes implementation non-trivial, particularly on congested makes implementation non-trivial, particularly on congested
wireless links. ECN and re-ECN work fine during partial wireless links. ECN and re-ECN work fine during partial
deployment, but they will not be very useful if the most congested deployment, but they will not be very useful if the most congested
elements in networks are the last to support them. Access network elements in networks are the last to support them. Access network
skipping to change at page 60, line 44 skipping to change at page 61, line 21
So, if re-ECN were stipulated for cellular devices, it would So, if re-ECN were stipulated for cellular devices, it would
automatically appear in those devices connected to the wireless automatically appear in those devices connected to the wireless
fringes of fixed networks if they coupled cellular with WiFi or fringes of fixed networks if they coupled cellular with WiFi or
Bluetooth technology, for instance. Also, once implemented in the Bluetooth technology, for instance. Also, once implemented in the
operating system of one mobile device, it would tend to be found operating system of one mobile device, it would tend to be found
in other devices using the same family of operating system. in other devices using the same family of operating system.
Therefore, whether or not a fixed network deployed ECN, or Therefore, whether or not a fixed network deployed ECN, or
deployed re-ECN policers and droppers, many of its hosts might deployed re-ECN policers and droppers, many of its hosts might
well be using re-ECN over it. Indeed, they would be at an well be using re-ECN over it. Indeed, they would be at an
advantage when communicating with hosts across Re-ECN policed advantage when communicating with hosts across re-ECN policed
networks that rate limited Not-RECT traffic. networks that rate limited Not-RECT traffic.
Other possible scenarios: Other possible scenarios:
The above is thankfully not the only plausible scenario we can The above is thankfully not the only plausible scenario we can
think of. One of the many clubs of operators that meet regularly think of. One of the many clubs of operators that meet regularly
around the world might decide to act together to persuade a major around the world might decide to act together to persuade a major
operating system manufacturer to implement re-ECN. And they may operating system manufacturer to implement re-ECN. And they may
agree between them on an interconnection model that includes agree between them on an interconnection model that includes
congestion penalties. congestion penalties.
Re-ECN provides an interesting opportunity for device Re-ECN provides an interesting opportunity for device
manufacturers as well as network operators. Policers can be manufacturers as well as network operators. Policers can be
configured loosely when first deployed. Then as re-ECN take-up configured loosely when first deployed. Then as re-ECN take-up
increases, they can be tightened up, so that a network with re-ECN increases, they can be tightened up, so that a network with re-ECN
deployed can gradually squeeze down the service provided to legacy deployed can gradually squeeze down the service provided to
devices that have not upgraded to re-ECN. Many device vendors RFC3168 compliant devices that have not upgraded to re-ECN. Many
rely on replacement sales. And operating system companies rely device vendors rely on replacement sales. And operating system
heavily on new release sales. Also support services would like to companies rely heavily on new release sales. Also support
be able to force stragglers to upgrade. So, the ability to services would like to be able to force stragglers to upgrade.
throttle service to legacy operating systems is quite valuable. So, the ability to throttle service to RFC3168 compliant operating
systems is quite valuable.
Also, policing unresponsive sources may not be the only or even Also, policing unresponsive sources may not be the only or even
the first application that drives deployment. It may be policing the first application that drives deployment. It may be policing
causes of heavy congestion (e.g. peer-to-peer file-sharing). Or causes of heavy congestion (e.g. peer-to-peer file-sharing). Or
it may be mitigation of denial of service. Or we may be wrong in it may be mitigation of denial of service. Or we may be wrong in
thinking simpler QoS will not be the initial motivation for re-ECN thinking simpler QoS will not be the initial motivation for re-ECN
deployment. Indeed, the combined pressure for all these may be deployment. Indeed, the combined pressure for all these may be
the motivator, but it seems optimistic to expect such a level of the motivator, but it seems optimistic to expect such a level of
joined-up thinking from today's communications industry. We joined-up thinking from today's communications industry. We
believe a single application alone must be a sufficient motivator. believe a single application alone must be a sufficient motivator.
skipping to change at page 63, line 10 skipping to change at page 63, line 32
(policing) congestion control. But policing is only truly effective (policing) congestion control. But policing is only truly effective
at the first ingress into an internetwork, whereas path congestion at the first ingress into an internetwork, whereas path congestion
was previously only visible at the last egress. So, re-ECN was previously only visible at the last egress. So, re-ECN
democratises congestion information. Then the choice over who democratises congestion information. Then the choice over who
actually controls congestion can be made at run-time, not design actually controls congestion can be made at run-time, not design
time---a bit like an aircraft with dual controls. And different time---a bit like an aircraft with dual controls. And different
operators can make different choices. We believe non-architectural operators can make different choices. We believe non-architectural
approaches to this problem are unlikely to offer more than partial approaches to this problem are unlikely to offer more than partial
solutions (see Section 9). solutions (see Section 9).
Importantly, re-ECN does NOT REQUIRE assumptions about specific Importantly, re-ECN does not require assumptions about specific
congestion responses to be embedded in any network elements, except congestion responses to be embedded in any network elements, except
at the first ingress to the internetwork if that level of control is at the first ingress to the internetwork if that level of control is
desired by the ingress operator. But such tight policing will be a desired by the ingress operator. But such tight policing will be a
matter of agreement between the source and its access network matter of agreement between the source and its access network
operator. The ingress operator need not police congestion response operator. The ingress operator need not police congestion response
at flow granularity; it can simply hold a source responsible for the at flow granularity; it can simply hold a source responsible for the
aggregate congestion it causes, perhaps keeping it within a monthly aggregate congestion it causes, perhaps keeping it within a monthly
congestion quota. Or if the ingress network trusts the source, it congestion quota. Or if the ingress network trusts the source, it
can do nothing. can do nothing.
skipping to change at page 66, line 28 skipping to change at page 67, line 7
declare path congestion to the network and it can remove traffic at declare path congestion to the network and it can remove traffic at
the egress if this declaration is dishonest. So it can police the egress if this declaration is dishonest. So it can police
correctly, irrespective of whether the receiver tries to suppress correctly, irrespective of whether the receiver tries to suppress
congestion feedback or whether the sender ignores genuine congestion congestion feedback or whether the sender ignores genuine congestion
feedback. Therefore the re-ECN protocol addresses a much wider range feedback. Therefore the re-ECN protocol addresses a much wider range
of cheating problems, which includes the one addressed by the ECN of cheating problems, which includes the one addressed by the ECN
nonce. nonce.
9.3. Identifying Upstream and Downstream Congestion 9.3. Identifying Upstream and Downstream Congestion
Purple [Purple] proposes that routers should use the CWR flag in the Purple [Purple] proposes that queues should use the CWR flag in the
TCP header of ECN-capable flows to work out path congestion and TCP header of ECN-capable flows to work out path congestion and
therefore downstream congestion in a similar way to re-ECN. However, therefore downstream congestion in a similar way to re-ECN. However,
because CWR is in the transport layer, it is not always visible to because CWR is in the transport layer, it is not always visible to
network layer routers and policers. Purple's motivation was to network layer routers and policers. Purple's motivation was to
improve AQM, not policing. But, of course, nodes trying to avoid a improve AQM, not policing. But, of course, nodes trying to avoid a
policer would not be expected to allow CWR to be visible. policer would not be expected to allow CWR to be visible.
10. Security Considerations 10. Security Considerations
This whole memo concerns the deployment of a secure congestion This whole memo concerns the deployment of a secure congestion
skipping to change at page 68, line 31 skipping to change at page 69, line 9
11. IANA Considerations 11. IANA Considerations
This memo includes no request to IANA (yet). This memo includes no request to IANA (yet).
If this memo was to progress to standards track, it would list: If this memo was to progress to standards track, it would list:
o The new RE flag in IPv4 (Section 5.1) and its extension with the o The new RE flag in IPv4 (Section 5.1) and its extension with the
ECN field to create a new set of extended ECN (EECN) codepoints; ECN field to create a new set of extended ECN (EECN) codepoints;
o The definition of the EECN codepoints for default Diffserv PHBs o The definition of the EECN codepoints for default Diffserv PHBs
(Section 3.2) (Section 3.3)
o The new extension header for IPv6 (Section 5.2); o The new extension header for IPv6 (Section 5.2);
o The new combinations of flags in the TCP header for capability o The new combinations of flags in the TCP header for capability
negotiation (Section 4.1.3); negotiation (Section 4.1.3);
o The new ICMP message type (Section 5.5.1). o The new ICMP message type (Section 5.5.1).
12. Conclusions 12. Conclusions
skipping to change at page 69, line 12 skipping to change at page 69, line 36
Soppera, David Songhurst, Peter Hovell, Louise Burness, Phil Eardley, Soppera, David Songhurst, Peter Hovell, Louise Burness, Phil Eardley,
Steve Rudkin, Marc Wennink, Fabrice Saffre, Cefn Hoile, Steve Wright, Steve Rudkin, Marc Wennink, Fabrice Saffre, Cefn Hoile, Steve Wright,
John Davey, Martin Koyabe, Carla Di Cairano-Gilfedder, Alexandru John Davey, Martin Koyabe, Carla Di Cairano-Gilfedder, Alexandru
Murgu, Nigel Geffen, Pete Willis, John Adams (BT), Sally Floyd Murgu, Nigel Geffen, Pete Willis, John Adams (BT), Sally Floyd
(ICIR), Joe Babiarz, Kwok Ho-Chan (Nortel), Stephen Hailes, Mark (ICIR), Joe Babiarz, Kwok Ho-Chan (Nortel), Stephen Hailes, Mark
Handley (who developed the attack with canceled packets), Adam Handley (who developed the attack with canceled packets), Adam
Greenhalgh (who developed the attack on DNS) (UCL), Jon Crowcroft Greenhalgh (who developed the attack on DNS) (UCL), Jon Crowcroft
(Uni Cam), David Clark, Bill Lehr, Sharon Gillett, Steve Bauer (who (Uni Cam), David Clark, Bill Lehr, Sharon Gillett, Steve Bauer (who
complemented our own dummy traffic attacks with others), Liz Maida complemented our own dummy traffic attacks with others), Liz Maida
(MIT), and comments from participants in the CRN/CFP Broadband and (MIT), and comments from participants in the CRN/CFP Broadband and
DoS-resistant Internet working groups. DoS-resistant Internet working groups.A special thank you to
Alessandro Salvatori for coming up with fiendish attacks on re-ECN.
14. Comments Solicited 14. Comments Solicited
Comments and questions are encouraged and very welcome. They can be Comments and questions are encouraged and very welcome. They can be
addressed to the IETF Transport Area working group's mailing list addressed to the IETF Transport Area working group's mailing list
<tsvwg@ietf.org>, and/or to the authors. <tsvwg@ietf.org>, and/or to the authors.
15. References 15. References
15.1. Normative References 15.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
S., Wroclawski, J., and L. Zhang, "Recommendations on
Queue Management and Congestion Avoidance in the
Internet", RFC 2309, April 1998.
[RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion [RFC2581] Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
Control", RFC 2581, April 1999. Control", RFC 2581, April 1999.
[RFC2960] Stewart, R., Xie, Q., Morneault, K., Sharp, C.,
Schwarzbauer, H., Taylor, T., Rytina, I., Kalla, M.,
Zhang, L., and V. Paxson, "Stream Control Transmission
Protocol", RFC 2960, October 2000.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", of Explicit Congestion Notification (ECN) to IP",
RFC 3168, September 2001. RFC 3168, September 2001.
[RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's [RFC3390] Allman, M., Floyd, S., and C. Partridge, "Increasing TCP's
Initial Window", RFC 3390, October 2002. Initial Window", RFC 3390, October 2002.
[RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram [RFC4340] Kohler, E., Handley, M., and S. Floyd, "Datagram
Congestion Control Protocol (DCCP)", RFC 4340, March 2006. Congestion Control Protocol (DCCP)", RFC 4340, March 2006.
[RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion [RFC4341] Floyd, S. and E. Kohler, "Profile for Datagram Congestion
Control Protocol (DCCP) Congestion Control ID 2: TCP-like Control Protocol (DCCP) Congestion Control ID 2: TCP-like
Congestion Control", RFC 4341, March 2006. Congestion Control", RFC 4341, March 2006.
[RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for [RFC4342] Floyd, S., Kohler, E., and J. Padhye, "Profile for
Datagram Congestion Control Protocol (DCCP) Congestion Datagram Congestion Control Protocol (DCCP) Congestion
Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342, Control ID 3: TCP-Friendly Rate Control (TFRC)", RFC 4342,
March 2006. March 2006.
[RFC4960] Stewart, R., "Stream Control Transmission Protocol",
RFC 4960, September 2007.
15.2. Informative References 15.2. Informative References
[ARI05] Adams, J., Roberts, L., and A. IJsselmuiden, "Changing the [ARI05] Adams, J., Roberts, L., and A. IJsselmuiden, "Changing the
Internet to Support Real-Time Content Supply from a Large Internet to Support Real-Time Content Supply from a Large
Fraction of Broadband Residential Users", BT Technology Fraction of Broadband Residential Users", BT Technology
Journal (BTTJ) 23(2), April 2005. Journal (BTTJ) 23(2), April 2005.
[Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the [Bauer06] Bauer, S., Faratin, P., and R. Beverly, "Assessing the
assumptions underlying mechanism design for the Internet", assumptions underlying mechanism design for the Internet",
Proc. Workshop on the Economics of Networked Systems Proc. Workshop on the Economics of Networked Systems
skipping to change at page 70, line 39 skipping to change at page 71, line 11
Salvatori, A., "Closed Loop Traffic Policing", Politecnico Salvatori, A., "Closed Loop Traffic Policing", Politecnico
Torino and Institut Eurecom Masters Thesis , Torino and Institut Eurecom Masters Thesis ,
September 2005. September 2005.
[ECN-Deploy] [ECN-Deploy]
Floyd, S., "ECN (Explicit Congestion Notification) in Floyd, S., "ECN (Explicit Congestion Notification) in
TCP/IP; Implementation and Deployment of ECN", Web-page , TCP/IP; Implementation and Deployment of ECN", Web-page ,
May 2004, May 2004,
<http://www.icir.org/floyd/ecn.html#implementations>. <http://www.icir.org/floyd/ecn.html#implementations>.
[ECN-MPLS]
Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion
Marking in MPLS", draft-ietf-tsvwg-ecn-mpls-01 (work in
progress), June 2007.
[ECN-tunnel] [ECN-tunnel]
Briscoe, B., "Layered Encapsulation of Congestion Briscoe, B., "Layered Encapsulation of Congestion
Notification", draft-briscoe-tsvwg-ecn-tunnel-00 (work in Notification", draft-briscoe-tsvwg-ecn-tunnel-00 (work in
progress), June 2007. progress), June 2007.
[Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the
evolution of congestion control", Automatica 35(12)1969-- evolution of congestion control", Automatica 35(12)1969--
1985, December 1999, 1985, December 1999,
<http://www.statslab.cam.ac.uk/~frank/evol.html>. <http://www.statslab.cam.ac.uk/~frank/evol.html>.
[I-D.ietf-tcpm-ecnsyn] [I-D.ietf-tcpm-ecnsyn]
Kuzmanovic, A., "Adding Explicit Congestion Notification Kuzmanovic, A., "Adding Explicit Congestion Notification
(ECN) Capability to TCP's SYN/ACK Packets", (ECN) Capability to TCP's SYN/ACK Packets",
draft-ietf-tcpm-ecnsyn-03 (work in progress), draft-ietf-tcpm-ecnsyn-05 (work in progress),
November 2007. February 2008.
[I-D.moncaster-tcpm-rcv-cheat] [I-D.moncaster-tcpm-rcv-cheat]
Moncaster, T., "A TCP Test to Allow Senders to Identify Moncaster, T., "A TCP Test to Allow Senders to Identify
Receiver Non-Compliance", Receiver Non-Compliance",
draft-moncaster-tcpm-rcv-cheat-02 (work in progress), draft-moncaster-tcpm-rcv-cheat-02 (work in progress),
November 2007. November 2007.
[ITU-T.I.371] [ITU-T.I.371]
ITU-T, "Traffic Control and Congestion Control in ITU-T, "Traffic Control and Congestion Control in
{B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004. {B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004.
skipping to change at page 71, line 36 skipping to change at page 71, line 51
[Mathis97] [Mathis97]
Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The Mathis, M., Semke, J., Mahdavi, J., and T. Ott, "The
Macroscopic Behavior of the TCP Congestion Avoidance Macroscopic Behavior of the TCP Congestion Avoidance
Algorithm", ACM SIGCOMM CCR 27(3)67--82, July 1997, Algorithm", ACM SIGCOMM CCR 27(3)67--82, July 1997,
<http://doi.acm.org/10.1145/263932.264023>. <http://doi.acm.org/10.1145/263932.264023>.
[PCN-arch] [PCN-arch]
Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R., Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R.,
Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion
Notification Architecture", Notification Architecture", draft-ietf-pcn-architecture-03
draft-eardley-pcn-architecture-00 (work in progress), (work in progress), February 2008.
June 2007.
[Purple] Pletka, R., Waldvogel, M., and S. Mannal, "PURPLE: [Purple] Pletka, R., Waldvogel, M., and S. Mannal, "PURPLE:
Predictive Active Queue Management Utilizing Congestion Predictive Active Queue Management Utilizing Congestion
Information", Proc. Local Computer Networks (LCN 2003) , Information", Proc. Local Computer Networks (LCN 2003) ,
October 2003. October 2003.
[RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell, [RFC2208] Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell,
M., Romanow, A., Weinrib, A., and L. Zhang, "Resource M., Romanow, A., Weinrib, A., and L. Zhang, "Resource
ReSerVation Protocol (RSVP) Version 1 Applicability ReSerVation Protocol (RSVP) Version 1 Applicability
Statement Some Guidelines on Deployment", RFC 2208, Statement Some Guidelines on Deployment", RFC 2208,
September 1997. September 1997.
[RFC2402] Kent, S. and R. Atkinson, "IP Authentication Header", [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering,
RFC 2402, November 1998. S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G.,
Partridge, C., Peterson, L., Ramakrishnan, K., Shenker,
[RFC2406] Kent, S. and R. Atkinson, "IP Encapsulating Security S., Wroclawski, J., and L. Zhang, "Recommendations on
Payload (ESP)", RFC 2406, November 1998. Queue Management and Congestion Avoidance in the
Internet", RFC 2309, April 1998.
[RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z., [RFC2475] Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
and W. Weiss, "An Architecture for Differentiated and W. Weiss, "An Architecture for Differentiated
Services", RFC 2475, December 1998. Services", RFC 2475, December 1998.
[RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission [RFC2988] Paxson, V. and M. Allman, "Computing TCP's Retransmission
Timer", RFC 2988, November 2000. Timer", RFC 2988, November 2000.
[RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager", [RFC3124] Balakrishnan, H. and S. Seshan, "The Congestion Manager",
RFC 3124, June 2001. RFC 3124, June 2001.
skipping to change at page 72, line 33 skipping to change at page 72, line 47
Congestion Notification (ECN) Signaling with Nonces", Congestion Notification (ECN) Signaling with Nonces",
RFC 3540, June 2003. RFC 3540, June 2003.
[RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion
Control for Voice Traffic in the Internet", RFC 3714, Control for Voice Traffic in the Internet", RFC 3714,
March 2004. March 2004.
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the [RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005. Internet Protocol", RFC 4301, December 2005.
[RFC4302] Kent, S., "IP Authentication Header", RFC 4302,
December 2005.
[RFC4305] Eastlake, D., "Cryptographic Algorithm Implementation
Requirements for Encapsulating Security Payload (ESP) and
Authentication Header (AH)", RFC 4305, December 2005.
[RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion
Marking in MPLS", RFC 5129, January 2008.
[Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN [Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN
on Bulk Data", draft-briscoe-re-pcn-border-cheat-00 (work on Bulk Data", draft-briscoe-re-pcn-border-cheat-01 (work
in progress), July 2007. in progress), February 2008.
[Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C.,
Salvatori, A., Soppera, A., and M. Koyabe, "Policing Salvatori, A., Soppera, A., and M. Koyabe, "Policing
Congestion Response in an Internetwork Using Re-Feedback", Congestion Response in an Internetwork Using Re-Feedback",
ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// ACM SIGCOMM CCR 35(4)277--288, August 2005, <http://
www.acm.org/sigs/sigcomm/sigcomm2005/ www.acm.org/sigs/sigcomm/sigcomm2005/
techprog.html#session8>. techprog.html#session8>.
[Savage99] [Savage99]
Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
skipping to change at page 73, line 36 skipping to change at page 74, line 10
[pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End
Congestion Control in the Internet", IEEE/ACM Transactions Congestion Control in the Internet", IEEE/ACM Transactions
on Networking 7(4) 458--472, August 1999, on Networking 7(4) 458--472, August 1999,
<http://www.aciri.org/floyd/end2end-paper.html>. <http://www.aciri.org/floyd/end2end-paper.html>.
Appendix A. Precise Re-ECN Protocol Operation Appendix A. Precise Re-ECN Protocol Operation
{ToDo: fix this} {ToDo: fix this}
The protocol operation in the middle described in Section 3.3 was an The protocol operation in the middle described in Section 3.4 was an
approximation. In fact, standard ECN router marking combines 1% and approximation. In fact, standard ECN router marking combines 1% and
2% marking into slightly less than 3% whole-path marking, because 2% marking into slightly less than 3% whole-path marking, because
routers deliberately mark CE whether or not it has already been routers deliberately mark CE whether or not it has already been
marked by another router upstream. So the combined marking fraction marked by another router upstream. So the combined marking fraction
would actually be 100% - (100% - 1%)(100% - 2%) = 2.98%. would actually be 100% - (100% - 1%)(100% - 2%) = 2.98%.
To generalise this we will need some notation. To generalise this we will need some notation.
o j represents the index of each resource (typically queues) along a o j represents the index of each resource (typically queues) along a
path, ranging from 0 at the first router to n-1 at the last. path, ranging from 0 at the first router to n-1 at the last.
skipping to change at page 74, line 37 skipping to change at page 75, line 12
p_0 = u_n p_0 = u_n
= 1 - (1 - m_1)(1 - m_2)... = 1 - (1 - m_1)(1 - m_2)...
Similarly, at some point j in the middle of the network, if p = 1 - Similarly, at some point j in the middle of the network, if p = 1 -
(1 - u_j)(1 - v_j), then (1 - u_j)(1 - v_j), then
v_j = 1 - (1 - p)/(1 - u_j) v_j = 1 - (1 - p)/(1 - u_j)
~= p - u_j; if u_j << 100% ~= p - u_j; if u_j << 100%
So, between the two routers in the example in Section 3.3, congestion So, between the two routers in the example in Section 3.4, congestion
downstream is downstream is
v_1 = 100.00% - (100% - 2.98%) / (100% - 1.00%) v_1 = 100.00% - (100% - 2.98%) / (100% - 1.00%)
= 2.00%, = 2.00%,
or a useful approximation of downstream congestion is or a useful approximation of downstream congestion is
v_1 ~= 2.98% - 1.00% v_1 ~= 2.98% - 1.00%
~= 1.98%. ~= 1.98%.
Appendix B. Justification for Two Codepoints Signifying Zero Worth Appendix B. Justification for Two Codepoints Signifying Zero Worth
Packets Packets
It may seem a waste of a codepoint to set aside two codepoints of the It may seem a waste of a codepoint to set aside two codepoints of the
Extended ECN field to signify zero worth (RECT and CE(0) are both Extended ECN field to signify zero worth (RECT and CE(0) are both
worth zero). The justification is subtle, but worth recording. worth zero). The justification is subtle, but worth recording.
The original version of re-ECN ([Re-fb] and draft-00 of this memo) The original version of Re-ECN ([Re-fb] and draft-00 of this memo)
used three codepoints for neutral (ECT(1)), positive (ECT(0)) and used three codepoints for neutral (ECT(1)), positive (ECT(0)) and
negative (CE) packets. The sender set packets to neutral unless re- negative (CE) packets. The sender set packets to neutral unless re-
echoing congestion, when it set them positive, in much the same way echoing congestion, when it set them positive, in much the same way
that it blanks the RE flag in the current protocol. However, routers that it blanks the RE flag in the current protocol. However, routers
were meant to mark congestion by setting packets negative (CE) were meant to mark congestion by setting packets negative (CE)
irrespective of whether they had previously been neutral or positive. irrespective of whether they had previously been neutral or positive.
However, we did not arrange for senders to remember which packet had However, we did not arrange for senders to remember which packet had
been sent with which codepoint, or for feedback to say exactly which been sent with which codepoint, or for feedback to say exactly which
packets arrived with which codepoints. The transport was meant to packets arrived with which codepoints. The transport was meant to
inflate the number of positive packets it sent to allow for a few inflate the number of positive packets it sent to allow for a few
being wiped out by congestion marking. We (wrongly) assumed that being wiped out by congestion marking. We (wrongly) assumed that
routers would congestion mark packets indiscriminately, so the routers would congestion mark packets indiscriminately, so the
transport could infer how many positive packets had been marked and transport could infer how many positive packets had been marked and
compensate accordingly by re-echoing. But this created a perverse compensate accordingly by re-echoing. But this created a perverse
incentive for routers to preferentially congestion mark positive incentive for routers to preferentially congestion mark positive
packets rather than neutral ones. packets rather than neutral ones.
We could have removed this perverse incentive by requiring re-ECN We could have removed this perverse incentive by requiring Re-ECN
senders to remember which packets they had sent with which codepoint. senders to remember which packets they had sent with which codepoint.
And for feedback from the receiver to identify which packets arrived And for feedback from the receiver to identify which packets arrived
as which. Then, if a positive packet was congestion marked to as which. Then, if a positive packet was congestion marked to
negative, the sender could have re-echoed twice to maintain the negative, the sender could have re-echoed twice to maintain the
balance between positive and negative at the receiver. balance between positive and negative at the receiver.
Instead, we chose to make re-echoing congestion (blanking RE) Instead, we chose to make re-echoing congestion (blanking RE)
orthogonal to congestion notification (marking CE), which required a orthogonal to congestion notification (marking CE), which required a
second neutral codepoint (the orthogonal scheme forms the main square second neutral codepoint. Then the receiver would be able to detect
of four codepoints in Figure 2). Then the receiver would be able to and echo a congestion event even if it arrived on a packet that had
detect and echo a congestion event even if it arrived on a packet originally been positive.
that had originally been positive.
If we had added extra complexity to the sender and receiver If we had added extra complexity to the sender and receiver
transports to track changes to individual packets, we could have made transports to track changes to individual packets, we could have made
it work, but then routers would have had an incentive to mark it work, but then routers would have had an incentive to mark
positive packets with half the probability of neutral packets. That positive packets with half the probability of neutral packets. That
in turn would have led router algorithms to become more complex. in turn would have led router algorithms to become more complex.
Then senders wouldn't know whether a mark had been introduced by a Then senders wouldn't know whether a mark had been introduced by a
simple or a complex router algorithm. That in turn would have simple or a complex router algorithm. That in turn would have
required another codepoint to distinguish between legacy ECN and new required another codepoint to distinguish between RFC3168 ECN and new
re-ECN router marking. Re-ECN router marking.
Once the cost of IP header codepoint real-estate was the same for Once the cost of IP header codepoint real-estate was the same for
both schemes, there was no doubt that the simpler option for both schemes, there was no doubt that the simpler option for
endpoints and for routers should be chosen. The resulting protocol endpoints and for routers should be chosen. The resulting protocol
also no longer needed the tricky inflation/deflation complexity of also no longer needed the tricky inflation/deflation complexity of
the original (broken) scheme. It was also much simpler to understand the original (broken) scheme. It was also much simpler to understand
conceptually. conceptually.
A further advantage of the new orthogonal four-codepoint scheme was A further advantage of the new orthogonal four-codepoint scheme was
that senders owned sole rights to change the RE flag and routers that senders owned sole rights to change the RE flag and routers
skipping to change at page 76, line 29 skipping to change at page 77, line 5
using such redundant relationships can improve the security of a using such redundant relationships can improve the security of a
scheme (cf. double-entry book-keeping or the ECN Nonce). scheme (cf. double-entry book-keeping or the ECN Nonce).
Alternatively, it might be necessary to exploit the redundancy in the Alternatively, it might be necessary to exploit the redundancy in the
future to encode an extra information channel. future to encode an extra information channel.
Appendix C. ECN Compatibility Appendix C. ECN Compatibility
The rationale for choosing the particular combinations of SYN and SYN The rationale for choosing the particular combinations of SYN and SYN
ACK flags in Section 4.1.3 is as follows. ACK flags in Section 4.1.3 is as follows.
Choice of SYN flags: A re-ECN sender can work with vanilla ECN Choice of SYN flags: A Re-ECN sender can work with RFC3168 compliant
receivers so we wanted to use the same flags as would be used in ECN receivers so we wanted to use the same flags as would be used
an ECN-setup SYN [RFC3168] (CWR=1, ECE=1). But at the same time, in an ECN-setup SYN [RFC3168] (CWR=1, ECE=1). But at the same
we wanted a server (host B) that is Re-ECT to be able to recognise time, we wanted a server (host B) that is Re-ECT to be able to
that the client (A) is also Re-ECT. We believe also setting NS=1 recognise that the client (A) is also Re-ECT. We believe also
in the initial SYN achieves both these objectives, as it should be setting NS=1 in the initial SYN achieves both these objectives, as
ignored by vanilla ECT receivers and by ECT-Nonce receivers. But it should be ignored by RFC3168 compliant ECT receivers and by
senders that are not Re-ECT should not set NS=1. At the time ECN ECT-Nonce receivers. But senders that are not Re-ECT should not
was defined, the NS flag was not defined, so setting NS=1 should set NS=1. At the time ECN was defined, the NS flag was not
be ignored by existing ECT receivers (but testing against defined, so setting NS=1 should be ignored by existing ECT
implementations may yet prove otherwise). The ECN Nonce receivers (but testing against implementations may yet prove
RFC [RFC3540] is silent on what the NS field might be set to in otherwise). The ECN Nonce RFC [RFC3540] is silent on what the NS
the TCP SYN, but we believe the intent was for a nonce client to field might be set to in the TCP SYN, but we believe the intent
set NS=0 in the initial SYN (again only testing will tell). was for a nonce client to set NS=0 in the initial SYN (again only
Therefore we define a Re-ECN-setup SYN as one with NS=1, CWR=1 & testing will tell). Therefore we define a Re-ECN-setup SYN as one
ECE=1 with NS=1, CWR=1 & ECE=1
Choice of SYN ACK flags: Choice of SYN ACK: The client (A) needs to Choice of SYN ACK flags: Choice of SYN ACK: The client (A) needs to
be able to determine whether the server (B) is Re-ECT. The be able to determine whether the server (B) is Re-ECT. The
original ECN specification required an ECT server to respond to an original ECN specification required an ECT server to respond to an
ECN-setup SYN with an ECN-setup SYN ACK of CWR=0 and ECE=1. There ECN-setup SYN with an ECN-setup SYN ACK of CWR=0 and ECE=1. There
is no room to modify this by setting the NS flag, as that is is no room to modify this by setting the NS flag, as that is
already set in the SYN ACK of an ECT-Nonce server. So we used the already set in the SYN ACK of an ECT-Nonce server. So we used the
only combination of CWR and ECE that would not be used by existing only combination of CWR and ECE that would not be used by existing
TCP receivers: CWR=1 and ECE=0. The original ECN specification TCP receivers: CWR=1 and ECE=0. The original ECN specification
defines this combination as a non-ECN-setup SYN ACK, which remains defines this combination as a non-ECN-setup SYN ACK, which remains
true for vanilla and Nonce ECTs. But for re-ECN we define it as a true for RFC3168 compliant and Nonce ECTs. But for Re-ECN we
Re-ECN-setup SYN ACK. We didn't use a SYN ACK with both CWR and define it as a Re-ECN-setup SYN ACK. We didn't use a SYN ACK with
ECE cleared to 0 because that would be the likely response from both CWR and ECE cleared to 0 because that would be the likely
most Not-ECT receivers. And we didn't use a SYN ACK with both CWR response from most Not-ECT receivers. And we didn't use a SYN ACK
and ECE set to 1 either, as at least one broken receiver with both CWR and ECE set to 1 either, as at least one broken
implementation echoes whatever flags were in the SYN into its SYN receiver implementation echoes whatever flags were in the SYN into
ACK. Therefore we define a Re-ECN-setup SYN ACK as one with CWR=1 its SYN ACK. Therefore we define a Re-ECN-setup SYN ACK as one
& ECE=0. with CWR=1 & ECE=0.
Choice of two alternative SYN ACKs: the NS flag may take either Choice of two alternative SYN ACKs: the NS flag may take either
value in a Re-ECN-setup SYN ACK. Section 5.4 REQUIRES that a Re- value in a Re-ECN-setup SYN ACK. Section 5.4 REQUIRES that a Re-
ECT server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to ECT server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to
echo congestion experienced (CE) on the initial SYN. Otherwise a echo congestion experienced (CE) on the initial SYN. Otherwise a
Re-ECN-setup SYN ACK MUST be returned with NS=0. The only current Re-ECN-setup SYN ACK MUST be returned with NS=0. The only current
known use of the NS flag in a SYN ACK is to indicate support for known use of the NS flag in a SYN ACK is to indicate support for
the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1. the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1.
Given the ECN nonce MUST NOT be used for a RECN mode connection, a Given the ECN nonce MUST NOT be used for a RECN mode connection, a
Re-ECN-setup SYN ACK can use either setting of the NS flag without Re-ECN-setup SYN ACK can use either setting of the NS flag without
skipping to change at page 80, line 13 skipping to change at page 80, line 36
original intent in the early days of the Internet). original intent in the early days of the Internet).
In the longer term, precision could be improved if routers In the longer term, precision could be improved if routers
decremented TTL to represent exact propagation delay to the next decremented TTL to represent exact propagation delay to the next
router. That is, for a router to decrement TTL by, say, 1.8 time router. That is, for a router to decrement TTL by, say, 1.8 time
units it would alternate the decrement of every packet between 1 & 2 units it would alternate the decrement of every packet between 1 & 2
at a ratio of 1:4. Although this might sometimes require a seemingly at a ratio of 1:4. Although this might sometimes require a seemingly
dangerous null decrement, a packet in a loop would still decrement to dangerous null decrement, a packet in a loop would still decrement to
zero after 255 time units on average. As more routers were upgraded zero after 255 time units on average. As more routers were upgraded
to this more accurate TTL decrement, path delay estimates would to this more accurate TTL decrement, path delay estimates would
become increasingly accurate despite the presence of some legacy become increasingly accurate despite the presence of some RFC3168
routers that continued to always decrement the TTL by 1. compliant routers that continued to always decrement the TTL by 1.
Appendix G. Policer Designs to ensure Congestion Responsiveness Appendix G. Policer Designs to ensure Congestion Responsiveness
G.1. Per-user Policing G.1. Per-user Policing
User policing requires a policer on the ingress interface of the User policing requires a policer on the ingress interface of the
access router associated with the user. At that point, the traffic access router associated with the user. At that point, the traffic
of the user hasn't diverged on different routes yet; nor has it mixed of the user hasn't diverged on different routes yet; nor has it mixed
with traffic from other sources. with traffic from other sources.
skipping to change at page 84, line 30 skipping to change at page 85, line 8
V_b: accumulated congestion volume V_b: accumulated congestion volume
B: total data volume (in case it is needed) B: total data volume (in case it is needed)
A suitable pseudo-code algorithm for a border router is as follows: A suitable pseudo-code algorithm for a border router is as follows:
==================================================================== ====================================================================
V_b = 0 V_b = 0
B = 0 B = 0
for each re-ECN-capable packet { for each Re-ECN-capable packet {
b = readLength(packet) /* set b to packet size */ b = readLength(packet) /* set b to packet size */
B += b /* accumulate total volume */ B += b /* accumulate total volume */
if readEECN(packet) == (Re-Echo || FNE) { if readEECN(packet) == (Re-Echo || FNE) {
V_b += b /* increment... */ V_b += b /* increment... */
} elseif readEECN(packet) == CE(-1) { } elseif readEECN(packet) == CE(-1) {
V_b -= b /* ...or decrement V_b... */ V_b -= b /* ...or decrement V_b... */
} /*...depending on EECN field */ } /*...depending on EECN field */
} }
==================================================================== ====================================================================
skipping to change at page 87, line 13 skipping to change at page 87, line 36
sending transports (e.g. large servers) want to allocate their /own/ sending transports (e.g. large servers) want to allocate their /own/
resources in proportion to the rates that each network path can resources in proportion to the rates that each network path can
sustain, based on congestion control. In that case, the nonce allows sustain, based on congestion control. In that case, the nonce allows
senders to be assured that they aren't being duped into giving more senders to be assured that they aren't being duped into giving more
of their own resources to a particular flow. And if congestion of their own resources to a particular flow. And if congestion
suppression is detected, the sending transport can rate limit the suppression is detected, the sending transport can rate limit the
offending connection to protect its own resources. Certainly, this offending connection to protect its own resources. Certainly, this
is a useful function, but the IETF should carefully decide whether is a useful function, but the IETF should carefully decide whether
such a single, very specific case warrants IP header space. such a single, very specific case warrants IP header space.
In contrast, re-ECN allows all routers to fully protect themselves In contrast, Re-ECN allows all routers to fully protect themselves
from such attacks, without having to trust anyone - senders, from such attacks, without having to trust anyone - senders,
receivers, neighbouring networks. Re-ECN is therefore proposed in receivers, neighbouring networks. Re-ECN is therefore proposed in
preference to the ECN nonce on the basis that it addresses the preference to the ECN nonce on the basis that it addresses the
generic problem of accountability for congestion of a network's generic problem of accountability for congestion of a network's
resources at the IP layer. resources at the IP layer.
Delaying the ECN nonce is justified because the applicability of the Delaying the ECN nonce is justified because the applicability of the
ECN nonce seems too limited for it to consume a two-bit codepoint in ECN nonce seems too limited for it to consume a two-bit codepoint in
the IP header. It therefore seems prudent to give time for an the IP header. It therefore seems prudent to give time for an
alternative way to be found to do the one function the nonce is alternative way to be found to do the one function the nonce is
essential for. essential for.
Moreover, while we have re-designed the re-ECN codepoints so that Moreover, while we have re-designed the Re-ECN codepoints so that
they do not prevent the ECN nonce progressing, the same is not true they do not prevent the ECN nonce progressing, the same is not true
the other way round. If the ECN nonce started to see some deployment the other way round. If the ECN nonce started to see some deployment
(perhaps because it was blessed with proposed standard status), (perhaps because it was blessed with proposed standard status),
incremental deployment of re-ECN would effectively be impossible, incremental deployment of Re-ECN would effectively be impossible,
because re-ECN marking fractions at inter-domain borders would be because Re-ECN marking fractions at inter-domain borders would be
polluted by unknown levels of nonce traffic. polluted by unknown levels of nonce traffic.
The authors are aware that re-ECN must prove it has the potential it The authors are aware that Re-ECN must prove it has the potential it
claims if it is to displace the nonce. Therefore, every effort has claims if it is to displace the nonce. Therefore, every effort has
been made to complete a comprehensive specification of re-ECN so that been made to complete a comprehensive specification of Re-ECN so that
its potential can be assessed. We therefore seek the opinion of the its potential can be assessed. We therefore seek the opinion of the
Internet community on whether the re-ECN protocol is sufficiently Internet community on whether the Re-ECN protocol is sufficiently
useful to warrant standards action. useful to warrant standards action.
Authors' Addresses Authors' Addresses
Bob Briscoe Bob Briscoe
BT & UCL BT & UCL
B54/77, Adastral Park B54/77, Adastral Park
Martlesham Heath Martlesham Heath
Ipswich IP5 3RE Ipswich IP5 3RE
UK UK
 End of changes. 174 change blocks. 
592 lines changed or deleted 577 lines changed or added

This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/