draft-briscoe-tsvwg-re-ecn-tcp-04.txt   draft-briscoe-tsvwg-re-ecn-tcp-05.txt 
Transport Area Working Group B. Briscoe Transport Area Working Group B. Briscoe
Internet-Draft BT & UCL Internet-Draft BT & UCL
Intended status: Standards Track A. Jacquet Intended status: Standards Track A. Jacquet
Expires: January 10, 2008 A. Salvatori Expires: July 13, 2008 T. Moncaster
M. Koyabe A. Smith
T. Moncaster
BT BT
July 09, 2007 January 10, 2008
Re-ECN: Adding Accountability for Causing Congestion to TCP/IP Re-ECN: Adding Accountability for Causing Congestion to TCP/IP
draft-briscoe-tsvwg-re-ecn-tcp-04 draft-briscoe-tsvwg-re-ecn-tcp-05
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 38 skipping to change at page 1, line 37
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt. http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet-Draft will expire on January 10, 2008. This Internet-Draft will expire on July 13, 2008.
Copyright Notice Copyright Notice
Copyright (C) The IETF Trust (2007). Copyright (C) The IETF Trust (2008).
Abstract Abstract
This document introduces a new protocol for explicit congestion This document introduces a new protocol for explicit congestion
notification (ECN), termed re-ECN, which can be deployed notification (ECN), termed re-ECN, which can be deployed
incrementally around unmodified routers. The protocol arranges an incrementally around unmodified routers. The protocol arranges an
extended ECN field in each packet so that, as it crosses any extended ECN field in each packet so that, as it crosses any
interface in an internetwork, it will carry a truthful prediction of interface in an internetwork, it will carry a truthful prediction of
congestion on the remainder of its path. Then the upstream party at congestion on the remainder of its path. Then the upstream party at
any trust boundary in the internetwork can be held responsible for any trust boundary in the internetwork can be held responsible for
skipping to change at page 2, line 33 skipping to change at page 2, line 32
reaching change to the Internet architecture, the most immediate reaching change to the Internet architecture, the most immediate
priority for the authors is to delay any move of the ECN nonce to priority for the authors is to delay any move of the ECN nonce to
Proposed Standard status. The argument for this position is Proposed Standard status. The argument for this position is
developed in Appendix I. developed in Appendix I.
Changes from previous drafts (to be removed by the RFC Editor) Changes from previous drafts (to be removed by the RFC Editor)
Full diffs created using the rfcdiff tool are available at Full diffs created using the rfcdiff tool are available at
<http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp> <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#retcp>
From -03 to -04 (current version): From -04 to -05 (current version):
Completed justification for packet marking with FNE during slow-
start(Appendix D).
Minor editorial changes throughout.
From -03 to -04:
Clarified reasons for holding back ECN nonce (Section 3.2 & Clarified reasons for holding back ECN nonce (Section 3.2 &
Appendix I). Appendix I).
Clarified Figure 1. Clarified Figure 1.
Added Section 4.1.1.1 on equivalence of drops and ECN marks. Added Section 4.1.1.1 on equivalence of drops and ECN marks.
Improved precision of Section 5.6 on IP in IP tunnels. Improved precision of Section 5.6 on IP in IP tunnels.
skipping to change at page 5, line 28 skipping to change at page 5, line 28
12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 68 12. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 68
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 68 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 68
14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 69 14. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 69
15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 69 15. References . . . . . . . . . . . . . . . . . . . . . . . . . . 69
15.1. Normative References . . . . . . . . . . . . . . . . . . . 69 15.1. Normative References . . . . . . . . . . . . . . . . . . . 69
15.2. Informative References . . . . . . . . . . . . . . . . . . 70 15.2. Informative References . . . . . . . . . . . . . . . . . . 70
Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 73 Appendix A. Precise Re-ECN Protocol Operation . . . . . . . . . . 73
Appendix B. Justification for Two Codepoints Signifying Zero Appendix B. Justification for Two Codepoints Signifying Zero
Worth Packets . . . . . . . . . . . . . . . . . . . . 74 Worth Packets . . . . . . . . . . . . . . . . . . . . 74
Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 76 Appendix C. ECN Compatibility . . . . . . . . . . . . . . . . . . 76
Appendix D. Packet Marking During Flow Start . . . . . . . . . . 77 Appendix D. Packet Marking with FNE During Flow Start . . . . . . 77
Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 77 Appendix E. Example Egress Dropper Algorithm . . . . . . . . . . 79
Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 77 Appendix F. Re-TTL . . . . . . . . . . . . . . . . . . . . . . . 79
Appendix G. Policer Designs to ensure Congestion Appendix G. Policer Designs to ensure Congestion
Responsiveness . . . . . . . . . . . . . . . . . . . 78 Responsiveness . . . . . . . . . . . . . . . . . . . 80
G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 78 G.1. Per-user Policing . . . . . . . . . . . . . . . . . . . . 80
G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 79 G.2. Per-flow Rate Policing . . . . . . . . . . . . . . . . . . 81
Appendix H. Downstream Congestion Metering Algorithms . . . . . . 82 Appendix H. Downstream Congestion Metering Algorithms . . . . . . 84
H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 82 H.1. Bulk Downstream Congestion Metering Algorithm . . . . . . 84
H.2. Inflation Factor for Persistently Negative Flows . . . . . 83 H.2. Inflation Factor for Persistently Negative Flows . . . . . 85
Appendix I. Argument for holding back the ECN nonce . . . . . . . 84 Appendix I. Argument for holding back the ECN nonce . . . . . . . 85
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 85 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 87
Intellectual Property and Copyright Statements . . . . . . . . . . 88 Intellectual Property and Copyright Statements . . . . . . . . . . 89
1. Introduction 1. Introduction
This document aims: This document aims:
o To provide a complete specification of the addition of the re-ECN o To provide a complete specification of the addition of the re-ECN
protocol to IP and guidelines on how to add it to transport layer protocol to IP and guidelines on how to add it to transport layer
protocols, including a complete specification of re-ECN in TCP as protocols, including a complete specification of re-ECN in TCP as
an example; an example;
skipping to change at page 70, line 47 skipping to change at page 70, line 47
<http://www.icir.org/floyd/ecn.html#implementations>. <http://www.icir.org/floyd/ecn.html#implementations>.
[ECN-MPLS] [ECN-MPLS]
Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion
Marking in MPLS", draft-ietf-tsvwg-ecn-mpls-01 (work in Marking in MPLS", draft-ietf-tsvwg-ecn-mpls-01 (work in
progress), June 2007. progress), June 2007.
[ECN-tunnel] [ECN-tunnel]
Briscoe, B., "Layered Encapsulation of Congestion Briscoe, B., "Layered Encapsulation of Congestion
Notification", draft-briscoe-tsvwg-ecn-tunnel-00 (work in Notification", draft-briscoe-tsvwg-ecn-tunnel-00 (work in
progress), July 2007. progress), June 2007.
[Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the
evolution of congestion control", Automatica 35(12)1969-- evolution of congestion control", Automatica 35(12)1969--
1985, December 1999, 1985, December 1999,
<http://www.statslab.cam.ac.uk/~frank/evol.html>. <http://www.statslab.cam.ac.uk/~frank/evol.html>.
[I-D.ietf-tcpm-ecnsyn] [I-D.ietf-tcpm-ecnsyn]
Kuzmanovic, A., "Adding Explicit Congestion Notification Kuzmanovic, A., "Adding Explicit Congestion Notification
(ECN) Capability to TCP's SYN/ACK Packets", (ECN) Capability to TCP's SYN/ACK Packets",
draft-ietf-tcpm-ecnsyn-01 (work in progress), draft-ietf-tcpm-ecnsyn-03 (work in progress),
October 2006. November 2007.
[I-D.moncaster-tcpm-rcv-cheat] [I-D.moncaster-tcpm-rcv-cheat]
Moncaster, T., "A TCP Test to Allow Senders to Identify Moncaster, T., "A TCP Test to Allow Senders to Identify
Receiver Non-Compliance", Receiver Non-Compliance",
draft-moncaster-tcpm-rcv-cheat-01 (work in progress), draft-moncaster-tcpm-rcv-cheat-02 (work in progress),
June 2007. November 2007.
[ITU-T.I.371] [ITU-T.I.371]
ITU-T, "Traffic Control and Congestion Control in ITU-T, "Traffic Control and Congestion Control in
{B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004. {B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004.
[Jiang02] Jiang, H. and D. Dovrolis, "The Macroscopic Behavior of [Jiang02] Jiang, H. and D. Dovrolis, "The Macroscopic Behavior of
the TCP Congestion Avoidance Algorithm", ACM SIGCOMM the TCP Congestion Avoidance Algorithm", ACM SIGCOMM
CCR 32(3)75-88, July 2002, CCR 32(3)75-88, July 2002,
<http://doi.acm.org/10.1145/571697.571725>. <http://doi.acm.org/10.1145/571697.571725>.
skipping to change at page 72, line 34 skipping to change at page 72, line 34
RFC 3540, June 2003. RFC 3540, June 2003.
[RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion
Control for Voice Traffic in the Internet", RFC 3714, Control for Voice Traffic in the Internet", RFC 3714,
March 2004. March 2004.
[RFC4301] Kent, S. and K. Seo, "Security Architecture for the [RFC4301] Kent, S. and K. Seo, "Security Architecture for the
Internet Protocol", RFC 4301, December 2005. Internet Protocol", RFC 4301, December 2005.
[Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN [Re-PCN] Briscoe, B., "Emulating Border Flow Policing using Re-ECN
on Bulk Data", draft-briscoe-tsvwg-re-ecn-border-cheat-01 on Bulk Data", draft-briscoe-re-pcn-border-cheat-00 (work
(work in progress), March 2006. in progress), July 2007.
[Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., [Re-fb] Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C.,
Salvatori, A., Soppera, A., and M. Koyabe, "Policing Salvatori, A., Soppera, A., and M. Koyabe, "Policing
Congestion Response in an Internetwork Using Re-Feedback", Congestion Response in an Internetwork Using Re-Feedback",
ACM SIGCOMM CCR 35(4)277--288, August 2005, <http:// ACM SIGCOMM CCR 35(4)277--288, August 2005, <http://
www.acm.org/sigs/sigcomm/sigcomm2005/ www.acm.org/sigs/sigcomm/sigcomm2005/
techprog.html#session8>. techprog.html#session8>.
[Savage99] [Savage99]
Savage, S., Cardwell, N., Wetherall, D., and T. Anderson, Savage, S., Cardwell, N., Wetherall, D., and T. Anderson,
skipping to change at page 73, line 36 skipping to change at page 73, line 36
[pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End
Congestion Control in the Internet", IEEE/ACM Transactions Congestion Control in the Internet", IEEE/ACM Transactions
on Networking 7(4) 458--472, August 1999, on Networking 7(4) 458--472, August 1999,
<http://www.aciri.org/floyd/end2end-paper.html>. <http://www.aciri.org/floyd/end2end-paper.html>.
Appendix A. Precise Re-ECN Protocol Operation Appendix A. Precise Re-ECN Protocol Operation
{ToDo: fix this} {ToDo: fix this}
The protocol operation described in Section 3.3 was an approximation. The protocol operation in the middle described in Section 3.3 was an
In fact, standard ECN router marking combines 1% and 2% marking into approximation. In fact, standard ECN router marking combines 1% and
slightly less than 3% whole-path marking, because routers 2% marking into slightly less than 3% whole-path marking, because
deliberately mark CE whether or not it has already been marked by routers deliberately mark CE whether or not it has already been
another router upstream. So the combined marking fraction would marked by another router upstream. So the combined marking fraction
actually be 100% - (100% - 1%)(100% - 2%) = 2.98%. would actually be 100% - (100% - 1%)(100% - 2%) = 2.98%.
To generalise this we will need some notation. To generalise this we will need some notation.
o j represents the index of each resource (typically queues) along a o j represents the index of each resource (typically queues) along a
path, ranging from 0 at the first router to n-1 at the last. path, ranging from 0 at the first router to n-1 at the last.
o m_j represents the fraction of octets *m*arked CE by a particular o m_j represents the fraction of octets *m*arked CE by a particular
router (whether or not they are already marked) because of router (whether or not they are already marked) because of
congestion of resource j. congestion of resource j.
skipping to change at page 77, line 29 skipping to change at page 77, line 29
ECT server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to ECT server MUST set the NS flag to 1 in a Re-ECN-setup SYN ACK to
echo congestion experienced (CE) on the initial SYN. Otherwise a echo congestion experienced (CE) on the initial SYN. Otherwise a
Re-ECN-setup SYN ACK MUST be returned with NS=0. The only current Re-ECN-setup SYN ACK MUST be returned with NS=0. The only current
known use of the NS flag in a SYN ACK is to indicate support for known use of the NS flag in a SYN ACK is to indicate support for
the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1. the ECN nonce, which will be negotiated by setting CWR=0 & ECE=1.
Given the ECN nonce MUST NOT be used for a RECN mode connection, a Given the ECN nonce MUST NOT be used for a RECN mode connection, a
Re-ECN-setup SYN ACK can use either setting of the NS flag without Re-ECN-setup SYN ACK can use either setting of the NS flag without
any risk of confusion, because the CWR & ECE flags will be any risk of confusion, because the CWR & ECE flags will be
reversed relative to those used by an ECN nonce SYN ACK. reversed relative to those used by an ECN nonce SYN ACK.
Appendix D. Packet Marking During Flow Start Appendix D. Packet Marking with FNE During Flow Start
{ToDo: Write up proof that sender should mark FNE on first and third FNE (feedback not established) packets have two functions. Their
data packets, even with the largest allowed initial window.} main role is to announce the start of a new flow when feedback has
not yet been established. However they also have the role of
balancing the expected feedback and can be used where there are
sudden changes in the rate of transmission. Whilst this should not
happen under TCP their use as speculative marking is used in building
the following argument as to why the first and third packets should
be set to FNE.
The proportion of FNE packets in each roundtrip should be a high
estimate of the potential error in the balance of number of
congestion marked packets versus number of re-echo packets already
issued.
Let's call:
S: the number of the TCP segments sent so far
F: the number of FNE packets sent so far
R: the number of Re-Echo packets sent so far
A: the number of acknowledgments received so far
C: the number of acknowledgments echoing a CE packet
In normal operation, when we want to send packet S+1, we first need
to check that enough Re-Echo packets have been issued:
If R<C, then S+1 will be a Re-echo packet
Next we need to estimate the amount of congestion observed so far.
If congestion was stationary, it could be estimated as C/A. A
pessimistic bound is (C+1)/(A+1) which assumes that the next
acknowledgment will echo a CE packet; we'll use that more pessimistic
estimate to drive the generation of FNE packets.
The number of CE packets expected when (S+1) will be acknowledged is
therefore (S+1)*(C+1)/(A+1). Packet S+1 should be set to FNE if that
expected value exceeds the sum of FNE and Re-Echo packets sent so
far.
If (F+R)<(S+1)*(C+1)/(A+1),
then S+1 will be set to FNE
else S+1 will be set to RECT
So the full test should be:
When packet (S+1) is about to be sent...
If R<C,
then S+1 will be set to Re-Echo
Else if (F+R)<(S+1)*(C+1)/(A+1),
then S+1 will be set to FNE
Else S+1 will be set to RECT
This means that at any point, given A, R, F, C, the source could send
another k RECT packets, so that k < (F+R)*(A+1)/(C+1)-S
The above scheme is independent of the actions of both the dropper
and policer and doesn't depend on the rate adaptation discipline of
the source. It only defines Re-Echo packets as notification of
effective end-to-end congestion (as witnessed at the previous
roundtrip), and FNE packets as notification of speculative end-to-end
congestion based on a high estimate of congestion
In practice, for any source:
o for the first packet, A=R=F=C=S=0 ==> 1 FNE
o if the acknowledgment doesn't echo a mark
* for the second packet, A=F=S=1 R=C=0 ==> 1 RECT
* for the third packet, S=2 A=F=1 R=C=0 ==> 1 FNE
o if no acknowledgement for these two packets echoes a congestion
mark, then {A=S=3 F=2 R=C=0} which gives k<2*4/1-3, so the source
o if no acknowledgement for these four packets echoes a congestion
mark, then {A=S=7 F=2 R=C=0} which gives k<2*8/1-7, so the source
could send another 8 RECT packets. ==> 8 RECT
This behaviour happens to match TCP's congestion window control in
slow start, which is why for TCP sources, only the first and third
packet need be FNE packets.
A source that would open the congestion window any quicker would have
to insert more FNE packets. As another example a UDP source sending
VBR traffic might need to send several FNE packets ahead of the
traffic peaks it generates.
Appendix E. Example Egress Dropper Algorithm Appendix E. Example Egress Dropper Algorithm
{ToDo: Write up the basic algorithm with flow state, then the {ToDo: Write up the basic algorithm with flow state, then the
aggregated one.} aggregated one.}
Appendix F. Re-TTL Appendix F. Re-TTL
This Appendix gives an overview of a proposal to be able to overload This Appendix gives an overview of a proposal to be able to overload
the TTL field in the IP header to monitor downstream propagation the TTL field in the IP header to monitor downstream propagation
skipping to change at page 86, line 29 skipping to change at page 88, line 29
BT BT
B54/70, Adastral Park B54/70, Adastral Park
Martlesham Heath Martlesham Heath
Ipswich IP5 3RE Ipswich IP5 3RE
UK UK
Phone: +44 1473 647284 Phone: +44 1473 647284
Email: arnaud.jacquet@bt.com Email: arnaud.jacquet@bt.com
URI: URI:
Alessandro Salvatori Toby Moncaster
BT
B54/77, Adastral Park
Martlesham Heath
Ipswich IP5 3RE
UK
Email: alessandro.salvatori@gmail.com
Martin Koyabe
BT BT
PP2a Rigel House, Adastral Park B54/70, Adastral Park
Martlesham Heath Martlesham Heath
Ipswich IP5 3RE Ipswich IP5 3RE
UK UK
Phone: +44 1473 646923 Phone: +44 1473 648734
Email: martin.koyabe@bt.com Email: toby.moncaster@bt.com
URI:
Toby Moncaster Alan Smith
BT BT
B54/70, Adastral Park B54/76, Adastral Park
Martlesham Heath Martlesham Heath
Ipswich IP5 3RE Ipswich IP5 3RE
UK UK
Phone: +44 1473 648734 Phone: +44 1473 640404
Email: toby.moncaster@bt.com Email: alan.p.smith@bt.com
Full Copyright Statement Full Copyright Statement
Copyright (C) The IETF Trust (2007). Copyright (C) The IETF Trust (2008).
This document is subject to the rights, licenses and restrictions This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the authors contained in BCP 78, and except as set forth therein, the authors
retain all their rights. retain all their rights.
This document and the information contained herein are provided on an This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
 End of changes. 22 change blocks. 
55 lines changed or deleted 138 lines changed or added

This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/