< draft-ietf-tcpm-accecn-reqs-03.txt | draft-ietf-tcpm-accecn-reqs-03-bb.txt > | |||
---|---|---|---|---|
TCP Maintenance and Minor Extensions (tcpm) M. Kuehlewind, Ed. | TCP Maintenance and Minor Extensions M. Kuehlewind, Ed. | |||
Internet-Draft University of Stuttgart | (tcpm) University of Stuttgart | |||
Intended status: Informational R. Scheffenegger | Internet-Draft R. Scheffenegger | |||
Expires: January 16, 2014 NetApp, Inc. | Intended status: Informational NetApp, Inc. | |||
July 15, 2013 | Expires: February 14, 2014 August 13, 2013 | |||
Problem Statement and Requirements for a More Accurate ECN Feedback | Problem Statement and Requirements for Fine-Grained ECN Feedback | |||
draft-ietf-tcpm-accecn-reqs-03 | draft-ietf-tcpm-accecn-reqs-03-bb | |||
Abstract | Abstract | |||
Explicit Congestion Notification (ECN) is an IP/TCP mechanism where | Explicit Congestion Notification (ECN) is an IP/TCP mechanism where | |||
network nodes can mark IP packets instead of dropping them to | network nodes can mark IP packets instead of dropping them to | |||
indicate congestion to the end-points. An ECN-capable receiver will | indicate congestion to the end-points. An ECN-capable receiver will | |||
feedback this information to the sender. ECN is specified for TCP in | feedback this information to the sender. ECN is specified for TCP in | |||
such a way that only one feedback signal can be transmitted per | such a way that only one feedback signal can be transmitted per | |||
Round-Trip Time (RTT). Recently, new TCP mechanisms like ConEx or | Round-Trip Time (RTT). Recently, new TCP mechanisms like ConEx or | |||
DCTCP need more accurate ECN feedback information in the case where | DCTCP need fine-grained ECN feedback information in the case where | |||
more than one marking is received in one RTT. This documents | more than one marking is received in one RTT. This document | |||
specifies requirement for different ECN feedback scheme in the TCP | specifies requirements for an update to the TCP protocol so that it | |||
header to provide more than one feedback signal per RTT. | can provide ECN feedback signals that are more fine-grained than just | |||
once per round trip. | ||||
Status of This Memo | Review Comments, dated 13 Aug 2013. | |||
This is a review by Bob Briscoe, not the work of the authors. The | ||||
changes suggested in this review are just that: suggestions. They | ||||
have been written as mods to the XML source of the draft merely for | ||||
convenience. A diff against the original draft-03 will be the best | ||||
way to read this. It is expected that the authors may accept some | ||||
changes and reject others. There is no implication that any of these | ||||
changes are acceptable to the authors. The motivation for some of | ||||
the suggested changes is in the accompanying email sent to the tcpm | ||||
list. | ||||
Status of this Memo | ||||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on January 16, 2014. | This Internet-Draft will expire on February 14, 2014. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2013 IETF Trust and the persons identified as the | Copyright (c) 2013 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 | 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 | |||
2. Overview ECN and ECN Nonce in IP/TCP . . . . . . . . . . . . 4 | 2. Recap of Classic ECN and ECN Nonce in IP/TCP . . . . . . . . . 6 | |||
3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 5 | 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
4. Design Approaches . . . . . . . . . . . . . . . . . . . . . . 6 | 4. Design Approaches . . . . . . . . . . . . . . . . . . . . . . 9 | |||
4.1. Re-use of ECN/NS Header Bits . . . . . . . . . . . . . . 6 | 4.1. Re-use of ECN/NS Header Bits . . . . . . . . . . . . . . . 9 | |||
4.2. Use of Other Header Bits . . . . . . . . . . . . . . . . 7 | 4.2. Using Other Header Bits . . . . . . . . . . . . . . . . . 10 | |||
4.3. TCP Option . . . . . . . . . . . . . . . . . . . . . . . 7 | 4.3. Using a TCP Option . . . . . . . . . . . . . . . . . . . . 11 | |||
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 | 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | |||
7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 | |||
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
8.1. Normative References . . . . . . . . . . . . . . . . . . 8 | 8.1. Normative References . . . . . . . . . . . . . . . . . . . 12 | |||
8.2. Informative References . . . . . . . . . . . . . . . . . 8 | 8.2. Informative References . . . . . . . . . . . . . . . . . . 12 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
1. Introduction | 1. Introduction | |||
Explicit Congestion Notification (ECN) [RFC3168] is an IP/TCP | Explicit Congestion Notification (ECN) [RFC3168] is an IP/TCP | |||
mechanism where network nodes can mark IP packets instead of dropping | mechanism where network nodes can mark IP packets instead of dropping | |||
them to indicate congestion to the end-points. An ECN-capable | them to indicate congestion to the end-points. An ECN-capable | |||
receiver will feedback this information to the sender. ECN is | receiver will feedback this information to the sender. ECN is | |||
specified for TCP in such a way that only one feedback signal can be | specified for TCP in such a way that only one feedback signal can be | |||
transmitted per Round-Trip Time (RTT). This is sufficient for | transmitted per Round-Trip Time (RTT). This is sufficient for pre- | |||
current congestion control mechanisms, as only one reduction in | existing congestion control mechanisms that perform only one | |||
sending rate is performed per RTT independent of the number of ECN | reduction in sending rate per RTT, independent of the number of ECN | |||
congestion marks. But recently proposed mechanisms like Congestion | congestion marks. But recently proposed/deployed mechanisms like | |||
Exposure (ConEx) or DCTCP [Ali10] need more accurate ECN feedback | Congestion Exposure (ConEx) [RFC6789] or DCTCP [Ali10] need more | |||
information in the case where more than one marking is received in | fine-grained ECN feedback information to work correctly in the case | |||
one RTT to work correctly. | where more than one marking is received in any one RTT. | |||
The following scenarios should briefly show where the accurate | ConEx is an experimental approach that allows the sender to re-insert | |||
feedback is needed or provides additional value: | the congestion feedback it sees into the forward data path. This is | |||
primarily so that any traffic management can be proportionate to | ||||
actual congestion caused by traffic, rather than limiting traffic | ||||
based on rate or volume in case it might cause congestion [RFC6789]. | ||||
A ConEx sender uses selective acknowledgements (SACK [RFC2018]) for | ||||
fine-grained feedback of loss signals, but currently TCP offers no | ||||
equivalent fine-grained feedback for ECN. | ||||
A Standard (RFC5681) TCP sender that supports ConEx: | DCTCP offers very low and predictable queueing delay. DCTCP requires | |||
In this case the congestion control algorithm still ignores | switches/routers to have ECN enabled and configured with no signal | |||
multiple marks per RTT, while the ConEx mechanism uses the | smoothing, so it is currently only used in private networks, e.g. | |||
extra information per RTT to re-echo more precise congestion | internal to data centres. DCTCP was released in Microsoft Windows 8, | |||
information. | and implementations exist for Linux and FreeBSD. | |||
The changes DCTCP makes to TCP are not currently the subject of any | ||||
IETF standardisation activity. The different DCTCP implementations | ||||
alter TCP's ECN feedback protocol [RFC3168] in unspecified | ||||
proprietary ways, and they either omit capability negotiation, or | ||||
they use non-interoperable negotiation. A primary motivation for | ||||
this document is to prevent each proprietary implementation from | ||||
inventing its own handshake, which could lead to _de facto_ | ||||
consumption of the few flags that remain available for standardising | ||||
capability negotiation. Also, those variants that use the feedback | ||||
protocol proposed in [Ali10] only work if there are no losses at all, | ||||
and otherwise they become confused. | ||||
To remedy these problems, Section 3 of this document lists | ||||
requirements for a robust and interoperable fine-grained TCP/ECN | ||||
feedback protocol that all implementations of ConEx and/or DCTCP can | ||||
use. A few solutions have already been proposed, so Section 4 | ||||
demonstrates how to use the requirements to compare them, by briefly | ||||
sketching their high level design choices and discussing the benefits | ||||
and drawbacks of each. | ||||
The following scenarios briefly show where fine-grained feedback is | ||||
needed or adds value: | ||||
An RFC5681 TCP sender that supports ConEx: | ||||
In this case the ConEx mechanism uses the extra information | ||||
per RTT to re-echo the precise congestion information, but | ||||
the congestion control algorithm still ignores multiple marks | ||||
per RTT [RFC5681]. | ||||
A sender using DCTCP congestion control without ConEx: | A sender using DCTCP congestion control without ConEx: | |||
The congestion control algorithm uses the extra info per RTT | The DCTCP congestion control algorithm uses the extra | |||
to perform its decrease depending on the number of congestion | feedback information per RTT to decrease its rate depending | |||
marks. | on the extent of congestion marks (not just the existence of | |||
at least one mark per RTT). | ||||
A sender using DCTCP congestion control and supports ConEx: | A sender using DCTCP congestion control and supports ConEx: | |||
Both the congestion control algorithm and ConEx use the | Both the congestion control algorithm and ConEx use the fine- | |||
accurate ECN feedback mechanism. | grained ECN feedback mechanism. | |||
A standard TCP sender (using RFC5681 congestion control algorithm) | An RFC5681 TCP sender without ConEx: | |||
without ConEx: | No fine-grained feedback is necessary here. The congestion | |||
No accurate feedback is necessary here. The congestion | control algorithm still reacts on only one signal per RTT. | |||
control algorithm still react only on one signal per RTT. | ||||
But it is best to have one generic feedback mechanism, | But it is best to have one generic feedback mechanism, | |||
whether it is used or not. | whether it is used or not. | |||
This document summarizes the requirements for a new more accurate ECN | ||||
feedback scheme. While a new feedback scheme should still deliver | ||||
identical performance as classic ECN, this document also clarifies | ||||
what has to be taken into consideration in addition. Thus the listed | ||||
requirements should be addressed in the specification of a more | ||||
accurate ECN feedback scheme. Moreover, as a large set of proposals | ||||
already exists, a few high level design choices are sketched and | ||||
briefly discussed, to demonstrate some of the benefits and drawbacks | ||||
of each of these potential schemes. | ||||
1.1. Requirements Language | 1.1. Requirements Language | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in RFC 2119 [RFC2119]. | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
We use the following terminology from [RFC3168] and [RFC3540]: | We use the following terminology from [RFC3168] and [RFC3540]: | |||
The ECN field in the IP header: | The ECN field in the IP header: | |||
skipping to change at page 4, line 12 | skipping to change at page 6, line 12 | |||
The ECN flags in the TCP header: | The ECN flags in the TCP header: | |||
CWR: the Congestion Window Reduced flag, | CWR: the Congestion Window Reduced flag, | |||
ECE: the ECN-Echo flag, and | ECE: the ECN-Echo flag, and | |||
NS: ECN Nonce Sum. | NS: ECN Nonce Sum. | |||
In this document, the ECN feedback scheme as specified in [RFC3168] | In this document, the ECN feedback scheme as specified in [RFC3168] | |||
is called the 'classic ECN' and any new proposal the 'more accurate | is called the 'classic ECN' and any new proposal the 'fine-grained | |||
ECN feedback' scheme. A 'congestion mark' is defined as an IP packet | ECN feedback' scheme. A 'congestion mark' is defined as an IP packet | |||
where the CE codepoint is set. A 'congestion event' refers to one or | where the CE codepoint is set. A 'congestion episode' refers to one | |||
more congestion marks belong to the same overload situation in the | or more congestion marks belonging to the same overload situation in | |||
network (usually during one RTT). A TCP segment with the | the network (usually during one RTT). A TCP segment with the | |||
acknowledgment flag set is simply called ACK. | acknowledgment flag set is simply called an ACK. | |||
2. Overview ECN and ECN Nonce in IP/TCP | 2. Recap of Classic ECN and ECN Nonce in IP/TCP | |||
ECN requires two bits in the IP header. The ECN capability of a | ECN requires two bits in the IP header. The ECN capability of a | |||
packet is indicated when either one of the two bits is set. An ECN | packet is indicated when either one of the two bits is set. An ECN | |||
sender can set one or the other bit to indicate an ECN-capable | sender can set one or the other bit to indicate an ECN-capable | |||
transport (ECT) which results in two signals, ECT(0) and ECT(1). A | transport (ECT) which results in two signals, ECT(0) and ECT(1). A | |||
network node can set both bits simultaneously when it experiences | network node can set both bits simultaneously when it experiences | |||
congestion. When both bits are set the packet is regarded as | congestion. When both bits are set the packet is regarded as | |||
"Congestion Experienced" (CE). | "Congestion Experienced" (CE). | |||
In the TCP header the first two bits in byte 14 are defined for the | In the TCP header the first two bits in byte 14 are defined as ECN | |||
use of ECN. The TCP mechanism for signaling the reception of a | feedback for each half-connection. A TCP receiver signals the | |||
congestion mark uses the ECN-Echo (ECE) flag in the TCP header. To | reception of a congestion mark using the ECN-Echo (ECE) flag in the | |||
enable the TCP receiver to determine when to stop setting the ECN- | TCP header. For reliability, the receiver continues to set the ECE | |||
Echo flag, the CWR flag is set by the sender upon reception of the | flag on every ACK. To enable the TCP receiver to determine when to | |||
feedback signal. This leads always to a full RTT of ACKs with ECE | stop setting the ECN-Echo flag, the sender sets the CWR flag upon | |||
set. Thus any additional CE markings arriving within this RTT can | reception of an ECE feedback signal. This always leads to a full RTT | |||
not signaled back anymore. | of ACKs with ECE set. Thus the receiver cannot signal back any | |||
additional CE markings arriving within the same RTT. | ||||
ECN-Nonce [RFC3540] is an optional addition to ECN that is used to | The ECN Nonce [RFC3540] is an experimental addition to ECN that the | |||
protect the TCP sender against accidental or malicious concealment of | TCP sender can use to protect itself against accidental or malicious | |||
marked or dropped packets. This addition defines the last bit of | concealment of marked or dropped packets. This addition defines the | |||
byte 13 in the TCP header as the Nonce Sum (NS) bit. With ECN-Nonce | last bit of byte 13 in the TCP header as the Nonce Sum (NS) flag. | |||
a nonce sum is maintain that counts the occurrence of ECT(1) packets. | The receiver maintains a nonce sum that counts the occurrence of | |||
ECT(1) packets, and signals the least significant bit of this sum on | ||||
the NS flag. | ||||
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | N | C | E | U | A | P | R | S | F | | | | | N | C | E | U | A | P | R | S | F | | |||
| Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | |||
| | | | R | E | G | K | H | T | N | N | | | | | | R | E | G | K | H | T | N | N | | |||
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
Figure 1: The (post-ECN Nonce) definition of the TCP header flags | Figure 1: The (post-ECN Nonce) definition of the TCP header flags | |||
However, it is believed that the ECN nonce has never been deployed. | ||||
Therefore, if a sender tried to protect itself with the nonce, any | ||||
receiver wishing to conceal marked or dropped packets merely has to | ||||
appear like all the other receivers that have not implemented the | ||||
nonce, and simply not provide any nonce feedback. An alternative for | ||||
a sender to assure feedback integrity has been proposed where the | ||||
sender occasionally inserts an ECN mark or loss itself, and checks | ||||
that the receiver feeds it back faithfully | ||||
[I-D.moncaster-tcpm-rcv-cheat]. This alternative requires no | ||||
standardisation and consumes no header bits or codepoints, as well as | ||||
releasing the ECT(1) codepoint in the IP header and the NS flag in | ||||
the TCP header for other uses. | ||||
3. Requirements | 3. Requirements | |||
The requirements of the accurate ECN feedback protocol, for the use | At minimum, a new feedback scheme should deliver feedback no worse | |||
of e.g. Conex or DCTCP, are to have a fairly accurate (not | than classic ECN feedback. However, to be useful for e.g. ConEx or | |||
necessarily perfect), timely and protected signaling. This leads to | DCTCP, a fine-grained ECN feedback protocol will also need to be | |||
the following requirements, which should be discussed for any | fairly accurate (not necessarily perfect), timely and amenable to | |||
proposed more accurate ECN feedback scheme: | integrity protection. This leads to the following requirements, | |||
which should all be addressed in the specification of a fine-grained | ||||
ECN feedback scheme: | ||||
Resilience | Resilience | |||
The ECN feedback signal is carried within the ACK. TCP ACKs | The ECN feedback signal is carried within the ACK. Pure TCP | |||
can get lost. Moreover, delayed ACKs are mostly used with | ACKs can be lost without recovery. Therefore, a fine-grained | |||
TCP. That means in most cases only every second data packet | ECN feedback extension has to take ACK loss into account. | |||
triggers an ACK. In a high congestion situation where most | ||||
of the packets are marked with CE, an accurate feedback | ||||
mechanism must still be able to signal sufficient congestion | ||||
information. Thus the accurate ECN feedback extension has to | ||||
take delayed ACK and ACK loss into account. | ||||
Timeliness | Timeliness | |||
The CE mark is induced by a network node on the transmission | A CE mark is induced by a network node on the transmission | |||
path and echoed by the receiver in the TCP ACK. Thus when | path and echoed by the receiver in the TCP ACK. Thus when | |||
this information arrives at the sender, its naturally already | this information arrives at the sender, it is naturally | |||
about one RTT old. With a sufficient ACK rate a further | already about one RTT old. With a sufficient ACK rate a | |||
delay of a small number of ACK can be tolerated but with | further delay of a small number of ACKs can be tolerated. | |||
large delays this information will be out dated due to high | However, this information will become stale with larger | |||
dynamic in the network. TCP congestion control which | delays, given the dynamic nature of networks. TCP congestion | |||
introduces parts of these dynamics operates on a time scale | control (which itself partly introduces these dynamics) | |||
of one RTT. Thus the congestion feedback information should | operates on a time scale of one RTT. Thus, to be timely, | |||
be delivered timely (within one RTT). | congestion feedback information should be delivered within | |||
about one RTT. | ||||
Integrity | Integrity | |||
With ECN Nonce, a misbehaving receiver or network node can be | Given the problems with the ECN nonce identified above, this | |||
detected with good probability. If the accurate ECN feedback | document only requires that the integrity of fine-grained ECN | |||
is reusing the NS bit, it is encouraged to ensure integrity | feedback can be assured; it does not require that the ECN | |||
at least as good as ECN Nonce. If this is not possible, | nonce is the mechanism employed to achieve this. Indeed, it | |||
alternative approaches should be provided how a mechanism | entertains the possibility that a fine-grained ECN feedback | |||
using the accurate ECN feedback extension can re-ensure | scheme might re-use the nonce sum (NS) flag in the TCP | |||
integrity or give strong incentives for the receiver and | header. If fine-grained ECN feedback does re-use the NS | |||
network node to cooperate honestly. | flag, an alternative should be provided that assures the | |||
integrity of the feedback at least as well as the ECN nonce | ||||
or that gives strong incentives for the receiver and network | ||||
nodes to cooperate honestly. | ||||
Accuracy | Accuracy | |||
Classic ECN feeds back one congestion notification per RTT, | Classic ECN feeds back one congestion notification per RTT, | |||
as this is supposed to be used for TCP congestion control | which is sufficient for classic TCP congestion control which | |||
which reduces the sending rate at most once per RTT. The | reduces the sending rate at most once per RTT. The fine- | |||
accurate ECN feedback scheme has to ensure that if a | grained ECN feedback scheme has to ensure that, if a | |||
congestion events occurs at least one congestion notification | congestion episode occurs, at least one congestion | |||
is echoed and received per RTT as classic ECN would do. Of | notification is echoed and received per RTT as classic ECN | |||
course, the goal of this extension is to reconstruct the | would do. Of course, the goal of this extension is to | |||
number of CE markings (more) accurately and in the best case | reconstruct the number of CE markings more accurately and in | |||
even to reconstruct the (exact) number of payload bytes that | the best case even to reconstruct the exact number of payload | |||
a CE marked packet was carrying. However, a sender should | bytes that a CE marked packet was carrying. However, a | |||
not assume to get the exact number of congestion markings or | sender should not assume to get the exact number of | |||
marked bytes in all situations. | congestion markings or marked bytes in all situations. | |||
Delayed ACKs are commonly used with TCP. That means in most | ||||
cases only every second data packet triggers an ACK. Thus a | ||||
fine-grained ECN feedback extension has to take delayed ACKs | ||||
into account. In a high congestion situation where most of | ||||
the packets are marked with CE, a fine-grained feedback | ||||
mechanism must still be able to signal sufficient congestion | ||||
information. Ideally, it would be possible for the sender to | ||||
determine which of the packets covered by a delayed ACK were | ||||
congestion marked, e.g. if the flow consists of packets of | ||||
different sizes, or to allow for future protocols where the | ||||
order of the markings may be important. Also, an ideal fine- | ||||
grained feedback protocol would still work if delayed ACKs | ||||
covered more than two packets. | ||||
Complexity | Complexity | |||
Of course, the more accurate ECN feedback can also be used, | The implementation should be as simple as possible and only a | |||
even if only one ECN feedback signal per RTT is need. The | ||||
implementation should be as simple as possible and only a | ||||
minimum of additional state information should be needed. | minimum of additional state information should be needed. | |||
Overhead | Overhead | |||
A more accurate ecn feedback signal should limit the | A fine-grained ECN feedback signal should limit the | |||
additional network load. As feedback information has to be | additional network load, because ECN feedback is ultimately | |||
provided timely and frequently, potentially all or a large | not critical information (in the worst case, loss will still | |||
fraction of TCP acknowledgments will carry this information. | be available as a congestion signal of last resort). As | |||
Ideally, no additional segments are exchanged compared to a | feedback information has to be provided frequently and in a | |||
standard RFC3168 TCP session, while the overhead in each | timely fashion, potentially all or a large fraction of TCP | |||
segment is kept minimal. Further, a feedback mechanism | acknowledgments will carry this information. Ideally, no | |||
should be prepared to proved a method to fall-back to well | additional segments should be exchanged compared to an | |||
known RFC3168 signaling, if the new signal is suppressed by | RFC3168 TCP session, and the overhead in each segment should | |||
be minimised. | ||||
Backward and forward compatibility | ||||
Given fine-grained ECN feedback will involve a change to the | ||||
TCP protocol, it will need to be negotiated between the two | ||||
TCP endpoints. If either end does not support fine-grained | ||||
feedback, they should both be able to fall-back to classic | ||||
ECN feedback. | ||||
A fine-grained ECN feedback extension should aim to be able | ||||
to traverse most existing middleboxes. Further, a feedback | ||||
mechanism should provide a method to fall-back to classic | ||||
RFC3168 signaling if the new signal is suppressed by certain | ||||
middleboxes. | middleboxes. | |||
In order to avoid a fork in the TCP protocol specifications, | ||||
if experiments with the new fine-grained ECN feedback | ||||
protocol are successful, it is intended to eventually update | ||||
RFC3168 for any TCP/ECN sender, not just for ConEx or DCTCP | ||||
senders. Therefore, even if only one ECN feedback signal per | ||||
RTT is needed, it should be possible to use fine-grained ECN | ||||
feedback. | ||||
4. Design Approaches | 4. Design Approaches | |||
All discussed approaches aim to provide accurate ECN feedback | The schemes proposed so far are outlined below. The main | |||
information as long as no ACK loss occurs and the congestion rate is | differentiator is their resilience in the face of loss of pure ACKs, | |||
reasonable. Otherwise the proposed schemes have different resilience | which largely depends on the number of bits used for the encoding. | |||
characteristics depending on the number of used bits for the | ||||
encoding. While classic ECN provides a reliable (inaccurate) | ||||
feedback of a maximum of one congestion signal per RTT, the proposed | ||||
schemes do not implement any acknowledgement mechanism. | ||||
4.1. Re-use of ECN/NS Header Bits | 4.1. Re-use of ECN/NS Header Bits | |||
The three ECN/NS header, ECE, CWR and NS are re-used (not only for | The three ECN header flags (ECE, CWR and NS) are re-used both during | |||
additional capability negotiation during the TCP handshake exchange | the TCP handshake for capability negotiation and during the | |||
but) to signal the current value of an CE counter at the receiver. | subsequent TCP session for the receiver to signal the current value | |||
This approach only provides a limited resilience against ACK lost | of its congestion signal counter. This approach provides resilience | |||
depending of the number of used bits. | against ACK loss by repeating the CE counter on each ACK, but | |||
resilience against loss of a string of pure ACKs is limited, | ||||
dependent on the number of bits used. | ||||
There are several codings proposed so far: An one bit scheme sends | Several codings have been proposed so far: | |||
one ECE for each CE received (while the CWR could be used to | ||||
introduce redundant information in next ACK to increase the | ||||
robustness against ACK loss). An 3 bit counter scheme uses all three | ||||
bits for continuously feeding the three most significant bits of a CE | ||||
counter back. An 3 bit codepoint scheme encodes either a CE counter | ||||
or an ECT(1) counter in 8 codepoints. | ||||
The proposed schemes provides accumulated information on ECN-CE- | o A one bit scheme sends one ECE for each CE received (to increase | |||
the robustness against ACK loss CWR could be used to introduce | ||||
redundant information on the next ACK); | ||||
o A 3-bit counter scheme continuously feeds back the three least | ||||
significant bits of a CE counter; | ||||
o A 3-bit codepoint scheme encodes either a CE counter or an ECT(1) | ||||
counter in 8 codepoints. | ||||
The proposed schemes provide accumulated information on ECN-CE- | ||||
marking feedback, similar to the number of acknowledged bytes in the | marking feedback, similar to the number of acknowledged bytes in the | |||
TCP header. Due to the limited number of bits the ECN feedback | TCP header. Due to the limited number of bits the ECN feedback | |||
information will wrap-around more often (than the acknowledgement). | information will wrap much more often than the acknowledgement field. | |||
Thus with a smaller number of ACK losses it is already possible to | Thus feedback information could be lost due to a relatively small | |||
loose feedback information. The resilience could be increased by | sequence of pure-ACK losses. Resilience could be increased by | |||
introducing redundancy, e.g. send each counter increase twice or more | introducing redundancy, e.g. send each counter increase two or more | |||
times. Of course any of these additional mechanisms will increasee | times. Of course any of these additional mechanisms will increase | |||
the complexity. If the congestion rate is larger that the ACK rate | the complexity. If the congestion rate is larger that the ACK rate | |||
(multiplied with the number of feedback information that can be | (multiplied by the number of congestion marks that can be signaled | |||
signaled per ACK), the congestion information cannot correctly be | per ACK), the congestion information cannot be correctly fed back. | |||
feed back. Thus an accurate ECN feedback mechanism needs to be able | Thus an accurate ECN feedback mechanism needs to be able to cover the | |||
to also cover the worst case situation where every packet is CE | worst case situation where every packet is CE marked. This can | |||
marked. This can potentially be realized by dynamically adapt the | potentially be realized by dynamically adapting the ACK rate and | |||
ACK rate and redundancy which again increases complexity and also | redundancy, which again increases complexity and perhaps the | |||
potentially the signaling overhead. For all schemes, an integrity | signaling overhead as well. | |||
check is only provided if ECN Nonce can be supported. | ||||
4.2. Use of Other Header Bits | 4.2. Using Other Header Bits | |||
As seen in Figure 1, there are currently three unused flag bits in | As seen in Figure 1, there are currently three unused flag bits in | |||
the TCP header. The proposed 3 bit or codepoint schemes could be | the TCP header. The proposed 3 bit or codepoint schemes could be | |||
extended by one or more bits, to add higher resilience against ACK | extended by one or more bits, to add higher resilience against ACK | |||
loss. The relative gain would be proportionally higher resilience | loss. The relative gain would be proportionally higher resilience | |||
against ACK loss, while the respective drawbacks would remain | against ACK loss, while the respective drawbacks would remain | |||
identical. | identical. | |||
Moreover, the Urgent Pointer could be used if the Urgent Flag is not | Alternatively, the receiver could use bits in the Urgent Pointer | |||
set. As this is often the case, the resiliency could by increased | field to signal more bits of its congestion signal counter, but only | |||
without additional signaling overhead. | whenever it does not set the Urgent Flag. As this is often the case, | |||
resilience could be increased without additional header overhead. | ||||
4.3. TCP Option | Any proposal to use such bits would need to check the likelihood that | |||
some middleboxes might discard or 'normalise' the currently unused | ||||
flag bits or a non-zero Urgent Pointer when the Urgent Flag is | ||||
cleared. | ||||
4.3. Using a TCP Option | ||||
Alternatively, a new TCP option could be introduced, to help | Alternatively, a new TCP option could be introduced, to help | |||
maintaining the accuracy and integrity of the ECN feedback between | maintaining the accuracy and integrity of ECN feedback between | |||
receiver and sender. Such an option could provide higher resilience | receiver and sender. Such an option could provide higher resilience | |||
and even more information. E.g. ECN for RTP/UDP provides explicit | and even more information. For instance, ECN for RTP/UDP provides | |||
the number of ECT(0), ECT(1), CE, non-ECT marked and lost packets. | the explicit the number of ECT(0), ECT(1), CE, non-ECT marked and | |||
However, deploying new TCP options has its own challenges. Moreover, | lost packets. However, deploying new TCP options has its own | |||
to actually achieve a high resilience, this option would need to be | challenges. Moreover, to achieve a high resilience, this option | |||
carried by either all or a large number ACKs. Thus this approach | would need to be carried by most or all ACKs, which would add | |||
would introduce considerable signaling overhead while ECN feedback is | considerable signaling overhead. Anyway, such a TCP option could be | |||
not such a critical information (as in the worst case, loss will | used in addition to a more accurate ECN feedback scheme in the TCP | |||
still be available to provide a strong congestion feedback signal). | header or in addition to classic ECN, only when available and needed. | |||
Anyway, such a TCP option could also be used in addition to a more | ||||
accurate ECN feedback scheme in the TCP header or in addition to | ||||
classic ECN, only when available and needed. | ||||
5. Acknowledgements | 5. Acknowledgements | |||
6. IANA Considerations | 6. IANA Considerations | |||
This memo includes no request to IANA. | This memo includes no request to IANA. | |||
7. Security Considerations | 7. Security Considerations | |||
If this scheme is used as input for congestion control, the | Given ECN feedback is used as input for congestion control, the | |||
respective algorithm might not react appropriately if ECN feedback | respective algorithm would not react appropriately if fine-grained | |||
information got lost. As those schemes should still react | ECN feedback were lost and the resilience mechanism to recover it was | |||
appropriately to loss, this drawback can not lead to a congestion | inadequate. This resilience requirement is articulated in Section 3. | |||
collapse though. | However, it should be noted that fine-grained ECN feedback is not the | |||
last resort against congestion collapse, because if there is | ||||
insufficient response to ECN, loss will ensue, and TCP will still | ||||
react appropriately to loss. | ||||
Providing wrong feedback information could otherwise lead to | A receiver could suppress ECN feedback information leading to its | |||
throttling of certain connections. This problem is identical in the | connections consuming excess sender or network resources. This | |||
classic ECN feedback scheme and should be addressed by an additional | problem is similar to that seen with the classic ECN feedback scheme | |||
integrity check like ECN Nonce. | and should be addressed by integrity checking as required in | |||
Section 3. | ||||
8. References | 8. References | |||
8.1. Normative References | 8.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
of Explicit Congestion Notification (ECN) to IP", RFC | of Explicit Congestion Notification (ECN) to IP", | |||
3168, September 2001. | RFC 3168, September 2001. | |||
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | |||
Congestion Notification (ECN) Signaling with Nonces", RFC | Congestion Notification (ECN) Signaling with Nonces", | |||
3540, June 2003. | RFC 3540, June 2003. | |||
8.2. Informative References | 8.2. Informative References | |||
[Ali10] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, | [Ali10] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, | |||
P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP: | P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP: | |||
Efficient Packet Transport for the Commoditized Data | Efficient Packet Transport for the Commoditized Data | |||
Center", Jan 2010. | Center", Jan 2010. | |||
[I-D.briscoe-tsvwg-re-ecn-tcp] | [I-D.briscoe-tsvwg-re-ecn-tcp] | |||
Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, | Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, | |||
"Re-ECN: Adding Accountability for Causing Congestion to | "Re-ECN: Adding Accountability for Causing Congestion to | |||
TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-09 (work in | TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-09 (work in | |||
progress), October 2010. | progress), October 2010. | |||
[I-D.kuehlewind-tcpm-accurate-ecn-option] | [I-D.kuehlewind-tcpm-accurate-ecn-option] | |||
Kuehlewind, M. and R. Scheffenegger, "Accurate ECN | Kuehlewind, M. and R. Scheffenegger, "Accurate ECN | |||
Feedback Option in TCP", draft-kuehlewind-tcpm-accurate- | Feedback Option in TCP", | |||
ecn-option-01 (work in progress), July 2012. | draft-kuehlewind-tcpm-accurate-ecn-option-01 (work in | |||
progress), July 2012. | ||||
[I-D.moncaster-tcpm-rcv-cheat] | ||||
Moncaster, T., "A TCP Test to Allow Senders to Identify | ||||
Receiver Non-Compliance", | ||||
draft-moncaster-tcpm-rcv-cheat-01 (work in progress), | ||||
June 2007. | ||||
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | ||||
Selective Acknowledgment Options", RFC 2018, October 1996. | ||||
[RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. | [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. | |||
Ramakrishnan, "Adding Explicit Congestion Notification | Ramakrishnan, "Adding Explicit Congestion Notification | |||
(ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, June | (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, | |||
2009. | June 2009. | |||
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
Control", RFC 5681, September 2009. | Control", RFC 5681, September 2009. | |||
[RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding | [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding | |||
Acknowledgement Congestion Control to TCP", RFC 5690, | Acknowledgement Congestion Control to TCP", RFC 5690, | |||
February 2010. | February 2010. | |||
[RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion | ||||
Exposure (ConEx) Concepts and Use Cases", RFC 6789, | ||||
December 2012. | ||||
Authors' Addresses | Authors' Addresses | |||
Mirja Kuehlewind (editor) | Mirja Kuehlewind (editor) | |||
University of Stuttgart | University of Stuttgart | |||
Pfaffenwaldring 47 | Pfaffenwaldring 47 | |||
Stuttgart 70569 | Stuttgart 70569 | |||
Germany | Germany | |||
Email: mirja.kuehlewind@ikr.uni-stuttgart.de | Email: mirja.kuehlewind@ikr.uni-stuttgart.de | |||
Richard Scheffenegger | Richard Scheffenegger | |||
NetApp, Inc. | NetApp, Inc. | |||
Am Euro Platz 2 | Am Euro Platz 2 | |||
Vienna 1120 | Vienna, 1120 | |||
Austria | Austria | |||
Phone: +43 1 3676811 3146 | Phone: +43 1 3676811 3146 | |||
Email: rs@netapp.com | Email: rs@netapp.com | |||
End of changes. 48 change blocks. | ||||
202 lines changed or deleted | 312 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |