| < draft-ietf-tcpm-accecn-reqs-03.txt | draft-ietf-tcpm-accecn-reqs-03-bb.txt > | |||
|---|---|---|---|---|
| TCP Maintenance and Minor Extensions (tcpm) M. Kuehlewind, Ed. | TCP Maintenance and Minor Extensions M. Kuehlewind, Ed. | |||
| Internet-Draft University of Stuttgart | (tcpm) University of Stuttgart | |||
| Intended status: Informational R. Scheffenegger | Internet-Draft R. Scheffenegger | |||
| Expires: January 16, 2014 NetApp, Inc. | Intended status: Informational NetApp, Inc. | |||
| July 15, 2013 | Expires: February 14, 2014 August 13, 2013 | |||
| Problem Statement and Requirements for a More Accurate ECN Feedback | Problem Statement and Requirements for Fine-Grained ECN Feedback | |||
| draft-ietf-tcpm-accecn-reqs-03 | draft-ietf-tcpm-accecn-reqs-03-bb | |||
| Abstract | Abstract | |||
| Explicit Congestion Notification (ECN) is an IP/TCP mechanism where | Explicit Congestion Notification (ECN) is an IP/TCP mechanism where | |||
| network nodes can mark IP packets instead of dropping them to | network nodes can mark IP packets instead of dropping them to | |||
| indicate congestion to the end-points. An ECN-capable receiver will | indicate congestion to the end-points. An ECN-capable receiver will | |||
| feedback this information to the sender. ECN is specified for TCP in | feedback this information to the sender. ECN is specified for TCP in | |||
| such a way that only one feedback signal can be transmitted per | such a way that only one feedback signal can be transmitted per | |||
| Round-Trip Time (RTT). Recently, new TCP mechanisms like ConEx or | Round-Trip Time (RTT). Recently, new TCP mechanisms like ConEx or | |||
| DCTCP need more accurate ECN feedback information in the case where | DCTCP need fine-grained ECN feedback information in the case where | |||
| more than one marking is received in one RTT. This documents | more than one marking is received in one RTT. This document | |||
| specifies requirement for different ECN feedback scheme in the TCP | specifies requirements for an update to the TCP protocol so that it | |||
| header to provide more than one feedback signal per RTT. | can provide ECN feedback signals that are more fine-grained than just | |||
| once per round trip. | ||||
| Status of This Memo | Review Comments, dated 13 Aug 2013. | |||
| This is a review by Bob Briscoe, not the work of the authors. The | ||||
| changes suggested in this review are just that: suggestions. They | ||||
| have been written as mods to the XML source of the draft merely for | ||||
| convenience. A diff against the original draft-03 will be the best | ||||
| way to read this. It is expected that the authors may accept some | ||||
| changes and reject others. There is no implication that any of these | ||||
| changes are acceptable to the authors. The motivation for some of | ||||
| the suggested changes is in the accompanying email sent to the tcpm | ||||
| list. | ||||
| Status of this Memo | ||||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on January 16, 2014. | This Internet-Draft will expire on February 14, 2014. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2013 IETF Trust and the persons identified as the | Copyright (c) 2013 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 | 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5 | |||
| 2. Overview ECN and ECN Nonce in IP/TCP . . . . . . . . . . . . 4 | 2. Recap of Classic ECN and ECN Nonce in IP/TCP . . . . . . . . . 6 | |||
| 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 5 | 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 4. Design Approaches . . . . . . . . . . . . . . . . . . . . . . 6 | 4. Design Approaches . . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 4.1. Re-use of ECN/NS Header Bits . . . . . . . . . . . . . . 6 | 4.1. Re-use of ECN/NS Header Bits . . . . . . . . . . . . . . . 9 | |||
| 4.2. Use of Other Header Bits . . . . . . . . . . . . . . . . 7 | 4.2. Using Other Header Bits . . . . . . . . . . . . . . . . . 10 | |||
| 4.3. TCP Option . . . . . . . . . . . . . . . . . . . . . . . 7 | 4.3. Using a TCP Option . . . . . . . . . . . . . . . . . . . . 11 | |||
| 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 | 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 | 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 | 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11 | |||
| 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 | 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 8.1. Normative References . . . . . . . . . . . . . . . . . . 8 | 8.1. Normative References . . . . . . . . . . . . . . . . . . . 12 | |||
| 8.2. Informative References . . . . . . . . . . . . . . . . . 8 | 8.2. Informative References . . . . . . . . . . . . . . . . . . 12 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 1. Introduction | 1. Introduction | |||
| Explicit Congestion Notification (ECN) [RFC3168] is an IP/TCP | Explicit Congestion Notification (ECN) [RFC3168] is an IP/TCP | |||
| mechanism where network nodes can mark IP packets instead of dropping | mechanism where network nodes can mark IP packets instead of dropping | |||
| them to indicate congestion to the end-points. An ECN-capable | them to indicate congestion to the end-points. An ECN-capable | |||
| receiver will feedback this information to the sender. ECN is | receiver will feedback this information to the sender. ECN is | |||
| specified for TCP in such a way that only one feedback signal can be | specified for TCP in such a way that only one feedback signal can be | |||
| transmitted per Round-Trip Time (RTT). This is sufficient for | transmitted per Round-Trip Time (RTT). This is sufficient for pre- | |||
| current congestion control mechanisms, as only one reduction in | existing congestion control mechanisms that perform only one | |||
| sending rate is performed per RTT independent of the number of ECN | reduction in sending rate per RTT, independent of the number of ECN | |||
| congestion marks. But recently proposed mechanisms like Congestion | congestion marks. But recently proposed/deployed mechanisms like | |||
| Exposure (ConEx) or DCTCP [Ali10] need more accurate ECN feedback | Congestion Exposure (ConEx) [RFC6789] or DCTCP [Ali10] need more | |||
| information in the case where more than one marking is received in | fine-grained ECN feedback information to work correctly in the case | |||
| one RTT to work correctly. | where more than one marking is received in any one RTT. | |||
| The following scenarios should briefly show where the accurate | ConEx is an experimental approach that allows the sender to re-insert | |||
| feedback is needed or provides additional value: | the congestion feedback it sees into the forward data path. This is | |||
| primarily so that any traffic management can be proportionate to | ||||
| actual congestion caused by traffic, rather than limiting traffic | ||||
| based on rate or volume in case it might cause congestion [RFC6789]. | ||||
| A ConEx sender uses selective acknowledgements (SACK [RFC2018]) for | ||||
| fine-grained feedback of loss signals, but currently TCP offers no | ||||
| equivalent fine-grained feedback for ECN. | ||||
| A Standard (RFC5681) TCP sender that supports ConEx: | DCTCP offers very low and predictable queueing delay. DCTCP requires | |||
| In this case the congestion control algorithm still ignores | switches/routers to have ECN enabled and configured with no signal | |||
| multiple marks per RTT, while the ConEx mechanism uses the | smoothing, so it is currently only used in private networks, e.g. | |||
| extra information per RTT to re-echo more precise congestion | internal to data centres. DCTCP was released in Microsoft Windows 8, | |||
| information. | and implementations exist for Linux and FreeBSD. | |||
| The changes DCTCP makes to TCP are not currently the subject of any | ||||
| IETF standardisation activity. The different DCTCP implementations | ||||
| alter TCP's ECN feedback protocol [RFC3168] in unspecified | ||||
| proprietary ways, and they either omit capability negotiation, or | ||||
| they use non-interoperable negotiation. A primary motivation for | ||||
| this document is to prevent each proprietary implementation from | ||||
| inventing its own handshake, which could lead to _de facto_ | ||||
| consumption of the few flags that remain available for standardising | ||||
| capability negotiation. Also, those variants that use the feedback | ||||
| protocol proposed in [Ali10] only work if there are no losses at all, | ||||
| and otherwise they become confused. | ||||
| To remedy these problems, Section 3 of this document lists | ||||
| requirements for a robust and interoperable fine-grained TCP/ECN | ||||
| feedback protocol that all implementations of ConEx and/or DCTCP can | ||||
| use. A few solutions have already been proposed, so Section 4 | ||||
| demonstrates how to use the requirements to compare them, by briefly | ||||
| sketching their high level design choices and discussing the benefits | ||||
| and drawbacks of each. | ||||
| The following scenarios briefly show where fine-grained feedback is | ||||
| needed or adds value: | ||||
| An RFC5681 TCP sender that supports ConEx: | ||||
| In this case the ConEx mechanism uses the extra information | ||||
| per RTT to re-echo the precise congestion information, but | ||||
| the congestion control algorithm still ignores multiple marks | ||||
| per RTT [RFC5681]. | ||||
| A sender using DCTCP congestion control without ConEx: | A sender using DCTCP congestion control without ConEx: | |||
| The congestion control algorithm uses the extra info per RTT | The DCTCP congestion control algorithm uses the extra | |||
| to perform its decrease depending on the number of congestion | feedback information per RTT to decrease its rate depending | |||
| marks. | on the extent of congestion marks (not just the existence of | |||
| at least one mark per RTT). | ||||
| A sender using DCTCP congestion control and supports ConEx: | A sender using DCTCP congestion control and supports ConEx: | |||
| Both the congestion control algorithm and ConEx use the | Both the congestion control algorithm and ConEx use the fine- | |||
| accurate ECN feedback mechanism. | grained ECN feedback mechanism. | |||
| A standard TCP sender (using RFC5681 congestion control algorithm) | An RFC5681 TCP sender without ConEx: | |||
| without ConEx: | No fine-grained feedback is necessary here. The congestion | |||
| No accurate feedback is necessary here. The congestion | control algorithm still reacts on only one signal per RTT. | |||
| control algorithm still react only on one signal per RTT. | ||||
| But it is best to have one generic feedback mechanism, | But it is best to have one generic feedback mechanism, | |||
| whether it is used or not. | whether it is used or not. | |||
| This document summarizes the requirements for a new more accurate ECN | ||||
| feedback scheme. While a new feedback scheme should still deliver | ||||
| identical performance as classic ECN, this document also clarifies | ||||
| what has to be taken into consideration in addition. Thus the listed | ||||
| requirements should be addressed in the specification of a more | ||||
| accurate ECN feedback scheme. Moreover, as a large set of proposals | ||||
| already exists, a few high level design choices are sketched and | ||||
| briefly discussed, to demonstrate some of the benefits and drawbacks | ||||
| of each of these potential schemes. | ||||
| 1.1. Requirements Language | 1.1. Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in RFC 2119 [RFC2119]. | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
| We use the following terminology from [RFC3168] and [RFC3540]: | We use the following terminology from [RFC3168] and [RFC3540]: | |||
| The ECN field in the IP header: | The ECN field in the IP header: | |||
| skipping to change at page 4, line 12 | skipping to change at page 6, line 12 | |||
| The ECN flags in the TCP header: | The ECN flags in the TCP header: | |||
| CWR: the Congestion Window Reduced flag, | CWR: the Congestion Window Reduced flag, | |||
| ECE: the ECN-Echo flag, and | ECE: the ECN-Echo flag, and | |||
| NS: ECN Nonce Sum. | NS: ECN Nonce Sum. | |||
| In this document, the ECN feedback scheme as specified in [RFC3168] | In this document, the ECN feedback scheme as specified in [RFC3168] | |||
| is called the 'classic ECN' and any new proposal the 'more accurate | is called the 'classic ECN' and any new proposal the 'fine-grained | |||
| ECN feedback' scheme. A 'congestion mark' is defined as an IP packet | ECN feedback' scheme. A 'congestion mark' is defined as an IP packet | |||
| where the CE codepoint is set. A 'congestion event' refers to one or | where the CE codepoint is set. A 'congestion episode' refers to one | |||
| more congestion marks belong to the same overload situation in the | or more congestion marks belonging to the same overload situation in | |||
| network (usually during one RTT). A TCP segment with the | the network (usually during one RTT). A TCP segment with the | |||
| acknowledgment flag set is simply called ACK. | acknowledgment flag set is simply called an ACK. | |||
| 2. Overview ECN and ECN Nonce in IP/TCP | 2. Recap of Classic ECN and ECN Nonce in IP/TCP | |||
| ECN requires two bits in the IP header. The ECN capability of a | ECN requires two bits in the IP header. The ECN capability of a | |||
| packet is indicated when either one of the two bits is set. An ECN | packet is indicated when either one of the two bits is set. An ECN | |||
| sender can set one or the other bit to indicate an ECN-capable | sender can set one or the other bit to indicate an ECN-capable | |||
| transport (ECT) which results in two signals, ECT(0) and ECT(1). A | transport (ECT) which results in two signals, ECT(0) and ECT(1). A | |||
| network node can set both bits simultaneously when it experiences | network node can set both bits simultaneously when it experiences | |||
| congestion. When both bits are set the packet is regarded as | congestion. When both bits are set the packet is regarded as | |||
| "Congestion Experienced" (CE). | "Congestion Experienced" (CE). | |||
| In the TCP header the first two bits in byte 14 are defined for the | In the TCP header the first two bits in byte 14 are defined as ECN | |||
| use of ECN. The TCP mechanism for signaling the reception of a | feedback for each half-connection. A TCP receiver signals the | |||
| congestion mark uses the ECN-Echo (ECE) flag in the TCP header. To | reception of a congestion mark using the ECN-Echo (ECE) flag in the | |||
| enable the TCP receiver to determine when to stop setting the ECN- | TCP header. For reliability, the receiver continues to set the ECE | |||
| Echo flag, the CWR flag is set by the sender upon reception of the | flag on every ACK. To enable the TCP receiver to determine when to | |||
| feedback signal. This leads always to a full RTT of ACKs with ECE | stop setting the ECN-Echo flag, the sender sets the CWR flag upon | |||
| set. Thus any additional CE markings arriving within this RTT can | reception of an ECE feedback signal. This always leads to a full RTT | |||
| not signaled back anymore. | of ACKs with ECE set. Thus the receiver cannot signal back any | |||
| additional CE markings arriving within the same RTT. | ||||
| ECN-Nonce [RFC3540] is an optional addition to ECN that is used to | The ECN Nonce [RFC3540] is an experimental addition to ECN that the | |||
| protect the TCP sender against accidental or malicious concealment of | TCP sender can use to protect itself against accidental or malicious | |||
| marked or dropped packets. This addition defines the last bit of | concealment of marked or dropped packets. This addition defines the | |||
| byte 13 in the TCP header as the Nonce Sum (NS) bit. With ECN-Nonce | last bit of byte 13 in the TCP header as the Nonce Sum (NS) flag. | |||
| a nonce sum is maintain that counts the occurrence of ECT(1) packets. | The receiver maintains a nonce sum that counts the occurrence of | |||
| ECT(1) packets, and signals the least significant bit of this sum on | ||||
| the NS flag. | ||||
| 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| | | | N | C | E | U | A | P | R | S | F | | | | | N | C | E | U | A | P | R | S | F | | |||
| | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | |||
| | | | | R | E | G | K | H | T | N | N | | | | | | R | E | G | K | H | T | N | N | | |||
| +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ | |||
| Figure 1: The (post-ECN Nonce) definition of the TCP header flags | Figure 1: The (post-ECN Nonce) definition of the TCP header flags | |||
| However, it is believed that the ECN nonce has never been deployed. | ||||
| Therefore, if a sender tried to protect itself with the nonce, any | ||||
| receiver wishing to conceal marked or dropped packets merely has to | ||||
| appear like all the other receivers that have not implemented the | ||||
| nonce, and simply not provide any nonce feedback. An alternative for | ||||
| a sender to assure feedback integrity has been proposed where the | ||||
| sender occasionally inserts an ECN mark or loss itself, and checks | ||||
| that the receiver feeds it back faithfully | ||||
| [I-D.moncaster-tcpm-rcv-cheat]. This alternative requires no | ||||
| standardisation and consumes no header bits or codepoints, as well as | ||||
| releasing the ECT(1) codepoint in the IP header and the NS flag in | ||||
| the TCP header for other uses. | ||||
| 3. Requirements | 3. Requirements | |||
| The requirements of the accurate ECN feedback protocol, for the use | At minimum, a new feedback scheme should deliver feedback no worse | |||
| of e.g. Conex or DCTCP, are to have a fairly accurate (not | than classic ECN feedback. However, to be useful for e.g. ConEx or | |||
| necessarily perfect), timely and protected signaling. This leads to | DCTCP, a fine-grained ECN feedback protocol will also need to be | |||
| the following requirements, which should be discussed for any | fairly accurate (not necessarily perfect), timely and amenable to | |||
| proposed more accurate ECN feedback scheme: | integrity protection. This leads to the following requirements, | |||
| which should all be addressed in the specification of a fine-grained | ||||
| ECN feedback scheme: | ||||
| Resilience | Resilience | |||
| The ECN feedback signal is carried within the ACK. TCP ACKs | The ECN feedback signal is carried within the ACK. Pure TCP | |||
| can get lost. Moreover, delayed ACKs are mostly used with | ACKs can be lost without recovery. Therefore, a fine-grained | |||
| TCP. That means in most cases only every second data packet | ECN feedback extension has to take ACK loss into account. | |||
| triggers an ACK. In a high congestion situation where most | ||||
| of the packets are marked with CE, an accurate feedback | ||||
| mechanism must still be able to signal sufficient congestion | ||||
| information. Thus the accurate ECN feedback extension has to | ||||
| take delayed ACK and ACK loss into account. | ||||
| Timeliness | Timeliness | |||
| The CE mark is induced by a network node on the transmission | A CE mark is induced by a network node on the transmission | |||
| path and echoed by the receiver in the TCP ACK. Thus when | path and echoed by the receiver in the TCP ACK. Thus when | |||
| this information arrives at the sender, its naturally already | this information arrives at the sender, it is naturally | |||
| about one RTT old. With a sufficient ACK rate a further | already about one RTT old. With a sufficient ACK rate a | |||
| delay of a small number of ACK can be tolerated but with | further delay of a small number of ACKs can be tolerated. | |||
| large delays this information will be out dated due to high | However, this information will become stale with larger | |||
| dynamic in the network. TCP congestion control which | delays, given the dynamic nature of networks. TCP congestion | |||
| introduces parts of these dynamics operates on a time scale | control (which itself partly introduces these dynamics) | |||
| of one RTT. Thus the congestion feedback information should | operates on a time scale of one RTT. Thus, to be timely, | |||
| be delivered timely (within one RTT). | congestion feedback information should be delivered within | |||
| about one RTT. | ||||
| Integrity | Integrity | |||
| With ECN Nonce, a misbehaving receiver or network node can be | Given the problems with the ECN nonce identified above, this | |||
| detected with good probability. If the accurate ECN feedback | document only requires that the integrity of fine-grained ECN | |||
| is reusing the NS bit, it is encouraged to ensure integrity | feedback can be assured; it does not require that the ECN | |||
| at least as good as ECN Nonce. If this is not possible, | nonce is the mechanism employed to achieve this. Indeed, it | |||
| alternative approaches should be provided how a mechanism | entertains the possibility that a fine-grained ECN feedback | |||
| using the accurate ECN feedback extension can re-ensure | scheme might re-use the nonce sum (NS) flag in the TCP | |||
| integrity or give strong incentives for the receiver and | header. If fine-grained ECN feedback does re-use the NS | |||
| network node to cooperate honestly. | flag, an alternative should be provided that assures the | |||
| integrity of the feedback at least as well as the ECN nonce | ||||
| or that gives strong incentives for the receiver and network | ||||
| nodes to cooperate honestly. | ||||
| Accuracy | Accuracy | |||
| Classic ECN feeds back one congestion notification per RTT, | Classic ECN feeds back one congestion notification per RTT, | |||
| as this is supposed to be used for TCP congestion control | which is sufficient for classic TCP congestion control which | |||
| which reduces the sending rate at most once per RTT. The | reduces the sending rate at most once per RTT. The fine- | |||
| accurate ECN feedback scheme has to ensure that if a | grained ECN feedback scheme has to ensure that, if a | |||
| congestion events occurs at least one congestion notification | congestion episode occurs, at least one congestion | |||
| is echoed and received per RTT as classic ECN would do. Of | notification is echoed and received per RTT as classic ECN | |||
| course, the goal of this extension is to reconstruct the | would do. Of course, the goal of this extension is to | |||
| number of CE markings (more) accurately and in the best case | reconstruct the number of CE markings more accurately and in | |||
| even to reconstruct the (exact) number of payload bytes that | the best case even to reconstruct the exact number of payload | |||
| a CE marked packet was carrying. However, a sender should | bytes that a CE marked packet was carrying. However, a | |||
| not assume to get the exact number of congestion markings or | sender should not assume to get the exact number of | |||
| marked bytes in all situations. | congestion markings or marked bytes in all situations. | |||
| Delayed ACKs are commonly used with TCP. That means in most | ||||
| cases only every second data packet triggers an ACK. Thus a | ||||
| fine-grained ECN feedback extension has to take delayed ACKs | ||||
| into account. In a high congestion situation where most of | ||||
| the packets are marked with CE, a fine-grained feedback | ||||
| mechanism must still be able to signal sufficient congestion | ||||
| information. Ideally, it would be possible for the sender to | ||||
| determine which of the packets covered by a delayed ACK were | ||||
| congestion marked, e.g. if the flow consists of packets of | ||||
| different sizes, or to allow for future protocols where the | ||||
| order of the markings may be important. Also, an ideal fine- | ||||
| grained feedback protocol would still work if delayed ACKs | ||||
| covered more than two packets. | ||||
| Complexity | Complexity | |||
| Of course, the more accurate ECN feedback can also be used, | The implementation should be as simple as possible and only a | |||
| even if only one ECN feedback signal per RTT is need. The | ||||
| implementation should be as simple as possible and only a | ||||
| minimum of additional state information should be needed. | minimum of additional state information should be needed. | |||
| Overhead | Overhead | |||
| A more accurate ecn feedback signal should limit the | A fine-grained ECN feedback signal should limit the | |||
| additional network load. As feedback information has to be | additional network load, because ECN feedback is ultimately | |||
| provided timely and frequently, potentially all or a large | not critical information (in the worst case, loss will still | |||
| fraction of TCP acknowledgments will carry this information. | be available as a congestion signal of last resort). As | |||
| Ideally, no additional segments are exchanged compared to a | feedback information has to be provided frequently and in a | |||
| standard RFC3168 TCP session, while the overhead in each | timely fashion, potentially all or a large fraction of TCP | |||
| segment is kept minimal. Further, a feedback mechanism | acknowledgments will carry this information. Ideally, no | |||
| should be prepared to proved a method to fall-back to well | additional segments should be exchanged compared to an | |||
| known RFC3168 signaling, if the new signal is suppressed by | RFC3168 TCP session, and the overhead in each segment should | |||
| be minimised. | ||||
| Backward and forward compatibility | ||||
| Given fine-grained ECN feedback will involve a change to the | ||||
| TCP protocol, it will need to be negotiated between the two | ||||
| TCP endpoints. If either end does not support fine-grained | ||||
| feedback, they should both be able to fall-back to classic | ||||
| ECN feedback. | ||||
| A fine-grained ECN feedback extension should aim to be able | ||||
| to traverse most existing middleboxes. Further, a feedback | ||||
| mechanism should provide a method to fall-back to classic | ||||
| RFC3168 signaling if the new signal is suppressed by certain | ||||
| middleboxes. | middleboxes. | |||
| In order to avoid a fork in the TCP protocol specifications, | ||||
| if experiments with the new fine-grained ECN feedback | ||||
| protocol are successful, it is intended to eventually update | ||||
| RFC3168 for any TCP/ECN sender, not just for ConEx or DCTCP | ||||
| senders. Therefore, even if only one ECN feedback signal per | ||||
| RTT is needed, it should be possible to use fine-grained ECN | ||||
| feedback. | ||||
| 4. Design Approaches | 4. Design Approaches | |||
| All discussed approaches aim to provide accurate ECN feedback | The schemes proposed so far are outlined below. The main | |||
| information as long as no ACK loss occurs and the congestion rate is | differentiator is their resilience in the face of loss of pure ACKs, | |||
| reasonable. Otherwise the proposed schemes have different resilience | which largely depends on the number of bits used for the encoding. | |||
| characteristics depending on the number of used bits for the | ||||
| encoding. While classic ECN provides a reliable (inaccurate) | ||||
| feedback of a maximum of one congestion signal per RTT, the proposed | ||||
| schemes do not implement any acknowledgement mechanism. | ||||
| 4.1. Re-use of ECN/NS Header Bits | 4.1. Re-use of ECN/NS Header Bits | |||
| The three ECN/NS header, ECE, CWR and NS are re-used (not only for | The three ECN header flags (ECE, CWR and NS) are re-used both during | |||
| additional capability negotiation during the TCP handshake exchange | the TCP handshake for capability negotiation and during the | |||
| but) to signal the current value of an CE counter at the receiver. | subsequent TCP session for the receiver to signal the current value | |||
| This approach only provides a limited resilience against ACK lost | of its congestion signal counter. This approach provides resilience | |||
| depending of the number of used bits. | against ACK loss by repeating the CE counter on each ACK, but | |||
| resilience against loss of a string of pure ACKs is limited, | ||||
| dependent on the number of bits used. | ||||
| There are several codings proposed so far: An one bit scheme sends | Several codings have been proposed so far: | |||
| one ECE for each CE received (while the CWR could be used to | ||||
| introduce redundant information in next ACK to increase the | ||||
| robustness against ACK loss). An 3 bit counter scheme uses all three | ||||
| bits for continuously feeding the three most significant bits of a CE | ||||
| counter back. An 3 bit codepoint scheme encodes either a CE counter | ||||
| or an ECT(1) counter in 8 codepoints. | ||||
| The proposed schemes provides accumulated information on ECN-CE- | o A one bit scheme sends one ECE for each CE received (to increase | |||
| the robustness against ACK loss CWR could be used to introduce | ||||
| redundant information on the next ACK); | ||||
| o A 3-bit counter scheme continuously feeds back the three least | ||||
| significant bits of a CE counter; | ||||
| o A 3-bit codepoint scheme encodes either a CE counter or an ECT(1) | ||||
| counter in 8 codepoints. | ||||
| The proposed schemes provide accumulated information on ECN-CE- | ||||
| marking feedback, similar to the number of acknowledged bytes in the | marking feedback, similar to the number of acknowledged bytes in the | |||
| TCP header. Due to the limited number of bits the ECN feedback | TCP header. Due to the limited number of bits the ECN feedback | |||
| information will wrap-around more often (than the acknowledgement). | information will wrap much more often than the acknowledgement field. | |||
| Thus with a smaller number of ACK losses it is already possible to | Thus feedback information could be lost due to a relatively small | |||
| loose feedback information. The resilience could be increased by | sequence of pure-ACK losses. Resilience could be increased by | |||
| introducing redundancy, e.g. send each counter increase twice or more | introducing redundancy, e.g. send each counter increase two or more | |||
| times. Of course any of these additional mechanisms will increasee | times. Of course any of these additional mechanisms will increase | |||
| the complexity. If the congestion rate is larger that the ACK rate | the complexity. If the congestion rate is larger that the ACK rate | |||
| (multiplied with the number of feedback information that can be | (multiplied by the number of congestion marks that can be signaled | |||
| signaled per ACK), the congestion information cannot correctly be | per ACK), the congestion information cannot be correctly fed back. | |||
| feed back. Thus an accurate ECN feedback mechanism needs to be able | Thus an accurate ECN feedback mechanism needs to be able to cover the | |||
| to also cover the worst case situation where every packet is CE | worst case situation where every packet is CE marked. This can | |||
| marked. This can potentially be realized by dynamically adapt the | potentially be realized by dynamically adapting the ACK rate and | |||
| ACK rate and redundancy which again increases complexity and also | redundancy, which again increases complexity and perhaps the | |||
| potentially the signaling overhead. For all schemes, an integrity | signaling overhead as well. | |||
| check is only provided if ECN Nonce can be supported. | ||||
| 4.2. Use of Other Header Bits | 4.2. Using Other Header Bits | |||
| As seen in Figure 1, there are currently three unused flag bits in | As seen in Figure 1, there are currently three unused flag bits in | |||
| the TCP header. The proposed 3 bit or codepoint schemes could be | the TCP header. The proposed 3 bit or codepoint schemes could be | |||
| extended by one or more bits, to add higher resilience against ACK | extended by one or more bits, to add higher resilience against ACK | |||
| loss. The relative gain would be proportionally higher resilience | loss. The relative gain would be proportionally higher resilience | |||
| against ACK loss, while the respective drawbacks would remain | against ACK loss, while the respective drawbacks would remain | |||
| identical. | identical. | |||
| Moreover, the Urgent Pointer could be used if the Urgent Flag is not | Alternatively, the receiver could use bits in the Urgent Pointer | |||
| set. As this is often the case, the resiliency could by increased | field to signal more bits of its congestion signal counter, but only | |||
| without additional signaling overhead. | whenever it does not set the Urgent Flag. As this is often the case, | |||
| resilience could be increased without additional header overhead. | ||||
| 4.3. TCP Option | Any proposal to use such bits would need to check the likelihood that | |||
| some middleboxes might discard or 'normalise' the currently unused | ||||
| flag bits or a non-zero Urgent Pointer when the Urgent Flag is | ||||
| cleared. | ||||
| 4.3. Using a TCP Option | ||||
| Alternatively, a new TCP option could be introduced, to help | Alternatively, a new TCP option could be introduced, to help | |||
| maintaining the accuracy and integrity of the ECN feedback between | maintaining the accuracy and integrity of ECN feedback between | |||
| receiver and sender. Such an option could provide higher resilience | receiver and sender. Such an option could provide higher resilience | |||
| and even more information. E.g. ECN for RTP/UDP provides explicit | and even more information. For instance, ECN for RTP/UDP provides | |||
| the number of ECT(0), ECT(1), CE, non-ECT marked and lost packets. | the explicit the number of ECT(0), ECT(1), CE, non-ECT marked and | |||
| However, deploying new TCP options has its own challenges. Moreover, | lost packets. However, deploying new TCP options has its own | |||
| to actually achieve a high resilience, this option would need to be | challenges. Moreover, to achieve a high resilience, this option | |||
| carried by either all or a large number ACKs. Thus this approach | would need to be carried by most or all ACKs, which would add | |||
| would introduce considerable signaling overhead while ECN feedback is | considerable signaling overhead. Anyway, such a TCP option could be | |||
| not such a critical information (as in the worst case, loss will | used in addition to a more accurate ECN feedback scheme in the TCP | |||
| still be available to provide a strong congestion feedback signal). | header or in addition to classic ECN, only when available and needed. | |||
| Anyway, such a TCP option could also be used in addition to a more | ||||
| accurate ECN feedback scheme in the TCP header or in addition to | ||||
| classic ECN, only when available and needed. | ||||
| 5. Acknowledgements | 5. Acknowledgements | |||
| 6. IANA Considerations | 6. IANA Considerations | |||
| This memo includes no request to IANA. | This memo includes no request to IANA. | |||
| 7. Security Considerations | 7. Security Considerations | |||
| If this scheme is used as input for congestion control, the | Given ECN feedback is used as input for congestion control, the | |||
| respective algorithm might not react appropriately if ECN feedback | respective algorithm would not react appropriately if fine-grained | |||
| information got lost. As those schemes should still react | ECN feedback were lost and the resilience mechanism to recover it was | |||
| appropriately to loss, this drawback can not lead to a congestion | inadequate. This resilience requirement is articulated in Section 3. | |||
| collapse though. | However, it should be noted that fine-grained ECN feedback is not the | |||
| last resort against congestion collapse, because if there is | ||||
| insufficient response to ECN, loss will ensue, and TCP will still | ||||
| react appropriately to loss. | ||||
| Providing wrong feedback information could otherwise lead to | A receiver could suppress ECN feedback information leading to its | |||
| throttling of certain connections. This problem is identical in the | connections consuming excess sender or network resources. This | |||
| classic ECN feedback scheme and should be addressed by an additional | problem is similar to that seen with the classic ECN feedback scheme | |||
| integrity check like ECN Nonce. | and should be addressed by integrity checking as required in | |||
| Section 3. | ||||
| 8. References | 8. References | |||
| 8.1. Normative References | 8.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", RFC | of Explicit Congestion Notification (ECN) to IP", | |||
| 3168, September 2001. | RFC 3168, September 2001. | |||
| [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | |||
| Congestion Notification (ECN) Signaling with Nonces", RFC | Congestion Notification (ECN) Signaling with Nonces", | |||
| 3540, June 2003. | RFC 3540, June 2003. | |||
| 8.2. Informative References | 8.2. Informative References | |||
| [Ali10] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, | [Ali10] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, | |||
| P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP: | P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP: | |||
| Efficient Packet Transport for the Commoditized Data | Efficient Packet Transport for the Commoditized Data | |||
| Center", Jan 2010. | Center", Jan 2010. | |||
| [I-D.briscoe-tsvwg-re-ecn-tcp] | [I-D.briscoe-tsvwg-re-ecn-tcp] | |||
| Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, | Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, | |||
| "Re-ECN: Adding Accountability for Causing Congestion to | "Re-ECN: Adding Accountability for Causing Congestion to | |||
| TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-09 (work in | TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-09 (work in | |||
| progress), October 2010. | progress), October 2010. | |||
| [I-D.kuehlewind-tcpm-accurate-ecn-option] | [I-D.kuehlewind-tcpm-accurate-ecn-option] | |||
| Kuehlewind, M. and R. Scheffenegger, "Accurate ECN | Kuehlewind, M. and R. Scheffenegger, "Accurate ECN | |||
| Feedback Option in TCP", draft-kuehlewind-tcpm-accurate- | Feedback Option in TCP", | |||
| ecn-option-01 (work in progress), July 2012. | draft-kuehlewind-tcpm-accurate-ecn-option-01 (work in | |||
| progress), July 2012. | ||||
| [I-D.moncaster-tcpm-rcv-cheat] | ||||
| Moncaster, T., "A TCP Test to Allow Senders to Identify | ||||
| Receiver Non-Compliance", | ||||
| draft-moncaster-tcpm-rcv-cheat-01 (work in progress), | ||||
| June 2007. | ||||
| [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP | ||||
| Selective Acknowledgment Options", RFC 2018, October 1996. | ||||
| [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. | [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. | |||
| Ramakrishnan, "Adding Explicit Congestion Notification | Ramakrishnan, "Adding Explicit Congestion Notification | |||
| (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, June | (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, | |||
| 2009. | June 2009. | |||
| [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion | |||
| Control", RFC 5681, September 2009. | Control", RFC 5681, September 2009. | |||
| [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding | [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding | |||
| Acknowledgement Congestion Control to TCP", RFC 5690, | Acknowledgement Congestion Control to TCP", RFC 5690, | |||
| February 2010. | February 2010. | |||
| [RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion | ||||
| Exposure (ConEx) Concepts and Use Cases", RFC 6789, | ||||
| December 2012. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Mirja Kuehlewind (editor) | Mirja Kuehlewind (editor) | |||
| University of Stuttgart | University of Stuttgart | |||
| Pfaffenwaldring 47 | Pfaffenwaldring 47 | |||
| Stuttgart 70569 | Stuttgart 70569 | |||
| Germany | Germany | |||
| Email: mirja.kuehlewind@ikr.uni-stuttgart.de | Email: mirja.kuehlewind@ikr.uni-stuttgart.de | |||
| Richard Scheffenegger | Richard Scheffenegger | |||
| NetApp, Inc. | NetApp, Inc. | |||
| Am Euro Platz 2 | Am Euro Platz 2 | |||
| Vienna 1120 | Vienna, 1120 | |||
| Austria | Austria | |||
| Phone: +43 1 3676811 3146 | Phone: +43 1 3676811 3146 | |||
| Email: rs@netapp.com | Email: rs@netapp.com | |||
| End of changes. 48 change blocks. | ||||
| 202 lines changed or deleted | 312 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||