< draft-ietf-tcpm-accecn-reqs-03.txt   draft-ietf-tcpm-accecn-reqs-03-bb.txt >
TCP Maintenance and Minor Extensions (tcpm) M. Kuehlewind, Ed. TCP Maintenance and Minor Extensions M. Kuehlewind, Ed.
Internet-Draft University of Stuttgart (tcpm) University of Stuttgart
Intended status: Informational R. Scheffenegger Internet-Draft R. Scheffenegger
Expires: January 16, 2014 NetApp, Inc. Intended status: Informational NetApp, Inc.
July 15, 2013 Expires: February 14, 2014 August 13, 2013
Problem Statement and Requirements for a More Accurate ECN Feedback Problem Statement and Requirements for Fine-Grained ECN Feedback
draft-ietf-tcpm-accecn-reqs-03 draft-ietf-tcpm-accecn-reqs-03-bb
Abstract Abstract
Explicit Congestion Notification (ECN) is an IP/TCP mechanism where Explicit Congestion Notification (ECN) is an IP/TCP mechanism where
network nodes can mark IP packets instead of dropping them to network nodes can mark IP packets instead of dropping them to
indicate congestion to the end-points. An ECN-capable receiver will indicate congestion to the end-points. An ECN-capable receiver will
feedback this information to the sender. ECN is specified for TCP in feedback this information to the sender. ECN is specified for TCP in
such a way that only one feedback signal can be transmitted per such a way that only one feedback signal can be transmitted per
Round-Trip Time (RTT). Recently, new TCP mechanisms like ConEx or Round-Trip Time (RTT). Recently, new TCP mechanisms like ConEx or
DCTCP need more accurate ECN feedback information in the case where DCTCP need fine-grained ECN feedback information in the case where
more than one marking is received in one RTT. This documents more than one marking is received in one RTT. This document
specifies requirement for different ECN feedback scheme in the TCP specifies requirements for an update to the TCP protocol so that it
header to provide more than one feedback signal per RTT. can provide ECN feedback signals that are more fine-grained than just
once per round trip.
Status of This Memo Review Comments, dated 13 Aug 2013.
This is a review by Bob Briscoe, not the work of the authors. The
changes suggested in this review are just that: suggestions. They
have been written as mods to the XML source of the draft merely for
convenience. A diff against the original draft-03 will be the best
way to read this. It is expected that the authors may accept some
changes and reject others. There is no implication that any of these
changes are acceptable to the authors. The motivation for some of
the suggested changes is in the accompanying email sent to the tcpm
list.
Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 16, 2014. This Internet-Draft will expire on February 14, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2013 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 5
2. Overview ECN and ECN Nonce in IP/TCP . . . . . . . . . . . . 4 2. Recap of Classic ECN and ECN Nonce in IP/TCP . . . . . . . . . 6
3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . 5 3. Requirements . . . . . . . . . . . . . . . . . . . . . . . . . 7
4. Design Approaches . . . . . . . . . . . . . . . . . . . . . . 6 4. Design Approaches . . . . . . . . . . . . . . . . . . . . . . 9
4.1. Re-use of ECN/NS Header Bits . . . . . . . . . . . . . . 6 4.1. Re-use of ECN/NS Header Bits . . . . . . . . . . . . . . . 9
4.2. Use of Other Header Bits . . . . . . . . . . . . . . . . 7 4.2. Using Other Header Bits . . . . . . . . . . . . . . . . . 10
4.3. TCP Option . . . . . . . . . . . . . . . . . . . . . . . 7 4.3. Using a TCP Option . . . . . . . . . . . . . . . . . . . . 11
5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 8 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 11
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
7. Security Considerations . . . . . . . . . . . . . . . . . . . 8 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 8 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12
8.1. Normative References . . . . . . . . . . . . . . . . . . 8 8.1. Normative References . . . . . . . . . . . . . . . . . . . 12
8.2. Informative References . . . . . . . . . . . . . . . . . 8 8.2. Informative References . . . . . . . . . . . . . . . . . . 12
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 9 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction 1. Introduction
Explicit Congestion Notification (ECN) [RFC3168] is an IP/TCP Explicit Congestion Notification (ECN) [RFC3168] is an IP/TCP
mechanism where network nodes can mark IP packets instead of dropping mechanism where network nodes can mark IP packets instead of dropping
them to indicate congestion to the end-points. An ECN-capable them to indicate congestion to the end-points. An ECN-capable
receiver will feedback this information to the sender. ECN is receiver will feedback this information to the sender. ECN is
specified for TCP in such a way that only one feedback signal can be specified for TCP in such a way that only one feedback signal can be
transmitted per Round-Trip Time (RTT). This is sufficient for transmitted per Round-Trip Time (RTT). This is sufficient for pre-
current congestion control mechanisms, as only one reduction in existing congestion control mechanisms that perform only one
sending rate is performed per RTT independent of the number of ECN reduction in sending rate per RTT, independent of the number of ECN
congestion marks. But recently proposed mechanisms like Congestion congestion marks. But recently proposed/deployed mechanisms like
Exposure (ConEx) or DCTCP [Ali10] need more accurate ECN feedback Congestion Exposure (ConEx) [RFC6789] or DCTCP [Ali10] need more
information in the case where more than one marking is received in fine-grained ECN feedback information to work correctly in the case
one RTT to work correctly. where more than one marking is received in any one RTT.
The following scenarios should briefly show where the accurate ConEx is an experimental approach that allows the sender to re-insert
feedback is needed or provides additional value: the congestion feedback it sees into the forward data path. This is
primarily so that any traffic management can be proportionate to
actual congestion caused by traffic, rather than limiting traffic
based on rate or volume in case it might cause congestion [RFC6789].
A ConEx sender uses selective acknowledgements (SACK [RFC2018]) for
fine-grained feedback of loss signals, but currently TCP offers no
equivalent fine-grained feedback for ECN.
A Standard (RFC5681) TCP sender that supports ConEx: DCTCP offers very low and predictable queueing delay. DCTCP requires
In this case the congestion control algorithm still ignores switches/routers to have ECN enabled and configured with no signal
multiple marks per RTT, while the ConEx mechanism uses the smoothing, so it is currently only used in private networks, e.g.
extra information per RTT to re-echo more precise congestion internal to data centres. DCTCP was released in Microsoft Windows 8,
information. and implementations exist for Linux and FreeBSD.
The changes DCTCP makes to TCP are not currently the subject of any
IETF standardisation activity. The different DCTCP implementations
alter TCP's ECN feedback protocol [RFC3168] in unspecified
proprietary ways, and they either omit capability negotiation, or
they use non-interoperable negotiation. A primary motivation for
this document is to prevent each proprietary implementation from
inventing its own handshake, which could lead to _de facto_
consumption of the few flags that remain available for standardising
capability negotiation. Also, those variants that use the feedback
protocol proposed in [Ali10] only work if there are no losses at all,
and otherwise they become confused.
To remedy these problems, Section 3 of this document lists
requirements for a robust and interoperable fine-grained TCP/ECN
feedback protocol that all implementations of ConEx and/or DCTCP can
use. A few solutions have already been proposed, so Section 4
demonstrates how to use the requirements to compare them, by briefly
sketching their high level design choices and discussing the benefits
and drawbacks of each.
The following scenarios briefly show where fine-grained feedback is
needed or adds value:
An RFC5681 TCP sender that supports ConEx:
In this case the ConEx mechanism uses the extra information
per RTT to re-echo the precise congestion information, but
the congestion control algorithm still ignores multiple marks
per RTT [RFC5681].
A sender using DCTCP congestion control without ConEx: A sender using DCTCP congestion control without ConEx:
The congestion control algorithm uses the extra info per RTT The DCTCP congestion control algorithm uses the extra
to perform its decrease depending on the number of congestion feedback information per RTT to decrease its rate depending
marks. on the extent of congestion marks (not just the existence of
at least one mark per RTT).
A sender using DCTCP congestion control and supports ConEx: A sender using DCTCP congestion control and supports ConEx:
Both the congestion control algorithm and ConEx use the Both the congestion control algorithm and ConEx use the fine-
accurate ECN feedback mechanism. grained ECN feedback mechanism.
A standard TCP sender (using RFC5681 congestion control algorithm) An RFC5681 TCP sender without ConEx:
without ConEx: No fine-grained feedback is necessary here. The congestion
No accurate feedback is necessary here. The congestion control algorithm still reacts on only one signal per RTT.
control algorithm still react only on one signal per RTT.
But it is best to have one generic feedback mechanism, But it is best to have one generic feedback mechanism,
whether it is used or not. whether it is used or not.
This document summarizes the requirements for a new more accurate ECN
feedback scheme. While a new feedback scheme should still deliver
identical performance as classic ECN, this document also clarifies
what has to be taken into consideration in addition. Thus the listed
requirements should be addressed in the specification of a more
accurate ECN feedback scheme. Moreover, as a large set of proposals
already exists, a few high level design choices are sketched and
briefly discussed, to demonstrate some of the benefits and drawbacks
of each of these potential schemes.
1.1. Requirements Language 1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
We use the following terminology from [RFC3168] and [RFC3540]: We use the following terminology from [RFC3168] and [RFC3540]:
The ECN field in the IP header: The ECN field in the IP header:
skipping to change at page 4, line 12 skipping to change at page 6, line 12
The ECN flags in the TCP header: The ECN flags in the TCP header:
CWR: the Congestion Window Reduced flag, CWR: the Congestion Window Reduced flag,
ECE: the ECN-Echo flag, and ECE: the ECN-Echo flag, and
NS: ECN Nonce Sum. NS: ECN Nonce Sum.
In this document, the ECN feedback scheme as specified in [RFC3168] In this document, the ECN feedback scheme as specified in [RFC3168]
is called the 'classic ECN' and any new proposal the 'more accurate is called the 'classic ECN' and any new proposal the 'fine-grained
ECN feedback' scheme. A 'congestion mark' is defined as an IP packet ECN feedback' scheme. A 'congestion mark' is defined as an IP packet
where the CE codepoint is set. A 'congestion event' refers to one or where the CE codepoint is set. A 'congestion episode' refers to one
more congestion marks belong to the same overload situation in the or more congestion marks belonging to the same overload situation in
network (usually during one RTT). A TCP segment with the the network (usually during one RTT). A TCP segment with the
acknowledgment flag set is simply called ACK. acknowledgment flag set is simply called an ACK.
2. Overview ECN and ECN Nonce in IP/TCP 2. Recap of Classic ECN and ECN Nonce in IP/TCP
ECN requires two bits in the IP header. The ECN capability of a ECN requires two bits in the IP header. The ECN capability of a
packet is indicated when either one of the two bits is set. An ECN packet is indicated when either one of the two bits is set. An ECN
sender can set one or the other bit to indicate an ECN-capable sender can set one or the other bit to indicate an ECN-capable
transport (ECT) which results in two signals, ECT(0) and ECT(1). A transport (ECT) which results in two signals, ECT(0) and ECT(1). A
network node can set both bits simultaneously when it experiences network node can set both bits simultaneously when it experiences
congestion. When both bits are set the packet is regarded as congestion. When both bits are set the packet is regarded as
"Congestion Experienced" (CE). "Congestion Experienced" (CE).
In the TCP header the first two bits in byte 14 are defined for the In the TCP header the first two bits in byte 14 are defined as ECN
use of ECN. The TCP mechanism for signaling the reception of a feedback for each half-connection. A TCP receiver signals the
congestion mark uses the ECN-Echo (ECE) flag in the TCP header. To reception of a congestion mark using the ECN-Echo (ECE) flag in the
enable the TCP receiver to determine when to stop setting the ECN- TCP header. For reliability, the receiver continues to set the ECE
Echo flag, the CWR flag is set by the sender upon reception of the flag on every ACK. To enable the TCP receiver to determine when to
feedback signal. This leads always to a full RTT of ACKs with ECE stop setting the ECN-Echo flag, the sender sets the CWR flag upon
set. Thus any additional CE markings arriving within this RTT can reception of an ECE feedback signal. This always leads to a full RTT
not signaled back anymore. of ACKs with ECE set. Thus the receiver cannot signal back any
additional CE markings arriving within the same RTT.
ECN-Nonce [RFC3540] is an optional addition to ECN that is used to The ECN Nonce [RFC3540] is an experimental addition to ECN that the
protect the TCP sender against accidental or malicious concealment of TCP sender can use to protect itself against accidental or malicious
marked or dropped packets. This addition defines the last bit of concealment of marked or dropped packets. This addition defines the
byte 13 in the TCP header as the Nonce Sum (NS) bit. With ECN-Nonce last bit of byte 13 in the TCP header as the Nonce Sum (NS) flag.
a nonce sum is maintain that counts the occurrence of ECT(1) packets. The receiver maintains a nonce sum that counts the occurrence of
ECT(1) packets, and signals the least significant bit of this sum on
the NS flag.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| | | N | C | E | U | A | P | R | S | F | | | | N | C | E | U | A | P | R | S | F |
| Header Length | Reserved | S | W | C | R | C | S | S | Y | I | | Header Length | Reserved | S | W | C | R | C | S | S | Y | I |
| | | | R | E | G | K | H | T | N | N | | | | | R | E | G | K | H | T | N | N |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
Figure 1: The (post-ECN Nonce) definition of the TCP header flags Figure 1: The (post-ECN Nonce) definition of the TCP header flags
However, it is believed that the ECN nonce has never been deployed.
Therefore, if a sender tried to protect itself with the nonce, any
receiver wishing to conceal marked or dropped packets merely has to
appear like all the other receivers that have not implemented the
nonce, and simply not provide any nonce feedback. An alternative for
a sender to assure feedback integrity has been proposed where the
sender occasionally inserts an ECN mark or loss itself, and checks
that the receiver feeds it back faithfully
[I-D.moncaster-tcpm-rcv-cheat]. This alternative requires no
standardisation and consumes no header bits or codepoints, as well as
releasing the ECT(1) codepoint in the IP header and the NS flag in
the TCP header for other uses.
3. Requirements 3. Requirements
The requirements of the accurate ECN feedback protocol, for the use At minimum, a new feedback scheme should deliver feedback no worse
of e.g. Conex or DCTCP, are to have a fairly accurate (not than classic ECN feedback. However, to be useful for e.g. ConEx or
necessarily perfect), timely and protected signaling. This leads to DCTCP, a fine-grained ECN feedback protocol will also need to be
the following requirements, which should be discussed for any fairly accurate (not necessarily perfect), timely and amenable to
proposed more accurate ECN feedback scheme: integrity protection. This leads to the following requirements,
which should all be addressed in the specification of a fine-grained
ECN feedback scheme:
Resilience Resilience
The ECN feedback signal is carried within the ACK. TCP ACKs The ECN feedback signal is carried within the ACK. Pure TCP
can get lost. Moreover, delayed ACKs are mostly used with ACKs can be lost without recovery. Therefore, a fine-grained
TCP. That means in most cases only every second data packet ECN feedback extension has to take ACK loss into account.
triggers an ACK. In a high congestion situation where most
of the packets are marked with CE, an accurate feedback
mechanism must still be able to signal sufficient congestion
information. Thus the accurate ECN feedback extension has to
take delayed ACK and ACK loss into account.
Timeliness Timeliness
The CE mark is induced by a network node on the transmission A CE mark is induced by a network node on the transmission
path and echoed by the receiver in the TCP ACK. Thus when path and echoed by the receiver in the TCP ACK. Thus when
this information arrives at the sender, its naturally already this information arrives at the sender, it is naturally
about one RTT old. With a sufficient ACK rate a further already about one RTT old. With a sufficient ACK rate a
delay of a small number of ACK can be tolerated but with further delay of a small number of ACKs can be tolerated.
large delays this information will be out dated due to high However, this information will become stale with larger
dynamic in the network. TCP congestion control which delays, given the dynamic nature of networks. TCP congestion
introduces parts of these dynamics operates on a time scale control (which itself partly introduces these dynamics)
of one RTT. Thus the congestion feedback information should operates on a time scale of one RTT. Thus, to be timely,
be delivered timely (within one RTT). congestion feedback information should be delivered within
about one RTT.
Integrity Integrity
With ECN Nonce, a misbehaving receiver or network node can be Given the problems with the ECN nonce identified above, this
detected with good probability. If the accurate ECN feedback document only requires that the integrity of fine-grained ECN
is reusing the NS bit, it is encouraged to ensure integrity feedback can be assured; it does not require that the ECN
at least as good as ECN Nonce. If this is not possible, nonce is the mechanism employed to achieve this. Indeed, it
alternative approaches should be provided how a mechanism entertains the possibility that a fine-grained ECN feedback
using the accurate ECN feedback extension can re-ensure scheme might re-use the nonce sum (NS) flag in the TCP
integrity or give strong incentives for the receiver and header. If fine-grained ECN feedback does re-use the NS
network node to cooperate honestly. flag, an alternative should be provided that assures the
integrity of the feedback at least as well as the ECN nonce
or that gives strong incentives for the receiver and network
nodes to cooperate honestly.
Accuracy Accuracy
Classic ECN feeds back one congestion notification per RTT, Classic ECN feeds back one congestion notification per RTT,
as this is supposed to be used for TCP congestion control which is sufficient for classic TCP congestion control which
which reduces the sending rate at most once per RTT. The reduces the sending rate at most once per RTT. The fine-
accurate ECN feedback scheme has to ensure that if a grained ECN feedback scheme has to ensure that, if a
congestion events occurs at least one congestion notification congestion episode occurs, at least one congestion
is echoed and received per RTT as classic ECN would do. Of notification is echoed and received per RTT as classic ECN
course, the goal of this extension is to reconstruct the would do. Of course, the goal of this extension is to
number of CE markings (more) accurately and in the best case reconstruct the number of CE markings more accurately and in
even to reconstruct the (exact) number of payload bytes that the best case even to reconstruct the exact number of payload
a CE marked packet was carrying. However, a sender should bytes that a CE marked packet was carrying. However, a
not assume to get the exact number of congestion markings or sender should not assume to get the exact number of
marked bytes in all situations. congestion markings or marked bytes in all situations.
Delayed ACKs are commonly used with TCP. That means in most
cases only every second data packet triggers an ACK. Thus a
fine-grained ECN feedback extension has to take delayed ACKs
into account. In a high congestion situation where most of
the packets are marked with CE, a fine-grained feedback
mechanism must still be able to signal sufficient congestion
information. Ideally, it would be possible for the sender to
determine which of the packets covered by a delayed ACK were
congestion marked, e.g. if the flow consists of packets of
different sizes, or to allow for future protocols where the
order of the markings may be important. Also, an ideal fine-
grained feedback protocol would still work if delayed ACKs
covered more than two packets.
Complexity Complexity
Of course, the more accurate ECN feedback can also be used, The implementation should be as simple as possible and only a
even if only one ECN feedback signal per RTT is need. The
implementation should be as simple as possible and only a
minimum of additional state information should be needed. minimum of additional state information should be needed.
Overhead Overhead
A more accurate ecn feedback signal should limit the A fine-grained ECN feedback signal should limit the
additional network load. As feedback information has to be additional network load, because ECN feedback is ultimately
provided timely and frequently, potentially all or a large not critical information (in the worst case, loss will still
fraction of TCP acknowledgments will carry this information. be available as a congestion signal of last resort). As
Ideally, no additional segments are exchanged compared to a feedback information has to be provided frequently and in a
standard RFC3168 TCP session, while the overhead in each timely fashion, potentially all or a large fraction of TCP
segment is kept minimal. Further, a feedback mechanism acknowledgments will carry this information. Ideally, no
should be prepared to proved a method to fall-back to well additional segments should be exchanged compared to an
known RFC3168 signaling, if the new signal is suppressed by RFC3168 TCP session, and the overhead in each segment should
be minimised.
Backward and forward compatibility
Given fine-grained ECN feedback will involve a change to the
TCP protocol, it will need to be negotiated between the two
TCP endpoints. If either end does not support fine-grained
feedback, they should both be able to fall-back to classic
ECN feedback.
A fine-grained ECN feedback extension should aim to be able
to traverse most existing middleboxes. Further, a feedback
mechanism should provide a method to fall-back to classic
RFC3168 signaling if the new signal is suppressed by certain
middleboxes. middleboxes.
In order to avoid a fork in the TCP protocol specifications,
if experiments with the new fine-grained ECN feedback
protocol are successful, it is intended to eventually update
RFC3168 for any TCP/ECN sender, not just for ConEx or DCTCP
senders. Therefore, even if only one ECN feedback signal per
RTT is needed, it should be possible to use fine-grained ECN
feedback.
4. Design Approaches 4. Design Approaches
All discussed approaches aim to provide accurate ECN feedback The schemes proposed so far are outlined below. The main
information as long as no ACK loss occurs and the congestion rate is differentiator is their resilience in the face of loss of pure ACKs,
reasonable. Otherwise the proposed schemes have different resilience which largely depends on the number of bits used for the encoding.
characteristics depending on the number of used bits for the
encoding. While classic ECN provides a reliable (inaccurate)
feedback of a maximum of one congestion signal per RTT, the proposed
schemes do not implement any acknowledgement mechanism.
4.1. Re-use of ECN/NS Header Bits 4.1. Re-use of ECN/NS Header Bits
The three ECN/NS header, ECE, CWR and NS are re-used (not only for The three ECN header flags (ECE, CWR and NS) are re-used both during
additional capability negotiation during the TCP handshake exchange the TCP handshake for capability negotiation and during the
but) to signal the current value of an CE counter at the receiver. subsequent TCP session for the receiver to signal the current value
This approach only provides a limited resilience against ACK lost of its congestion signal counter. This approach provides resilience
depending of the number of used bits. against ACK loss by repeating the CE counter on each ACK, but
resilience against loss of a string of pure ACKs is limited,
dependent on the number of bits used.
There are several codings proposed so far: An one bit scheme sends Several codings have been proposed so far:
one ECE for each CE received (while the CWR could be used to
introduce redundant information in next ACK to increase the
robustness against ACK loss). An 3 bit counter scheme uses all three
bits for continuously feeding the three most significant bits of a CE
counter back. An 3 bit codepoint scheme encodes either a CE counter
or an ECT(1) counter in 8 codepoints.
The proposed schemes provides accumulated information on ECN-CE- o A one bit scheme sends one ECE for each CE received (to increase
the robustness against ACK loss CWR could be used to introduce
redundant information on the next ACK);
o A 3-bit counter scheme continuously feeds back the three least
significant bits of a CE counter;
o A 3-bit codepoint scheme encodes either a CE counter or an ECT(1)
counter in 8 codepoints.
The proposed schemes provide accumulated information on ECN-CE-
marking feedback, similar to the number of acknowledged bytes in the marking feedback, similar to the number of acknowledged bytes in the
TCP header. Due to the limited number of bits the ECN feedback TCP header. Due to the limited number of bits the ECN feedback
information will wrap-around more often (than the acknowledgement). information will wrap much more often than the acknowledgement field.
Thus with a smaller number of ACK losses it is already possible to Thus feedback information could be lost due to a relatively small
loose feedback information. The resilience could be increased by sequence of pure-ACK losses. Resilience could be increased by
introducing redundancy, e.g. send each counter increase twice or more introducing redundancy, e.g. send each counter increase two or more
times. Of course any of these additional mechanisms will increasee times. Of course any of these additional mechanisms will increase
the complexity. If the congestion rate is larger that the ACK rate the complexity. If the congestion rate is larger that the ACK rate
(multiplied with the number of feedback information that can be (multiplied by the number of congestion marks that can be signaled
signaled per ACK), the congestion information cannot correctly be per ACK), the congestion information cannot be correctly fed back.
feed back. Thus an accurate ECN feedback mechanism needs to be able Thus an accurate ECN feedback mechanism needs to be able to cover the
to also cover the worst case situation where every packet is CE worst case situation where every packet is CE marked. This can
marked. This can potentially be realized by dynamically adapt the potentially be realized by dynamically adapting the ACK rate and
ACK rate and redundancy which again increases complexity and also redundancy, which again increases complexity and perhaps the
potentially the signaling overhead. For all schemes, an integrity signaling overhead as well.
check is only provided if ECN Nonce can be supported.
4.2. Use of Other Header Bits 4.2. Using Other Header Bits
As seen in Figure 1, there are currently three unused flag bits in As seen in Figure 1, there are currently three unused flag bits in
the TCP header. The proposed 3 bit or codepoint schemes could be the TCP header. The proposed 3 bit or codepoint schemes could be
extended by one or more bits, to add higher resilience against ACK extended by one or more bits, to add higher resilience against ACK
loss. The relative gain would be proportionally higher resilience loss. The relative gain would be proportionally higher resilience
against ACK loss, while the respective drawbacks would remain against ACK loss, while the respective drawbacks would remain
identical. identical.
Moreover, the Urgent Pointer could be used if the Urgent Flag is not Alternatively, the receiver could use bits in the Urgent Pointer
set. As this is often the case, the resiliency could by increased field to signal more bits of its congestion signal counter, but only
without additional signaling overhead. whenever it does not set the Urgent Flag. As this is often the case,
resilience could be increased without additional header overhead.
4.3. TCP Option Any proposal to use such bits would need to check the likelihood that
some middleboxes might discard or 'normalise' the currently unused
flag bits or a non-zero Urgent Pointer when the Urgent Flag is
cleared.
4.3. Using a TCP Option
Alternatively, a new TCP option could be introduced, to help Alternatively, a new TCP option could be introduced, to help
maintaining the accuracy and integrity of the ECN feedback between maintaining the accuracy and integrity of ECN feedback between
receiver and sender. Such an option could provide higher resilience receiver and sender. Such an option could provide higher resilience
and even more information. E.g. ECN for RTP/UDP provides explicit and even more information. For instance, ECN for RTP/UDP provides
the number of ECT(0), ECT(1), CE, non-ECT marked and lost packets. the explicit the number of ECT(0), ECT(1), CE, non-ECT marked and
However, deploying new TCP options has its own challenges. Moreover, lost packets. However, deploying new TCP options has its own
to actually achieve a high resilience, this option would need to be challenges. Moreover, to achieve a high resilience, this option
carried by either all or a large number ACKs. Thus this approach would need to be carried by most or all ACKs, which would add
would introduce considerable signaling overhead while ECN feedback is considerable signaling overhead. Anyway, such a TCP option could be
not such a critical information (as in the worst case, loss will used in addition to a more accurate ECN feedback scheme in the TCP
still be available to provide a strong congestion feedback signal). header or in addition to classic ECN, only when available and needed.
Anyway, such a TCP option could also be used in addition to a more
accurate ECN feedback scheme in the TCP header or in addition to
classic ECN, only when available and needed.
5. Acknowledgements 5. Acknowledgements
6. IANA Considerations 6. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
7. Security Considerations 7. Security Considerations
If this scheme is used as input for congestion control, the Given ECN feedback is used as input for congestion control, the
respective algorithm might not react appropriately if ECN feedback respective algorithm would not react appropriately if fine-grained
information got lost. As those schemes should still react ECN feedback were lost and the resilience mechanism to recover it was
appropriately to loss, this drawback can not lead to a congestion inadequate. This resilience requirement is articulated in Section 3.
collapse though. However, it should be noted that fine-grained ECN feedback is not the
last resort against congestion collapse, because if there is
insufficient response to ECN, loss will ensue, and TCP will still
react appropriately to loss.
Providing wrong feedback information could otherwise lead to A receiver could suppress ECN feedback information leading to its
throttling of certain connections. This problem is identical in the connections consuming excess sender or network resources. This
classic ECN feedback scheme and should be addressed by an additional problem is similar to that seen with the classic ECN feedback scheme
integrity check like ECN Nonce. and should be addressed by integrity checking as required in
Section 3.
8. References 8. References
8.1. Normative References 8.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
of Explicit Congestion Notification (ECN) to IP", RFC of Explicit Congestion Notification (ECN) to IP",
3168, September 2001. RFC 3168, September 2001.
[RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
Congestion Notification (ECN) Signaling with Nonces", RFC Congestion Notification (ECN) Signaling with Nonces",
3540, June 2003. RFC 3540, June 2003.
8.2. Informative References 8.2. Informative References
[Ali10] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, [Ali10] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel,
P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP: P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP:
Efficient Packet Transport for the Commoditized Data Efficient Packet Transport for the Commoditized Data
Center", Jan 2010. Center", Jan 2010.
[I-D.briscoe-tsvwg-re-ecn-tcp] [I-D.briscoe-tsvwg-re-ecn-tcp]
Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith,
"Re-ECN: Adding Accountability for Causing Congestion to "Re-ECN: Adding Accountability for Causing Congestion to
TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-09 (work in TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-09 (work in
progress), October 2010. progress), October 2010.
[I-D.kuehlewind-tcpm-accurate-ecn-option] [I-D.kuehlewind-tcpm-accurate-ecn-option]
Kuehlewind, M. and R. Scheffenegger, "Accurate ECN Kuehlewind, M. and R. Scheffenegger, "Accurate ECN
Feedback Option in TCP", draft-kuehlewind-tcpm-accurate- Feedback Option in TCP",
ecn-option-01 (work in progress), July 2012. draft-kuehlewind-tcpm-accurate-ecn-option-01 (work in
progress), July 2012.
[I-D.moncaster-tcpm-rcv-cheat]
Moncaster, T., "A TCP Test to Allow Senders to Identify
Receiver Non-Compliance",
draft-moncaster-tcpm-rcv-cheat-01 (work in progress),
June 2007.
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, October 1996.
[RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K. [RFC5562] Kuzmanovic, A., Mondal, A., Floyd, S., and K.
Ramakrishnan, "Adding Explicit Congestion Notification Ramakrishnan, "Adding Explicit Congestion Notification
(ECN) Capability to TCP's SYN/ACK Packets", RFC 5562, June (ECN) Capability to TCP's SYN/ACK Packets", RFC 5562,
2009. June 2009.
[RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion
Control", RFC 5681, September 2009. Control", RFC 5681, September 2009.
[RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding [RFC5690] Floyd, S., Arcia, A., Ros, D., and J. Iyengar, "Adding
Acknowledgement Congestion Control to TCP", RFC 5690, Acknowledgement Congestion Control to TCP", RFC 5690,
February 2010. February 2010.
[RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion
Exposure (ConEx) Concepts and Use Cases", RFC 6789,
December 2012.
Authors' Addresses Authors' Addresses
Mirja Kuehlewind (editor) Mirja Kuehlewind (editor)
University of Stuttgart University of Stuttgart
Pfaffenwaldring 47 Pfaffenwaldring 47
Stuttgart 70569 Stuttgart 70569
Germany Germany
Email: mirja.kuehlewind@ikr.uni-stuttgart.de Email: mirja.kuehlewind@ikr.uni-stuttgart.de
Richard Scheffenegger Richard Scheffenegger
NetApp, Inc. NetApp, Inc.
Am Euro Platz 2 Am Euro Platz 2
Vienna 1120 Vienna, 1120
Austria Austria
Phone: +43 1 3676811 3146 Phone: +43 1 3676811 3146
Email: rs@netapp.com Email: rs@netapp.com
 End of changes. 48 change blocks. 
202 lines changed or deleted 312 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/