< draft-ietf-conex-tcp-modifications-07.txt   draft-ietf-conex-tcp-modifications-07-bb.txt >
Congestion Exposure (ConEx) M. Kuehlewind, Ed. Congestion Exposure (ConEx) M. Kuehlewind, Ed.
Internet-Draft ETH Zurich Internet-Draft ETH Zurich
Intended status: Experimental R. Scheffenegger Intended status: Experimental R. Scheffenegger
Expires: August 18, 2015 NetApp, Inc. Expires: September 9, 2015 NetApp, Inc.
February 14, 2015 March 8, 2015
TCP modifications for Congestion Exposure TCP modifications for Congestion Exposure
draft-ietf-conex-tcp-modifications-07 draft-ietf-conex-tcp-modifications-07
Abstract Abstract
Congestion Exposure (ConEx) is a mechanism by which senders inform Congestion Exposure (ConEx) is a mechanism by which senders inform
the network about the congestion encountered by previous packets on the network about expected congestion based on congestion feedback
the same flow. This document describes the necessary modifications from previous packets in the same flow. This document describes the
to use ConEx with the Transmission Control Protocol (TCP). necessary modifications to use ConEx with the Transmission Control
Protocol (TCP).
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 18, 2015. This Internet-Draft will expire on September 9, 2015.
Copyright Notice Copyright Notice
Copyright (c) 2015 IETF Trust and the persons identified as the Copyright (c) 2015 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 18
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2. Sender-side Modifications . . . . . . . . . . . . . . . . . . 3 2. Sender-side Modifications . . . . . . . . . . . . . . . . . . 3
3. Accounting congestion . . . . . . . . . . . . . . . . . . . . 4 3. Counting congestion . . . . . . . . . . . . . . . . . . . . . 4
3.1. Loss Detection . . . . . . . . . . . . . . . . . . . . . 5 3.1. Loss Detection . . . . . . . . . . . . . . . . . . . . . 5
3.1.1. Without SACK Support . . . . . . . . . . . . . . . . 6 3.1.1. General Approach . . . . . . . . . . . . . . . . . . 6
3.1.2. Without SACK Support . . . . . . . . . . . . . . . . 6
3.2. ECN . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2. ECN . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1. Accurate ECN feedback . . . . . . . . . . . . . . . . 8 3.2.1. Accurate ECN feedback . . . . . . . . . . . . . . . . 9
3.2.2. Classic ECN support . . . . . . . . . . . . . . . . . 8 3.2.2. Classic ECN support . . . . . . . . . . . . . . . . . 9
4. Setting the ConEx Bits . . . . . . . . . . . . . . . . . . . 9 4. Setting the ConEx Bits . . . . . . . . . . . . . . . . . . . 10
4.1. Setting the E and the L Bit . . . . . . . . . . . . . . . 9 4.1. Setting the E or the L Flag . . . . . . . . . . . . . . . 10
4.2. Credit Bits . . . . . . . . . . . . . . . . . . . . . . . 9 4.2. Setting the Credit Flag . . . . . . . . . . . . . . . . . 11
5. Loss of ConEx information . . . . . . . . . . . . . . . . . . 11 5. Loss of ConEx information . . . . . . . . . . . . . . . . . . 13
6. Timeliness of the ConEx Signals . . . . . . . . . . . . . . . 11 6. Timeliness of the ConEx Signals . . . . . . . . . . . . . . . 14
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 14
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
9. Security Considerations . . . . . . . . . . . . . . . . . . . 12 9. Security Considerations . . . . . . . . . . . . . . . . . . . 14
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 12 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 15
10.1. Normative References . . . . . . . . . . . . . . . . . . 12 10.1. Normative References . . . . . . . . . . . . . . . . . . 15
10.2. Informative References . . . . . . . . . . . . . . . . . 13 10.2. Informative References . . . . . . . . . . . . . . . . . 16
Appendix A. Revision history . . . . . . . . . . . . . . . . . . 14 Appendix A. Revision history . . . . . . . . . . . . . . . . . . 17
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20
1. Introduction 1. Introduction
Congestion Exposure (ConEx) is a mechanism by which senders inform Congestion Exposure (ConEx) is a mechanism by which senders inform
the network about the congestion encountered by previous packets on the network about expected congestion based on congestion feedback
the same flow. ConEx concepts and use cases are further explained in from previous packets in the same flow. ConEx concepts and use cases
[RFC6789]. The abstract ConEx mechanism is explained in are further explained in [RFC6789]. The abstract ConEx mechanism is
[draft-ietf-conex-abstract-mech]. This document describes the explained in [draft-ietf-conex-abstract-mech]. This document
necessary modifications to use ConEx with the Transmission Control describes the necessary modifications to use ConEx with the
Protocol (TCP). Transmission Control Protocol (TCP).
The needed markings to provide ConEx signaling are defined in the The markings for ConEx signaling are defined in the ConEx Destination
ConEx Destination Option (CDO) for IPv6 [draft-ietf-conex-destopt]. Option (CDO) for IPv6 [draft-ietf-conex-destopt]. Specifically, the
Specifically, the use of four bits are defined: the X (ConEx- use of four flags are defined: X (ConEx-capable), L (loss
capable), the L (loss experienced), the E (ECN experienced) and C experienced), E (ECN experienced) and C (credit).
(credit) bit.
ConEx signaling is based on loss or Explicit Congestion Notification ConEx signaling is based on loss or Explicit Congestion Notification
(ECN) marks [RFC3168] as a congestion indication. This congestion (ECN) marks [RFC3168] as congestion indications. The sender collects
information is retrieved by the sender based on existing feedback this congestion information based on existing TCP feedback mechanisms
mechanisms from the receiver to the sender in TCP. No changes are from the receiver to the sender. No changes are needed at the
needed at the receiver to implement ConEx signaling. Therefore no receiver to implement ConEx signaling. Therefore no additional
additional negotiation is needed to implement and use ConEx at the negotiation is needed to implement and use ConEx at the sender. This
sender. This document specifies actions needed by sender to provide document specifies the sender's actions that are needed to provide
meaningful ConEx information to the network. meaningful ConEx information to the network.
Section 2 provides an overview of the needed modifications for TCP Section 2 provides an overview of the modifications needed for TCP
senders to implement ConEx. First congestion information have to be senders to implement ConEx. First congestion information has to be
extracted from loss or ECN feedback in TCP as described in section 3. extracted from TCP's loss or ECN feedback as described in section 3.
Section 4 details how to set the CDO marking based on the accounted Section 4 details how to set the CDO marking based on this congestion
congestion information. Section 6 finally discusses timeliness of information. Section 5 discusses loss of packets carrying ConEx
the ConEx feedback signal as congestion is a temporary state. information. Section 6 [CREF1]discusses timeliness of the ConEx
feedback signal, given congestion is a temporary state.
This document describes congestion accounting for both TCP with and This document describes congestion accounting for TCP with and
without the Selective Acknowledgment (SACK) extension [RFC2018] in without the Selective Acknowledgment (SACK) extension [RFC2018] (in
section 3.1. However, ConEx benefits from more accurate information section 3.1). However, ConEx benefits from the more accurate
about the number of packets dropped in the network. It is therefore information that SACK provides about the number of bytes dropped in
recommended to use the SACK extension when using TCP with ConEx. The the network. It is therefore preferable[CREF2] to use the SACK
detailed mechanism to respectively set the L bit in response to loss- extension when using TCP with ConEx. The detailed mechanism to set
based congestion feedback signal is given in section 4.1. the L flag in response to loss-based congestion feedback signal is
given in section 4.1.
While loss-based congestion feedback should be minimized, ECN could Whereas loss has to be minimized, ECN can provide more fine-grained
actually provide more fine-grained feedback information. ConEx-based feedback information. ConEx-based traffic measurement or management
traffic measurement or management mechanisms would benefit from this. mechanisms could benefit from this. Unfortunately, the current ECN
Unfortunately, the current ECN feedback mechanism does not reflect feedback mechanism does not reflect multiple congestion markings if
multiple congestion markings which occur within the same Round-Trip they occur within the same Round-Trip Time (RTT). A more accurate
Time (RTT). A more accurate feedback extension to ECN is proposed in feedback extension to ECN (AccECN) is proposed in a separate document
a separate document [draft-kuehlewind-tcpm-accurate-ecn], as this is [draft-kuehlewind-tcpm-accurate-ecn], as this is also useful for
also useful for other mechanisms. other mechanisms.
The congestion accounting for both, with the classic ECN feedback as Congestion accounting for both classic ECN feedback and AccECN
well as a more accurate ECN feedback are explained in detail in feedback is explained in detail in section 3.2. Setting the E flag
section 3.2 while the setting of the E bit in response to ECN-based in response to ECN-based congestion feedback is again detailed in
congestion feedback is again detailed in section 4.1. section 4.1.
1.1. Requirements Language 1.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
2. Sender-side Modifications 2. Sender-side Modifications
This section gives an overview of actions that need to be taken by a This section gives an overview of actions that need to be taken by a
TCP sender that would like to use ConEx signaling. TCP sender modified to use ConEx signaling.
A ConEx sender MUST negotiate for both SACK and ECN or the more In the TCP handshake, a ConEx sender MUST negotiate for SACK and ECN
accurate ECN feedback in the TCP handshake if these TCP extension are preferably with AccECN feedback. Therefore a ConEx sender MUST also
available at the sender. Therefore a ConEx sender SHOULD also
implement SACK and ECN. Depending on the capability of the receiver, implement SACK and ECN. Depending on the capability of the receiver,
the following operation modes exist: the following operation modes exist:
o SACK-accECN-ConEx (SACK and accurate ECN feedback) +------+-----+
| SACK | ECN |
o accECN-ConEx (no SACK but accurate ECN feedback) +------+-----+
| S | A |
o ECN-ConEx (no SACK and no accurate ECN feedback but 'classic' ECN) | S | C |
| S | - |
o SACK-ECN-ConEx (SACK and 'classic' instead of accurate ECN) | - | A |
| - | C |
| - | - |
+------+-----+
o SACK-ConEx (SACK but no ECN at all) S: SACK enabled; A: AccECN enabled; C: Classic ECN [RFC3168] enabled
o Basic-ConEx (neither SACK nor ECN) Table 1: ConEx modes.
A ConEx sender MUST expose all congestion information to the network A ConEx sender MUST expose all congestion information to the network
according to the congestion information received by ECN or based on according to the congestion information received by ECN or based on
loss information provided by the TCP feedback loop. A TCP sender loss information provided by the TCP feedback loop. A TCP sender
SHOULD account congestion byte-wise (and not packet-wise). A sender SHOULD count congestion byte-wise (rather than packet-wise; see next
MUST mark subsequent packets (after the congestion notification) with paragraph). After any congestion notification, a sender MUST mark
the respective ConEx bit in the IP header. Furthermore, a ConEx subsequent packets with the appropriate ConEx flag in the IP header.
sender must send enough credit to cover all experienced congestion Furthermore, a ConEx sender must send enough credit to cover all
for the connection so far, as well as the risk of congestion for the experienced congestion for the connection so far, as well as the risk
current transmission (see Section 4.2). of congestion for the current transmission (see Section 4.2).
With SACK only the number of lost payload bytes is known, but not the With SACK the number of lost payload bytes is known, but not the
number of packets carrying these bytes. With classic ECN only an number of packets carrying these bytes. With classic ECN only an
indication is given that a marking occurred but not the exact number indication is given that a marking occurred but not the exact number
of payload bytes nor packets. As network congestion is usually byte- of payload bytes nor packets. As network congestion is usually byte-
congestion [draft-briscoe-tsvwg-byte-pkt-mark], the exact number of congestion [RFC7141], the byte-size of a packet marked with a CDO
flag is defined to represent that number of bytes of congestion
signalling [draft-ietf-conex-destopt]. Therefore the exact number of
bytes should be taken into account, if available, to make the ConEx bytes should be taken into account, if available, to make the ConEx
signal as exact as possible. signal as exact as possible.
Detailed mechanisms for congestion accounting in each operation mode Detailed mechanisms for congestion accounting in each operation mode
are described in the next section. Further handling of the IPv6 bits are described in the next section.
itself if congestion was accounted is described in the subsequent
section afterwards.
3. Accounting congestion 3. Counting congestion
A ConEx sender maintains two counters: one that accounts congestion A ConEx TCP sender maintains two counters: one that counts congestion
based on the information retrived by loss detection, and a second based on the information retrieved by loss detection, and a second
that accounts for ECN based congestion feedback (in TCP). These that accounts for ECN based congestion feedback. These counters hold
counters hold the number of outstanding bytes that should be ConEx the number of outstanding bytes that should be ConEx marked with
marked either with the E bit or the L bit in subsequent packets. respectively the E flag or the L flag in subsequent packets.
The outstanding bytes for congestion indications based on loss are The outstanding bytes for congestion indications based on loss are
maintained in the loss exposure gauge (LEG) and the accounting is added to the loss exposure gauge (LEG), as explained in Section 3.1.
explained in Section 3.1.
The outstanding bytes accounted based on ECN feedback information are The outstanding bytes counted based on ECN feedback information are
maintained in the congestion exposure gauge (CEG). The accounting of added to the congestion exposure gauge (CEG)as explained in
these bytes from the ECN feedback is explained in more detail next in
Section 3.2. Section 3.2.
Furthermore, those counters will be reduced every time a ConEx When the sender sends a ConEx capable packet with the E or L flag set
capable packet with the E or L bit set is sent. This is explained it reduces the respective counter by the byte-size of the packet.
for both counters in Section 4.1. This is explained for both counters in Section 4.1.
Usually all bytes of an IP packet must be accounted. Therefore the Usually all bytes of an IP packet must be counted. Therefore the
sender SHOULD take the headers into account, too. If equal sized sender SHOULD take the payload and headers into account, up to and
packets, or at least equally distributed packet sizes can be assumed, including the IP header. Therefore, as well as the TCP payload
the sender MAY only account the TCP payload bytes. In this case bytes, an appropriate number of header bytes SHOULD be added to the
there should be about the same number of ConEx marked packets as the gauge for each packet of congestion feedback. And the sender SHOULD
original packets that were causing the congestion. Thus both contain subtract header bytes from the gauge for each marked packet sent.
about the same number of header bytes. This case is assumed for
simplification in the following sections.
Otherwise if this is not the case and a sender sends different sized If equal-sized packets, or at least equally distributed packet sizes
packets (with unequally distributed packet sizes), the sender needs can be assumed, the sender MAY only add and subtract TCP payload
to memorize or estimate the number of ECN-marked or lost packets. A bytes,. In this case there should be about the same number of ConEx
sender might be able to reconstruct the number of packets and thus marked packets as the original packets that were causing the
the header bytes if the packet sizes of all packets that were sent congestion. Thus both contain about the same number of header bytes
during the last RTT are known. Otherwise if no additional so they will cancel out. This case is assumed for simplicity in the
information is available the worst case number of packets and thus following sections.
header bytes should be estimated in a conservative way based on a
minimum packet size (of all packets sent in the last RTT). If the Otherwise, if a sender sends different sized packets (with unequally
number of ConEx marked packets is smaller (or larger) than the distributed packet sizes), the sender needs to memorize or estimate
estimated number of ECN-marked or lost packets, the additional header the number of lost or ECN-marked packets. A sender might be able to
bytes should the added to (or can be subtracted from) the respective reconstruct the number of packets and thus the header bytes if the
counter. packet sizes of all packets that were sent during the last RTT are
known. Otherwise, if no additional information is available, the
conservative or even worst case number of packets and thus header
bytes should be estimated, e.g. based on the minimum packet size (of
all packets sent in the last RTT). If the number of ConEx marked
packets is smaller (or larger) than the estimated number of lost or
ECN-marked packets, the additional header bytes should be added to
(or can be subtracted from) the respective counter.[CREF3]
3.1. Loss Detection 3.1. Loss Detection
3.1.1. General Approach
A ConEx sender MUST maintain a loss exposure gauge (LEG), indicating This section applies whether or not SACK support is available. The
the number of outstanding bytes that must be sent with the ConEx L following section deals with the case when SACK is not available.
bit. When a data segment is retransmitted, LEG will be increased by
the size of the TCP payload bytes contained by the retransmission, TCP feedback is designed so that the sender can detect losses in
assuming equal sized segments such that the retransmitted packet will order to retransmit the lost data. Therefore, it might be naively
have the same number of header bytes as the original ones. assumed that a TCP sender only needs to set the ConEx L flag on all
retransmissions in order to signal the amount of bytes lost.
However, this will not always be the case. Therefore the process of
loss detection is described here and separately the process of ConEx
marking is described in Section 4.1.[CREF4]
A ConEx sender needs to[CREF5] maintain a local signed counter that
shall be called the loss exposure gauge (LEG), indicating the number
of outstanding bytes to be sent with the ConEx L flag. When a TCP
sender decides that a data segment needs to be retransmitted, it will
increase LEG by the size of the TCP payload bytes in the
retransmission (assuming equal sized segments such that the
retransmitted packet will have the same number of header bytes as the
original ones).
Any retransmission may be spurious. To accommodate that, a ConEx Any retransmission may be spurious. To accommodate that, a ConEx
sender SHOULD make use of heuristics to detect such spurious sender SHOULD make use of heuristics to detect such spurious
retransmissions (e.g. F-RTO [RFC5682], DSACK [RFC3708], and Eifel retransmissions (e.g. F-RTO [RFC5682], DSACK [RFC3708], and Eifel
[RFC3522], [RFC4015]). When such a heuristic has determined, that a [RFC3522], [RFC4015]). When such a heuristic has determined that a
certain number of packets were retransmitted erroneously, the ConEx certain number of packets were retransmitted erroneously, the ConEx
sender should subtract the payload size of these TCP packets from sender SHOULD subtract the payload size of these TCP packets from
LEG. LEG.[CREF6]
3.1.1. Without SACK Support 3.1.2. Without SACK Support
If multiple losses occur within one RTT and SACK is not used, it may If multiple losses occur within one RTT and SACK is not used, it may
take several RTTs until all lost data is retransmitted. With the take several RTTs until all lost data is retransmitted. With the
scheme described above, the ConEx information will be delayed scheme described above, the ConEx information will be delayed
strongly but timeliness is important for ConEx. considerably, but timeliness is important for ConEx.
For ConEx it is not important to know which data got lost but only For ConEx it is not important to know which data got lost but only
how much. During the first RTT after the initial loss detection, the how much.[CREF7] During the first RTT after the initial loss
amount of received data and thus also the amount of lost data can be detection, the amount of received data and thus also the amount of
estimated based on the number of received ACKs. Thus without SACK, lost data can be estimated based on the number of received ACKs.
the needed information for the ConEx feedback can be available with Thus without SACK, the information needed for ConEx feedback can be
an additionally delay of one RTT by using the following estimation available with an additional delay of one RTT by using the following
algorithm and an additional Loss Estimation Counter (LEC): estimation algorithm and an additional Loss Estimation Counter (LEC):
flight_bytes: current flight size in bytes flight_bytes: current flight size in bytes
retransmit_bytes: payload size of the retransmission retransmit_bytes: payload size of the retransmission
At the first retransmission in a congestion event LEC is set: At the first retransmission in a congestion event LEC is set:
LEC = flight_bytes - 3*SMSS LEC = flight_bytes - 3*SMSS
(At this point of time in the transmission, in the worst case, (At this point of time in the transmission, in the worst case,
all packets in flight minus three that trigged the dupACks all packets in flight minus three that trigged the dupACks
skipping to change at page 7, line 13 skipping to change at page 7, line 40
that should be ConEx L marked.) that should be ConEx L marked.)
After the first RTT for each following retransmissions: After the first RTT for each following retransmissions:
if (LEC > 0): LEC -= retransmit_bytes if (LEC > 0): LEC -= retransmit_bytes
else if (LEC==0): LEG += retransmit_bytes else if (LEC==0): LEG += retransmit_bytes
if (LEC < 0): LEG += -LEC if (LEC < 0): LEG += -LEC
(The LEG is not increased for those bytes that were (The LEG is not increased for those bytes that were
already accounted.) already counted.)
3.2. ECN 3.2. ECN
ECN [RFC3168] is an IP/TCP mechanism that allows network nodes to ECN [RFC3168] is an IP/TCP mechanism that allows network nodes to
mark packets with the Congestion Experienced (CE) mark instead of mark packets with the Congestion Experienced (CE) mark instead of
(early) dropping them when congestion occurs. As soon as a CE mark dropping them when congestion occurs.
is seen at the receiver, with classic ECN it will feed this
information back to the sender by setting the Echo Congestion
Experienced (ECE) bit in the TCP header of all subsequent ACKs until
a packet with Congestion Window Reduced (CWR) bit in the TCP header
is received to acknowledge the reception of the congestion
notification. The sender sets the CWR bit in the TCP header once
when the first ECE of a congestion notification is received.
A receiver can support 'classic' ECN, a more accurate ECN feedback A receiver might support 'classic' ECN, the more accurate ECN
scheme, or neither. In the case ECN is not supported at all, of feedback scheme (AccECN), or neither. In the case that ECN is not
course, no ECN marks will occur, thus the E bit will never be set. supported for a connection, of course, no ECN marks will occur; thus
Otherwise, a ConEx sender must maintain a counter, the congestion the sender will never set the E flag. Otherwise, a ConEx sender must
exposure gauge (CEG), for the number of outstanding bytes that have maintain a signed counter, the congestion exposure gauge (CEG), for
to be ConEx marked with the E bit. the number of outstanding bytes that have to be ConEx marked with the
E flag.
The CEG is increased when ECN information is received from an ECN- The CEG is increased when ECN information is received from an ECN-
capable receiver supporting the 'classic' ECN scheme or the accurate capable receiver supporting the 'classic' ECN scheme or the accurate
ECN feedback scheme. When the ConEx sender receives an ACK ECN feedback scheme. When the ConEx sender receives an ACK
indicating one or more segments were received with a CE mark, CEG is indicating one or more segments were received with a CE mark, CEG is
increased by the appropriate number of bytes as described further increased by the appropriate number of bytes as described further
below. below.
Unfortunately in case of duplicate acknowledgements the number of Unfortunately in case of duplicate acknowledgements the number of
newly acknowledged bytes will be zero even though (CE marked) data newly acknowledged bytes will be zero even though (CE marked) data
has been received. Therefore, we increase the CEG by DeliveredData, has been received. Therefore, we increase the CEG by DeliveredData,
as defined below: as defined below:
DeliveredData = acked_bytes + SACK_diff + (is_dup)*1SMSS - DeliveredData = acked_bytes + SACK_diff + (is_dup)*1SMSS -
(is_after_dup)*num_dup*1SMSS (is_after_dup)*num_dup*1SMSS
DeliveredData covers the number of bytes which has been newly DeliveredData covers the number of bytes that has been newly
delivered to the receiver. Therefore on each arrival of an ACK, delivered to the receiver. Therefore on each arrival of an ACK,
DeliveredData will be increased by the newly acknowledged bytes DeliveredData will be increased by the newly acknowledged bytes
(acked_bytes) as indicated by the current ACK, relative to all past (acked_bytes) as indicated by the current ACK, relative to all past
ACKs. ACKs. The formula depends on whether SACK is available, as follows:
Moreover with SACK, DeliveredData is increased by the number of bytes With SACK: DeliveredData is increased by the number of bytes
provided by (new) SACK information (SACK_diff). Note, if less provided by (new) SACK information (SACK_diff). Note, if less
unacknowledged bytes are announced in the new SACK information than unacknowledged bytes are announced in the new SACK information
in the previous ACK, SACK_diff can be negative. In this case, data than in the previous ACK, SACK_diff can be negative. In this
is newly acknowledged (in acked_byte), that has previously already case, data is newly acknowledged (in acked_bytes), that has
been accounted to DeliveredData based on SACK information. previously already been accumulated into DeliveredData based on
SACK information.
Without SACK, DeliveredData is estimated to be 1 SMSS on duplicate Without SACK: DeliveredData is estimated to be 1 SMSS on duplicate
acknowledgements. For the subsequent partial or full ACK, acknowledgements. For the subsequent partial or full ACK,
DeliveredData is estimated to be the newly acknowledged bytes, minus DeliveredData is estimated to be the newly acknowledged bytes,
one SMSS for each preceding duplicate ACK. Therefore is_dup is one minus one SMSS for each preceding duplicate ACK. Therefore is_dup
if the current ACK is a duplicated ACK without SACK, and zero is one if the current ACK is a duplicated ACK without SACK, and
otherwise. is_after_dup is only one for the next full or partial ACK zero otherwise. is_after_dup is only one for the next full or
after a number of duplicated ACKs without SACK and num_dup counts the partial ACK after a number of duplicated ACKs without SACK and
number of duplicated ACKs in a row. num_dup counts the number of duplicated ACKs in a row.[CREF8]
The two cases, with and without more accurate ECN depending on the With classic ECN, as soon as a CE mark is seen at the receiver, it
receiver capability, are discussed in the following sections. will feed this information back to the sender by setting the Echo
Congestion Experienced (ECE) flag in the TCP header of subsequent
ACKs. Once the sender receives the first ECE of a congestion
notification, it sets the CWR flag in the TCP header once. When this
packet with Congestion Window Reduced (CWR) flag in the TCP header
arrives at the receiver, acknowledging its first ECE feedback, the
receiver stops setting ECE.
Thus, with classic ECN, one congestion marked packet causes
continuous congestion feedback for a whole round trip, thus hiding
the arrival of any further congestion marked packets during that
round trip. The more accurate ECN feedback scheme (AccECN) has been
defined to ensure that feedback properly reflects the extent of
congestion marking. The two cases, with and without a receiver
capable of AccECN, are discussed in the following sections.
3.2.1. Accurate ECN feedback 3.2.1. Accurate ECN feedback
With a more accurate ECN feedback scheme either the number of marked With the [CREF9] more accurate ECN feedback scheme (AccECN) either
packets/received CE marks or directly the number of marked bytes is the number of marked packets or the number of marked bytes is known.
known. In the later case the CEG can directly be increased by the In the latter case the CEG can directly be increased by the number of
number of marked bytes. Otherwise if D is assumed to be the number marked bytes. Otherwise if D is assumed to be the number of marks,
of marks, the gauge CEG will be conservatively increased by one SMSS the gauge (CEG) will be conservatively increased by one SMSS for each
for each marking or at max the number of newly acknowledged bytes: marking or at max the number of newly acknowledged bytes:
CEG += min(SMSS*D, DeliveredData) CEG += min(SMSS*D, DeliveredData)
3.2.2. Classic ECN support 3.2.2. Classic ECN support
If the ConEx sender fully conforms to the semantics of the ECN If the ConEx sender fully conforms to the semantics of ECN signaling
signaling as defined by [RFC5562], it will receive one full RTT of as defined by [RFC5562],[CREF10] it will receive one full RTT of ACKs
ACKs with the ECE flag set whenever at least one CE mark was received with the ECE flag set whenever at least one CE mark was received by
by the receiver. As the sender cannot estimate how much packets have the receiver. As the sender cannot estimate how many packets have
actually been CE marked during this RTT, the most conservative actually been CE marked during this RTT, the most conservative
assumption should be taken, namely assuming that all packets were assumption MAY be taken, namely assuming that all packets were
marked. This can be achieved by increasing the CEG by DeliveredData marked. This can be achieved by increasing the CEG by DeliveredData
for each ACK with the ECE flag: for each ACK with the ECE flag:
CEG += DeliveredData CEG += DeliveredData
Optionally a ConEx sender could implement an Advanced Compatibility Optionally a ConEx sender could implement the following technique,
Mode: called advanced compatibility mode, to considerably improve its
estimate of the number of ECN-marked packets:
To extract more than one ECE indication per RTT, a ConEx sender could To extract more than one ECE indication per RTT, a ConEx sender could
set the CWR flag opportunistically to force the receiver to signal set the CWR flag continuously to force the receiver to signal only
only one ECE per CE mark. Unfortunately, the use of delayed ACKs one ECE per CE mark. Unfortunately, the use of delayed ACKs
[RFC5681], as it is usually done today, will prevent a feedback of [RFC5681] (which is common) will prevent feedback of every CE mark;
every CE mark. If an CWR confirmation will be received before the if a CWR confirmation is received before the ECE can be sent out on
ECE can be sent out with the next ACK, ECN feedback information the next ACK, ECN feedback information could get lost. Thus a sender
information could get lost. Thus a sender should set CWR only on SHOULD set CWR only on those data segments that will actually trigger
those data segments, that will actually trigger a (delayed) ACK. The a (delayed) ACK. The sender would need an additional control loop to
sender would need an additional control loop to estimated which data estimated which data segments will trigger an ACK in order to extract
segment will trigger an ACK. But such a more sophisticated more timely congestion notifications. Still the CEG SHOULD be
heuristics could extract congestion notifications more timely. Still increased by DeliveredData, as one or more CE marked packets could be
the CEG need to be increased by DeliveredData, as one or more CE acknowledged by one delayed ACK.
marked packets could be acknowledged by one delayed ACK.
The repetition of ECE in classic ECN is intended to ensure reliable
delivery of congestion feedback. The following argument is intended
to prove that suppressing repetitions of ECE is safe against possible
congestion collapse due to lost congestion feedback.
With advanced compatibility mode, if an ACK containing ECE is lost,
the continual CWRs prevent it being repeated, so it will remain lost.
Therefore, if congestion is light on the forward path and heavy on
the reverse, most of the light congestion signals will be lost. If
loss of feedback exacerbates congestion on the forward path, more
forward packets will be CE marked, increasing the likelihood that
feedback from at least one CE will get through per RTT. As long as
one ECE reaches the sender per RTT, the sender's congestion response
will be the same as if CWR were not continuous. The only way that
heavy congestion on the forward path could be completely hidden would
be if all ACKs on the reverse path were lost. If total ACK loss
persisted, the sender would time out and do a congestion response
anyway.Therefore, the problem seems confined to potential suppression
of a congestion response during light congestion.
Anyway, even if loss of all ECN feedback led to no congestion
response, the worst that could happen would be loss instead of ECN-
signalled congestion on the forward path. Given compatibility mode
does not affect loss feedback, there would be no risk of congestion
collapse.
4. Setting the ConEx Bits 4. Setting the ConEx Bits
By setting the X bit a packet is marked as ConEx-capable. All By setting the X flag, a packet is marked as ConEx-capable. All
packets carrying payload MUST be marked with the X bit set including packets carrying payload MUST be marked with the X flag set,
retransmissions. No congestion feedback information are available including retransmissions. No congestion feedback information is
about control packets such as pure ACKs which are not carrying any available about control packets such as pure ACKs which are not
payload. Thus these packets should not be taken into account when carrying any payload. Thus these packets should not be taken into
determining ConEx information. These packet MUST carry a ConEx account when determining ConEx information. These packet MUST carry
Destination Option with the X bit unset. a ConEx Destination Option with the X flag unset.[CREF11]
4.1. Setting the E and the L Bit 4.1. Setting the E or the L Flag
As long as the CEG or LEG counter is positive, ConEx-capable packets As long as the LEG or CEG counter is positive, the sender MUST mark
SHOULD be marked with E or L respectively, and the CEG or LEG counter each ConEx-capable packet with L or E respectively, and decrease the
is decreased by the TCP payload bytes carried in this packet. If the LEG or CEG counter by the TCP payload bytes carried in the marked
CEG or LEG counter is negative, the respective counter SHOULD be packet (assuming headers are not being counted because packet sizes
reset to zero within one RTT after it was decreased the last time or are regular). No matter how small the value of LEG or CEG, if it is
one RTT after recovery if no further congestion occurred. positive, to ensure ConEx signals are timely, the sender MUST NOT
defer packet marking. Therefore the value of LEG and CEG will
commonly be negative.
If SACK information is not available spurious retransmission are more Multiple ConEx flags may be required for signaling at the same time.
likely. In this case it might be valuable to slightly delay the This may happen, for example, during excessive congestion when an ACK
ConEx loss feedback until a spurious retransmission might be is received by the sender that simultaneously indicates that at least
detected. But the ConEx signal MUST NOT be delayed more than one RTT one segment has been lost, and that one or more ECN marks were
if as long as data packets are sent out. received. Another case when this might happen is when ACKs are lost,
so that a subsequent ACK carries summary information not previously
available to the sender.
4.2. Credit Bits Whenever both LEG and CEG are positive, the sender MUST mark each
ConEx-capable packet with both L and E. If a credit signal is also
pending (see Section 4.2), the C flag can be set as well.
The ConEx abstract mechanism requires that sufficient credit must be 4.2. Setting the Credit Flag
signaled in advance to cover the expected congestion during the
feedback delay of one RTT. A ConEx sender should maintain a counter
of the sent credits c in bytes. If congestion occurs, credits will
be consumed and the c counter should be reduced by the number of
bytes that where lost or estimated to be ECN-marked. If the risk of
congestion was estimated wrongly and thus too few credits were sent,
the c counter becomes zero but can not get negative.
The number of credits sent should always equal the number of bytes in The ConEx abstract mechanism [draft-ietf-conex-abstract-mech]
flight, as all packets could potentially get lost or congestion requires that sufficient credit must be signaled in advance to cover
marked. Thus a ConEx sender should monitor the number of bytes in the expected congestion during the feedback delay of one RTT.
flight f. If f ever becomes larger than c, the ConEx sender SHOULD
send new credits. Remember that c will be decreased if congestion
occurs.
In TCP Slow Start, the congestion window might grow much larger than This section proposes concrete algorithms for determining how much
during the rest of the transmission. Thus a sender could consider to credit to signal during congestion avoidance and slow start.
sent fewer than f credits but risking potential penalization by an However, experimentation in better credit setting algorithms is
audit. In any case the credits should at least cover the increase in expected and encouraged. The wider goal of ConEx is to reflect the
sending rate. As the sending rate increases exponentially in Slow 'cost' of the risk of causing congestion on those that contribute
Start, thus double every RTT, a ConEx sender should at least cover most to it. Thus, experimentation is encouraged in better ways to
half the number of packets in flight by credits. Note, that the improve or maintain performance while reducing the risk of causing
number of losses or markings within one RTT does not only depend congestion, and therefore reducing the need to signal so much credit.
actions taken by the sender. In general, the behavior of the cross
traffic, and if Active Queue Management (AQM) is used, the respective
parameterization influence how many packets get dropped or marked.
But if the used AQM is not overly aggressive with ECN marking,
sending halve the flight size as credits should be sufficient for
both, congestion signaled by loss or ECN. Marking every fourth
packet will allow the respective number of credits in Slow Start as
it can be seen in Figure Figure 1.
RTT1 |------XC------>| For a simple credit algorithm, a ConEx sender SHOULD maintain a
|------X------->| counter of the sent credits c in bytes. If congestion occurs,
|------X------->| credit=1 in_flight=3 credits will be consumed and the c counter SHOULD be reduced by the
number of bytes that where lost or estimated to be ECN-marked. If
the risk of congestion was estimated wrongly and thus too few credits
were sent, the c counter becomes zero but cannot go negative.
During TCP congestion avoidance, the amount of credit sent SHOULD
exceed the amount of congestion experienced by at least the number of
bytes in flight, as all packets could potentially get lost or
congestion marked.[CREF12] Thus a ConEx sender should monitor the
number of bytes in flight f. Whenever f becomes larger than c, the
ConEx sender SHOULD set the C flag on each ConEx-capable packet and
increase c by the size of each marked packet until it is no less than
f again.
Recall that c will be decreased whenever congestion occurs, therefore
c will need to be replenished as soon as c drops below f. Also
recall that the sender can set the C flag on a ConEx-capable packet
whether or not the E or L flags are also set.
In TCP slow start, the congestion window might grow much larger than
during the rest of the transmission. Thus a sender could consider
sending fewer than f credits but risking being penalized by an audit
function. In any case the credits SHOULD at least cover the increase
in sending rate.[CREF13] Given the sending rate doubles every RTT in
Slow Start, a ConEx sender should at least cover half the number of
packets in flight by credits. Note that the number of losses or
markings within one RTT does not solely depend on the sender's
actions. In general, the behavior of the cross traffic, whether
active queue management (AQM) is used and how it is parameterized
influence how many packets might be dropped or marked. As long as
any AQM encountered is not overly aggressive with ECN marking,
sending half the flight size as credits should be sufficient whether
congestion is signaled by loss or ECN. Marking C on every second
packet in the initial window and every fourth packet in slow start
will introduce the correct amount of credit as can be seen in
Figure 1.[CREF14] This behaviour is most easily achieved by using the
following formula to update c as every packet is sent during slow
start:
c = (f+1)/2, using integer division.
f c=(f+1)/2
RTT1 |------XC------>| 1 1
|------X------->| 2 1
|------XC------>| 3 2
| | | |
RTT2 |------X------->| RTT2 |------X------->| 3 2
|------XC------>| |------X------->| 4 2
|------X------->| |------X------->| 4 2
|------X------->| |------XC------>| 5 3
|------X------->| |------X------->| 5 3
|------XC------>| credit=3 in_flight=6 |------X------->| 6 3
| | | |
RTT3 |------X------->| RTT3 |------X------->| 6 3
|------X------->| |------XC------>| 7 4
|------X------->| |------X------->| 7 4
|------XC------>| |------X------->| 8 4
|------X------->| |------X------->| 8 4
|------X------->| |------XC------>| 9 5
|------X------->| |------X------->| 9 5
|------XC------>| |------X------->| 10 5
|------X------->| |------X------->| 10 5
|------X------->| |------XC------>| 11 6
|------X------->| |------X------->| 11 6
|------XC------>| credit=6 in_flight=12 |------X------->| 12 6
| . | | . |
| : | | : |
Figure 1: Credits in Slow Start (with an initial window of 3) Figure 1: Credits in Slow Start (with an initial window of 3)
It is possible that the audit looses state due to e.g. rerouting or It is possible that a TCP flow will encounter an audit function
memory limitations. Therefore, the sender needs to detect this case without relevant flow state, due to e.g. rerouting or memory
and resend credits. Thus a ConEx sender should reset the credit limitations. Therefore, the sender needs to detect this case and
count c to zero if losses occur in two subsequent RTTs (assuming that resend credits. Thus a ConEx sender should reset the credit count c
the sending rate was correctly reduced based on the received to zero if losses occur in two subsequent RTTs (assuming that the
congestion signal). sending rate was correctly reduced based on the received congestion
signal). [CREF15]
5. Loss of ConEx information 5. Loss of ConEx information
Packets carrying ConEx can also get lost. A ConEx sender must Packets carrying ConEx signals could be discarded themselves. This
remember which packet was marked with either the L, the E or the C will be a second order problem (e.g. if the loss probability is 0.1%,
bit. If one of these packets is detected to be lost, the should the probability of losing a loss signal will be 0.1% of 0.1% =
increase the respective gauge, LEG or CEG, by the number of lost 0.0001%). Therefore, an implementer MAY choose to ignore this
payload bytes. problem, accepting instead the risk that an audit function might
slightly increase the loss level (e.g. from 0.1000% to 0.1001%).
Nonetheless, a ConEx sender SHOULD remember which packet was marked
with either the L, the E or the C flag. If one of these packets is
detected as lost, the sender SHOULD increase the respective gauge(s),
LEG or CEG, by the number of lost payload bytes in addition to
increasing LEG for the loss.
6. Timeliness of the ConEx Signals 6. Timeliness of the ConEx Signals
ConEx signals can only be evaluated by a network node with a time ConEx signals will only be useful to a network node within a time
delay of about one RTT after the congestion occured. To avoid delay of about one RTT after the congestion occurred. To avoid
further delays, a ConEx sender SHOULD sent the ConEx signaling with further delays, a ConEx sender SHOULD send the ConEx signaling on the
the next available packet. In cases where it is preferable to next available packet.
slightly delay the ConEx signal, the sender MUST NOT delay the ConEx
signal more than one RTT.
Multiple ConEx bits may become available for signaling at the same Any or all of the ConEx flags can be used in the same packet, which
time, for example when an ACK is received by the sender, that allows delay to be minimised when multiple signals are pending.
indicates at the same time that at least one segment has been lost,
and that one or more ECN marks were received. This may happen during If a flow becomes application-limited, there could be insufficient
excessive congestion, where the queues overflow even though ECN was bytes to send to reduce the gauges to zero or below. In such cases,
used and currently all packets are marked, while others have to be the sender cannot help but delay ConEx signals. Nonetheless, as long
dropped nevertheless. Another possibility when this may happen are as the sender is marking all outgoing packets, an audit function is
lost ACKs, so that a subsequent ACK carries summary information not unlikely to penalize ConEx-marked packets. Therefore, no matter how
previously available to the sender. As ConEx-capable packet can long a gauge has been positive, a sender MUST NOT reduce the gauge by
carry different ConEx marks at the same time, these information do more than the ConEx marked bytes it has sent.
not need to be distributed over several packets and thus can be sent
without further delay. If the CEG or LEG counter is negative, the respective counter SHOULD
be reset to zero within one RTT after it was decreased the last time
or one RTT after recovery if no further congestion occurred.
[CREF16]
If SACK information is not available spurious retransmission are more
likely. In this case it might be valuable to slightly delay the
ConEx loss feedback until a spurious retransmission might be
detected. But the ConEx signal MUST NOT be delayed more than one RTT
if as long as data packets are sent out.[CREF17]
7. Acknowledgements 7. Acknowledgements
The authors would like to thank Bob Briscoe who contributed with this The authors would like to thank Bob Briscoe who contributed with this
initial ideas and valuable feedback. Moreover, thanks to Jana initial ideas [I-D.briscoe-conex-re-ecn-tcp] and valuable feedback.
Iyengar who provided valuable feedback. Moreover, thanks to Jana Iyengar who provided valuable feedback.
8. IANA Considerations 8. IANA Considerations
This document does not have any requests to IANA. This document does not have any requests to IANA.
9. Security Considerations 9. Security Considerations
With some of the advanced ECN compatibility modes it is possible to General ConEx security considerations are covered extensively in the
miss congestion notifications. Thus a sender will not decrease its ConEx abstract mechanism [draft-ietf-conex-abstract-mech]. This
sending rate. If the congestion is persistent, the likelihood to section covers TCP-specific concerns.
receive a congestion notification increases. In the worst case the
sender will still react correctly to loss. This will prevent a The ConEx modifications to TCP provide no mechanism for a receiver to
congestion collapse. force a sender not to use ConEx. A receiver can degrade the accuracy
of ConEx by claiming that it does not support SACK, AccECN or ECN,
but the sender will never have to turn ConEx off. The receiver
cannot force the sender to have to mark ConEx more conservatively, in
order to cover the risk of any inaccuracy. Instead the sender can
choose to mark inaccurately, which will only increase the likelihood
of loss at an audit function. Thus the receiver will only harm
itself.
Assuming the sender is limited in some way by a congestion allowance
or quota, a receiver could spoof more loss or ECN congestion feedback
than it actually experiences, in an attempt to make the sender draw
down its allowance faster than necessary. However, over-declaring
congestion simply makes the sender slow down. If the receiver is
interested in the content it will not want to harm its own
performance.
However, if the receiver is solely interested in making the sender
draw down its allowance, the net effect will depend on the sender's
congestion control algorithm. With New Reno [RFC5681], doubling
congestion feedback causes the sender to consume sqrt(2) = 1.4 times
more congestion allowance. However, to improve scaling, congestion
control algorithms are tending towards less responsive algorithms
like Cubic or Compound TCP, and ultimately to linear algorithms like
DCTCP [DCTCP]. In each case, if the receiver doubles congestion
feedback, it causes the sender to respectively consume more allowance
by a factor of 1.2, 1.15 or 1, where 1 implies the attack has become
completely ineffective.
10. References 10. References
10.1. Normative References 10.1. Normative References
[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP [RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, October 1996. Selective Acknowledgment Options", RFC 2018, October 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
skipping to change at page 13, line 29 skipping to change at page 16, line 22
Destination Option for ConEx", draft-ietf-conex-destopt-04 Destination Option for ConEx", draft-ietf-conex-destopt-04
(work in progress), March 2013. (work in progress), March 2013.
10.2. Informative References 10.2. Informative References
[DCTCP] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel, [DCTCP] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel,
P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP: P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP:
Efficient Packet Transport for the Commoditized Data Efficient Packet Transport for the Commoditized Data
Center", Jan 2010. Center", Jan 2010.
[I-D.briscoe-tsvwg-re-ecn-tcp] [I-D.briscoe-conex-re-ecn-tcp]
Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith,
"Re-ECN: Adding Accountability for Causing Congestion to "Re-ECN: Adding Accountability for Causing Congestion to
TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-09 (work in TCP/IP", draft-briscoe-conex-re-ecn-tcp-04 (work in
progress), October 2010. progress), July 2014.
[RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm [RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm
for TCP", RFC 3522, April 2003. for TCP", RFC 3522, April 2003.
[RFC3708] Blanton, E. and M. Allman, "Using TCP Duplicate Selective [RFC3708] Blanton, E. and M. Allman, "Using TCP Duplicate Selective
Acknowledgement (DSACKs) and Stream Control Transmission Acknowledgement (DSACKs) and Stream Control Transmission
Protocol (SCTP) Duplicate Transmission Sequence Numbers Protocol (SCTP) Duplicate Transmission Sequence Numbers
(TSNs) to Detect Spurious Retransmissions", RFC 3708, (TSNs) to Detect Spurious Retransmissions", RFC 3708,
February 2004. February 2004.
skipping to change at page 14, line 14 skipping to change at page 17, line 5
[RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata, [RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
"Forward RTO-Recovery (F-RTO): An Algorithm for Detecting "Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
Spurious Retransmission Timeouts with TCP", RFC 5682, Spurious Retransmission Timeouts with TCP", RFC 5682,
September 2009. September 2009.
[RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion [RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion
Exposure (ConEx) Concepts and Use Cases", RFC 6789, Exposure (ConEx) Concepts and Use Cases", RFC 6789,
December 2012. December 2012.
[draft-briscoe-tsvwg-byte-pkt-mark] [RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion
Briscoe, B. and J. Manner, "Byte and Packet Congestion Notification", BCP 41, RFC 7141, February 2014.
Notification", draft-briscoe-tsvwg-byte-pkt-mark-010 (work
in progress), May 2013.
[draft-kuehlewind-tcpm-accurate-ecn] [draft-kuehlewind-tcpm-accurate-ecn]
Kuehlewind, M. and R. Scheffenegger, "More Accurate ECN Kuehlewind, M. and R. Scheffenegger, "More Accurate ECN
Feedback in TCP", draft-kuehlewind-tcpm-accurate-ecn-02 Feedback in TCP", draft-kuehlewind-tcpm-accurate-ecn-02
(work in progress), Jun 2013. (work in progress), Jun 2013.
Appendix A. Revision history Appendix A. Revision history
RFC Editior: This section is to be removed before RFC publication. RFC Editor: This section is to be removed before RFC publication.
00 ... initial draft, early submission to meet deadline. 00 ... initial draft, early submission to meet deadline.
01 ... refined draft, updated LEG "drain" from per-packet to RTT- 01 ... refined draft, updated LEG "drain" from per-packet to RTT-
based. based.
02 ... added Section 5 and expanded discussion about ECN interaction. 02 ... added Section 5 and expanded discussion about ECN interaction.
03 ... expanded the discussion around credit bits. 03 ... expanded the discussion around credit bits.
04 ... review comments of Jana addressed. (Change in full compliance 04 ... review comments of Jana addressed. (Change in full compliance
mode.) mode.)
05 ... changes on Loss Detection without SACK, support of classic ECN 05 ... changes on Loss Detection without SACK, support of classic ECN
and credit handling. and credit handling.
Editorial Comments
[CREF1] BB: 'finally" here would mean "At last (sigh), here's what
you've all been waiting for." :-)
[CREF2] BB: Avoid 'recommended', which could be confused with the
normative upper-cased word. The normative language later is
good and sufficient.
[CREF3] BB: I don't understand this last sentence. How does the sender
suddenly know something it didn't know before?
[CREF4] BB: I've added this sentence, but only to give you an excuse for
having devised all this mechanism. However, I really don't know
why you're going to all this trouble to be so accurate and
timely. TCP never retransmits less data than is lost. And over
the years TCP designers have been reducing the amount of
unnecessary retransmission, and reducing retransmission delay.
So I suggest we just mark retransmissions with the L flag.
Done! No need even for a loss exposure gauge. ...If the sender
is faced with insufficient information such that the universe of
TCP designers has been unable to minimise unnecessary or delayed
retransmissions, why try to do better than everyone has so far
managed? Just accept that you will be over-declaring or
sluggishly declaring ConEx. And assume that deployment of all
the techniques to reduce late or spurious losses is proceeding,
and we can walk on their shoulders.
[CREF5] BB: I suggest removing MUST, because we cannot mandate a
particular implementation technique.
[CREF6] BB: If these mechanisms are being used, surely they will be
being used to /prevent/ spurious retransmissions (not just count
them but still retransmit anyway). So, if we increase LEG only
when a retransmission actually occurs, is that not sufficient?
[CREF7] BB: OK, I get that. But, as above, why worry about optimising a
case that is becoming rare, because everyone recognised late
retransmission was a problem, so SACK is pretty much universally
deployed. Would you be unhappy if all this was deleted?
Perhaps relegate to an appendix? But is it really so necessary?
[CREF8] BB: I think 3 has been used instead of num_dup in the LEC
algorithm earlier.
[CREF9] BB: I changed 'a' to 'the'. Did you mean a generally more
accurate scheme, or the AccECN scheme in particular? If the
latter, as it stands, the AccECN scheme doesn't give marked
bytes.
[CREF10] BB: Surely RFC5562 only adds ECT on the SYN/ACK. Is it really
necessary to even refer to it in this draft? Whatever, it
doesn't seem particularly relevant to this sentence. Or did
you mean RFC3168?
[CREF11] BB: I thought the result of the discussion about how to say
whether the X flag is set in conex-destopt was that X is set
irrespective of whether loss or ECN marking of the packet
itself can be detected. The relevant sentence in conex-destopt
is: "This [X=0] can be the case if no congestion feedback is
(currently) available e.g. in TCP if one endpoint has been
receiving data but sending nothing but pure ACKs (no user data)
for some time."
[CREF12] BB: I would prefer if this were stated at the maximum required,
not a recommended value. The idea is to hold as much credit as
the /likely/ worst-case congestion, not the /absolute/ worst
case (I did experiments to find the variance of congestion in
my PhD).
[CREF13] BB: Again, rather than a SHOULD, can we make this a
recommendation that is part of the reason for ConEx
experimentation? - especially if variants like hybrid SS are
enabled.
[CREF14] BB: Just marking every fourth packet doesn't work for a general
IW. During the IW, mark the first packet and every other
packet, then after IW mark every fourth packet (to determine
precisely which is the first packet to mark after the IW,
maintain a packet counter and double it when IW ends).
[CREF15] BB: Whoa! This is rather excessively conservative isn't it?
There will often be a loss in 2 consecutive RTTs due to normal
congestion. If there's a re-route, I think the new audit will
drop a whole window, so the sender will naturally send a whole
window's worth of credit with the retransmissions. Am I wrong?
[CREF16] BB: This adds complexity. I would suggest this is a MAY. It
depends on how audit is done whether it is necessary, so this
will depend on experiments. For instance, in the audit
function I designed, there was a long term and a short term
comparison, and the long term one became more relaxed the
longer the flow had been behaving. (Note I have also suggested
moving this and the next para from "Setting E/L" to
"Timeliness")
[CREF17] BB: As before, I disagree with the need for this para - this is
trying to optimise a case that is rare because it's known to be
sub-optimal, by compromising ConEx timeliness. SACK is nearly
universal .If SACK isn't available, things are bound to be non-
optimal. The solution is for the receiver to deploy SACK like
nearly every other receiver has done, not to add more
complexity to the sender and more delay to ConEx.
Authors' Addresses Authors' Addresses
Mirja Kuehlewind (editor) Mirja Kuehlewind (editor)
ETH Zurich ETH Zurich
Switzerland Switzerland
Email: mirja.kuehlewind@tik.ee.ethz.ch Email: mirja.kuehlewind@tik.ee.ethz.ch
Richard Scheffenegger Richard Scheffenegger
NetApp, Inc. NetApp, Inc.
 End of changes. 74 change blocks. 
314 lines changed or deleted 547 lines changed or added

This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/
X-Generator: pyht 0.35