Diff: draft-ietf-conex-tcp-modifications-07.txt - draft-ietf-conex-tcp-modifications-07-bb.txt

	< draft-ietf-conex-tcp-modifications-07.txt	draft-ietf-conex-tcp-modifications-07-bb.txt >

	Congestion Exposure (ConEx) M. Kuehlewind, Ed.	Congestion Exposure (ConEx) M. Kuehlewind, Ed.
	Internet-Draft ETH Zurich	Internet-Draft ETH Zurich
	Intended status: Experimental R. Scheffenegger	Intended status: Experimental R. Scheffenegger

	Expires: August 18, 2015 NetApp, Inc.	Expires: September 9, 2015 NetApp, Inc.
	February 14, 2015	March 8, 2015

	TCP modifications for Congestion Exposure	TCP modifications for Congestion Exposure
	draft-ietf-conex-tcp-modifications-07	draft-ietf-conex-tcp-modifications-07

	Abstract	Abstract

	Congestion Exposure (ConEx) is a mechanism by which senders inform	Congestion Exposure (ConEx) is a mechanism by which senders inform

	the network about the congestion encountered by previous packets on	the network about expected congestion based on congestion feedback
	the same flow. This document describes the necessary modifications	from previous packets in the same flow. This document describes the
	to use ConEx with the Transmission Control Protocol (TCP).	necessary modifications to use ConEx with the Transmission Control
		Protocol (TCP).

	Status of This Memo	Status of This Memo

	This Internet-Draft is submitted in full conformance with the	This Internet-Draft is submitted in full conformance with the
	provisions of BCP 78 and BCP 79.	provisions of BCP 78 and BCP 79.

	Internet-Drafts are working documents of the Internet Engineering	Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF). Note that other groups may also distribute	Task Force (IETF). Note that other groups may also distribute
	working documents as Internet-Drafts. The list of current Internet-	working documents as Internet-Drafts. The list of current Internet-
	Drafts is at http://datatracker.ietf.org/drafts/current/.	Drafts is at http://datatracker.ietf.org/drafts/current/.

	Internet-Drafts are draft documents valid for a maximum of six months	Internet-Drafts are draft documents valid for a maximum of six months
	and may be updated, replaced, or obsoleted by other documents at any	and may be updated, replaced, or obsoleted by other documents at any
	time. It is inappropriate to use Internet-Drafts as reference	time. It is inappropriate to use Internet-Drafts as reference
	material or to cite them other than as "work in progress."	material or to cite them other than as "work in progress."


	This Internet-Draft will expire on August 18, 2015.	This Internet-Draft will expire on September 9, 2015.

	Copyright Notice	Copyright Notice

	Copyright (c) 2015 IETF Trust and the persons identified as the	Copyright (c) 2015 IETF Trust and the persons identified as the
	document authors. All rights reserved.	document authors. All rights reserved.

	This document is subject to BCP 78 and the IETF Trust's Legal	This document is subject to BCP 78 and the IETF Trust's Legal
	Provisions Relating to IETF Documents	Provisions Relating to IETF Documents
	(http://trustee.ietf.org/license-info) in effect on the date of	(http://trustee.ietf.org/license-info) in effect on the date of
	publication of this document. Please review these documents	publication of this document. Please review these documents
	carefully, as they describe your rights and restrictions with respect	carefully, as they describe your rights and restrictions with respect
	to this document. Code Components extracted from this document must	to this document. Code Components extracted from this document must
	include Simplified BSD License text as described in Section 4.e of	include Simplified BSD License text as described in Section 4.e of
	the Trust Legal Provisions and are provided without warranty as	the Trust Legal Provisions and are provided without warranty as
	described in the Simplified BSD License.	described in the Simplified BSD License.

	Table of Contents	Table of Contents


	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 18
	1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3	1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
	2. Sender-side Modifications . . . . . . . . . . . . . . . . . . 3	2. Sender-side Modifications . . . . . . . . . . . . . . . . . . 3

	3. Accounting congestion . . . . . . . . . . . . . . . . . . . . 4	3. Counting congestion . . . . . . . . . . . . . . . . . . . . . 4
	3.1. Loss Detection . . . . . . . . . . . . . . . . . . . . . 5	3.1. Loss Detection . . . . . . . . . . . . . . . . . . . . . 5

	3.1.1. Without SACK Support . . . . . . . . . . . . . . . . 6	3.1.1. General Approach . . . . . . . . . . . . . . . . . . 6
		3.1.2. Without SACK Support . . . . . . . . . . . . . . . . 6
	3.2. ECN . . . . . . . . . . . . . . . . . . . . . . . . . . . 7	3.2. ECN . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

	3.2.1. Accurate ECN feedback . . . . . . . . . . . . . . . . 8	3.2.1. Accurate ECN feedback . . . . . . . . . . . . . . . . 9
	3.2.2. Classic ECN support . . . . . . . . . . . . . . . . . 8	3.2.2. Classic ECN support . . . . . . . . . . . . . . . . . 9
	4. Setting the ConEx Bits . . . . . . . . . . . . . . . . . . . 9	4. Setting the ConEx Bits . . . . . . . . . . . . . . . . . . . 10
	4.1. Setting the E and the L Bit . . . . . . . . . . . . . . . 9	4.1. Setting the E or the L Flag . . . . . . . . . . . . . . . 10
	4.2. Credit Bits . . . . . . . . . . . . . . . . . . . . . . . 9	4.2. Setting the Credit Flag . . . . . . . . . . . . . . . . . 11
	5. Loss of ConEx information . . . . . . . . . . . . . . . . . . 11	5. Loss of ConEx information . . . . . . . . . . . . . . . . . . 13
	6. Timeliness of the ConEx Signals . . . . . . . . . . . . . . . 11	6. Timeliness of the ConEx Signals . . . . . . . . . . . . . . . 14
	7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 12	7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 14
	8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12	8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
	9. Security Considerations . . . . . . . . . . . . . . . . . . . 12	9. Security Considerations . . . . . . . . . . . . . . . . . . . 14
	10. References . . . . . . . . . . . . . . . . . . . . . . . . . 12	10. References . . . . . . . . . . . . . . . . . . . . . . . . . 15
	10.1. Normative References . . . . . . . . . . . . . . . . . . 12	10.1. Normative References . . . . . . . . . . . . . . . . . . 15
	10.2. Informative References . . . . . . . . . . . . . . . . . 13	10.2. Informative References . . . . . . . . . . . . . . . . . 16
	Appendix A. Revision history . . . . . . . . . . . . . . . . . . 14	Appendix A. Revision history . . . . . . . . . . . . . . . . . . 17
	Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 15	Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 20

	1. Introduction	1. Introduction

	Congestion Exposure (ConEx) is a mechanism by which senders inform	Congestion Exposure (ConEx) is a mechanism by which senders inform

	the network about the congestion encountered by previous packets on	the network about expected congestion based on congestion feedback
	the same flow. ConEx concepts and use cases are further explained in	from previous packets in the same flow. ConEx concepts and use cases
	[RFC6789]. The abstract ConEx mechanism is explained in	are further explained in [RFC6789]. The abstract ConEx mechanism is
	[draft-ietf-conex-abstract-mech]. This document describes the	explained in [draft-ietf-conex-abstract-mech]. This document
	necessary modifications to use ConEx with the Transmission Control	describes the necessary modifications to use ConEx with the
	Protocol (TCP).	Transmission Control Protocol (TCP).


	The needed markings to provide ConEx signaling are defined in the	The markings for ConEx signaling are defined in the ConEx Destination
	ConEx Destination Option (CDO) for IPv6 [draft-ietf-conex-destopt].	Option (CDO) for IPv6 [draft-ietf-conex-destopt]. Specifically, the
	Specifically, the use of four bits are defined: the X (ConEx-	use of four flags are defined: X (ConEx-capable), L (loss
	capable), the L (loss experienced), the E (ECN experienced) and C	experienced), E (ECN experienced) and C (credit).
	(credit) bit.

	ConEx signaling is based on loss or Explicit Congestion Notification	ConEx signaling is based on loss or Explicit Congestion Notification

	(ECN) marks [RFC3168] as a congestion indication. This congestion	(ECN) marks [RFC3168] as congestion indications. The sender collects
	information is retrieved by the sender based on existing feedback	this congestion information based on existing TCP feedback mechanisms
	mechanisms from the receiver to the sender in TCP. No changes are	from the receiver to the sender. No changes are needed at the
	needed at the receiver to implement ConEx signaling. Therefore no	receiver to implement ConEx signaling. Therefore no additional
	additional negotiation is needed to implement and use ConEx at the	negotiation is needed to implement and use ConEx at the sender. This
	sender. This document specifies actions needed by sender to provide	document specifies the sender's actions that are needed to provide
	meaningful ConEx information to the network.	meaningful ConEx information to the network.


	Section 2 provides an overview of the needed modifications for TCP	Section 2 provides an overview of the modifications needed for TCP
	senders to implement ConEx. First congestion information have to be	senders to implement ConEx. First congestion information has to be
	extracted from loss or ECN feedback in TCP as described in section 3.	extracted from TCP's loss or ECN feedback as described in section 3.
	Section 4 details how to set the CDO marking based on the accounted	Section 4 details how to set the CDO marking based on this congestion
	congestion information. Section 6 finally discusses timeliness of	information. Section 5 discusses loss of packets carrying ConEx
	the ConEx feedback signal as congestion is a temporary state.	information. Section 6 [CREF1]discusses timeliness of the ConEx
		feedback signal, given congestion is a temporary state.


	This document describes congestion accounting for both TCP with and	This document describes congestion accounting for TCP with and
	without the Selective Acknowledgment (SACK) extension [RFC2018] in	without the Selective Acknowledgment (SACK) extension [RFC2018] (in
	section 3.1. However, ConEx benefits from more accurate information	section 3.1). However, ConEx benefits from the more accurate
	about the number of packets dropped in the network. It is therefore	information that SACK provides about the number of bytes dropped in
	recommended to use the SACK extension when using TCP with ConEx. The	the network. It is therefore preferable[CREF2] to use the SACK
	detailed mechanism to respectively set the L bit in response to loss-	extension when using TCP with ConEx. The detailed mechanism to set
	based congestion feedback signal is given in section 4.1.	the L flag in response to loss-based congestion feedback signal is
		given in section 4.1.


	While loss-based congestion feedback should be minimized, ECN could	Whereas loss has to be minimized, ECN can provide more fine-grained
	actually provide more fine-grained feedback information. ConEx-based	feedback information. ConEx-based traffic measurement or management
	traffic measurement or management mechanisms would benefit from this.	mechanisms could benefit from this. Unfortunately, the current ECN
	Unfortunately, the current ECN feedback mechanism does not reflect	feedback mechanism does not reflect multiple congestion markings if
	multiple congestion markings which occur within the same Round-Trip	they occur within the same Round-Trip Time (RTT). A more accurate
	Time (RTT). A more accurate feedback extension to ECN is proposed in	feedback extension to ECN (AccECN) is proposed in a separate document
	a separate document [draft-kuehlewind-tcpm-accurate-ecn], as this is	[draft-kuehlewind-tcpm-accurate-ecn], as this is also useful for
	also useful for other mechanisms.	other mechanisms.


	The congestion accounting for both, with the classic ECN feedback as	Congestion accounting for both classic ECN feedback and AccECN
	well as a more accurate ECN feedback are explained in detail in	feedback is explained in detail in section 3.2. Setting the E flag
	section 3.2 while the setting of the E bit in response to ECN-based	in response to ECN-based congestion feedback is again detailed in
	congestion feedback is again detailed in section 4.1.	section 4.1.

	1.1. Requirements Language	1.1. Requirements Language

	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
	document are to be interpreted as described in [RFC2119].	document are to be interpreted as described in [RFC2119].

	2. Sender-side Modifications	2. Sender-side Modifications

	This section gives an overview of actions that need to be taken by a	This section gives an overview of actions that need to be taken by a

	TCP sender that would like to use ConEx signaling.	TCP sender modified to use ConEx signaling.


	A ConEx sender MUST negotiate for both SACK and ECN or the more	In the TCP handshake, a ConEx sender MUST negotiate for SACK and ECN
	accurate ECN feedback in the TCP handshake if these TCP extension are	preferably with AccECN feedback. Therefore a ConEx sender MUST also
	available at the sender. Therefore a ConEx sender SHOULD also
	implement SACK and ECN. Depending on the capability of the receiver,	implement SACK and ECN. Depending on the capability of the receiver,
	the following operation modes exist:	the following operation modes exist:


	o SACK-accECN-ConEx (SACK and accurate ECN feedback)	+------+-----+
		\| SACK \| ECN \|
	o accECN-ConEx (no SACK but accurate ECN feedback)	+------+-----+
		\| S \| A \|
	o ECN-ConEx (no SACK and no accurate ECN feedback but 'classic' ECN)	\| S \| C \|
		\| S \| - \|
	o SACK-ECN-ConEx (SACK and 'classic' instead of accurate ECN)	\| - \| A \|
		\| - \| C \|
		\| - \| - \|
		+------+-----+


	o SACK-ConEx (SACK but no ECN at all)	S: SACK enabled; A: AccECN enabled; C: Classic ECN [RFC3168] enabled


	o Basic-ConEx (neither SACK nor ECN)	Table 1: ConEx modes.

	A ConEx sender MUST expose all congestion information to the network	A ConEx sender MUST expose all congestion information to the network
	according to the congestion information received by ECN or based on	according to the congestion information received by ECN or based on
	loss information provided by the TCP feedback loop. A TCP sender	loss information provided by the TCP feedback loop. A TCP sender

	SHOULD account congestion byte-wise (and not packet-wise). A sender	SHOULD count congestion byte-wise (rather than packet-wise; see next
	MUST mark subsequent packets (after the congestion notification) with	paragraph). After any congestion notification, a sender MUST mark
	the respective ConEx bit in the IP header. Furthermore, a ConEx	subsequent packets with the appropriate ConEx flag in the IP header.
	sender must send enough credit to cover all experienced congestion	Furthermore, a ConEx sender must send enough credit to cover all
	for the connection so far, as well as the risk of congestion for the	experienced congestion for the connection so far, as well as the risk
	current transmission (see Section 4.2).	of congestion for the current transmission (see Section 4.2).


	With SACK only the number of lost payload bytes is known, but not the	With SACK the number of lost payload bytes is known, but not the
	number of packets carrying these bytes. With classic ECN only an	number of packets carrying these bytes. With classic ECN only an
	indication is given that a marking occurred but not the exact number	indication is given that a marking occurred but not the exact number
	of payload bytes nor packets. As network congestion is usually byte-	of payload bytes nor packets. As network congestion is usually byte-

	congestion [draft-briscoe-tsvwg-byte-pkt-mark], the exact number of	congestion [RFC7141], the byte-size of a packet marked with a CDO
		flag is defined to represent that number of bytes of congestion
		signalling [draft-ietf-conex-destopt]. Therefore the exact number of
	bytes should be taken into account, if available, to make the ConEx	bytes should be taken into account, if available, to make the ConEx
	signal as exact as possible.	signal as exact as possible.

	Detailed mechanisms for congestion accounting in each operation mode	Detailed mechanisms for congestion accounting in each operation mode

	are described in the next section. Further handling of the IPv6 bits	are described in the next section.
	itself if congestion was accounted is described in the subsequent
	section afterwards.


	3. Accounting congestion	3. Counting congestion


	A ConEx sender maintains two counters: one that accounts congestion	A ConEx TCP sender maintains two counters: one that counts congestion
	based on the information retrived by loss detection, and a second	based on the information retrieved by loss detection, and a second
	that accounts for ECN based congestion feedback (in TCP). These	that accounts for ECN based congestion feedback. These counters hold
	counters hold the number of outstanding bytes that should be ConEx	the number of outstanding bytes that should be ConEx marked with
	marked either with the E bit or the L bit in subsequent packets.	respectively the E flag or the L flag in subsequent packets.

	The outstanding bytes for congestion indications based on loss are	The outstanding bytes for congestion indications based on loss are

	maintained in the loss exposure gauge (LEG) and the accounting is	added to the loss exposure gauge (LEG), as explained in Section 3.1.
	explained in Section 3.1.


	The outstanding bytes accounted based on ECN feedback information are	The outstanding bytes counted based on ECN feedback information are
	maintained in the congestion exposure gauge (CEG). The accounting of	added to the congestion exposure gauge (CEG)as explained in
	these bytes from the ECN feedback is explained in more detail next in
	Section 3.2.	Section 3.2.


	Furthermore, those counters will be reduced every time a ConEx	When the sender sends a ConEx capable packet with the E or L flag set
	capable packet with the E or L bit set is sent. This is explained	it reduces the respective counter by the byte-size of the packet.
	for both counters in Section 4.1.	This is explained for both counters in Section 4.1.


	Usually all bytes of an IP packet must be accounted. Therefore the	Usually all bytes of an IP packet must be counted. Therefore the
	sender SHOULD take the headers into account, too. If equal sized	sender SHOULD take the payload and headers into account, up to and
	packets, or at least equally distributed packet sizes can be assumed,	including the IP header. Therefore, as well as the TCP payload
	the sender MAY only account the TCP payload bytes. In this case	bytes, an appropriate number of header bytes SHOULD be added to the
	there should be about the same number of ConEx marked packets as the	gauge for each packet of congestion feedback. And the sender SHOULD
	original packets that were causing the congestion. Thus both contain	subtract header bytes from the gauge for each marked packet sent.
	about the same number of header bytes. This case is assumed for
	simplification in the following sections.


	Otherwise if this is not the case and a sender sends different sized	If equal-sized packets, or at least equally distributed packet sizes
	packets (with unequally distributed packet sizes), the sender needs	can be assumed, the sender MAY only add and subtract TCP payload
	to memorize or estimate the number of ECN-marked or lost packets. A	bytes,. In this case there should be about the same number of ConEx
	sender might be able to reconstruct the number of packets and thus	marked packets as the original packets that were causing the
	the header bytes if the packet sizes of all packets that were sent	congestion. Thus both contain about the same number of header bytes
	during the last RTT are known. Otherwise if no additional	so they will cancel out. This case is assumed for simplicity in the
	information is available the worst case number of packets and thus	following sections.
	header bytes should be estimated in a conservative way based on a
	minimum packet size (of all packets sent in the last RTT). If the	Otherwise, if a sender sends different sized packets (with unequally
	number of ConEx marked packets is smaller (or larger) than the	distributed packet sizes), the sender needs to memorize or estimate
	estimated number of ECN-marked or lost packets, the additional header	the number of lost or ECN-marked packets. A sender might be able to
	bytes should the added to (or can be subtracted from) the respective	reconstruct the number of packets and thus the header bytes if the
	counter.	packet sizes of all packets that were sent during the last RTT are
		known. Otherwise, if no additional information is available, the
		conservative or even worst case number of packets and thus header
		bytes should be estimated, e.g. based on the minimum packet size (of
		all packets sent in the last RTT). If the number of ConEx marked
		packets is smaller (or larger) than the estimated number of lost or
		ECN-marked packets, the additional header bytes should be added to
		(or can be subtracted from) the respective counter.[CREF3]

	3.1. Loss Detection	3.1. Loss Detection

		3.1.1. General Approach


	A ConEx sender MUST maintain a loss exposure gauge (LEG), indicating	This section applies whether or not SACK support is available. The
	the number of outstanding bytes that must be sent with the ConEx L	following section deals with the case when SACK is not available.
	bit. When a data segment is retransmitted, LEG will be increased by
	the size of the TCP payload bytes contained by the retransmission,	TCP feedback is designed so that the sender can detect losses in
	assuming equal sized segments such that the retransmitted packet will	order to retransmit the lost data. Therefore, it might be naively
	have the same number of header bytes as the original ones.	assumed that a TCP sender only needs to set the ConEx L flag on all
		retransmissions in order to signal the amount of bytes lost.
		However, this will not always be the case. Therefore the process of
		loss detection is described here and separately the process of ConEx
		marking is described in Section 4.1.[CREF4]

		A ConEx sender needs to[CREF5] maintain a local signed counter that
		shall be called the loss exposure gauge (LEG), indicating the number
		of outstanding bytes to be sent with the ConEx L flag. When a TCP
		sender decides that a data segment needs to be retransmitted, it will
		increase LEG by the size of the TCP payload bytes in the
		retransmission (assuming equal sized segments such that the
		retransmitted packet will have the same number of header bytes as the
		original ones).

	Any retransmission may be spurious. To accommodate that, a ConEx	Any retransmission may be spurious. To accommodate that, a ConEx
	sender SHOULD make use of heuristics to detect such spurious	sender SHOULD make use of heuristics to detect such spurious
	retransmissions (e.g. F-RTO [RFC5682], DSACK [RFC3708], and Eifel	retransmissions (e.g. F-RTO [RFC5682], DSACK [RFC3708], and Eifel

	[RFC3522], [RFC4015]). When such a heuristic has determined, that a	[RFC3522], [RFC4015]). When such a heuristic has determined that a
	certain number of packets were retransmitted erroneously, the ConEx	certain number of packets were retransmitted erroneously, the ConEx

	sender should subtract the payload size of these TCP packets from	sender SHOULD subtract the payload size of these TCP packets from
	LEG.	LEG.[CREF6]


	3.1.1. Without SACK Support	3.1.2. Without SACK Support

	If multiple losses occur within one RTT and SACK is not used, it may	If multiple losses occur within one RTT and SACK is not used, it may
	take several RTTs until all lost data is retransmitted. With the	take several RTTs until all lost data is retransmitted. With the
	scheme described above, the ConEx information will be delayed	scheme described above, the ConEx information will be delayed

	strongly but timeliness is important for ConEx.	considerably, but timeliness is important for ConEx.

	For ConEx it is not important to know which data got lost but only	For ConEx it is not important to know which data got lost but only

	how much. During the first RTT after the initial loss detection, the	how much.[CREF7] During the first RTT after the initial loss
	amount of received data and thus also the amount of lost data can be	detection, the amount of received data and thus also the amount of
	estimated based on the number of received ACKs. Thus without SACK,	lost data can be estimated based on the number of received ACKs.
	the needed information for the ConEx feedback can be available with	Thus without SACK, the information needed for ConEx feedback can be
	an additionally delay of one RTT by using the following estimation	available with an additional delay of one RTT by using the following
	algorithm and an additional Loss Estimation Counter (LEC):	estimation algorithm and an additional Loss Estimation Counter (LEC):

	flight_bytes: current flight size in bytes	flight_bytes: current flight size in bytes
	retransmit_bytes: payload size of the retransmission	retransmit_bytes: payload size of the retransmission

	At the first retransmission in a congestion event LEC is set:	At the first retransmission in a congestion event LEC is set:

	LEC = flight_bytes - 3*SMSS	LEC = flight_bytes - 3*SMSS

	(At this point of time in the transmission, in the worst case,	(At this point of time in the transmission, in the worst case,
	all packets in flight minus three that trigged the dupACks	all packets in flight minus three that trigged the dupACks

	skipping to change at page 7, line 13	skipping to change at page 7, line 40
	that should be ConEx L marked.)	that should be ConEx L marked.)

	After the first RTT for each following retransmissions:	After the first RTT for each following retransmissions:

	if (LEC > 0): LEC -= retransmit_bytes	if (LEC > 0): LEC -= retransmit_bytes
	else if (LEC==0): LEG += retransmit_bytes	else if (LEC==0): LEG += retransmit_bytes

	if (LEC < 0): LEG += -LEC	if (LEC < 0): LEG += -LEC

	(The LEG is not increased for those bytes that were	(The LEG is not increased for those bytes that were

	already accounted.)	already counted.)

	3.2. ECN	3.2. ECN

	ECN [RFC3168] is an IP/TCP mechanism that allows network nodes to	ECN [RFC3168] is an IP/TCP mechanism that allows network nodes to
	mark packets with the Congestion Experienced (CE) mark instead of	mark packets with the Congestion Experienced (CE) mark instead of

	(early) dropping them when congestion occurs. As soon as a CE mark	dropping them when congestion occurs.
	is seen at the receiver, with classic ECN it will feed this
	information back to the sender by setting the Echo Congestion
	Experienced (ECE) bit in the TCP header of all subsequent ACKs until
	a packet with Congestion Window Reduced (CWR) bit in the TCP header
	is received to acknowledge the reception of the congestion
	notification. The sender sets the CWR bit in the TCP header once
	when the first ECE of a congestion notification is received.


	A receiver can support 'classic' ECN, a more accurate ECN feedback	A receiver might support 'classic' ECN, the more accurate ECN
	scheme, or neither. In the case ECN is not supported at all, of	feedback scheme (AccECN), or neither. In the case that ECN is not
	course, no ECN marks will occur, thus the E bit will never be set.	supported for a connection, of course, no ECN marks will occur; thus
	Otherwise, a ConEx sender must maintain a counter, the congestion	the sender will never set the E flag. Otherwise, a ConEx sender must
	exposure gauge (CEG), for the number of outstanding bytes that have	maintain a signed counter, the congestion exposure gauge (CEG), for
	to be ConEx marked with the E bit.	the number of outstanding bytes that have to be ConEx marked with the
		E flag.

	The CEG is increased when ECN information is received from an ECN-	The CEG is increased when ECN information is received from an ECN-
	capable receiver supporting the 'classic' ECN scheme or the accurate	capable receiver supporting the 'classic' ECN scheme or the accurate
	ECN feedback scheme. When the ConEx sender receives an ACK	ECN feedback scheme. When the ConEx sender receives an ACK
	indicating one or more segments were received with a CE mark, CEG is	indicating one or more segments were received with a CE mark, CEG is
	increased by the appropriate number of bytes as described further	increased by the appropriate number of bytes as described further
	below.	below.

	Unfortunately in case of duplicate acknowledgements the number of	Unfortunately in case of duplicate acknowledgements the number of
	newly acknowledged bytes will be zero even though (CE marked) data	newly acknowledged bytes will be zero even though (CE marked) data
	has been received. Therefore, we increase the CEG by DeliveredData,	has been received. Therefore, we increase the CEG by DeliveredData,
	as defined below:	as defined below:

	DeliveredData = acked_bytes + SACK_diff + (is_dup)*1SMSS -	DeliveredData = acked_bytes + SACK_diff + (is_dup)*1SMSS -
	(is_after_dup)num_dup1SMSS	(is_after_dup)num_dup1SMSS


	DeliveredData covers the number of bytes which has been newly	DeliveredData covers the number of bytes that has been newly
	delivered to the receiver. Therefore on each arrival of an ACK,	delivered to the receiver. Therefore on each arrival of an ACK,
	DeliveredData will be increased by the newly acknowledged bytes	DeliveredData will be increased by the newly acknowledged bytes
	(acked_bytes) as indicated by the current ACK, relative to all past	(acked_bytes) as indicated by the current ACK, relative to all past

	ACKs.	ACKs. The formula depends on whether SACK is available, as follows:


	Moreover with SACK, DeliveredData is increased by the number of bytes	With SACK: DeliveredData is increased by the number of bytes
	provided by (new) SACK information (SACK_diff). Note, if less	provided by (new) SACK information (SACK_diff). Note, if less

	unacknowledged bytes are announced in the new SACK information than	unacknowledged bytes are announced in the new SACK information
	in the previous ACK, SACK_diff can be negative. In this case, data	than in the previous ACK, SACK_diff can be negative. In this
	is newly acknowledged (in acked_byte), that has previously already	case, data is newly acknowledged (in acked_bytes), that has
	been accounted to DeliveredData based on SACK information.	previously already been accumulated into DeliveredData based on
		SACK information.


	Without SACK, DeliveredData is estimated to be 1 SMSS on duplicate	Without SACK: DeliveredData is estimated to be 1 SMSS on duplicate
	acknowledgements. For the subsequent partial or full ACK,	acknowledgements. For the subsequent partial or full ACK,

	DeliveredData is estimated to be the newly acknowledged bytes, minus	DeliveredData is estimated to be the newly acknowledged bytes,
	one SMSS for each preceding duplicate ACK. Therefore is_dup is one	minus one SMSS for each preceding duplicate ACK. Therefore is_dup
	if the current ACK is a duplicated ACK without SACK, and zero	is one if the current ACK is a duplicated ACK without SACK, and
	otherwise. is_after_dup is only one for the next full or partial ACK	zero otherwise. is_after_dup is only one for the next full or
	after a number of duplicated ACKs without SACK and num_dup counts the	partial ACK after a number of duplicated ACKs without SACK and
	number of duplicated ACKs in a row.	num_dup counts the number of duplicated ACKs in a row.[CREF8]


	The two cases, with and without more accurate ECN depending on the	With classic ECN, as soon as a CE mark is seen at the receiver, it
	receiver capability, are discussed in the following sections.	will feed this information back to the sender by setting the Echo
		Congestion Experienced (ECE) flag in the TCP header of subsequent
		ACKs. Once the sender receives the first ECE of a congestion
		notification, it sets the CWR flag in the TCP header once. When this
		packet with Congestion Window Reduced (CWR) flag in the TCP header
		arrives at the receiver, acknowledging its first ECE feedback, the
		receiver stops setting ECE.

		Thus, with classic ECN, one congestion marked packet causes
		continuous congestion feedback for a whole round trip, thus hiding
		the arrival of any further congestion marked packets during that
		round trip. The more accurate ECN feedback scheme (AccECN) has been
		defined to ensure that feedback properly reflects the extent of
		congestion marking. The two cases, with and without a receiver
		capable of AccECN, are discussed in the following sections.

	3.2.1. Accurate ECN feedback	3.2.1. Accurate ECN feedback


	With a more accurate ECN feedback scheme either the number of marked	With the [CREF9] more accurate ECN feedback scheme (AccECN) either
	packets/received CE marks or directly the number of marked bytes is	the number of marked packets or the number of marked bytes is known.
	known. In the later case the CEG can directly be increased by the	In the latter case the CEG can directly be increased by the number of
	number of marked bytes. Otherwise if D is assumed to be the number	marked bytes. Otherwise if D is assumed to be the number of marks,
	of marks, the gauge CEG will be conservatively increased by one SMSS	the gauge (CEG) will be conservatively increased by one SMSS for each
	for each marking or at max the number of newly acknowledged bytes:	marking or at max the number of newly acknowledged bytes:

	CEG += min(SMSS*D, DeliveredData)	CEG += min(SMSS*D, DeliveredData)

	3.2.2. Classic ECN support	3.2.2. Classic ECN support


	If the ConEx sender fully conforms to the semantics of the ECN	If the ConEx sender fully conforms to the semantics of ECN signaling
	signaling as defined by [RFC5562], it will receive one full RTT of	as defined by [RFC5562],[CREF10] it will receive one full RTT of ACKs
	ACKs with the ECE flag set whenever at least one CE mark was received	with the ECE flag set whenever at least one CE mark was received by
	by the receiver. As the sender cannot estimate how much packets have	the receiver. As the sender cannot estimate how many packets have
	actually been CE marked during this RTT, the most conservative	actually been CE marked during this RTT, the most conservative

	assumption should be taken, namely assuming that all packets were	assumption MAY be taken, namely assuming that all packets were
	marked. This can be achieved by increasing the CEG by DeliveredData	marked. This can be achieved by increasing the CEG by DeliveredData
	for each ACK with the ECE flag:	for each ACK with the ECE flag:

	CEG += DeliveredData	CEG += DeliveredData


	Optionally a ConEx sender could implement an Advanced Compatibility	Optionally a ConEx sender could implement the following technique,
	Mode:	called advanced compatibility mode, to considerably improve its
		estimate of the number of ECN-marked packets:

	To extract more than one ECE indication per RTT, a ConEx sender could	To extract more than one ECE indication per RTT, a ConEx sender could

	set the CWR flag opportunistically to force the receiver to signal	set the CWR flag continuously to force the receiver to signal only
	only one ECE per CE mark. Unfortunately, the use of delayed ACKs	one ECE per CE mark. Unfortunately, the use of delayed ACKs
	[RFC5681], as it is usually done today, will prevent a feedback of	[RFC5681] (which is common) will prevent feedback of every CE mark;
	every CE mark. If an CWR confirmation will be received before the	if a CWR confirmation is received before the ECE can be sent out on
	ECE can be sent out with the next ACK, ECN feedback information	the next ACK, ECN feedback information could get lost. Thus a sender
	information could get lost. Thus a sender should set CWR only on	SHOULD set CWR only on those data segments that will actually trigger
	those data segments, that will actually trigger a (delayed) ACK. The	a (delayed) ACK. The sender would need an additional control loop to
	sender would need an additional control loop to estimated which data	estimated which data segments will trigger an ACK in order to extract
	segment will trigger an ACK. But such a more sophisticated	more timely congestion notifications. Still the CEG SHOULD be
	heuristics could extract congestion notifications more timely. Still	increased by DeliveredData, as one or more CE marked packets could be
	the CEG need to be increased by DeliveredData, as one or more CE	acknowledged by one delayed ACK.
	marked packets could be acknowledged by one delayed ACK.
		The repetition of ECE in classic ECN is intended to ensure reliable
		delivery of congestion feedback. The following argument is intended
		to prove that suppressing repetitions of ECE is safe against possible
		congestion collapse due to lost congestion feedback.

		With advanced compatibility mode, if an ACK containing ECE is lost,
		the continual CWRs prevent it being repeated, so it will remain lost.
		Therefore, if congestion is light on the forward path and heavy on
		the reverse, most of the light congestion signals will be lost. If
		loss of feedback exacerbates congestion on the forward path, more
		forward packets will be CE marked, increasing the likelihood that
		feedback from at least one CE will get through per RTT. As long as
		one ECE reaches the sender per RTT, the sender's congestion response
		will be the same as if CWR were not continuous. The only way that
		heavy congestion on the forward path could be completely hidden would
		be if all ACKs on the reverse path were lost. If total ACK loss
		persisted, the sender would time out and do a congestion response
		anyway.Therefore, the problem seems confined to potential suppression
		of a congestion response during light congestion.

		Anyway, even if loss of all ECN feedback led to no congestion
		response, the worst that could happen would be loss instead of ECN-
		signalled congestion on the forward path. Given compatibility mode
		does not affect loss feedback, there would be no risk of congestion
		collapse.

	4. Setting the ConEx Bits	4. Setting the ConEx Bits


	By setting the X bit a packet is marked as ConEx-capable. All	By setting the X flag, a packet is marked as ConEx-capable. All
	packets carrying payload MUST be marked with the X bit set including	packets carrying payload MUST be marked with the X flag set,
	retransmissions. No congestion feedback information are available	including retransmissions. No congestion feedback information is
	about control packets such as pure ACKs which are not carrying any	available about control packets such as pure ACKs which are not
	payload. Thus these packets should not be taken into account when	carrying any payload. Thus these packets should not be taken into
	determining ConEx information. These packet MUST carry a ConEx	account when determining ConEx information. These packet MUST carry
	Destination Option with the X bit unset.	a ConEx Destination Option with the X flag unset.[CREF11]


	4.1. Setting the E and the L Bit	4.1. Setting the E or the L Flag


	As long as the CEG or LEG counter is positive, ConEx-capable packets	As long as the LEG or CEG counter is positive, the sender MUST mark
	SHOULD be marked with E or L respectively, and the CEG or LEG counter	each ConEx-capable packet with L or E respectively, and decrease the
	is decreased by the TCP payload bytes carried in this packet. If the	LEG or CEG counter by the TCP payload bytes carried in the marked
	CEG or LEG counter is negative, the respective counter SHOULD be	packet (assuming headers are not being counted because packet sizes
	reset to zero within one RTT after it was decreased the last time or	are regular). No matter how small the value of LEG or CEG, if it is
	one RTT after recovery if no further congestion occurred.	positive, to ensure ConEx signals are timely, the sender MUST NOT
		defer packet marking. Therefore the value of LEG and CEG will
		commonly be negative.


	If SACK information is not available spurious retransmission are more	Multiple ConEx flags may be required for signaling at the same time.
	likely. In this case it might be valuable to slightly delay the	This may happen, for example, during excessive congestion when an ACK
	ConEx loss feedback until a spurious retransmission might be	is received by the sender that simultaneously indicates that at least
	detected. But the ConEx signal MUST NOT be delayed more than one RTT	one segment has been lost, and that one or more ECN marks were
	if as long as data packets are sent out.	received. Another case when this might happen is when ACKs are lost,
		so that a subsequent ACK carries summary information not previously
		available to the sender.


	4.2. Credit Bits	Whenever both LEG and CEG are positive, the sender MUST mark each
		ConEx-capable packet with both L and E. If a credit signal is also
		pending (see Section 4.2), the C flag can be set as well.


	The ConEx abstract mechanism requires that sufficient credit must be	4.2. Setting the Credit Flag
	signaled in advance to cover the expected congestion during the
	feedback delay of one RTT. A ConEx sender should maintain a counter
	of the sent credits c in bytes. If congestion occurs, credits will
	be consumed and the c counter should be reduced by the number of
	bytes that where lost or estimated to be ECN-marked. If the risk of
	congestion was estimated wrongly and thus too few credits were sent,
	the c counter becomes zero but can not get negative.


	The number of credits sent should always equal the number of bytes in	The ConEx abstract mechanism [draft-ietf-conex-abstract-mech]
	flight, as all packets could potentially get lost or congestion	requires that sufficient credit must be signaled in advance to cover
	marked. Thus a ConEx sender should monitor the number of bytes in	the expected congestion during the feedback delay of one RTT.
	flight f. If f ever becomes larger than c, the ConEx sender SHOULD
	send new credits. Remember that c will be decreased if congestion
	occurs.


	In TCP Slow Start, the congestion window might grow much larger than	This section proposes concrete algorithms for determining how much
	during the rest of the transmission. Thus a sender could consider to	credit to signal during congestion avoidance and slow start.
	sent fewer than f credits but risking potential penalization by an	However, experimentation in better credit setting algorithms is
	audit. In any case the credits should at least cover the increase in	expected and encouraged. The wider goal of ConEx is to reflect the
	sending rate. As the sending rate increases exponentially in Slow	'cost' of the risk of causing congestion on those that contribute
	Start, thus double every RTT, a ConEx sender should at least cover	most to it. Thus, experimentation is encouraged in better ways to
	half the number of packets in flight by credits. Note, that the	improve or maintain performance while reducing the risk of causing
	number of losses or markings within one RTT does not only depend	congestion, and therefore reducing the need to signal so much credit.
	actions taken by the sender. In general, the behavior of the cross
	traffic, and if Active Queue Management (AQM) is used, the respective
	parameterization influence how many packets get dropped or marked.
	But if the used AQM is not overly aggressive with ECN marking,
	sending halve the flight size as credits should be sufficient for
	both, congestion signaled by loss or ECN. Marking every fourth
	packet will allow the respective number of credits in Slow Start as
	it can be seen in Figure Figure 1.


	RTT1 \|------XC------>\|	For a simple credit algorithm, a ConEx sender SHOULD maintain a
	\|------X------->\|	counter of the sent credits c in bytes. If congestion occurs,
	\|------X------->\| credit=1 in_flight=3	credits will be consumed and the c counter SHOULD be reduced by the
		number of bytes that where lost or estimated to be ECN-marked. If
		the risk of congestion was estimated wrongly and thus too few credits
		were sent, the c counter becomes zero but cannot go negative.

		During TCP congestion avoidance, the amount of credit sent SHOULD
		exceed the amount of congestion experienced by at least the number of
		bytes in flight, as all packets could potentially get lost or
		congestion marked.[CREF12] Thus a ConEx sender should monitor the
		number of bytes in flight f. Whenever f becomes larger than c, the
		ConEx sender SHOULD set the C flag on each ConEx-capable packet and
		increase c by the size of each marked packet until it is no less than
		f again.

		Recall that c will be decreased whenever congestion occurs, therefore
		c will need to be replenished as soon as c drops below f. Also
		recall that the sender can set the C flag on a ConEx-capable packet
		whether or not the E or L flags are also set.

		In TCP slow start, the congestion window might grow much larger than
		during the rest of the transmission. Thus a sender could consider
		sending fewer than f credits but risking being penalized by an audit
		function. In any case the credits SHOULD at least cover the increase
		in sending rate.[CREF13] Given the sending rate doubles every RTT in
		Slow Start, a ConEx sender should at least cover half the number of
		packets in flight by credits. Note that the number of losses or
		markings within one RTT does not solely depend on the sender's
		actions. In general, the behavior of the cross traffic, whether
		active queue management (AQM) is used and how it is parameterized
		influence how many packets might be dropped or marked. As long as
		any AQM encountered is not overly aggressive with ECN marking,
		sending half the flight size as credits should be sufficient whether
		congestion is signaled by loss or ECN. Marking C on every second
		packet in the initial window and every fourth packet in slow start
		will introduce the correct amount of credit as can be seen in
		Figure 1.[CREF14] This behaviour is most easily achieved by using the
		following formula to update c as every packet is sent during slow
		start:

		c = (f+1)/2, using integer division.

		f c=(f+1)/2
		RTT1 \|------XC------>\| 1 1
		\|------X------->\| 2 1
		\|------XC------>\| 3 2
	\| \|	\| \|

	RTT2 \|------X------->\|	RTT2 \|------X------->\| 3 2
	\|------XC------>\|	\|------X------->\| 4 2
	\|------X------->\|	\|------X------->\| 4 2
	\|------X------->\|	\|------XC------>\| 5 3
	\|------X------->\|	\|------X------->\| 5 3
	\|------XC------>\| credit=3 in_flight=6	\|------X------->\| 6 3
	\| \|	\| \|

	RTT3 \|------X------->\|	RTT3 \|------X------->\| 6 3
	\|------X------->\|	\|------XC------>\| 7 4
	\|------X------->\|	\|------X------->\| 7 4
	\|------XC------>\|	\|------X------->\| 8 4
	\|------X------->\|	\|------X------->\| 8 4
	\|------X------->\|	\|------XC------>\| 9 5
	\|------X------->\|	\|------X------->\| 9 5
	\|------XC------>\|	\|------X------->\| 10 5
	\|------X------->\|	\|------X------->\| 10 5
	\|------X------->\|	\|------XC------>\| 11 6
	\|------X------->\|	\|------X------->\| 11 6
	\|------XC------>\| credit=6 in_flight=12	\|------X------->\| 12 6
	\| . \|	\| . \|
	\| : \|	\| : \|

	Figure 1: Credits in Slow Start (with an initial window of 3)	Figure 1: Credits in Slow Start (with an initial window of 3)


	It is possible that the audit looses state due to e.g. rerouting or	It is possible that a TCP flow will encounter an audit function
	memory limitations. Therefore, the sender needs to detect this case	without relevant flow state, due to e.g. rerouting or memory
	and resend credits. Thus a ConEx sender should reset the credit	limitations. Therefore, the sender needs to detect this case and
	count c to zero if losses occur in two subsequent RTTs (assuming that	resend credits. Thus a ConEx sender should reset the credit count c
	the sending rate was correctly reduced based on the received	to zero if losses occur in two subsequent RTTs (assuming that the
	congestion signal).	sending rate was correctly reduced based on the received congestion
		signal). [CREF15]

	5. Loss of ConEx information	5. Loss of ConEx information


	Packets carrying ConEx can also get lost. A ConEx sender must	Packets carrying ConEx signals could be discarded themselves. This
	remember which packet was marked with either the L, the E or the C	will be a second order problem (e.g. if the loss probability is 0.1%,
	bit. If one of these packets is detected to be lost, the should	the probability of losing a loss signal will be 0.1% of 0.1% =
	increase the respective gauge, LEG or CEG, by the number of lost	0.0001%). Therefore, an implementer MAY choose to ignore this
	payload bytes.	problem, accepting instead the risk that an audit function might
		slightly increase the loss level (e.g. from 0.1000% to 0.1001%).

		Nonetheless, a ConEx sender SHOULD remember which packet was marked
		with either the L, the E or the C flag. If one of these packets is
		detected as lost, the sender SHOULD increase the respective gauge(s),
		LEG or CEG, by the number of lost payload bytes in addition to
		increasing LEG for the loss.

	6. Timeliness of the ConEx Signals	6. Timeliness of the ConEx Signals


	ConEx signals can only be evaluated by a network node with a time	ConEx signals will only be useful to a network node within a time
	delay of about one RTT after the congestion occured. To avoid	delay of about one RTT after the congestion occurred. To avoid
	further delays, a ConEx sender SHOULD sent the ConEx signaling with	further delays, a ConEx sender SHOULD send the ConEx signaling on the
	the next available packet. In cases where it is preferable to	next available packet.
	slightly delay the ConEx signal, the sender MUST NOT delay the ConEx
	signal more than one RTT.


	Multiple ConEx bits may become available for signaling at the same	Any or all of the ConEx flags can be used in the same packet, which
	time, for example when an ACK is received by the sender, that	allows delay to be minimised when multiple signals are pending.
	indicates at the same time that at least one segment has been lost,
	and that one or more ECN marks were received. This may happen during	If a flow becomes application-limited, there could be insufficient
	excessive congestion, where the queues overflow even though ECN was	bytes to send to reduce the gauges to zero or below. In such cases,
	used and currently all packets are marked, while others have to be	the sender cannot help but delay ConEx signals. Nonetheless, as long
	dropped nevertheless. Another possibility when this may happen are	as the sender is marking all outgoing packets, an audit function is
	lost ACKs, so that a subsequent ACK carries summary information not	unlikely to penalize ConEx-marked packets. Therefore, no matter how
	previously available to the sender. As ConEx-capable packet can	long a gauge has been positive, a sender MUST NOT reduce the gauge by
	carry different ConEx marks at the same time, these information do	more than the ConEx marked bytes it has sent.
	not need to be distributed over several packets and thus can be sent
	without further delay.	If the CEG or LEG counter is negative, the respective counter SHOULD
		be reset to zero within one RTT after it was decreased the last time
		or one RTT after recovery if no further congestion occurred.
		[CREF16]

		If SACK information is not available spurious retransmission are more
		likely. In this case it might be valuable to slightly delay the
		ConEx loss feedback until a spurious retransmission might be
		detected. But the ConEx signal MUST NOT be delayed more than one RTT
		if as long as data packets are sent out.[CREF17]

	7. Acknowledgements	7. Acknowledgements

	The authors would like to thank Bob Briscoe who contributed with this	The authors would like to thank Bob Briscoe who contributed with this

	initial ideas and valuable feedback. Moreover, thanks to Jana	initial ideas [I-D.briscoe-conex-re-ecn-tcp] and valuable feedback.
	Iyengar who provided valuable feedback.	Moreover, thanks to Jana Iyengar who provided valuable feedback.

	8. IANA Considerations	8. IANA Considerations

	This document does not have any requests to IANA.	This document does not have any requests to IANA.

	9. Security Considerations	9. Security Considerations


	With some of the advanced ECN compatibility modes it is possible to	General ConEx security considerations are covered extensively in the
	miss congestion notifications. Thus a sender will not decrease its	ConEx abstract mechanism [draft-ietf-conex-abstract-mech]. This
	sending rate. If the congestion is persistent, the likelihood to	section covers TCP-specific concerns.
	receive a congestion notification increases. In the worst case the
	sender will still react correctly to loss. This will prevent a	The ConEx modifications to TCP provide no mechanism for a receiver to
	congestion collapse.	force a sender not to use ConEx. A receiver can degrade the accuracy
		of ConEx by claiming that it does not support SACK, AccECN or ECN,
		but the sender will never have to turn ConEx off. The receiver
		cannot force the sender to have to mark ConEx more conservatively, in
		order to cover the risk of any inaccuracy. Instead the sender can
		choose to mark inaccurately, which will only increase the likelihood
		of loss at an audit function. Thus the receiver will only harm
		itself.

		Assuming the sender is limited in some way by a congestion allowance
		or quota, a receiver could spoof more loss or ECN congestion feedback
		than it actually experiences, in an attempt to make the sender draw
		down its allowance faster than necessary. However, over-declaring
		congestion simply makes the sender slow down. If the receiver is
		interested in the content it will not want to harm its own
		performance.

		However, if the receiver is solely interested in making the sender
		draw down its allowance, the net effect will depend on the sender's
		congestion control algorithm. With New Reno [RFC5681], doubling
		congestion feedback causes the sender to consume sqrt(2) = 1.4 times
		more congestion allowance. However, to improve scaling, congestion
		control algorithms are tending towards less responsive algorithms
		like Cubic or Compound TCP, and ultimately to linear algorithms like
		DCTCP [DCTCP]. In each case, if the receiver doubles congestion
		feedback, it causes the sender to respectively consume more allowance
		by a factor of 1.2, 1.15 or 1, where 1 implies the attack has become
		completely ineffective.

	10. References	10. References

	10.1. Normative References	10.1. Normative References

	[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP	[RFC2018] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
	Selective Acknowledgment Options", RFC 2018, October 1996.	Selective Acknowledgment Options", RFC 2018, October 1996.

	[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate	[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
	Requirement Levels", BCP 14, RFC 2119, March 1997.	Requirement Levels", BCP 14, RFC 2119, March 1997.

	skipping to change at page 13, line 29	skipping to change at page 16, line 22
	Destination Option for ConEx", draft-ietf-conex-destopt-04	Destination Option for ConEx", draft-ietf-conex-destopt-04
	(work in progress), March 2013.	(work in progress), March 2013.

	10.2. Informative References	10.2. Informative References

	[DCTCP] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel,	[DCTCP] Alizadeh, M., Greenberg, A., Maltz, D., Padhye, J., Patel,
	P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP:	P., Prabhakar, B., Sengupta, S., and M. Sridharan, "DCTCP:
	Efficient Packet Transport for the Commoditized Data	Efficient Packet Transport for the Commoditized Data
	Center", Jan 2010.	Center", Jan 2010.


	[I-D.briscoe-tsvwg-re-ecn-tcp]	[I-D.briscoe-conex-re-ecn-tcp]
	Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith,	Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith,
	"Re-ECN: Adding Accountability for Causing Congestion to	"Re-ECN: Adding Accountability for Causing Congestion to

	TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-09 (work in	TCP/IP", draft-briscoe-conex-re-ecn-tcp-04 (work in
	progress), October 2010.	progress), July 2014.

	[RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm	[RFC3522] Ludwig, R. and M. Meyer, "The Eifel Detection Algorithm
	for TCP", RFC 3522, April 2003.	for TCP", RFC 3522, April 2003.

	[RFC3708] Blanton, E. and M. Allman, "Using TCP Duplicate Selective	[RFC3708] Blanton, E. and M. Allman, "Using TCP Duplicate Selective
	Acknowledgement (DSACKs) and Stream Control Transmission	Acknowledgement (DSACKs) and Stream Control Transmission
	Protocol (SCTP) Duplicate Transmission Sequence Numbers	Protocol (SCTP) Duplicate Transmission Sequence Numbers
	(TSNs) to Detect Spurious Retransmissions", RFC 3708,	(TSNs) to Detect Spurious Retransmissions", RFC 3708,
	February 2004.	February 2004.


	skipping to change at page 14, line 14	skipping to change at page 17, line 5

	[RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,	[RFC5682] Sarolahti, P., Kojo, M., Yamamoto, K., and M. Hata,
	"Forward RTO-Recovery (F-RTO): An Algorithm for Detecting	"Forward RTO-Recovery (F-RTO): An Algorithm for Detecting
	Spurious Retransmission Timeouts with TCP", RFC 5682,	Spurious Retransmission Timeouts with TCP", RFC 5682,
	September 2009.	September 2009.

	[RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion	[RFC6789] Briscoe, B., Woundy, R., and A. Cooper, "Congestion
	Exposure (ConEx) Concepts and Use Cases", RFC 6789,	Exposure (ConEx) Concepts and Use Cases", RFC 6789,
	December 2012.	December 2012.


	[draft-briscoe-tsvwg-byte-pkt-mark]	[RFC7141] Briscoe, B. and J. Manner, "Byte and Packet Congestion
	Briscoe, B. and J. Manner, "Byte and Packet Congestion	Notification", BCP 41, RFC 7141, February 2014.
	Notification", draft-briscoe-tsvwg-byte-pkt-mark-010 (work
	in progress), May 2013.

	[draft-kuehlewind-tcpm-accurate-ecn]	[draft-kuehlewind-tcpm-accurate-ecn]
	Kuehlewind, M. and R. Scheffenegger, "More Accurate ECN	Kuehlewind, M. and R. Scheffenegger, "More Accurate ECN
	Feedback in TCP", draft-kuehlewind-tcpm-accurate-ecn-02	Feedback in TCP", draft-kuehlewind-tcpm-accurate-ecn-02
	(work in progress), Jun 2013.	(work in progress), Jun 2013.

	Appendix A. Revision history	Appendix A. Revision history


	RFC Editior: This section is to be removed before RFC publication.	RFC Editor: This section is to be removed before RFC publication.

	00 ... initial draft, early submission to meet deadline.	00 ... initial draft, early submission to meet deadline.

	01 ... refined draft, updated LEG "drain" from per-packet to RTT-	01 ... refined draft, updated LEG "drain" from per-packet to RTT-
	based.	based.

	02 ... added Section 5 and expanded discussion about ECN interaction.	02 ... added Section 5 and expanded discussion about ECN interaction.

	03 ... expanded the discussion around credit bits.	03 ... expanded the discussion around credit bits.

	04 ... review comments of Jana addressed. (Change in full compliance	04 ... review comments of Jana addressed. (Change in full compliance
	mode.)	mode.)

	05 ... changes on Loss Detection without SACK, support of classic ECN	05 ... changes on Loss Detection without SACK, support of classic ECN
	and credit handling.	and credit handling.


		Editorial Comments

		[CREF1] BB: 'finally" here would mean "At last (sigh), here's what
		you've all been waiting for." :-)

		[CREF2] BB: Avoid 'recommended', which could be confused with the
		normative upper-cased word. The normative language later is
		good and sufficient.

		[CREF3] BB: I don't understand this last sentence. How does the sender
		suddenly know something it didn't know before?

		[CREF4] BB: I've added this sentence, but only to give you an excuse for
		having devised all this mechanism. However, I really don't know
		why you're going to all this trouble to be so accurate and
		timely. TCP never retransmits less data than is lost. And over
		the years TCP designers have been reducing the amount of
		unnecessary retransmission, and reducing retransmission delay.
		So I suggest we just mark retransmissions with the L flag.
		Done! No need even for a loss exposure gauge. ...If the sender
		is faced with insufficient information such that the universe of
		TCP designers has been unable to minimise unnecessary or delayed
		retransmissions, why try to do better than everyone has so far
		managed? Just accept that you will be over-declaring or
		sluggishly declaring ConEx. And assume that deployment of all
		the techniques to reduce late or spurious losses is proceeding,
		and we can walk on their shoulders.

		[CREF5] BB: I suggest removing MUST, because we cannot mandate a
		particular implementation technique.

		[CREF6] BB: If these mechanisms are being used, surely they will be
		being used to /prevent/ spurious retransmissions (not just count
		them but still retransmit anyway). So, if we increase LEG only
		when a retransmission actually occurs, is that not sufficient?

		[CREF7] BB: OK, I get that. But, as above, why worry about optimising a
		case that is becoming rare, because everyone recognised late
		retransmission was a problem, so SACK is pretty much universally
		deployed. Would you be unhappy if all this was deleted?
		Perhaps relegate to an appendix? But is it really so necessary?

		[CREF8] BB: I think 3 has been used instead of num_dup in the LEC
		algorithm earlier.

		[CREF9] BB: I changed 'a' to 'the'. Did you mean a generally more
		accurate scheme, or the AccECN scheme in particular? If the
		latter, as it stands, the AccECN scheme doesn't give marked
		bytes.

		[CREF10] BB: Surely RFC5562 only adds ECT on the SYN/ACK. Is it really
		necessary to even refer to it in this draft? Whatever, it
		doesn't seem particularly relevant to this sentence. Or did
		you mean RFC3168?

		[CREF11] BB: I thought the result of the discussion about how to say
		whether the X flag is set in conex-destopt was that X is set
		irrespective of whether loss or ECN marking of the packet
		itself can be detected. The relevant sentence in conex-destopt
		is: "This [X=0] can be the case if no congestion feedback is
		(currently) available e.g. in TCP if one endpoint has been
		receiving data but sending nothing but pure ACKs (no user data)
		for some time."

		[CREF12] BB: I would prefer if this were stated at the maximum required,
		not a recommended value. The idea is to hold as much credit as
		the /likely/ worst-case congestion, not the /absolute/ worst
		case (I did experiments to find the variance of congestion in
		my PhD).

		[CREF13] BB: Again, rather than a SHOULD, can we make this a
		recommendation that is part of the reason for ConEx
		experimentation? - especially if variants like hybrid SS are
		enabled.

		[CREF14] BB: Just marking every fourth packet doesn't work for a general
		IW. During the IW, mark the first packet and every other
		packet, then after IW mark every fourth packet (to determine
		precisely which is the first packet to mark after the IW,
		maintain a packet counter and double it when IW ends).

		[CREF15] BB: Whoa! This is rather excessively conservative isn't it?
		There will often be a loss in 2 consecutive RTTs due to normal
		congestion. If there's a re-route, I think the new audit will
		drop a whole window, so the sender will naturally send a whole
		window's worth of credit with the retransmissions. Am I wrong?

		[CREF16] BB: This adds complexity. I would suggest this is a MAY. It
		depends on how audit is done whether it is necessary, so this
		will depend on experiments. For instance, in the audit
		function I designed, there was a long term and a short term
		comparison, and the long term one became more relaxed the
		longer the flow had been behaving. (Note I have also suggested
		moving this and the next para from "Setting E/L" to
		"Timeliness")

		[CREF17] BB: As before, I disagree with the need for this para - this is
		trying to optimise a case that is rare because it's known to be
		sub-optimal, by compromising ConEx timeliness. SACK is nearly
		universal .If SACK isn't available, things are bound to be non-
		optimal. The solution is for the receiver to deploy SACK like
		nearly every other receiver has done, not to add more
		complexity to the sender and more delay to ConEx.

	Authors' Addresses	Authors' Addresses

	Mirja Kuehlewind (editor)	Mirja Kuehlewind (editor)
	ETH Zurich	ETH Zurich
	Switzerland	Switzerland

	Email: mirja.kuehlewind@tik.ee.ethz.ch	Email: mirja.kuehlewind@tik.ee.ethz.ch

	Richard Scheffenegger	Richard Scheffenegger
	NetApp, Inc.	NetApp, Inc.

End of changes. 74 change blocks.
	314 lines changed or deleted	547 lines changed or added
This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/

X-Generator: pyht 0.35