Diff: draft-mathis-conex-abstract-mech-00a.txt - draft-mathis-conex-abstract-mech-00c.txt

	< draft-mathis-conex-abstract-mech-00a.txt	draft-mathis-conex-abstract-mech-00c.txt >

	Congestion Exposure (ConEx) M. Mathis	Congestion Exposure (ConEx) M. Mathis
	Working Group Google	Working Group Google

	Internet-Draft October 14, 2010	Internet-Draft B. Briscoe
	Intended status: Informational	Intended status: Informational BT
	Expires: April 17, 2011	Expires: April 18, 2011 October 15, 2010


	ConEx Concepts and Abstract Mechanism	Congestion Exposure (ConEx) Concepts and Abstract Mechanism
	draft-mathis-conex-abstract-mech-00a	draft-mathis-conex-abstract-mech-00c

	Abstract	Abstract


	This document describes and abstract mechanism by which senders	This document describes an abstract mechanism by which senders inform
	inform the network about the congestion encountered by previous	the network about the congestion encountered by packets earlier in
	packets on the same flow. Today, the network may signal congestion	the same flow. Today, the network may signal congestion to the
	by ECN markings or by dropping packets, and the receiver passes this	receiver by ECN markings or by dropping packets, and the receiver may
	information back to the sender in transport-layer acknowledgments.	pass this information back to the sender in transport-layer feedback.
	The mechanism to be developed by the CONEX WG will enable the sender	The mechanism to be developed by the ConEx WG will enable the sender
	to also relay the congestion information back into the network in-	to also relay this congestion information back into the network in-
	band at the IP layer, such that the total level of congestion is	band at the IP layer, such that the total level of congestion is
	visible to all IP devices along the path, from where it could, for	visible to all IP devices along the path, from where it could, for
	example, be provided as input to traffic management.	example, be provided as input to traffic management.

	Status of This Memo	Status of This Memo

	This Internet-Draft is submitted in full conformance with the	This Internet-Draft is submitted in full conformance with the
	provisions of BCP 78 and BCP 79.	provisions of BCP 78 and BCP 79.

	Internet-Drafts are working documents of the Internet Engineering	Internet-Drafts are working documents of the Internet Engineering
	Task Force (IETF). Note that other groups may also distribute	Task Force (IETF). Note that other groups may also distribute
	working documents as Internet-Drafts. The list of current Internet-	working documents as Internet-Drafts. The list of current Internet-
	Drafts is at http://datatracker.ietf.org/drafts/current/.	Drafts is at http://datatracker.ietf.org/drafts/current/.

	Internet-Drafts are draft documents valid for a maximum of six months	Internet-Drafts are draft documents valid for a maximum of six months
	and may be updated, replaced, or obsoleted by other documents at any	and may be updated, replaced, or obsoleted by other documents at any
	time. It is inappropriate to use Internet-Drafts as reference	time. It is inappropriate to use Internet-Drafts as reference
	material or to cite them other than as "work in progress."	material or to cite them other than as "work in progress."


	This Internet-Draft will expire on April 17, 2011.	This Internet-Draft will expire on April 18, 2011.

	Copyright Notice	Copyright Notice

	Copyright (c) 2010 IETF Trust and the persons identified as the	Copyright (c) 2010 IETF Trust and the persons identified as the
	document authors. All rights reserved.	document authors. All rights reserved.

	This document is subject to BCP 78 and the IETF Trust's Legal	This document is subject to BCP 78 and the IETF Trust's Legal
	Provisions Relating to IETF Documents	Provisions Relating to IETF Documents
	(http://trustee.ietf.org/license-info) in effect on the date of	(http://trustee.ietf.org/license-info) in effect on the date of
	publication of this document. Please review these documents	publication of this document. Please review these documents
	carefully, as they describe your rights and restrictions with respect	carefully, as they describe your rights and restrictions with respect
	to this document. Code Components extracted from this document must	to this document. Code Components extracted from this document must
	include Simplified BSD License text as described in Section 4.e of	include Simplified BSD License text as described in Section 4.e of
	the Trust Legal Provisions and are provided without warranty as	the Trust Legal Provisions and are provided without warranty as
	described in the Simplified BSD License.	described in the Simplified BSD License.

	Table of Contents	Table of Contents


	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3	1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
	1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 4	1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4
	2. Requirements for the Congestion Exposure Signal . . . . . . . . 4	2. Requirements for the Congestion Exposure Signal . . . . . . . 5
	3. Representing Congestion Exposure . . . . . . . . . . . . . . . 5	3. Representing Congestion Exposure . . . . . . . . . . . . . . . 7
	3.1. One Simple Encoding . . . . . . . . . . . . . . . . . . . . 6	3.1. Strawman Encoding . . . . . . . . . . . . . . . . . . . . 7
	3.2. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 6	3.2. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 8
	3.2.1. ECN Changes . . . . . . . . . . . . . . . . . . . . . . 7	3.2.1. ECN Changes . . . . . . . . . . . . . . . . . . . . . 9
	3.3. Abstract Encoding . . . . . . . . . . . . . . . . . . . . . 7	3.3. Abstract Encoding . . . . . . . . . . . . . . . . . . . . 9
	3.3.1. Separate Bits . . . . . . . . . . . . . . . . . . . . . 7	3.3.1. Independent Bits . . . . . . . . . . . . . . . . . . . 9
	3.3.2. Enumerated Encoding . . . . . . . . . . . . . . . . . . 8	3.3.2. Codepoint Encoding . . . . . . . . . . . . . . . . . . 10
	4. Congestion Exposure Components . . . . . . . . . . . . . . . . 8	4. Congestion Exposure Components . . . . . . . . . . . . . . . . 10
	4.1. Modified Senders . . . . . . . . . . . . . . . . . . . . . 8	4.1. Modified Senders . . . . . . . . . . . . . . . . . . . . . 10
	4.2. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 8	4.2. Receivers (Optionally Modified) . . . . . . . . . . . . . 11
	4.2.1. Audit . . . . . . . . . . . . . . . . . . . . . . . . . 8	4.3. Audit . . . . . . . . . . . . . . . . . . . . . . . . . . 11
	4.2.2. Policers and Shapers . . . . . . . . . . . . . . . . . 8	4.4. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 12
	5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8	4.4.1. Congestion Policers . . . . . . . . . . . . . . . . . 12
	6. Security Considerations . . . . . . . . . . . . . . . . . . . . 8	4.4.2. Other Policy Devices . . . . . . . . . . . . . . . . . 12
	7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 9	5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
	8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9	6. Security Considerations . . . . . . . . . . . . . . . . . . . 13
	9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 9	7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 13
	10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9	8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
	10.1. Normative References . . . . . . . . . . . . . . . . . . . 9	9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 13
	10.2. Informative References . . . . . . . . . . . . . . . . . . 9	10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
		10.1. Normative References . . . . . . . . . . . . . . . . . . . 13
		10.2. Informative References . . . . . . . . . . . . . . . . . . 13

	1. Introduction	1. Introduction

	One of the required functions of a transport protocol is controlling	One of the required functions of a transport protocol is controlling
	congestion in the network. There are three techniques in use today	congestion in the network. There are three techniques in use today

	for signaling congestion:	for the network to signal congestion to a transport:

	o The most common congestion signal is packet loss. When congested,	o The most common congestion signal is packet loss. When congested,
	the network simply discards some packets either as part of an	the network simply discards some packets either as part of an
	explicit control function [RFC2309] or as the consequence of a	explicit control function [RFC2309] or as the consequence of a
	queue overflow or other resource starvation. The transport	queue overflow or other resource starvation. The transport
	receiver detects that some data is missing and signals such	receiver detects that some data is missing and signals such
	through transport acknowledgments to the transport sender (e.g.	through transport acknowledgments to the transport sender (e.g.

	TCP SACK options). The sender retransmits the missing data (if a	TCP SACK options). The sender performs the appropriate congestion
	reliable protocol) and then performs the mandatory congestion	control rate reduction (e.g. [RFC5681] for TCP) and, if it is a
	control adjustment [RFC5681].	reliable transport, it retransmits the missing data.


	o Some experimental transport protocols and TCP variants [Vegas,	o If the transport supports explicit congestion notification (ECN)
	I-D.ietf-ledbat-congestion...] sense queuing delays in the network	[RFC3168] or pre-congestion notification (PCN) [RFC5670] , the
	before the network itself signals congestion. From the	transport sender indicates this by setting an ECN-capable
	perspective of this document, these algorithm and related	transport (ECT) codepoint in every packet. Network devices can
	techniques prevent congestion, therefore they are out of scope and	then explicitly signal congestion to the receiver by setting ECN
	are not discussed further in this document.	bits in the IP header of such packets. The transport receiver
		communicates these ECN signals back to the sender, which then
		performs the appropriate congestion control rate reduction.


	o With Explicit Congestion Notification (ECN) [RFC3168], network	o Some experimental transport protocols and TCP variants [Vegas]
	devices explicitly indicate congestion by setting ECN bits in the	sense queuing delays in the network and reduce their rate before
	IP header. The transport receiver communicates these signals back	the network has to signal congestion using loss or ECN. A purely
	to the sender, which then performs the mandatory congestion	delay-sensing transport will tend to be pushed out by other
	control adjustment.	competing transports that do not back off until they have driven
		the queue into loss. Therefore, modern delay-sensing algorithms
		use delay in some combination with loss to signal congestion (e.g.
		LEDBAT [I-D.ietf-ledbat-congestion], Compound
		[I-D.sridharan-tcpm-ctcp]). In the rest of this document, we will
		confine the discussion to concrete signals of congestion such as
		loss and ECN. We will not discuss delay-sensing further, because
		it can only avoid these more concrete signals of congestion in
		some circumstances.

	In all cases the congestion signals follow the route indicated in	In all cases the congestion signals follow the route indicated in
	Figure 1. A congested network device sends a signal in the data	Figure 1. A congested network device sends a signal in the data
	stream on the forward path to the transport receiver, the receiver	stream on the forward path to the transport receiver, the receiver

	passes it back to the sender through transport level acknowledgments,	passes it back to the sender through transport level feedback, and
	and the sender makes some congestion control adjustment.	the sender makes some congestion control adjustment.

	This document proposes to extend the capabilities of the Internet	This document proposes to extend the capabilities of the Internet

	suite with the addition of a Congestion Exposure Signal that relays	protocol suite with the addition of a Congestion Exposure Signal
	the congestion information from the Transport Sender back through the	that, to a first approximation, relays the congestion information
	network layer. That signal is shown Figure 1. It would be visible	from the transport sender back through the internetwork layer. That
	to all network layer devices along the forward (data) path and is	signal is shown in Figure 1. It would be visible to all internetwork
	intended to support a number of new policy mechanism that might be	layer devices along the forward (data) path and is intended to
		support a number of new policy-controlled mechanisms that might be
	used to manage traffic.	used to manage traffic.

		123456789012345678901234567890123456789012345678901234567890123456789
	1234567890123456789012345678901234567890123456789012345678901234567890	+---------+ +---------+
	----------- ------------- -----------	\| \|<==Feedback Path==============================<\| \|
	\| \| \|(Congested)\| \| \|	\| \|<--Transport Layer returned Congestion Signal-<\| \|
	\| \|>==Data=Path=>\| Network \|>=====Data=Path=====>\| \|	\| \| \| \|
	\|Transport\| \| Device \|>-Congestion-Signal->\|Transport\|	\|Transport\| \|Transport\|
	\| Sender \| ------------- \| Receiver\|	\| Sender \|>-(new)-IP layer Congestion Exposure Signal--->\| Receiver\|
	\| \| \| \|	\| \| (Carried in Data Packet Headers) \| \|
	\| \|<====ACK=Path==================================<\| \|	\| \| +-----------+ \| \|
	\| \|<---Transport Layer returned Congestion Signal-<\| \|	\| \|>=Data=Path=>\|(Congested)\|>=====Data=Path=====>\| \|
	\| \| \| \|	\| \| \| Network \|>-Congestion-Signal->\| \|
	\| \|>-(new)-IP layer Congestion Exposure Signal---->\| \|	\| \| \| Device \| \| \|
	----------- (Carried in Data Packets) -----------	+---------+ +-----------+ +---------+

	Not shown are policy devices along the data path that observe the	Not shown are policy devices along the data path that observe the
	Congestion Exposure Signal, and use the information to monitor or	Congestion Exposure Signal, and use the information to monitor or

	manage traffic. These are discussed in Section 4.2.	manage traffic. These are discussed in Section 4.4.

	Figure 1	Figure 1


	1.1. Requirements Language	1.1. Terminology

	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",	The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this	"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
	document are to be interpreted as described in RFC 2119 [RFC2119].	document are to be interpreted as described in RFC 2119 [RFC2119].


		ConEx signals in IP packet headers from the sender to the network
		{ToDo: These are placeholders for whatever words we decide to use}:

		Not-ConEx (aka White) The transport is not ConEx-capable

		ConEx (aka Grey) The transport is ConEx-capable

		Re-Echo-Loss (aka Purple) The transport has experienced a loss.

		Re-Echo-ECN (aka Black) The transport has experienced an ECN mark

		Credit (aka Green) The transport is building up credit to allow for
		any future delay in expected ConEx signals

		ConEx-Marked Any of Re-Echo-Loss, Re-Echo-ECN or Credit.

		ConEx-Unmarked ConEx, but not ConEx-Marked.

	2. Requirements for the Congestion Exposure Signal	2. Requirements for the Congestion Exposure Signal


	a. The Congestion Exposure Signal must be visible to the network	Ideally, all the following requirements would be met by a Congestion
	layer along the entire path from the transport sender to the	Exposure Signal. However it is already known that some compromises
	transport receiver. Equivalently, it must be present in the IPv4	will be necessary, therefore all the requirements are expressed with
	or IPv6 header. A corollary of this is that existing (legacy)	the keyword 'SHOULD' rather then 'MUST'. The only mandatory
	networking gear must at the very minimum pass the Congestion	requirement is that a concrete protocol description MUST give sound
	Exposures Signal without modification.	reasoning if it chooses not to meet any of these requirements:


	b. The Congestion Exposure Signal must be useful under only partial	a. The Congestion Exposure Signal SHOULD be visible to internetwork
	deployment. A minimal deployment must only require changes to	layer devices along the entire path from the transport sender to
	the transport senders. Furthermore, partial deployment should	the transport receiver. Equivalently, it SHOULD be present in
	create incentives for additional deployment, both in terms of	the IPv4 or IPv6 header, and in the outermost IP header if using
	enabling Congestion Exposure on more devices and adding richer	IP in IP tunnelling. The Congestion Exposure Signal SHOULD be
	features to existing devices. It is anticipated that ConEx	immutable once set by the transport sender. A corollary of these
	deployment will be asymptotic, and some residual class of hosts	requirements is that existing (legacy) networking gear SHOULD
	and network equipment will never fully support the Congestion	pass the Congestion Exposure Signal silently without
	Exposure Protocol.	modification.


	c. The Congestion Exposure Signal must be timely and accurate. It	b. The Congestion Exposure Signal SHOULD be useful under only
	must not be delayed by significantly more than one RTT from the	partial deployment. A minimal deployment SHOULD only require
	congestion event which triggered the signal. There must be	changes to transport senders. Furthermore, partial deployment
	techniques to audit the Congestion Exposure Signal by comparing	SHOULD create incentives for additional deployment, both in terms
	it to the actual congestion signals on the forward data path.	of enabling Congestion Exposure on more devices and adding richer
	The auditing mechanism must have a capability for providing	features to existing devices. Nonetheless, ConEx deployment need
	strong disincentives for miss-reporting congestion, such as by	never be universal, and it is anticipated that some hosts and
		some transports may never support the Congestion Exposure
		Protocol and some networks may never use the Congestion Exposure
		Signals.

		c. The Congestion Exposure Signal SHOULD be accurate. In
		potentially hostile environments such as the public Internet, it
		SHOULD be possible for techniques to be deployed to audit the
		Congestion Exposure Signal by comparing it to the actual
		congestion signals on the forward data path. The auditing
		mechanism must have a capability for providing sufficient
		disincentives against misreported congestion, such as by
	throttling traffic that reports less congestion than it is	throttling traffic that reports less congestion than it is
	actually experiencing.	actually experiencing.


		d. The Congestion Exposure Signal SHOULD be timely. There will be a
		delay between the time when an auditing device sees an actual
		congestion signal and when it sees the subsequent Congestion
		Exposure Signal from the sender. The minimum delay will be one
		round trip, but it may be much longer depending on the
		transport's choice of feedback delay (consider RTCP [RFC3550] for
		example). It is not practical to expect auditing devices in the
		network to make allowance for such feedback delays. Instead, the
		sender SHOULD be able to send Congestion Exposure signals in
		advance, as 'credit' for any audit device to hold as a balance
		against the risk of congestion during the feedback delay. This
		design choice simplifies auditing devices and correctly makes the
		transport responsible for both minimising feedback delay and
		minimising sharp increases in packets in flight that would risk
		causing excessive congestion to others. This issue is discussed
		in more detail in Section 4.3.

	It is important to note that the auditing requirement implies a	It is important to note that the auditing requirement implies a
	number of additional constraints: The basic auditing technique is to	number of additional constraints: The basic auditing technique is to

	count both congestion signals and Congestion Exposure Signals	count both actual congestion signals and Congestion Exposure Signals
	someplace along the data path. For congestion signaled by ECN, this	someplace along the data path:
	is most accurate when done near the transport receiver. The total
	number of ECN marks seen near the receiver should always be equal to
	or less than the number of Congestion Exposure Signals seen one RTT
	later.


	Auditing loss based Congestion Exposure can most easily be	o For congestion signaled by ECN, auditing is most accurate when
	implemented near the sender, since down stream losses appear as	located near the transport receiver. Within any flow or aggregate
	duplicate data for all reliable protocols (and duplicate sequence	of flows, the total volume of ECN marked data seen near the
	numbers for TCP). The auditor can detect losses by observing both	receiver should always be equal to or less than the volume of data
	the original transmission and the retransmission after the loss.	tagged with Congestion Exposure Signals.
	(This method does assume that IPsec is not in use).


	Given that loss based and ECN based Congestion Exposure are best	o For congestion signaled by loss, totally accurate auditing is not
	audited at different locations, it is likely that they will need to	believed to be possible in the general case, because it involves a
	have distinct encodings. In addition the simplest mechanism to	network node detecting the absence of some packets, when it cannot
	address the one RTT delay between the congestion event and the	necessarily see the transport protocol sequence numbers and when
	Congestion Exposure Signal is to pre-mark some packets with a special	the missing packets might simply be taking a different route. But
	Congestion Exposure credit prior any true congestion marks. This	there are common cases where sufficient audit accuracy should be
	technique is described in more detail in Section 4.2.1.	possible:

		* For non-IPsec traffic conforming to standard TCP sequence
		numbering on a single path, an auditor could detect losses by
		observing both the original transmission and the retransmission
		after the loss. Such auditing would be most accurate near the
		sender.

		* For networks designed so that losses predominantly occur under
		the management of one IP-aware node on the path, the auditor
		could be located at this bottleneck. It could simply compare
		Congestion Exposure Signals with actual local losses. This is
		a good model for most consumer access networks and audit
		accuracy could well be sufficient even if losses occasionally
		occurred at other nodes in the network, such as border gateways
		(see Section 4.3 for details).

		Given that loss-based and ECN-based Congestion Exposure might
		sometimes be best audited at different locations, having distinct
		encodings would widen the design space for the auditing function.

	3. Representing Congestion Exposure	3. Representing Congestion Exposure

	Most protocol specifications start with a description of packet	Most protocol specifications start with a description of packet

	formats and code points with their associated meanings. This	formats and codepoints with their associated meanings. This document
	document does not: It is already known that choosing the encoding for	does not: It is already known that choosing the encoding for the
	the Congestion Exposure Signal is likely to entail some engineering	Congestion Exposure Signal is likely to entail some engineering
	compromises that have the potential to reduce the protocol's	compromises that have the potential to reduce the protocol's
	usefulness in some settings. Rather than making these engineering	usefulness in some settings. Rather than making these engineering
	choices prematurely, this document side steps the encoding problem by	choices prematurely, this document side steps the encoding problem by

	describing an abstract representation of Congestion Exposure Signal.	describing an abstract representation of a Congestion Exposure
	All of the elements of the protocol can be defined in terms of this	Signal. All of the elements of the protocol can be defined in terms
	abstract representation. Most important, the preliminary use cases	of this abstract representation. Most important, the preliminary use
	for the protocol are described in terms of the abstract	cases for the protocol are described in terms of the abstract
	representation in companion documents.	representation in companion documents [I-D.conex-concepts-uses].

	Once we have some example use cases we can evaluate different	Once we have some example use cases we can evaluate different

	encoding schemes. Since theses schemes are likely to include some	encoding schemes. Since these schemes are likely to include some
	conflated code points, some information will be lost resulting in	conflated code points, some information will be lost resulting in
	weakening or disabling some of the algorithms and eliminating some	weakening or disabling some of the algorithms and eliminating some
	use cases.	use cases.

	The goal of this approach is to be as complete as possible for	The goal of this approach is to be as complete as possible for
	discovering the potential usage and capabilities of the Congestion	discovering the potential usage and capabilities of the Congestion

	Exposure protocol, so we have some hope of of making optimal design	Exposure protocol, so we have some hope of making optimal design
	decisions when choosing the encoding.	decisions when choosing the encoding.


	3.1. One Simple Encoding	3.1. Strawman Encoding

	As an aid to the reader, it might be helpful to describe one simple
	encoding of the Congestion Exposure protocol: set IPv4 header bit 48
	(aka the "evil bit" [RFC3514]) on all retransmissions or once per ECN
	signaled window reduction. Clearly network devices along the forward
	path can see this bit and act on it. For example they can count
	marked and unmarked packets to estimate the congestion levels along
	the path.


	However this encoding has been forbidden by RFC xxxx, which seeks to	As an aid to the reader, it might be helpful to describe a naive
	preserve the last unallocated bit in the IPv4 header for some	strawman encoding of the Congestion Exposure protocol described
	unspecifed future use.	solely in terms of TCP: set the Reserved bit in the IPv4 header (bit
		48 counting from zero [RFC0791]--aka the "evil bit" [RFC3514]) on all
		retransmissions or once per ECN signaled window reduction. Clearly
		network devices along the forward path can see this bit and act on
		it. For example they can count marked and unmarked packets to
		estimate the congestion levels along the path.


	Furthermore this encoding, by itself, does not sufficiently support	However, the IESG has chartered the ConEx working group to establish
	partial deployment or strong auditing and might motivate users and/or	that there is sufficient demand for an IPv6 ConEx protocol before
	applications to misrepresent the congestion that they are be causing.	using the last available bit in the IPv4 header. Furthermore this
		encoding, by itself, does not sufficiently support partial deployment
		or strong auditing and might motivate users and/or applications to
		misrepresent the congestion that they are causing.


	However, this simple encoding does present a clear mental model of	Nonetheless, this strawman encoding does present a clear mental model
	how the Congestion Exposure protocol functions and is very useful for	of how the Congestion Exposure protocol might function under various
	conducting thought experiments about how the protocol might function	uses.
	under various uses.

	3.2. ECN Based Encoding	3.2. ECN Based Encoding


	Bob Briscoe's PhD thesis [Refb-dis], and many derivative works	The re-ECN specification [I-D.briscoe-tsvwg-re-ecn-tcp] presents an
	including RE-ECN [I-D.briscoe-tsvwg-re-ecn-tcp] present an ECN based	ECN based implementation of ConEx. The central theme of this work is
	implementation of ConEx. The central theme of this work includes	an audit mechanism that can provide sufficient disincentives against
	strong disincentives for misrepresenting congestion	misrepresenting congestion [I-D.briscoe-tsvwg-re-ecn-motiv], which is
	[I-D.briscoe-tsvwg-re-ecn-motiv]. However, it also pre-supposes the	analysed extensively in Briscoe's PhD dissertation [Refb-dis].
	full deployment of ECN, and does not adequately signal congestion
	indicated by packet loss. Furthermore, given that after 10 years ECN
	still has not been widely deployed, it does not seem prudent to
	require its deployment as a prerequisite for deploying a Congestion
	Exposure protocol.


	As it currently stands, this work fails to meet the "partial	The re-ECN encoding is tightly integrated with the encoding of ECN in
	deployment" requirement described above in section Section 2.	the IP header. However, re-ECN can be incrementally deployed on
		hosts whether or not any networks support ECN marking and whether or
		not any networks take note of re-ECN markings. Nonetheless, the
		audit function has only been formally analysed where at least one
		autonomous network has deployed ECN marking, which it uses to audit
		whether the Congestion Exposure Signal matches actual congestion.

		Thus, even if networks have not deployed ECN, re-ECN acts perfectly
		well as a loss-based Congestion Exposure protocol. As such, a
		network could potentially audit re-ECN signals against losses using
		the loss-based audit techniques in Section 4.3, rather than deploying
		ECN.

		Although re-ECN does not require networks to support ECN, it still
		embodies a major incremental deployment challenge; a sender cannot
		use re-ECN unless the receiver at least supports ECN. Most operating
		systems currently being supplied (late 2010) implement ECN, but it is
		turned off by default at the client end, even though it is on by
		default at the server end. This is primarily because one home
		gateway model widely supplied in 2006 crashes if a TCP client behind
		it attempts to use ECN (there are issues with some other home
		gateways from that era, but they are surmountable with ECN black-hole
		detection).

		Given that, 10 years after standardisation, ECN has still not been
		widely enabled on TCP clients, if at all possible the Congestion
		Exposure protocol should not require the receiver to be ECN capable.
		Therefore, as it currently stands, the re-ECN encoding would fail to
		meet the "partial deployment" requirement of Section 2.

	For a tutorial background on Re-Feedback techniques, see [,,] {Bob:	For a tutorial background on Re-Feedback techniques, see [,,] {Bob:
	Matt, What did you have in mind here? SIGCOMM'05 paper? IEEE	Matt, What did you have in mind here? SIGCOMM'05 paper? IEEE
	Spectrum article? Re-ECN Web page?}.	Spectrum article? Re-ECN Web page?}.

	3.2.1. ECN Changes	3.2.1. ECN Changes


	It is important to note that Briscoe's work proposes some relatively	Although the re-ECN protocol requires no changes to the network side
	minor modifications to the ECN protocol specified in RFC 3168. They	of the ECN protocol, it is important to note that it does propose
	include: redefining the ECT(0) and ECT(1) code points (this is	some relatively minor modifications to the host-to-host aspects of
	consistent with RFC3168 but requires deprecating [RFC3540]);	the ECN protocol specified in RFC 3168. They include: redefining the
	permitting routers to send ECN signals at a different threshold than	ECT(1) code point (the change is consistent with RFC3168 but requires
	packet loss; modifications to the ECN negotiations carried on the SYN	deprecating the experimental ECN nonce [RFC3540]); modifications to
	and SYN-ACK; and using a different state machine to carry ECN signals	the ECN negotiations carried on the SYN and SYN-ACK; and using a
	in the transport acknowledgments from the Receiver to the Sender.	different state machine to carry ECN signals in the transport
	This later change permits the transport protocol to carry multiple	acknowledgments from the Receiver to the Sender. This last change
	congestion signals per round trip, and greatly simplifies accurate	permits the transport protocol to carry multiple congestion signals
	auditing.	per round trip, and greatly simplifies accurate auditing.

	All of these adjustments to RFC 3168 may also be needed in a future	All of these adjustments to RFC 3168 may also be needed in a future

	standardized Congestion Exposure protocol. There will be very	standardized Congestion Exposure protocol. There will need to be
	careful considerations about any proposed changes to ECN or other	very careful consideration of any proposed changes to ECN or other
	existing protocols, because any such changes increase the cost of	existing protocols, because any such changes increase the cost of
	deployment.	deployment.

	3.3. Abstract Encoding	3.3. Abstract Encoding


	{ToDo: Not really done, extra terse}	The Congestion Exposure protocol could take one of two different
		encodings: independently settable bits or an enumerated set of
		mutually exclusive codepoints.


	Model with two different encodings: individual bits or as an	In both cases, the amount of congestion is signaled by the volume of
	enumerated set. Enumerated encoding is probably good enough for most	marked data--just as the volume of lost data or ECN marked data
	purposes, but it must not be forgotten that it does lose some small	signals the amount of congestion experienced. Thus the size of each
	amount of information.	packet carrying a Congestion Exposure Signal is signficant.


	3.3.1. Separate Bits	3.3.1. Independent Bits


	One bit each for	This encoding involves a field of four flag bits, each of which the
		sender can set independently to indicate to the network that:


	o Not supported (implicit signal from legacy transport senders)	ConEx (Not-ConEx) The transport is (or is not) using ConEx with this
		packet (the protocol MUST be arranged so that legacy transport
		senders implicitly send Not-ConEx)


	o Congestion indicated by packet losses	Re-Echo-Loss (Not-Re-Echo-Loss) The transport has (or has not)
		experienced a loss


	o ECN signaled congestion	Re-Echo-ECN (Not-Re-Echo-ECN) The transport has (or has not)
		experienced ECN signaled congestion


	o Pre-congestion credit (AKA green). See Section 4.2.1 devices	Credit (Not-Credit) The transport is (or is not) building up
	below.	congestion credit (see Section 4.3 on audit devices)


	3.3.2. Enumerated Encoding	3.3.2. Codepoint Encoding


	For enumerated encoding some marks must be delayed such that each	This encoding involves a bit-field large enough to signal one of the
	packet only carries at most one mark.	following five codepoints:


	ENUM {Not_Supported, No_Mark, Black_ECN, Black_Loss, Green}	ENUM {Not-ConEx, ConEx, Re-Echo-Loss, Re-Echo-ECN, Credit}

		Each named codepoint has the same meaning as in the encoding using
		independent bits (Section 3.3.1). The use of any one codepoint
		implies the negative of all the others, except the last three
		codepoints (Re-Echo-Loss, Re-Echo-ECN and Credit) obviously also
		imply ConEx is supported.

		Inherently, the semantics of most of the enumerated codepoints are
		mutually exclusive. 'Credit' is the only one that might need to be
		used in combination with either Re-Echo-Loss or Re-Echo-ECN, but even
		that requirement is questionable. It must not be forgotten that the
		enumerated encoding loses the flexibility to signal these two
		combinations, whereas the encoding with four independent bits is not
		so limited. Alternatively two extra codepoints could be assigned to
		these two combinations of semantics.

		{ToDo: Default behaviour for Currently Unused codepoints}

		{ToDo: Signal from Policer to Receiver to distinguish policy-induced
		drop from congestion-induced drop}

		Some might prefer to use the following colours respectively for each
		codepoint. The same colours as follows (with the omission of Purple)
		were used to describe re-ECN codepoints:

		ENUM {White, Grey, Purple, Black, Green}.

	4. Congestion Exposure Components	4. Congestion Exposure Components


		{ToDo: Picture of the components, similar to that in the last
		slideset about conex-concepts-uses?}

	4.1. Modified Senders	4.1. Modified Senders


	Send Congestion Exposure Signals per congestion signals.	The sending transport needs to be modified to send Congestion
		Exposure Signals in response to congestion feedback signals.


	4.2. Policy Devices	4.2. Receivers (Optionally Modified)


	4.2.1. Audit	The receiving transport may already feedback sufficiently useful
		signals to the sender so that it does not need to be altered.


	For loss: detect retransmissions by monitoring sequence numbers.	However, a TCP receiver feeds back ECN congestion signals no more
	Assure that #retransmissions<=#Black_Loss	than once within a round trip. The sender may require more precise
		feedback from the receiver otherwise it will appear to be
		understating its Congestion Exposure Signals (see Section 3.2.1).


	(May need to include a fudge factor, because it would be more robust	Ideally, Congestion Exposure should be added to a transport like TCP
	to mark the packet after a retransmission. Otherwise network devices	without mandatory modifications to the receiver. But an optional
	that discard marked packets will cause connectivity failures, rather	modification to the receiver could be recommended for precision.
	than poor performance).	This was the approach taken when adding re-ECN to TCP
		[I-D.briscoe-tsvwg-re-ecn-tcp].


	For ECN: count Congestion Exposure Signals and ECN. Would normally	4.3. Audit
	need to delay ECN by one RTT to avoid false positives. Alternative:
	use Green (pre-credits) to assure that #ECN<=#Black_ECN+#GREEN, even
	though the #Black_ECN is delayed by one RTT.


	4.2.2. Policers and Shapers	To audit Congestion Exposure Signals against actual losses an auditor
		could use one of the following techniques:


	{ToDo: Beware these terms are defined differently than the	TCP-specific approach: The auditor could monitor TCP flows or
	conventional usage.}	aggregates of flows, only holding state on a flow if it first
		sends a Credit or a Re-Echo-Loss marking. The auditor could
		detect retransmissions by monitoring sequence numbers. It would
		assure that (volume of retransmitted data) <= (volume of data
		marked Re-Echo-Loss). Traffic would only be auditable in this way
		if it conformed to the standard TCP protocol and the IP payload
		was not encrypted (e.g. with IPsec).


	{ToDo: Abridge from existing doc?}	Predominant bottleneck approach: Unlike the above TCP-specific
		solution, this technique would work for IP packets carrying any
		transport layer protocol, and whether encrypted or not. But it
		only works well for networks designed so that losses predominantly
		occur under the management of one IP-aware node on the path. The
		auditor could then be located at this bottleneck. It could simply
		compare Congestion Exposure Signals with actual local losses.
		Most consumer access networks are design to this model, e.g. the
		radio network controller (RNC) in a cellular network or the
		broadband remote access server (BRAS) in a digital subscriber line
		(DSL) network.

		The accuracy of an auditor at one predominant bottleneck might
		still be sufficient, even if losses occasionally occurred at other
		nodes in the network (e.g. border gateways). Although the auditor
		at the predominant bottleneck would not always be able to detect
		losses at other nodes, transports would not know where losses were
		occurring either. Therefore any transport would not know which
		losses it could cheat on without getting caught, and which ones it
		couldn't.

		To audit Congestion Exposure Signals against actual ECN markings or
		losses, the auditor could work as follows: monitor flows or
		aggregates of flows, only holding state on a flow if it first sends a
		Credit or either Re-Echo marking. Count the number of bytes marked
		with Credit or Re-Echo-ECN. Separately count the number of bytes
		marked with ECN. Use Credits to assure that #ECN<=#Re-Echo-
		ECN+#Credit, even though the Re-Echo-ECN markings are delayed by at
		least one RTT.

		Note that an auditing device involves no policy configuration; it
		merely enforces protocol compliance, not policy.

		4.4. Policy Devices

		4.4.1. Congestion Policers

		Note that a congestion policer can be implemented in a very similar
		way to a bit-rate policer, but its effect is focused solely on
		traffic causing congestion downstream, not on all traffic just in
		case it causes congestion.

		It monitors all ConEx traffic entering a network, or some
		identifiable subset. Using Congestion Exposure signals, it measures
		the amount of congestion being caused by this traffic. If this
		exceeds a policy-configured 'congestion-bit-rate' the congestion
		policer will limit all the monitored ConEx traffic. A congestion
		policer can be implemented by a simple token bucket. But unlike a
		bit-rate policer, it only removes tokens when forwarding packets that
		a ConEx marked. See [CongPol] for details.

		4.4.2. Other Policy Devices

		Other policy devices that use Congestion Exposure signaling might
		traffic traffic based on Congestion Exposure Signals in much the same
		way as the monitoring element of a Congestion Policer. But the
		resulting action could be different. It might re-route traffic or
		downgrade the class of service.

		It might do nothing directly to the traffic, but instead report
		measurements of Congestion Exposure Signals to systems designed to
		control congestion indirectly. For instance the measurements might
		be used to trigger penalty clauses in contracts, to levy charges
		between networks based on congestion or simply to notify customers
		who cause excessive congestion.

	5. IANA Considerations	5. IANA Considerations

	This memo includes no request to IANA.	This memo includes no request to IANA.

	Note to RFC Editor: this section may be removed on publication as an	Note to RFC Editor: this section may be removed on publication as an
	RFC.	RFC.

	6. Security Considerations	6. Security Considerations


	{ToDo:}	Significant parts of this whole document are about the auditability
		of Congestion Exposure Signals, in particular Section 4.3.

	7. Conclusions	7. Conclusions

	{ToDo:}	{ToDo:}

	8. Acknowledgements	8. Acknowledgements


	{ToDo:}	This document was improved by review comments from Toby Moncaster.

	9. Comments Solicited	9. Comments Solicited

	Comments and questions are encouraged and very welcome. They can be	Comments and questions are encouraged and very welcome. They can be
	addressed to the IETF Congestion Exposure (ConEx) working group	addressed to the IETF Congestion Exposure (ConEx) working group
	mailing list <conex@ietf.org>, and/or to the authors.	mailing list <conex@ietf.org>, and/or to the authors.

	10. References	10. References

	10.1. Normative References	10.1. Normative References

	[RFC2119] Bradner, S., "Key words for use in	[RFC2119] Bradner, S., "Key words for use in
	RFCs to Indicate Requirement	RFCs to Indicate Requirement
	Levels", BCP 14, RFC 2119,	Levels", BCP 14, RFC 2119,
	March 1997.	March 1997.

	10.2. Informative References	10.2. Informative References


		[CongPol] Jacquet, A., Briscoe, B., and T.
		Moncaster, "Policing Freedom to Use
		the Internet Resource Pool", Proc
		ACM Workshop on Re-Architecting the
		Internet (ReArch'08) ,
		December 2008, <http://
		www.bobbriscoe.net/
		pubs.html#polfree>.

	[I-D.briscoe-tsvwg-re-ecn-motiv] Briscoe, B., Jacquet, A.,	[I-D.briscoe-tsvwg-re-ecn-motiv] Briscoe, B., Jacquet, A.,
	Moncaster, T., and A. Smith, "Re-	Moncaster, T., and A. Smith, "Re-
	ECN: A Framework for adding	ECN: A Framework for adding
	Congestion Accountability to	Congestion Accountability to
	TCP/IP", draft-briscoe-tsvwg-re-	TCP/IP", draft-briscoe-tsvwg-re-
	ecn-tcp-motivation-01 (work in	ecn-tcp-motivation-01 (work in
	progress), September 2009.	progress), September 2009.

	[I-D.briscoe-tsvwg-re-ecn-tcp] Briscoe, B., Jacquet, A.,	[I-D.briscoe-tsvwg-re-ecn-tcp] Briscoe, B., Jacquet, A.,
	Moncaster, T., and A. Smith, "Re-	Moncaster, T., and A. Smith, "Re-
	ECN: Adding Accountability for	ECN: Adding Accountability for
	Causing Congestion to TCP/IP",	Causing Congestion to TCP/IP",
	draft-briscoe-tsvwg-re-ecn-tcp-08	draft-briscoe-tsvwg-re-ecn-tcp-08
	(work in progress), September 2009.	(work in progress), September 2009.


		[I-D.conex-concepts-uses] Briscoe, B., Woundy, R., Moncaster,
		T., and J. Leslie, "ConEx Concepts
		and Use Cases", draft-moncaster-
		conex-concepts-uses-01 (work in
		progress), July 2010.

	[I-D.ietf-ledbat-congestion] Shalunov, S. and G. Hazel, "Low	[I-D.ietf-ledbat-congestion] Shalunov, S. and G. Hazel, "Low
	Extra Delay Background Transport	Extra Delay Background Transport
	(LEDBAT)",	(LEDBAT)",
	draft-ietf-ledbat-congestion-02	draft-ietf-ledbat-congestion-02
	(work in progress), July 2010.	(work in progress), July 2010.


		[I-D.sridharan-tcpm-ctcp] Sridharan, M., Tan, K., Bansal, D.,
		and D. Thaler, "Compound TCP: A New
		TCP Congestion Control for High-
		Speed and Long Distance Networks",
		draft-sridharan-tcpm-ctcp-02 (work
		in progress), November 2008.

		[RFC0791] Postel, J., "Internet Protocol",
		STD 5, RFC 791, September 1981.

	[RFC2309] Braden, B., Clark, D., Crowcroft,	[RFC2309] Braden, B., Clark, D., Crowcroft,
	J., Davie, B., Deering, S., Estrin,	J., Davie, B., Deering, S., Estrin,
	D., Floyd, S., Jacobson, V.,	D., Floyd, S., Jacobson, V.,
	Minshall, G., Partridge, C.,	Minshall, G., Partridge, C.,
	Peterson, L., Ramakrishnan, K.,	Peterson, L., Ramakrishnan, K.,
	Shenker, S., Wroclawski, J., and L.	Shenker, S., Wroclawski, J., and L.
	Zhang, "Recommendations on Queue	Zhang, "Recommendations on Queue
	Management and Congestion Avoidance	Management and Congestion Avoidance
	in the Internet", RFC 2309,	in the Internet", RFC 2309,
	April 1998.	April 1998.

	skipping to change at page 10, line 27	skipping to change at page 15, line 16

	[RFC3514] Bellovin, S., "The Security Flag in	[RFC3514] Bellovin, S., "The Security Flag in
	the IPv4 Header", RFC 3514,	the IPv4 Header", RFC 3514,
	April 2003.	April 2003.

	[RFC3540] Spring, N., Wetherall, D., and D.	[RFC3540] Spring, N., Wetherall, D., and D.
	Ely, "Robust Explicit Congestion	Ely, "Robust Explicit Congestion
	Notification (ECN) Signaling with	Notification (ECN) Signaling with
	Nonces", RFC 3540, June 2003.	Nonces", RFC 3540, June 2003.


		[RFC3550] Schulzrinne, H., Casner, S.,
		Frederick, R., and V. Jacobson,
		"RTP: A Transport Protocol for
		Real-Time Applications", STD 64,
		RFC 3550, July 2003.

		[RFC5670] Eardley, P., "Metering and Marking
		Behaviour of PCN-Nodes", RFC 5670,
		November 2009.

	[RFC5681] Allman, M., Paxson, V., and E.	[RFC5681] Allman, M., Paxson, V., and E.
	Blanton, "TCP Congestion Control",	Blanton, "TCP Congestion Control",
	RFC 5681, September 2009.	RFC 5681, September 2009.

	[Refb-dis] Briscoe, B., "Re-feedback: Freedom	[Refb-dis] Briscoe, B., "Re-feedback: Freedom
	with Accountability for Causing	with Accountability for Causing
	Congestion in a Connectionless	Congestion in a Connectionless
	Internetwork", UCL PhD	Internetwork", UCL PhD
	Dissertation , 2009, <http://	Dissertation , 2009, <http://
	bobbriscoe.net/projects/refb/	bobbriscoe.net/projects/refb/

	skipping to change at page 11, line 5	skipping to change at page 16, line 5

	[Vegas] Brakmo, L. and L. Peterson, "TCP	[Vegas] Brakmo, L. and L. Peterson, "TCP
	Vegas: End-to-End Congestion	Vegas: End-to-End Congestion
	Avoidance on a Global Internet",	Avoidance on a Global Internet",
	IEEE Journal on Selected Areas in	IEEE Journal on Selected Areas in
	Communications 13(8)1465--80,	Communications 13(8)1465--80,
	October 1995, <http://	October 1995, <http://
	ieeexplore.ieee.org/iel1/49/9740/	ieeexplore.ieee.org/iel1/49/9740/
	00464716.pdf?arnumber=464716>.	00464716.pdf?arnumber=464716>.


	Author's Address	Authors' Addresses

	Matt Mathis	Matt Mathis
	Google	Google

	Phone:	Phone:
	Fax:	Fax:
	EMail: mattmathis at google.com	EMail: mattmathis at google.com
	URI:	URI:


		Bob Briscoe
		BT
		B54/77, Adastral Park
		Martlesham Heath
		Ipswich IP5 3RE
		UK

		Phone: +44 1473 645196
		EMail: bob.briscoe@bt.com
		URI: http://bobbriscoe.net/

End of changes. 63 change blocks.
	204 lines changed or deleted	444 lines changed or added
This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/