< draft-mathis-conex-abstract-mech-00a.txt   draft-mathis-conex-abstract-mech-00c.txt >
Congestion Exposure (ConEx) M. Mathis Congestion Exposure (ConEx) M. Mathis
Working Group Google Working Group Google
Internet-Draft October 14, 2010 Internet-Draft B. Briscoe
Intended status: Informational Intended status: Informational BT
Expires: April 17, 2011 Expires: April 18, 2011 October 15, 2010
ConEx Concepts and Abstract Mechanism Congestion Exposure (ConEx) Concepts and Abstract Mechanism
draft-mathis-conex-abstract-mech-00a draft-mathis-conex-abstract-mech-00c
Abstract Abstract
This document describes and abstract mechanism by which senders This document describes an abstract mechanism by which senders inform
inform the network about the congestion encountered by previous the network about the congestion encountered by packets earlier in
packets on the same flow. Today, the network may signal congestion the same flow. Today, the network may signal congestion to the
by ECN markings or by dropping packets, and the receiver passes this receiver by ECN markings or by dropping packets, and the receiver may
information back to the sender in transport-layer acknowledgments. pass this information back to the sender in transport-layer feedback.
The mechanism to be developed by the CONEX WG will enable the sender The mechanism to be developed by the ConEx WG will enable the sender
to also relay the congestion information back into the network in- to also relay this congestion information back into the network in-
band at the IP layer, such that the total level of congestion is band at the IP layer, such that the total level of congestion is
visible to all IP devices along the path, from where it could, for visible to all IP devices along the path, from where it could, for
example, be provided as input to traffic management. example, be provided as input to traffic management.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 17, 2011. This Internet-Draft will expire on April 18, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 4 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4
2. Requirements for the Congestion Exposure Signal . . . . . . . . 4 2. Requirements for the Congestion Exposure Signal . . . . . . . 5
3. Representing Congestion Exposure . . . . . . . . . . . . . . . 5 3. Representing Congestion Exposure . . . . . . . . . . . . . . . 7
3.1. One Simple Encoding . . . . . . . . . . . . . . . . . . . . 6 3.1. Strawman Encoding . . . . . . . . . . . . . . . . . . . . 7
3.2. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 6 3.2. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 8
3.2.1. ECN Changes . . . . . . . . . . . . . . . . . . . . . . 7 3.2.1. ECN Changes . . . . . . . . . . . . . . . . . . . . . 9
3.3. Abstract Encoding . . . . . . . . . . . . . . . . . . . . . 7 3.3. Abstract Encoding . . . . . . . . . . . . . . . . . . . . 9
3.3.1. Separate Bits . . . . . . . . . . . . . . . . . . . . . 7 3.3.1. Independent Bits . . . . . . . . . . . . . . . . . . . 9
3.3.2. Enumerated Encoding . . . . . . . . . . . . . . . . . . 8 3.3.2. Codepoint Encoding . . . . . . . . . . . . . . . . . . 10
4. Congestion Exposure Components . . . . . . . . . . . . . . . . 8 4. Congestion Exposure Components . . . . . . . . . . . . . . . . 10
4.1. Modified Senders . . . . . . . . . . . . . . . . . . . . . 8 4.1. Modified Senders . . . . . . . . . . . . . . . . . . . . . 10
4.2. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 8 4.2. Receivers (Optionally Modified) . . . . . . . . . . . . . 11
4.2.1. Audit . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.3. Audit . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2.2. Policers and Shapers . . . . . . . . . . . . . . . . . 8 4.4. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 12
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 4.4.1. Congestion Policers . . . . . . . . . . . . . . . . . 12
6. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 4.4.2. Other Policy Devices . . . . . . . . . . . . . . . . . 12
7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 9 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13
9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 9 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 13
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13
10.1. Normative References . . . . . . . . . . . . . . . . . . . 9 9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 13
10.2. Informative References . . . . . . . . . . . . . . . . . . 9 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13
10.1. Normative References . . . . . . . . . . . . . . . . . . . 13
10.2. Informative References . . . . . . . . . . . . . . . . . . 13
1. Introduction 1. Introduction
One of the required functions of a transport protocol is controlling One of the required functions of a transport protocol is controlling
congestion in the network. There are three techniques in use today congestion in the network. There are three techniques in use today
for signaling congestion: for the network to signal congestion to a transport:
o The most common congestion signal is packet loss. When congested, o The most common congestion signal is packet loss. When congested,
the network simply discards some packets either as part of an the network simply discards some packets either as part of an
explicit control function [RFC2309] or as the consequence of a explicit control function [RFC2309] or as the consequence of a
queue overflow or other resource starvation. The transport queue overflow or other resource starvation. The transport
receiver detects that some data is missing and signals such receiver detects that some data is missing and signals such
through transport acknowledgments to the transport sender (e.g. through transport acknowledgments to the transport sender (e.g.
TCP SACK options). The sender retransmits the missing data (if a TCP SACK options). The sender performs the appropriate congestion
reliable protocol) and then performs the mandatory congestion control rate reduction (e.g. [RFC5681] for TCP) and, if it is a
control adjustment [RFC5681]. reliable transport, it retransmits the missing data.
o Some experimental transport protocols and TCP variants [Vegas, o If the transport supports explicit congestion notification (ECN)
I-D.ietf-ledbat-congestion...] sense queuing delays in the network [RFC3168] or pre-congestion notification (PCN) [RFC5670] , the
before the network itself signals congestion. From the transport sender indicates this by setting an ECN-capable
perspective of this document, these algorithm and related transport (ECT) codepoint in every packet. Network devices can
techniques prevent congestion, therefore they are out of scope and then explicitly signal congestion to the receiver by setting ECN
are not discussed further in this document. bits in the IP header of such packets. The transport receiver
communicates these ECN signals back to the sender, which then
performs the appropriate congestion control rate reduction.
o With Explicit Congestion Notification (ECN) [RFC3168], network o Some experimental transport protocols and TCP variants [Vegas]
devices explicitly indicate congestion by setting ECN bits in the sense queuing delays in the network and reduce their rate before
IP header. The transport receiver communicates these signals back the network has to signal congestion using loss or ECN. A purely
to the sender, which then performs the mandatory congestion delay-sensing transport will tend to be pushed out by other
control adjustment. competing transports that do not back off until they have driven
the queue into loss. Therefore, modern delay-sensing algorithms
use delay in some combination with loss to signal congestion (e.g.
LEDBAT [I-D.ietf-ledbat-congestion], Compound
[I-D.sridharan-tcpm-ctcp]). In the rest of this document, we will
confine the discussion to concrete signals of congestion such as
loss and ECN. We will not discuss delay-sensing further, because
it can only avoid these more concrete signals of congestion in
some circumstances.
In all cases the congestion signals follow the route indicated in In all cases the congestion signals follow the route indicated in
Figure 1. A congested network device sends a signal in the data Figure 1. A congested network device sends a signal in the data
stream on the forward path to the transport receiver, the receiver stream on the forward path to the transport receiver, the receiver
passes it back to the sender through transport level acknowledgments, passes it back to the sender through transport level feedback, and
and the sender makes some congestion control adjustment. the sender makes some congestion control adjustment.
This document proposes to extend the capabilities of the Internet This document proposes to extend the capabilities of the Internet
suite with the addition of a Congestion Exposure Signal that relays protocol suite with the addition of a Congestion Exposure Signal
the congestion information from the Transport Sender back through the that, to a first approximation, relays the congestion information
network layer. That signal is shown Figure 1. It would be visible from the transport sender back through the internetwork layer. That
to all network layer devices along the forward (data) path and is signal is shown in Figure 1. It would be visible to all internetwork
intended to support a number of new policy mechanism that might be layer devices along the forward (data) path and is intended to
support a number of new policy-controlled mechanisms that might be
used to manage traffic. used to manage traffic.
123456789012345678901234567890123456789012345678901234567890123456789
1234567890123456789012345678901234567890123456789012345678901234567890 +---------+ +---------+
----------- ------------- ----------- | |<==Feedback Path==============================<| |
| | |(Congested)| | | | |<--Transport Layer returned Congestion Signal-<| |
| |>==Data=Path=>| Network |>=====Data=Path=====>| | | | | |
|Transport| | Device |>-Congestion-Signal->|Transport| |Transport| |Transport|
| Sender | ------------- | Receiver| | Sender |>-(new)-IP layer Congestion Exposure Signal--->| Receiver|
| | | | | | (Carried in Data Packet Headers) | |
| |<====ACK=Path==================================<| | | | +-----------+ | |
| |<---Transport Layer returned Congestion Signal-<| | | |>=Data=Path=>|(Congested)|>=====Data=Path=====>| |
| | | | | | | Network |>-Congestion-Signal->| |
| |>-(new)-IP layer Congestion Exposure Signal---->| | | | | Device | | |
----------- (Carried in Data Packets) ----------- +---------+ +-----------+ +---------+
Not shown are policy devices along the data path that observe the Not shown are policy devices along the data path that observe the
Congestion Exposure Signal, and use the information to monitor or Congestion Exposure Signal, and use the information to monitor or
manage traffic. These are discussed in Section 4.2. manage traffic. These are discussed in Section 4.4.
Figure 1 Figure 1
1.1. Requirements Language 1.1. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [RFC2119]. document are to be interpreted as described in RFC 2119 [RFC2119].
ConEx signals in IP packet headers from the sender to the network
{ToDo: These are placeholders for whatever words we decide to use}:
Not-ConEx (aka White) The transport is not ConEx-capable
ConEx (aka Grey) The transport is ConEx-capable
Re-Echo-Loss (aka Purple) The transport has experienced a loss.
Re-Echo-ECN (aka Black) The transport has experienced an ECN mark
Credit (aka Green) The transport is building up credit to allow for
any future delay in expected ConEx signals
ConEx-Marked Any of Re-Echo-Loss, Re-Echo-ECN or Credit.
ConEx-Unmarked ConEx, but not ConEx-Marked.
2. Requirements for the Congestion Exposure Signal 2. Requirements for the Congestion Exposure Signal
a. The Congestion Exposure Signal must be visible to the network Ideally, all the following requirements would be met by a Congestion
layer along the entire path from the transport sender to the Exposure Signal. However it is already known that some compromises
transport receiver. Equivalently, it must be present in the IPv4 will be necessary, therefore all the requirements are expressed with
or IPv6 header. A corollary of this is that existing (legacy) the keyword 'SHOULD' rather then 'MUST'. The only mandatory
networking gear must at the very minimum pass the Congestion requirement is that a concrete protocol description MUST give sound
Exposures Signal without modification. reasoning if it chooses not to meet any of these requirements:
b. The Congestion Exposure Signal must be useful under only partial a. The Congestion Exposure Signal SHOULD be visible to internetwork
deployment. A minimal deployment must only require changes to layer devices along the entire path from the transport sender to
the transport senders. Furthermore, partial deployment should the transport receiver. Equivalently, it SHOULD be present in
create incentives for additional deployment, both in terms of the IPv4 or IPv6 header, and in the outermost IP header if using
enabling Congestion Exposure on more devices and adding richer IP in IP tunnelling. The Congestion Exposure Signal SHOULD be
features to existing devices. It is anticipated that ConEx immutable once set by the transport sender. A corollary of these
deployment will be asymptotic, and some residual class of hosts requirements is that existing (legacy) networking gear SHOULD
and network equipment will never fully support the Congestion pass the Congestion Exposure Signal silently without
Exposure Protocol. modification.
c. The Congestion Exposure Signal must be timely and accurate. It b. The Congestion Exposure Signal SHOULD be useful under only
must not be delayed by significantly more than one RTT from the partial deployment. A minimal deployment SHOULD only require
congestion event which triggered the signal. There must be changes to transport senders. Furthermore, partial deployment
techniques to audit the Congestion Exposure Signal by comparing SHOULD create incentives for additional deployment, both in terms
it to the actual congestion signals on the forward data path. of enabling Congestion Exposure on more devices and adding richer
The auditing mechanism must have a capability for providing features to existing devices. Nonetheless, ConEx deployment need
strong disincentives for miss-reporting congestion, such as by never be universal, and it is anticipated that some hosts and
some transports may never support the Congestion Exposure
Protocol and some networks may never use the Congestion Exposure
Signals.
c. The Congestion Exposure Signal SHOULD be accurate. In
potentially hostile environments such as the public Internet, it
SHOULD be possible for techniques to be deployed to audit the
Congestion Exposure Signal by comparing it to the actual
congestion signals on the forward data path. The auditing
mechanism must have a capability for providing sufficient
disincentives against misreported congestion, such as by
throttling traffic that reports less congestion than it is throttling traffic that reports less congestion than it is
actually experiencing. actually experiencing.
d. The Congestion Exposure Signal SHOULD be timely. There will be a
delay between the time when an auditing device sees an actual
congestion signal and when it sees the subsequent Congestion
Exposure Signal from the sender. The minimum delay will be one
round trip, but it may be much longer depending on the
transport's choice of feedback delay (consider RTCP [RFC3550] for
example). It is not practical to expect auditing devices in the
network to make allowance for such feedback delays. Instead, the
sender SHOULD be able to send Congestion Exposure signals in
advance, as 'credit' for any audit device to hold as a balance
against the risk of congestion during the feedback delay. This
design choice simplifies auditing devices and correctly makes the
transport responsible for both minimising feedback delay and
minimising sharp increases in packets in flight that would risk
causing excessive congestion to others. This issue is discussed
in more detail in Section 4.3.
It is important to note that the auditing requirement implies a It is important to note that the auditing requirement implies a
number of additional constraints: The basic auditing technique is to number of additional constraints: The basic auditing technique is to
count both congestion signals and Congestion Exposure Signals count both actual congestion signals and Congestion Exposure Signals
someplace along the data path. For congestion signaled by ECN, this someplace along the data path:
is most accurate when done near the transport receiver. The total
number of ECN marks seen near the receiver should always be equal to
or less than the number of Congestion Exposure Signals seen one RTT
later.
Auditing loss based Congestion Exposure can most easily be o For congestion signaled by ECN, auditing is most accurate when
implemented near the sender, since down stream losses appear as located near the transport receiver. Within any flow or aggregate
duplicate data for all reliable protocols (and duplicate sequence of flows, the total volume of ECN marked data seen near the
numbers for TCP). The auditor can detect losses by observing both receiver should always be equal to or less than the volume of data
the original transmission and the retransmission after the loss. tagged with Congestion Exposure Signals.
(This method does assume that IPsec is not in use).
Given that loss based and ECN based Congestion Exposure are best o For congestion signaled by loss, totally accurate auditing is not
audited at different locations, it is likely that they will need to believed to be possible in the general case, because it involves a
have distinct encodings. In addition the simplest mechanism to network node detecting the absence of some packets, when it cannot
address the one RTT delay between the congestion event and the necessarily see the transport protocol sequence numbers and when
Congestion Exposure Signal is to pre-mark some packets with a special the missing packets might simply be taking a different route. But
Congestion Exposure credit prior any true congestion marks. This there are common cases where sufficient audit accuracy should be
technique is described in more detail in Section 4.2.1. possible:
* For non-IPsec traffic conforming to standard TCP sequence
numbering on a single path, an auditor could detect losses by
observing both the original transmission and the retransmission
after the loss. Such auditing would be most accurate near the
sender.
* For networks designed so that losses predominantly occur under
the management of one IP-aware node on the path, the auditor
could be located at this bottleneck. It could simply compare
Congestion Exposure Signals with actual local losses. This is
a good model for most consumer access networks and audit
accuracy could well be sufficient even if losses occasionally
occurred at other nodes in the network, such as border gateways
(see Section 4.3 for details).
Given that loss-based and ECN-based Congestion Exposure might
sometimes be best audited at different locations, having distinct
encodings would widen the design space for the auditing function.
3. Representing Congestion Exposure 3. Representing Congestion Exposure
Most protocol specifications start with a description of packet Most protocol specifications start with a description of packet
formats and code points with their associated meanings. This formats and codepoints with their associated meanings. This document
document does not: It is already known that choosing the encoding for does not: It is already known that choosing the encoding for the
the Congestion Exposure Signal is likely to entail some engineering Congestion Exposure Signal is likely to entail some engineering
compromises that have the potential to reduce the protocol's compromises that have the potential to reduce the protocol's
usefulness in some settings. Rather than making these engineering usefulness in some settings. Rather than making these engineering
choices prematurely, this document side steps the encoding problem by choices prematurely, this document side steps the encoding problem by
describing an abstract representation of Congestion Exposure Signal. describing an abstract representation of a Congestion Exposure
All of the elements of the protocol can be defined in terms of this Signal. All of the elements of the protocol can be defined in terms
abstract representation. Most important, the preliminary use cases of this abstract representation. Most important, the preliminary use
for the protocol are described in terms of the abstract cases for the protocol are described in terms of the abstract
representation in companion documents. representation in companion documents [I-D.conex-concepts-uses].
Once we have some example use cases we can evaluate different Once we have some example use cases we can evaluate different
encoding schemes. Since theses schemes are likely to include some encoding schemes. Since these schemes are likely to include some
conflated code points, some information will be lost resulting in conflated code points, some information will be lost resulting in
weakening or disabling some of the algorithms and eliminating some weakening or disabling some of the algorithms and eliminating some
use cases. use cases.
The goal of this approach is to be as complete as possible for The goal of this approach is to be as complete as possible for
discovering the potential usage and capabilities of the Congestion discovering the potential usage and capabilities of the Congestion
Exposure protocol, so we have some hope of of making optimal design Exposure protocol, so we have some hope of making optimal design
decisions when choosing the encoding. decisions when choosing the encoding.
3.1. One Simple Encoding 3.1. Strawman Encoding
As an aid to the reader, it might be helpful to describe one simple
encoding of the Congestion Exposure protocol: set IPv4 header bit 48
(aka the "evil bit" [RFC3514]) on all retransmissions or once per ECN
signaled window reduction. Clearly network devices along the forward
path can see this bit and act on it. For example they can count
marked and unmarked packets to estimate the congestion levels along
the path.
However this encoding has been forbidden by RFC xxxx, which seeks to As an aid to the reader, it might be helpful to describe a naive
preserve the last unallocated bit in the IPv4 header for some strawman encoding of the Congestion Exposure protocol described
unspecifed future use. solely in terms of TCP: set the Reserved bit in the IPv4 header (bit
48 counting from zero [RFC0791]--aka the "evil bit" [RFC3514]) on all
retransmissions or once per ECN signaled window reduction. Clearly
network devices along the forward path can see this bit and act on
it. For example they can count marked and unmarked packets to
estimate the congestion levels along the path.
Furthermore this encoding, by itself, does not sufficiently support However, the IESG has chartered the ConEx working group to establish
partial deployment or strong auditing and might motivate users and/or that there is sufficient demand for an IPv6 ConEx protocol before
applications to misrepresent the congestion that they are be causing. using the last available bit in the IPv4 header. Furthermore this
encoding, by itself, does not sufficiently support partial deployment
or strong auditing and might motivate users and/or applications to
misrepresent the congestion that they are causing.
However, this simple encoding does present a clear mental model of Nonetheless, this strawman encoding does present a clear mental model
how the Congestion Exposure protocol functions and is very useful for of how the Congestion Exposure protocol might function under various
conducting thought experiments about how the protocol might function uses.
under various uses.
3.2. ECN Based Encoding 3.2. ECN Based Encoding
Bob Briscoe's PhD thesis [Refb-dis], and many derivative works The re-ECN specification [I-D.briscoe-tsvwg-re-ecn-tcp] presents an
including RE-ECN [I-D.briscoe-tsvwg-re-ecn-tcp] present an ECN based ECN based implementation of ConEx. The central theme of this work is
implementation of ConEx. The central theme of this work includes an audit mechanism that can provide sufficient disincentives against
strong disincentives for misrepresenting congestion misrepresenting congestion [I-D.briscoe-tsvwg-re-ecn-motiv], which is
[I-D.briscoe-tsvwg-re-ecn-motiv]. However, it also pre-supposes the analysed extensively in Briscoe's PhD dissertation [Refb-dis].
full deployment of ECN, and does not adequately signal congestion
indicated by packet loss. Furthermore, given that after 10 years ECN
still has not been widely deployed, it does not seem prudent to
require its deployment as a prerequisite for deploying a Congestion
Exposure protocol.
As it currently stands, this work fails to meet the "partial The re-ECN encoding is tightly integrated with the encoding of ECN in
deployment" requirement described above in section Section 2. the IP header. However, re-ECN can be incrementally deployed on
hosts whether or not any networks support ECN marking and whether or
not any networks take note of re-ECN markings. Nonetheless, the
audit function has only been formally analysed where at least one
autonomous network has deployed ECN marking, which it uses to audit
whether the Congestion Exposure Signal matches actual congestion.
Thus, even if networks have not deployed ECN, re-ECN acts perfectly
well as a loss-based Congestion Exposure protocol. As such, a
network could potentially audit re-ECN signals against losses using
the loss-based audit techniques in Section 4.3, rather than deploying
ECN.
Although re-ECN does not require networks to support ECN, it still
embodies a major incremental deployment challenge; a sender cannot
use re-ECN unless the receiver at least supports ECN. Most operating
systems currently being supplied (late 2010) implement ECN, but it is
turned off by default at the client end, even though it is on by
default at the server end. This is primarily because one home
gateway model widely supplied in 2006 crashes if a TCP client behind
it attempts to use ECN (there are issues with some other home
gateways from that era, but they are surmountable with ECN black-hole
detection).
Given that, 10 years after standardisation, ECN has still not been
widely enabled on TCP clients, if at all possible the Congestion
Exposure protocol should not require the receiver to be ECN capable.
Therefore, as it currently stands, the re-ECN encoding would fail to
meet the "partial deployment" requirement of Section 2.
For a tutorial background on Re-Feedback techniques, see [,,] {Bob: For a tutorial background on Re-Feedback techniques, see [,,] {Bob:
Matt, What did you have in mind here? SIGCOMM'05 paper? IEEE Matt, What did you have in mind here? SIGCOMM'05 paper? IEEE
Spectrum article? Re-ECN Web page?}. Spectrum article? Re-ECN Web page?}.
3.2.1. ECN Changes 3.2.1. ECN Changes
It is important to note that Briscoe's work proposes some relatively Although the re-ECN protocol requires no changes to the network side
minor modifications to the ECN protocol specified in RFC 3168. They of the ECN protocol, it is important to note that it does propose
include: redefining the ECT(0) and ECT(1) code points (this is some relatively minor modifications to the host-to-host aspects of
consistent with RFC3168 but requires deprecating [RFC3540]); the ECN protocol specified in RFC 3168. They include: redefining the
permitting routers to send ECN signals at a different threshold than ECT(1) code point (the change is consistent with RFC3168 but requires
packet loss; modifications to the ECN negotiations carried on the SYN deprecating the experimental ECN nonce [RFC3540]); modifications to
and SYN-ACK; and using a different state machine to carry ECN signals the ECN negotiations carried on the SYN and SYN-ACK; and using a
in the transport acknowledgments from the Receiver to the Sender. different state machine to carry ECN signals in the transport
This later change permits the transport protocol to carry multiple acknowledgments from the Receiver to the Sender. This last change
congestion signals per round trip, and greatly simplifies accurate permits the transport protocol to carry multiple congestion signals
auditing. per round trip, and greatly simplifies accurate auditing.
All of these adjustments to RFC 3168 may also be needed in a future All of these adjustments to RFC 3168 may also be needed in a future
standardized Congestion Exposure protocol. There will be very standardized Congestion Exposure protocol. There will need to be
careful considerations about any proposed changes to ECN or other very careful consideration of any proposed changes to ECN or other
existing protocols, because any such changes increase the cost of existing protocols, because any such changes increase the cost of
deployment. deployment.
3.3. Abstract Encoding 3.3. Abstract Encoding
{ToDo: Not really done, extra terse} The Congestion Exposure protocol could take one of two different
encodings: independently settable bits or an enumerated set of
mutually exclusive codepoints.
Model with two different encodings: individual bits or as an In both cases, the amount of congestion is signaled by the volume of
enumerated set. Enumerated encoding is probably good enough for most marked data--just as the volume of lost data or ECN marked data
purposes, but it must not be forgotten that it does lose some small signals the amount of congestion experienced. Thus the size of each
amount of information. packet carrying a Congestion Exposure Signal is signficant.
3.3.1. Separate Bits 3.3.1. Independent Bits
One bit each for This encoding involves a field of four flag bits, each of which the
sender can set independently to indicate to the network that:
o Not supported (implicit signal from legacy transport senders) ConEx (Not-ConEx) The transport is (or is not) using ConEx with this
packet (the protocol MUST be arranged so that legacy transport
senders implicitly send Not-ConEx)
o Congestion indicated by packet losses Re-Echo-Loss (Not-Re-Echo-Loss) The transport has (or has not)
experienced a loss
o ECN signaled congestion Re-Echo-ECN (Not-Re-Echo-ECN) The transport has (or has not)
experienced ECN signaled congestion
o Pre-congestion credit (AKA green). See Section 4.2.1 devices Credit (Not-Credit) The transport is (or is not) building up
below. congestion credit (see Section 4.3 on audit devices)
3.3.2. Enumerated Encoding 3.3.2. Codepoint Encoding
For enumerated encoding some marks must be delayed such that each This encoding involves a bit-field large enough to signal one of the
packet only carries at most one mark. following five codepoints:
ENUM {Not_Supported, No_Mark, Black_ECN, Black_Loss, Green} ENUM {Not-ConEx, ConEx, Re-Echo-Loss, Re-Echo-ECN, Credit}
Each named codepoint has the same meaning as in the encoding using
independent bits (Section 3.3.1). The use of any one codepoint
implies the negative of all the others, except the last three
codepoints (Re-Echo-Loss, Re-Echo-ECN and Credit) obviously also
imply ConEx is supported.
Inherently, the semantics of most of the enumerated codepoints are
mutually exclusive. 'Credit' is the only one that might need to be
used in combination with either Re-Echo-Loss or Re-Echo-ECN, but even
that requirement is questionable. It must not be forgotten that the
enumerated encoding loses the flexibility to signal these two
combinations, whereas the encoding with four independent bits is not
so limited. Alternatively two extra codepoints could be assigned to
these two combinations of semantics.
{ToDo: Default behaviour for Currently Unused codepoints}
{ToDo: Signal from Policer to Receiver to distinguish policy-induced
drop from congestion-induced drop}
Some might prefer to use the following colours respectively for each
codepoint. The same colours as follows (with the omission of Purple)
were used to describe re-ECN codepoints:
ENUM {White, Grey, Purple, Black, Green}.
4. Congestion Exposure Components 4. Congestion Exposure Components
{ToDo: Picture of the components, similar to that in the last
slideset about conex-concepts-uses?}
4.1. Modified Senders 4.1. Modified Senders
Send Congestion Exposure Signals per congestion signals. The sending transport needs to be modified to send Congestion
Exposure Signals in response to congestion feedback signals.
4.2. Policy Devices 4.2. Receivers (Optionally Modified)
4.2.1. Audit The receiving transport may already feedback sufficiently useful
signals to the sender so that it does not need to be altered.
For loss: detect retransmissions by monitoring sequence numbers. However, a TCP receiver feeds back ECN congestion signals no more
Assure that #retransmissions<=#Black_Loss than once within a round trip. The sender may require more precise
feedback from the receiver otherwise it will appear to be
understating its Congestion Exposure Signals (see Section 3.2.1).
(May need to include a fudge factor, because it would be more robust Ideally, Congestion Exposure should be added to a transport like TCP
to mark the packet after a retransmission. Otherwise network devices without mandatory modifications to the receiver. But an optional
that discard marked packets will cause connectivity failures, rather modification to the receiver could be recommended for precision.
than poor performance). This was the approach taken when adding re-ECN to TCP
[I-D.briscoe-tsvwg-re-ecn-tcp].
For ECN: count Congestion Exposure Signals and ECN. Would normally 4.3. Audit
need to delay ECN by one RTT to avoid false positives. Alternative:
use Green (pre-credits) to assure that #ECN<=#Black_ECN+#GREEN, even
though the #Black_ECN is delayed by one RTT.
4.2.2. Policers and Shapers To audit Congestion Exposure Signals against actual losses an auditor
could use one of the following techniques:
{ToDo: Beware these terms are defined differently than the TCP-specific approach: The auditor could monitor TCP flows or
conventional usage.} aggregates of flows, only holding state on a flow if it first
sends a Credit or a Re-Echo-Loss marking. The auditor could
detect retransmissions by monitoring sequence numbers. It would
assure that (volume of retransmitted data) <= (volume of data
marked Re-Echo-Loss). Traffic would only be auditable in this way
if it conformed to the standard TCP protocol and the IP payload
was not encrypted (e.g. with IPsec).
{ToDo: Abridge from existing doc?} Predominant bottleneck approach: Unlike the above TCP-specific
solution, this technique would work for IP packets carrying any
transport layer protocol, and whether encrypted or not. But it
only works well for networks designed so that losses predominantly
occur under the management of one IP-aware node on the path. The
auditor could then be located at this bottleneck. It could simply
compare Congestion Exposure Signals with actual local losses.
Most consumer access networks are design to this model, e.g. the
radio network controller (RNC) in a cellular network or the
broadband remote access server (BRAS) in a digital subscriber line
(DSL) network.
The accuracy of an auditor at one predominant bottleneck might
still be sufficient, even if losses occasionally occurred at other
nodes in the network (e.g. border gateways). Although the auditor
at the predominant bottleneck would not always be able to detect
losses at other nodes, transports would not know where losses were
occurring either. Therefore any transport would not know which
losses it could cheat on without getting caught, and which ones it
couldn't.
To audit Congestion Exposure Signals against actual ECN markings or
losses, the auditor could work as follows: monitor flows or
aggregates of flows, only holding state on a flow if it first sends a
Credit or either Re-Echo marking. Count the number of bytes marked
with Credit or Re-Echo-ECN. Separately count the number of bytes
marked with ECN. Use Credits to assure that #ECN<=#Re-Echo-
ECN+#Credit, even though the Re-Echo-ECN markings are delayed by at
least one RTT.
Note that an auditing device involves no policy configuration; it
merely enforces protocol compliance, not policy.
4.4. Policy Devices
4.4.1. Congestion Policers
Note that a congestion policer can be implemented in a very similar
way to a bit-rate policer, but its effect is focused solely on
traffic causing congestion downstream, not on all traffic just in
case it causes congestion.
It monitors all ConEx traffic entering a network, or some
identifiable subset. Using Congestion Exposure signals, it measures
the amount of congestion being caused by this traffic. If this
exceeds a policy-configured 'congestion-bit-rate' the congestion
policer will limit all the monitored ConEx traffic. A congestion
policer can be implemented by a simple token bucket. But unlike a
bit-rate policer, it only removes tokens when forwarding packets that
a ConEx marked. See [CongPol] for details.
4.4.2. Other Policy Devices
Other policy devices that use Congestion Exposure signaling might
traffic traffic based on Congestion Exposure Signals in much the same
way as the monitoring element of a Congestion Policer. But the
resulting action could be different. It might re-route traffic or
downgrade the class of service.
It might do nothing directly to the traffic, but instead report
measurements of Congestion Exposure Signals to systems designed to
control congestion indirectly. For instance the measurements might
be used to trigger penalty clauses in contracts, to levy charges
between networks based on congestion or simply to notify customers
who cause excessive congestion.
5. IANA Considerations 5. IANA Considerations
This memo includes no request to IANA. This memo includes no request to IANA.
Note to RFC Editor: this section may be removed on publication as an Note to RFC Editor: this section may be removed on publication as an
RFC. RFC.
6. Security Considerations 6. Security Considerations
{ToDo:} Significant parts of this whole document are about the auditability
of Congestion Exposure Signals, in particular Section 4.3.
7. Conclusions 7. Conclusions
{ToDo:} {ToDo:}
8. Acknowledgements 8. Acknowledgements
{ToDo:} This document was improved by review comments from Toby Moncaster.
9. Comments Solicited 9. Comments Solicited
Comments and questions are encouraged and very welcome. They can be Comments and questions are encouraged and very welcome. They can be
addressed to the IETF Congestion Exposure (ConEx) working group addressed to the IETF Congestion Exposure (ConEx) working group
mailing list <conex@ietf.org>, and/or to the authors. mailing list <conex@ietf.org>, and/or to the authors.
10. References 10. References
10.1. Normative References 10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in [RFC2119] Bradner, S., "Key words for use in
RFCs to Indicate Requirement RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, Levels", BCP 14, RFC 2119,
March 1997. March 1997.
10.2. Informative References 10.2. Informative References
[CongPol] Jacquet, A., Briscoe, B., and T.
Moncaster, "Policing Freedom to Use
the Internet Resource Pool", Proc
ACM Workshop on Re-Architecting the
Internet (ReArch'08) ,
December 2008, <http://
www.bobbriscoe.net/
pubs.html#polfree>.
[I-D.briscoe-tsvwg-re-ecn-motiv] Briscoe, B., Jacquet, A., [I-D.briscoe-tsvwg-re-ecn-motiv] Briscoe, B., Jacquet, A.,
Moncaster, T., and A. Smith, "Re- Moncaster, T., and A. Smith, "Re-
ECN: A Framework for adding ECN: A Framework for adding
Congestion Accountability to Congestion Accountability to
TCP/IP", draft-briscoe-tsvwg-re- TCP/IP", draft-briscoe-tsvwg-re-
ecn-tcp-motivation-01 (work in ecn-tcp-motivation-01 (work in
progress), September 2009. progress), September 2009.
[I-D.briscoe-tsvwg-re-ecn-tcp] Briscoe, B., Jacquet, A., [I-D.briscoe-tsvwg-re-ecn-tcp] Briscoe, B., Jacquet, A.,
Moncaster, T., and A. Smith, "Re- Moncaster, T., and A. Smith, "Re-
ECN: Adding Accountability for ECN: Adding Accountability for
Causing Congestion to TCP/IP", Causing Congestion to TCP/IP",
draft-briscoe-tsvwg-re-ecn-tcp-08 draft-briscoe-tsvwg-re-ecn-tcp-08
(work in progress), September 2009. (work in progress), September 2009.
[I-D.conex-concepts-uses] Briscoe, B., Woundy, R., Moncaster,
T., and J. Leslie, "ConEx Concepts
and Use Cases", draft-moncaster-
conex-concepts-uses-01 (work in
progress), July 2010.
[I-D.ietf-ledbat-congestion] Shalunov, S. and G. Hazel, "Low [I-D.ietf-ledbat-congestion] Shalunov, S. and G. Hazel, "Low
Extra Delay Background Transport Extra Delay Background Transport
(LEDBAT)", (LEDBAT)",
draft-ietf-ledbat-congestion-02 draft-ietf-ledbat-congestion-02
(work in progress), July 2010. (work in progress), July 2010.
[I-D.sridharan-tcpm-ctcp] Sridharan, M., Tan, K., Bansal, D.,
and D. Thaler, "Compound TCP: A New
TCP Congestion Control for High-
Speed and Long Distance Networks",
draft-sridharan-tcpm-ctcp-02 (work
in progress), November 2008.
[RFC0791] Postel, J., "Internet Protocol",
STD 5, RFC 791, September 1981.
[RFC2309] Braden, B., Clark, D., Crowcroft, [RFC2309] Braden, B., Clark, D., Crowcroft,
J., Davie, B., Deering, S., Estrin, J., Davie, B., Deering, S., Estrin,
D., Floyd, S., Jacobson, V., D., Floyd, S., Jacobson, V.,
Minshall, G., Partridge, C., Minshall, G., Partridge, C.,
Peterson, L., Ramakrishnan, K., Peterson, L., Ramakrishnan, K.,
Shenker, S., Wroclawski, J., and L. Shenker, S., Wroclawski, J., and L.
Zhang, "Recommendations on Queue Zhang, "Recommendations on Queue
Management and Congestion Avoidance Management and Congestion Avoidance
in the Internet", RFC 2309, in the Internet", RFC 2309,
April 1998. April 1998.
skipping to change at page 10, line 27 skipping to change at page 15, line 16
[RFC3514] Bellovin, S., "The Security Flag in [RFC3514] Bellovin, S., "The Security Flag in
the IPv4 Header", RFC 3514, the IPv4 Header", RFC 3514,
April 2003. April 2003.
[RFC3540] Spring, N., Wetherall, D., and D. [RFC3540] Spring, N., Wetherall, D., and D.
Ely, "Robust Explicit Congestion Ely, "Robust Explicit Congestion
Notification (ECN) Signaling with Notification (ECN) Signaling with
Nonces", RFC 3540, June 2003. Nonces", RFC 3540, June 2003.
[RFC3550] Schulzrinne, H., Casner, S.,
Frederick, R., and V. Jacobson,
"RTP: A Transport Protocol for
Real-Time Applications", STD 64,
RFC 3550, July 2003.
[RFC5670] Eardley, P., "Metering and Marking
Behaviour of PCN-Nodes", RFC 5670,
November 2009.
[RFC5681] Allman, M., Paxson, V., and E. [RFC5681] Allman, M., Paxson, V., and E.
Blanton, "TCP Congestion Control", Blanton, "TCP Congestion Control",
RFC 5681, September 2009. RFC 5681, September 2009.
[Refb-dis] Briscoe, B., "Re-feedback: Freedom [Refb-dis] Briscoe, B., "Re-feedback: Freedom
with Accountability for Causing with Accountability for Causing
Congestion in a Connectionless Congestion in a Connectionless
Internetwork", UCL PhD Internetwork", UCL PhD
Dissertation , 2009, <http:// Dissertation , 2009, <http://
bobbriscoe.net/projects/refb/ bobbriscoe.net/projects/refb/
skipping to change at page 11, line 5 skipping to change at page 16, line 5
[Vegas] Brakmo, L. and L. Peterson, "TCP [Vegas] Brakmo, L. and L. Peterson, "TCP
Vegas: End-to-End Congestion Vegas: End-to-End Congestion
Avoidance on a Global Internet", Avoidance on a Global Internet",
IEEE Journal on Selected Areas in IEEE Journal on Selected Areas in
Communications 13(8)1465--80, Communications 13(8)1465--80,
October 1995, <http:// October 1995, <http://
ieeexplore.ieee.org/iel1/49/9740/ ieeexplore.ieee.org/iel1/49/9740/
00464716.pdf?arnumber=464716>. 00464716.pdf?arnumber=464716>.
Author's Address Authors' Addresses
Matt Mathis Matt Mathis
Google Google
Phone: Phone:
Fax: Fax:
EMail: mattmathis at google.com EMail: mattmathis at google.com
URI: URI:
Bob Briscoe
BT
B54/77, Adastral Park
Martlesham Heath
Ipswich IP5 3RE
UK
Phone: +44 1473 645196
EMail: bob.briscoe@bt.com
URI: http://bobbriscoe.net/
 End of changes. 63 change blocks. 
204 lines changed or deleted 444 lines changed or added

This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/