< draft-mathis-conex-abstract-mech-00b.txt | draft-mathis-conex-abstract-mech-00c.txt > | |||
---|---|---|---|---|
Congestion Exposure (ConEx) M. Mathis | Congestion Exposure (ConEx) M. Mathis | |||
Working Group Google | Working Group Google | |||
Internet-Draft B. Briscoe | Internet-Draft B. Briscoe | |||
Intended status: Informational BT | Intended status: Informational BT | |||
Expires: April 17, 2011 October 14, 2010 | Expires: April 18, 2011 October 15, 2010 | |||
Congestion Exposure (ConEx) Concepts and Abstract Mechanism | Congestion Exposure (ConEx) Concepts and Abstract Mechanism | |||
draft-mathis-conex-abstract-mech-00b | draft-mathis-conex-abstract-mech-00c | |||
Abstract | Abstract | |||
This document describes an abstract mechanism by which senders inform | This document describes an abstract mechanism by which senders inform | |||
the network about the congestion encountered by packets earlier in | the network about the congestion encountered by packets earlier in | |||
the same flow. Today, the network may signal congestion to the | the same flow. Today, the network may signal congestion to the | |||
receiver by ECN markings or by dropping packets, and the receiver may | receiver by ECN markings or by dropping packets, and the receiver may | |||
pass this information back to the sender in transport-layer feedback. | pass this information back to the sender in transport-layer feedback. | |||
The mechanism to be developed by the ConEx WG will enable the sender | The mechanism to be developed by the ConEx WG will enable the sender | |||
to also relay this congestion information back into the network in- | to also relay this congestion information back into the network in- | |||
skipping to change at page 1, line 40 | skipping to change at page 1, line 40 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on April 17, 2011. | This Internet-Draft will expire on April 18, 2011. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2010 IETF Trust and the persons identified as the | Copyright (c) 2010 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 17 | skipping to change at page 2, line 17 | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
2. Requirements for the Congestion Exposure Signal . . . . . . . 5 | 2. Requirements for the Congestion Exposure Signal . . . . . . . 5 | |||
3. Representing Congestion Exposure . . . . . . . . . . . . . . . 7 | 3. Representing Congestion Exposure . . . . . . . . . . . . . . . 7 | |||
3.1. One Simple Encoding . . . . . . . . . . . . . . . . . . . 7 | 3.1. Strawman Encoding . . . . . . . . . . . . . . . . . . . . 7 | |||
3.2. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 8 | 3.2. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 8 | |||
3.2.1. ECN Changes . . . . . . . . . . . . . . . . . . . . . 8 | 3.2.1. ECN Changes . . . . . . . . . . . . . . . . . . . . . 9 | |||
3.3. Abstract Encoding . . . . . . . . . . . . . . . . . . . . 9 | 3.3. Abstract Encoding . . . . . . . . . . . . . . . . . . . . 9 | |||
3.3.1. Separate Bits . . . . . . . . . . . . . . . . . . . . 9 | 3.3.1. Independent Bits . . . . . . . . . . . . . . . . . . . 9 | |||
3.3.2. Enumerated Encoding . . . . . . . . . . . . . . . . . 9 | 3.3.2. Codepoint Encoding . . . . . . . . . . . . . . . . . . 10 | |||
4. Congestion Exposure Components . . . . . . . . . . . . . . . . 9 | 4. Congestion Exposure Components . . . . . . . . . . . . . . . . 10 | |||
4.1. Modified Senders . . . . . . . . . . . . . . . . . . . . . 9 | 4.1. Modified Senders . . . . . . . . . . . . . . . . . . . . . 10 | |||
4.2. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 9 | 4.2. Receivers (Optionally Modified) . . . . . . . . . . . . . 11 | |||
4.2.1. Audit . . . . . . . . . . . . . . . . . . . . . . . . 9 | 4.3. Audit . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
4.2.2. Policers and Shapers . . . . . . . . . . . . . . . . . 10 | 4.4. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 12 | |||
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | 4.4.1. Congestion Policers . . . . . . . . . . . . . . . . . 12 | |||
6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | 4.4.2. Other Policy Devices . . . . . . . . . . . . . . . . . 12 | |||
7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 | |||
8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 | |||
9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 10 | 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . . 10 | 9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 13 | |||
10.2. Informative References . . . . . . . . . . . . . . . . . . 11 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
10.1. Normative References . . . . . . . . . . . . . . . . . . . 13 | ||||
10.2. Informative References . . . . . . . . . . . . . . . . . . 13 | ||||
1. Introduction | 1. Introduction | |||
One of the required functions of a transport protocol is controlling | One of the required functions of a transport protocol is controlling | |||
congestion in the network. There are three techniques in use today | congestion in the network. There are three techniques in use today | |||
for the network to signal congestion to a transport: | for the network to signal congestion to a transport: | |||
o The most common congestion signal is packet loss. When congested, | o The most common congestion signal is packet loss. When congested, | |||
the network simply discards some packets either as part of an | the network simply discards some packets either as part of an | |||
explicit control function [RFC2309] or as the consequence of a | explicit control function [RFC2309] or as the consequence of a | |||
skipping to change at page 4, line 25 | skipping to change at page 4, line 25 | |||
| Sender |>-(new)-IP layer Congestion Exposure Signal--->| Receiver| | | Sender |>-(new)-IP layer Congestion Exposure Signal--->| Receiver| | |||
| | (Carried in Data Packet Headers) | | | | | (Carried in Data Packet Headers) | | | |||
| | +-----------+ | | | | | +-----------+ | | | |||
| |>=Data=Path=>|(Congested)|>=====Data=Path=====>| | | | |>=Data=Path=>|(Congested)|>=====Data=Path=====>| | | |||
| | | Network |>-Congestion-Signal->| | | | | | Network |>-Congestion-Signal->| | | |||
| | | Device | | | | | | | Device | | | | |||
+---------+ +-----------+ +---------+ | +---------+ +-----------+ +---------+ | |||
Not shown are policy devices along the data path that observe the | Not shown are policy devices along the data path that observe the | |||
Congestion Exposure Signal, and use the information to monitor or | Congestion Exposure Signal, and use the information to monitor or | |||
manage traffic. These are discussed in Section 4.2. | manage traffic. These are discussed in Section 4.4. | |||
Figure 1 | Figure 1 | |||
1.1. Terminology | 1.1. Terminology | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
document are to be interpreted as described in RFC 2119 [RFC2119]. | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
ConEx signals in IP packet headers from the sender to the network | ConEx signals in IP packet headers from the sender to the network | |||
{ToDo: These are placeholders for whatever words we decide to use}: | {ToDo: These are placeholders for whatever words we decide to use}: | |||
Re-Echo Loss (aka Black-Loss) The transport has experienced a loss. | Not-ConEx (aka White) The transport is not ConEx-capable | |||
Re-Echo ECN (aka Black-ECN) The transport has experienced an ECN | ConEx (aka Grey) The transport is ConEx-capable | |||
mark | ||||
Pre-Echo (aka Green) The transport is building up credit to allow | Re-Echo-Loss (aka Purple) The transport has experienced a loss. | |||
for any future delay in expected ConEx signals | ||||
Neutral (aka Grey) The transport is ConEx-capable | Re-Echo-ECN (aka Black) The transport has experienced an ECN mark | |||
Not-ConEx (aka White) The transport is not ConEx-capable | ||||
Credit (aka Green) The transport is building up credit to allow for | ||||
any future delay in expected ConEx signals | ||||
ConEx-Marked Any of Re-Echo-Loss, Re-Echo-ECN or Credit. | ||||
ConEx-Unmarked ConEx, but not ConEx-Marked. | ||||
2. Requirements for the Congestion Exposure Signal | 2. Requirements for the Congestion Exposure Signal | |||
Ideally, all the following requirements would be met by a Congestion | ||||
Exposure Signal. However it is already known that some compromises | ||||
will be necessary, therefore all the requirements are expressed with | ||||
the keyword 'SHOULD' rather then 'MUST'. The only mandatory | ||||
requirement is that a concrete protocol description MUST give sound | ||||
reasoning if it chooses not to meet any of these requirements: | ||||
a. The Congestion Exposure Signal SHOULD be visible to internetwork | a. The Congestion Exposure Signal SHOULD be visible to internetwork | |||
layer devices along the entire path from the transport sender to | layer devices along the entire path from the transport sender to | |||
the transport receiver. Equivalently, it SHOULD be present in | the transport receiver. Equivalently, it SHOULD be present in | |||
the IPv4 or IPv6 header, and in the outermost IP header if using | the IPv4 or IPv6 header, and in the outermost IP header if using | |||
IP in IP tunnelling. The Congestion Exposure Signal SHOULD be | IP in IP tunnelling. The Congestion Exposure Signal SHOULD be | |||
immutable once set by the transport sender. A corollary of these | immutable once set by the transport sender. A corollary of these | |||
requirements is that existing (legacy) networking gear SHOULD | requirements is that existing (legacy) networking gear SHOULD | |||
pass the Congestion Exposure Signal silently without | pass the Congestion Exposure Signal silently without | |||
modification. | modification. | |||
skipping to change at page 5, line 47 | skipping to change at page 6, line 8 | |||
actually experiencing. | actually experiencing. | |||
d. The Congestion Exposure Signal SHOULD be timely. There will be a | d. The Congestion Exposure Signal SHOULD be timely. There will be a | |||
delay between the time when an auditing device sees an actual | delay between the time when an auditing device sees an actual | |||
congestion signal and when it sees the subsequent Congestion | congestion signal and when it sees the subsequent Congestion | |||
Exposure Signal from the sender. The minimum delay will be one | Exposure Signal from the sender. The minimum delay will be one | |||
round trip, but it may be much longer depending on the | round trip, but it may be much longer depending on the | |||
transport's choice of feedback delay (consider RTCP [RFC3550] for | transport's choice of feedback delay (consider RTCP [RFC3550] for | |||
example). It is not practical to expect auditing devices in the | example). It is not practical to expect auditing devices in the | |||
network to make allowance for such feedback delays. Instead, the | network to make allowance for such feedback delays. Instead, the | |||
sender MUST be able to send Congestion Exposure signals in | sender SHOULD be able to send Congestion Exposure signals in | |||
advance, as 'credit' for any audit device to hold as a balance | advance, as 'credit' for any audit device to hold as a balance | |||
against the risk of congestion during the feedback delay. This | against the risk of congestion during the feedback delay. This | |||
design choice simplifies auditing devices and correctly makes the | design choice simplifies auditing devices and correctly makes the | |||
transport responsible for both minimising feedback delay and | transport responsible for both minimising feedback delay and | |||
minimising sharp increases in packets in flight that would risk | minimising sharp increases in packets in flight that would risk | |||
causing excessive congestion to others. This issue is discussed | causing excessive congestion to others. This issue is discussed | |||
in more detail in Section 4.2.1. | in more detail in Section 4.3. | |||
It is important to note that the auditing requirement implies a | It is important to note that the auditing requirement implies a | |||
number of additional constraints: The basic auditing technique is to | number of additional constraints: The basic auditing technique is to | |||
count both actual congestion signals and Congestion Exposure Signals | count both actual congestion signals and Congestion Exposure Signals | |||
someplace along the data path: | someplace along the data path: | |||
o For congestion signaled by ECN, auditing is most accurate when | o For congestion signaled by ECN, auditing is most accurate when | |||
located near the transport receiver. Within any flow or aggregate | located near the transport receiver. Within any flow or aggregate | |||
of flows, the total volume of ECN marked data seen near the | of flows, the total volume of ECN marked data seen near the | |||
receiver should always be equal to or less than the volume of data | receiver should always be equal to or less than the volume of data | |||
skipping to change at page 6, line 28 | skipping to change at page 6, line 37 | |||
o For congestion signaled by loss, totally accurate auditing is not | o For congestion signaled by loss, totally accurate auditing is not | |||
believed to be possible in the general case, because it involves a | believed to be possible in the general case, because it involves a | |||
network node detecting the absence of some packets, when it cannot | network node detecting the absence of some packets, when it cannot | |||
necessarily see the transport protocol sequence numbers and when | necessarily see the transport protocol sequence numbers and when | |||
the missing packets might simply be taking a different route. But | the missing packets might simply be taking a different route. But | |||
there are common cases where sufficient audit accuracy should be | there are common cases where sufficient audit accuracy should be | |||
possible: | possible: | |||
* For non-IPsec traffic conforming to standard TCP sequence | * For non-IPsec traffic conforming to standard TCP sequence | |||
numbering on a single path, the auditor could detect losses by | numbering on a single path, an auditor could detect losses by | |||
observing both the original transmission and the retransmission | observing both the original transmission and the retransmission | |||
after the loss. Such auditing would be most accurate near the | after the loss. Such auditing would be most accurate near the | |||
sender. | sender. | |||
* For networks designed so that losses predominantly occur under | * For networks designed so that losses predominantly occur under | |||
the management of one IP-aware node on the path, the auditor | the management of one IP-aware node on the path, the auditor | |||
could be located at this bottleneck. It could simply compare | could be located at this bottleneck. It could simply compare | |||
Congestion Exposure Signals with actual local losses. Most | Congestion Exposure Signals with actual local losses. This is | |||
consumer access networks are design to this model, e.g. the | a good model for most consumer access networks and audit | |||
radio network controller (RNC) in a cellular network or the | accuracy could well be sufficient even if losses occasionally | |||
broadband remote access server (BRAS) in a digital subscriber | occurred at other nodes in the network, such as border gateways | |||
line (DSL) network. Unlike the above TCP-specific solution, | (see Section 4.3 for details). | |||
this would work for IP packets carrying any transport layer | ||||
protocol, and whether encrypted or not. | ||||
The accuracy of an auditor at one predominant bottleneck might | ||||
still be sufficient, even if losses occasionally occurred at | ||||
other nodes in the network (e.g. border gateways). Although | ||||
the auditor at the predominant bottleneck would not always be | ||||
able to detect losses at other nodes, transports would not know | ||||
where losses were occurring either. Therefore any transport | ||||
would not know which losses it could cheat on without getting | ||||
caught, and which ones it couldn't. | ||||
Given that loss-based and ECN-based Congestion Exposure might | Given that loss-based and ECN-based Congestion Exposure might | |||
sometimes be best audited at different locations, have distinct | sometimes be best audited at different locations, having distinct | |||
encodings would widen the design space for the auditing function. | encodings would widen the design space for the auditing function. | |||
{Bob: Got to here making suggested changes.} | ||||
3. Representing Congestion Exposure | 3. Representing Congestion Exposure | |||
Most protocol specifications start with a description of packet | Most protocol specifications start with a description of packet | |||
formats and code points with their associated meanings. This | formats and codepoints with their associated meanings. This document | |||
document does not: It is already known that choosing the encoding for | does not: It is already known that choosing the encoding for the | |||
the Congestion Exposure Signal is likely to entail some engineering | Congestion Exposure Signal is likely to entail some engineering | |||
compromises that have the potential to reduce the protocol's | compromises that have the potential to reduce the protocol's | |||
usefulness in some settings. Rather than making these engineering | usefulness in some settings. Rather than making these engineering | |||
choices prematurely, this document side steps the encoding problem by | choices prematurely, this document side steps the encoding problem by | |||
describing an abstract representation of Congestion Exposure Signal. | describing an abstract representation of a Congestion Exposure | |||
All of the elements of the protocol can be defined in terms of this | Signal. All of the elements of the protocol can be defined in terms | |||
abstract representation. Most important, the preliminary use cases | of this abstract representation. Most important, the preliminary use | |||
for the protocol are described in terms of the abstract | cases for the protocol are described in terms of the abstract | |||
representation in companion documents. | representation in companion documents [I-D.conex-concepts-uses]. | |||
Once we have some example use cases we can evaluate different | Once we have some example use cases we can evaluate different | |||
encoding schemes. Since these schemes are likely to include some | encoding schemes. Since these schemes are likely to include some | |||
conflated code points, some information will be lost resulting in | conflated code points, some information will be lost resulting in | |||
weakening or disabling some of the algorithms and eliminating some | weakening or disabling some of the algorithms and eliminating some | |||
use cases. | use cases. | |||
The goal of this approach is to be as complete as possible for | The goal of this approach is to be as complete as possible for | |||
discovering the potential usage and capabilities of the Congestion | discovering the potential usage and capabilities of the Congestion | |||
Exposure protocol, so we have some hope of making optimal design | Exposure protocol, so we have some hope of making optimal design | |||
decisions when choosing the encoding. | decisions when choosing the encoding. | |||
3.1. One Simple Encoding | 3.1. Strawman Encoding | |||
As an aid to the reader, it might be helpful to describe one simple | ||||
encoding of the Congestion Exposure protocol: set IPv4 header bit 48 | ||||
(aka the "evil bit" [RFC3514]) on all retransmissions or once per ECN | ||||
signaled window reduction. Clearly network devices along the forward | ||||
path can see this bit and act on it. For example they can count | ||||
marked and unmarked packets to estimate the congestion levels along | ||||
the path. | ||||
However this encoding has been forbidden by RFC xxxx, which seeks to | As an aid to the reader, it might be helpful to describe a naive | |||
preserve the last unallocated bit in the IPv4 header for some | strawman encoding of the Congestion Exposure protocol described | |||
unspecifed future use. | solely in terms of TCP: set the Reserved bit in the IPv4 header (bit | |||
48 counting from zero [RFC0791]--aka the "evil bit" [RFC3514]) on all | ||||
retransmissions or once per ECN signaled window reduction. Clearly | ||||
network devices along the forward path can see this bit and act on | ||||
it. For example they can count marked and unmarked packets to | ||||
estimate the congestion levels along the path. | ||||
Furthermore this encoding, by itself, does not sufficiently support | However, the IESG has chartered the ConEx working group to establish | |||
partial deployment or strong auditing and might motivate users and/or | that there is sufficient demand for an IPv6 ConEx protocol before | |||
applications to misrepresent the congestion that they are be causing. | using the last available bit in the IPv4 header. Furthermore this | |||
encoding, by itself, does not sufficiently support partial deployment | ||||
or strong auditing and might motivate users and/or applications to | ||||
misrepresent the congestion that they are causing. | ||||
However, this simple encoding does present a clear mental model of | Nonetheless, this strawman encoding does present a clear mental model | |||
how the Congestion Exposure protocol functions and is very useful for | of how the Congestion Exposure protocol might function under various | |||
conducting thought experiments about how the protocol might function | uses. | |||
under various uses. | ||||
3.2. ECN Based Encoding | 3.2. ECN Based Encoding | |||
Bob Briscoe's PhD thesis [Refb-dis], and many derivative works | The re-ECN specification [I-D.briscoe-tsvwg-re-ecn-tcp] presents an | |||
including RE-ECN [I-D.briscoe-tsvwg-re-ecn-tcp] present an ECN based | ECN based implementation of ConEx. The central theme of this work is | |||
implementation of ConEx. The central theme of this work includes | an audit mechanism that can provide sufficient disincentives against | |||
strong disincentives for misrepresenting congestion | misrepresenting congestion [I-D.briscoe-tsvwg-re-ecn-motiv], which is | |||
[I-D.briscoe-tsvwg-re-ecn-motiv]. However, it also pre-supposes the | analysed extensively in Briscoe's PhD dissertation [Refb-dis]. | |||
full deployment of ECN, and does not adequately signal congestion | ||||
indicated by packet loss. Furthermore, given that after 10 years ECN | ||||
still has not been widely deployed, it does not seem prudent to | ||||
require its deployment as a prerequisite for deploying a Congestion | ||||
Exposure protocol. | ||||
As it currently stands, this work fails to meet the "partial | The re-ECN encoding is tightly integrated with the encoding of ECN in | |||
deployment" requirement described above in section Section 2. | the IP header. However, re-ECN can be incrementally deployed on | |||
hosts whether or not any networks support ECN marking and whether or | ||||
not any networks take note of re-ECN markings. Nonetheless, the | ||||
audit function has only been formally analysed where at least one | ||||
autonomous network has deployed ECN marking, which it uses to audit | ||||
whether the Congestion Exposure Signal matches actual congestion. | ||||
Thus, even if networks have not deployed ECN, re-ECN acts perfectly | ||||
well as a loss-based Congestion Exposure protocol. As such, a | ||||
network could potentially audit re-ECN signals against losses using | ||||
the loss-based audit techniques in Section 4.3, rather than deploying | ||||
ECN. | ||||
Although re-ECN does not require networks to support ECN, it still | ||||
embodies a major incremental deployment challenge; a sender cannot | ||||
use re-ECN unless the receiver at least supports ECN. Most operating | ||||
systems currently being supplied (late 2010) implement ECN, but it is | ||||
turned off by default at the client end, even though it is on by | ||||
default at the server end. This is primarily because one home | ||||
gateway model widely supplied in 2006 crashes if a TCP client behind | ||||
it attempts to use ECN (there are issues with some other home | ||||
gateways from that era, but they are surmountable with ECN black-hole | ||||
detection). | ||||
Given that, 10 years after standardisation, ECN has still not been | ||||
widely enabled on TCP clients, if at all possible the Congestion | ||||
Exposure protocol should not require the receiver to be ECN capable. | ||||
Therefore, as it currently stands, the re-ECN encoding would fail to | ||||
meet the "partial deployment" requirement of Section 2. | ||||
For a tutorial background on Re-Feedback techniques, see [,,] {Bob: | For a tutorial background on Re-Feedback techniques, see [,,] {Bob: | |||
Matt, What did you have in mind here? SIGCOMM'05 paper? IEEE | Matt, What did you have in mind here? SIGCOMM'05 paper? IEEE | |||
Spectrum article? Re-ECN Web page?}. | Spectrum article? Re-ECN Web page?}. | |||
3.2.1. ECN Changes | 3.2.1. ECN Changes | |||
It is important to note that Briscoe's work proposes some relatively | Although the re-ECN protocol requires no changes to the network side | |||
minor modifications to the ECN protocol specified in RFC 3168. They | of the ECN protocol, it is important to note that it does propose | |||
include: redefining the ECT(0) and ECT(1) code points (this is | some relatively minor modifications to the host-to-host aspects of | |||
consistent with RFC3168 but requires deprecating [RFC3540]); | the ECN protocol specified in RFC 3168. They include: redefining the | |||
permitting routers to send ECN signals at a different threshold than | ECT(1) code point (the change is consistent with RFC3168 but requires | |||
packet loss; modifications to the ECN negotiations carried on the SYN | deprecating the experimental ECN nonce [RFC3540]); modifications to | |||
and SYN-ACK; and using a different state machine to carry ECN signals | the ECN negotiations carried on the SYN and SYN-ACK; and using a | |||
in the transport acknowledgments from the Receiver to the Sender. | different state machine to carry ECN signals in the transport | |||
This later change permits the transport protocol to carry multiple | acknowledgments from the Receiver to the Sender. This last change | |||
congestion signals per round trip, and greatly simplifies accurate | permits the transport protocol to carry multiple congestion signals | |||
auditing. | per round trip, and greatly simplifies accurate auditing. | |||
All of these adjustments to RFC 3168 may also be needed in a future | All of these adjustments to RFC 3168 may also be needed in a future | |||
standardized Congestion Exposure protocol. There will be very | standardized Congestion Exposure protocol. There will need to be | |||
careful considerations about any proposed changes to ECN or other | very careful consideration of any proposed changes to ECN or other | |||
existing protocols, because any such changes increase the cost of | existing protocols, because any such changes increase the cost of | |||
deployment. | deployment. | |||
3.3. Abstract Encoding | 3.3. Abstract Encoding | |||
{ToDo: Not really done, extra terse} | The Congestion Exposure protocol could take one of two different | |||
encodings: independently settable bits or an enumerated set of | ||||
mutually exclusive codepoints. | ||||
Model with two different encodings: individual bits or as an | In both cases, the amount of congestion is signaled by the volume of | |||
enumerated set. Enumerated encoding is probably good enough for most | marked data--just as the volume of lost data or ECN marked data | |||
purposes, but it must not be forgotten that it does lose some small | signals the amount of congestion experienced. Thus the size of each | |||
amount of information. | packet carrying a Congestion Exposure Signal is signficant. | |||
3.3.1. Separate Bits | 3.3.1. Independent Bits | |||
One bit each for | This encoding involves a field of four flag bits, each of which the | |||
sender can set independently to indicate to the network that: | ||||
o Not supported (implicit signal from legacy transport senders) | ConEx (Not-ConEx) The transport is (or is not) using ConEx with this | |||
packet (the protocol MUST be arranged so that legacy transport | ||||
senders implicitly send Not-ConEx) | ||||
o Congestion indicated by packet losses | Re-Echo-Loss (Not-Re-Echo-Loss) The transport has (or has not) | |||
experienced a loss | ||||
o ECN signaled congestion | Re-Echo-ECN (Not-Re-Echo-ECN) The transport has (or has not) | |||
experienced ECN signaled congestion | ||||
o Pre-congestion credit (AKA green). See Section 4.2.1 devices | Credit (Not-Credit) The transport is (or is not) building up | |||
below. | congestion credit (see Section 4.3 on audit devices) | |||
3.3.2. Enumerated Encoding | 3.3.2. Codepoint Encoding | |||
For enumerated encoding some marks must be delayed such that each | This encoding involves a bit-field large enough to signal one of the | |||
packet only carries at most one mark. | following five codepoints: | |||
ENUM {Not_Supported, No_Mark, Black_ECN, Black_Loss, Green} | ENUM {Not-ConEx, ConEx, Re-Echo-Loss, Re-Echo-ECN, Credit} | |||
Each named codepoint has the same meaning as in the encoding using | ||||
independent bits (Section 3.3.1). The use of any one codepoint | ||||
implies the negative of all the others, except the last three | ||||
codepoints (Re-Echo-Loss, Re-Echo-ECN and Credit) obviously also | ||||
imply ConEx is supported. | ||||
Inherently, the semantics of most of the enumerated codepoints are | ||||
mutually exclusive. 'Credit' is the only one that might need to be | ||||
used in combination with either Re-Echo-Loss or Re-Echo-ECN, but even | ||||
that requirement is questionable. It must not be forgotten that the | ||||
enumerated encoding loses the flexibility to signal these two | ||||
combinations, whereas the encoding with four independent bits is not | ||||
so limited. Alternatively two extra codepoints could be assigned to | ||||
these two combinations of semantics. | ||||
{ToDo: Default behaviour for Currently Unused codepoints} | ||||
{ToDo: Signal from Policer to Receiver to distinguish policy-induced | ||||
drop from congestion-induced drop} | ||||
Some might prefer to use the following colours respectively for each | ||||
codepoint. The same colours as follows (with the omission of Purple) | ||||
were used to describe re-ECN codepoints: | ||||
ENUM {White, Grey, Purple, Black, Green}. | ||||
4. Congestion Exposure Components | 4. Congestion Exposure Components | |||
{ToDo: Picture of the components, similar to that in the last | ||||
slideset about conex-concepts-uses?} | ||||
4.1. Modified Senders | 4.1. Modified Senders | |||
Send Congestion Exposure Signals per congestion signals. | The sending transport needs to be modified to send Congestion | |||
Exposure Signals in response to congestion feedback signals. | ||||
4.2. Policy Devices | 4.2. Receivers (Optionally Modified) | |||
4.2.1. Audit | The receiving transport may already feedback sufficiently useful | |||
signals to the sender so that it does not need to be altered. | ||||
For loss: detect retransmissions by monitoring sequence numbers. | However, a TCP receiver feeds back ECN congestion signals no more | |||
Assure that #retransmissions<=#Black_Loss | than once within a round trip. The sender may require more precise | |||
feedback from the receiver otherwise it will appear to be | ||||
understating its Congestion Exposure Signals (see Section 3.2.1). | ||||
(May need to include a fudge factor, because it would be more robust | Ideally, Congestion Exposure should be added to a transport like TCP | |||
to mark the packet after a retransmission. Otherwise network devices | without mandatory modifications to the receiver. But an optional | |||
that discard marked packets will cause connectivity failures, rather | modification to the receiver could be recommended for precision. | |||
than poor performance). | This was the approach taken when adding re-ECN to TCP | |||
[I-D.briscoe-tsvwg-re-ecn-tcp]. | ||||
For ECN: count Congestion Exposure Signals and ECN. Would normally | 4.3. Audit | |||
need to delay ECN by one RTT to avoid false positives. Alternative: | ||||
use Green (pre-credits) to assure that #ECN<=#Black_ECN+#GREEN, even | ||||
though the #Black_ECN is delayed by one RTT. | ||||
4.2.2. Policers and Shapers | To audit Congestion Exposure Signals against actual losses an auditor | |||
could use one of the following techniques: | ||||
{ToDo: Beware these terms are defined differently than the | TCP-specific approach: The auditor could monitor TCP flows or | |||
conventional usage.} | aggregates of flows, only holding state on a flow if it first | |||
sends a Credit or a Re-Echo-Loss marking. The auditor could | ||||
detect retransmissions by monitoring sequence numbers. It would | ||||
assure that (volume of retransmitted data) <= (volume of data | ||||
marked Re-Echo-Loss). Traffic would only be auditable in this way | ||||
if it conformed to the standard TCP protocol and the IP payload | ||||
was not encrypted (e.g. with IPsec). | ||||
{ToDo: Abridge from existing doc?} | Predominant bottleneck approach: Unlike the above TCP-specific | |||
solution, this technique would work for IP packets carrying any | ||||
transport layer protocol, and whether encrypted or not. But it | ||||
only works well for networks designed so that losses predominantly | ||||
occur under the management of one IP-aware node on the path. The | ||||
auditor could then be located at this bottleneck. It could simply | ||||
compare Congestion Exposure Signals with actual local losses. | ||||
Most consumer access networks are design to this model, e.g. the | ||||
radio network controller (RNC) in a cellular network or the | ||||
broadband remote access server (BRAS) in a digital subscriber line | ||||
(DSL) network. | ||||
The accuracy of an auditor at one predominant bottleneck might | ||||
still be sufficient, even if losses occasionally occurred at other | ||||
nodes in the network (e.g. border gateways). Although the auditor | ||||
at the predominant bottleneck would not always be able to detect | ||||
losses at other nodes, transports would not know where losses were | ||||
occurring either. Therefore any transport would not know which | ||||
losses it could cheat on without getting caught, and which ones it | ||||
couldn't. | ||||
To audit Congestion Exposure Signals against actual ECN markings or | ||||
losses, the auditor could work as follows: monitor flows or | ||||
aggregates of flows, only holding state on a flow if it first sends a | ||||
Credit or either Re-Echo marking. Count the number of bytes marked | ||||
with Credit or Re-Echo-ECN. Separately count the number of bytes | ||||
marked with ECN. Use Credits to assure that #ECN<=#Re-Echo- | ||||
ECN+#Credit, even though the Re-Echo-ECN markings are delayed by at | ||||
least one RTT. | ||||
Note that an auditing device involves no policy configuration; it | ||||
merely enforces protocol compliance, not policy. | ||||
4.4. Policy Devices | ||||
4.4.1. Congestion Policers | ||||
Note that a congestion policer can be implemented in a very similar | ||||
way to a bit-rate policer, but its effect is focused solely on | ||||
traffic causing congestion downstream, not on all traffic just in | ||||
case it causes congestion. | ||||
It monitors all ConEx traffic entering a network, or some | ||||
identifiable subset. Using Congestion Exposure signals, it measures | ||||
the amount of congestion being caused by this traffic. If this | ||||
exceeds a policy-configured 'congestion-bit-rate' the congestion | ||||
policer will limit all the monitored ConEx traffic. A congestion | ||||
policer can be implemented by a simple token bucket. But unlike a | ||||
bit-rate policer, it only removes tokens when forwarding packets that | ||||
a ConEx marked. See [CongPol] for details. | ||||
4.4.2. Other Policy Devices | ||||
Other policy devices that use Congestion Exposure signaling might | ||||
traffic traffic based on Congestion Exposure Signals in much the same | ||||
way as the monitoring element of a Congestion Policer. But the | ||||
resulting action could be different. It might re-route traffic or | ||||
downgrade the class of service. | ||||
It might do nothing directly to the traffic, but instead report | ||||
measurements of Congestion Exposure Signals to systems designed to | ||||
control congestion indirectly. For instance the measurements might | ||||
be used to trigger penalty clauses in contracts, to levy charges | ||||
between networks based on congestion or simply to notify customers | ||||
who cause excessive congestion. | ||||
5. IANA Considerations | 5. IANA Considerations | |||
This memo includes no request to IANA. | This memo includes no request to IANA. | |||
Note to RFC Editor: this section may be removed on publication as an | Note to RFC Editor: this section may be removed on publication as an | |||
RFC. | RFC. | |||
6. Security Considerations | 6. Security Considerations | |||
{ToDo:} | Significant parts of this whole document are about the auditability | |||
of Congestion Exposure Signals, in particular Section 4.3. | ||||
7. Conclusions | 7. Conclusions | |||
{ToDo:} | {ToDo:} | |||
8. Acknowledgements | 8. Acknowledgements | |||
This document was improved by review comments from Toby Moncaster. | This document was improved by review comments from Toby Moncaster. | |||
9. Comments Solicited | 9. Comments Solicited | |||
skipping to change at page 11, line 7 | skipping to change at page 13, line 42 | |||
10.1. Normative References | 10.1. Normative References | |||
[RFC2119] Bradner, S., "Key words for use in | [RFC2119] Bradner, S., "Key words for use in | |||
RFCs to Indicate Requirement | RFCs to Indicate Requirement | |||
Levels", BCP 14, RFC 2119, | Levels", BCP 14, RFC 2119, | |||
March 1997. | March 1997. | |||
10.2. Informative References | 10.2. Informative References | |||
[CongPol] Jacquet, A., Briscoe, B., and T. | ||||
Moncaster, "Policing Freedom to Use | ||||
the Internet Resource Pool", Proc | ||||
ACM Workshop on Re-Architecting the | ||||
Internet (ReArch'08) , | ||||
December 2008, <http:// | ||||
www.bobbriscoe.net/ | ||||
pubs.html#polfree>. | ||||
[I-D.briscoe-tsvwg-re-ecn-motiv] Briscoe, B., Jacquet, A., | [I-D.briscoe-tsvwg-re-ecn-motiv] Briscoe, B., Jacquet, A., | |||
Moncaster, T., and A. Smith, "Re- | Moncaster, T., and A. Smith, "Re- | |||
ECN: A Framework for adding | ECN: A Framework for adding | |||
Congestion Accountability to | Congestion Accountability to | |||
TCP/IP", draft-briscoe-tsvwg-re- | TCP/IP", draft-briscoe-tsvwg-re- | |||
ecn-tcp-motivation-01 (work in | ecn-tcp-motivation-01 (work in | |||
progress), September 2009. | progress), September 2009. | |||
[I-D.briscoe-tsvwg-re-ecn-tcp] Briscoe, B., Jacquet, A., | [I-D.briscoe-tsvwg-re-ecn-tcp] Briscoe, B., Jacquet, A., | |||
Moncaster, T., and A. Smith, "Re- | Moncaster, T., and A. Smith, "Re- | |||
ECN: Adding Accountability for | ECN: Adding Accountability for | |||
Causing Congestion to TCP/IP", | Causing Congestion to TCP/IP", | |||
draft-briscoe-tsvwg-re-ecn-tcp-08 | draft-briscoe-tsvwg-re-ecn-tcp-08 | |||
(work in progress), September 2009. | (work in progress), September 2009. | |||
[I-D.conex-concepts-uses] Briscoe, B., Woundy, R., Moncaster, | ||||
T., and J. Leslie, "ConEx Concepts | ||||
and Use Cases", draft-moncaster- | ||||
conex-concepts-uses-01 (work in | ||||
progress), July 2010. | ||||
[I-D.ietf-ledbat-congestion] Shalunov, S. and G. Hazel, "Low | [I-D.ietf-ledbat-congestion] Shalunov, S. and G. Hazel, "Low | |||
Extra Delay Background Transport | Extra Delay Background Transport | |||
(LEDBAT)", | (LEDBAT)", | |||
draft-ietf-ledbat-congestion-02 | draft-ietf-ledbat-congestion-02 | |||
(work in progress), July 2010. | (work in progress), July 2010. | |||
[I-D.sridharan-tcpm-ctcp] Sridharan, M., Tan, K., Bansal, D., | [I-D.sridharan-tcpm-ctcp] Sridharan, M., Tan, K., Bansal, D., | |||
and D. Thaler, "Compound TCP: A New | and D. Thaler, "Compound TCP: A New | |||
TCP Congestion Control for High- | TCP Congestion Control for High- | |||
Speed and Long Distance Networks", | Speed and Long Distance Networks", | |||
draft-sridharan-tcpm-ctcp-02 (work | draft-sridharan-tcpm-ctcp-02 (work | |||
in progress), November 2008. | in progress), November 2008. | |||
[RFC0791] Postel, J., "Internet Protocol", | ||||
STD 5, RFC 791, September 1981. | ||||
[RFC2309] Braden, B., Clark, D., Crowcroft, | [RFC2309] Braden, B., Clark, D., Crowcroft, | |||
J., Davie, B., Deering, S., Estrin, | J., Davie, B., Deering, S., Estrin, | |||
D., Floyd, S., Jacobson, V., | D., Floyd, S., Jacobson, V., | |||
Minshall, G., Partridge, C., | Minshall, G., Partridge, C., | |||
Peterson, L., Ramakrishnan, K., | Peterson, L., Ramakrishnan, K., | |||
Shenker, S., Wroclawski, J., and L. | Shenker, S., Wroclawski, J., and L. | |||
Zhang, "Recommendations on Queue | Zhang, "Recommendations on Queue | |||
Management and Congestion Avoidance | Management and Congestion Avoidance | |||
in the Internet", RFC 2309, | in the Internet", RFC 2309, | |||
April 1998. | April 1998. | |||
End of changes. 53 change blocks. | ||||
136 lines changed or deleted | 286 lines changed or added | |||
This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |