| < draft-mathis-conex-abstract-mech-00b.txt | draft-mathis-conex-abstract-mech-00c.txt > | |||
|---|---|---|---|---|
| Congestion Exposure (ConEx) M. Mathis | Congestion Exposure (ConEx) M. Mathis | |||
| Working Group Google | Working Group Google | |||
| Internet-Draft B. Briscoe | Internet-Draft B. Briscoe | |||
| Intended status: Informational BT | Intended status: Informational BT | |||
| Expires: April 17, 2011 October 14, 2010 | Expires: April 18, 2011 October 15, 2010 | |||
| Congestion Exposure (ConEx) Concepts and Abstract Mechanism | Congestion Exposure (ConEx) Concepts and Abstract Mechanism | |||
| draft-mathis-conex-abstract-mech-00b | draft-mathis-conex-abstract-mech-00c | |||
| Abstract | Abstract | |||
| This document describes an abstract mechanism by which senders inform | This document describes an abstract mechanism by which senders inform | |||
| the network about the congestion encountered by packets earlier in | the network about the congestion encountered by packets earlier in | |||
| the same flow. Today, the network may signal congestion to the | the same flow. Today, the network may signal congestion to the | |||
| receiver by ECN markings or by dropping packets, and the receiver may | receiver by ECN markings or by dropping packets, and the receiver may | |||
| pass this information back to the sender in transport-layer feedback. | pass this information back to the sender in transport-layer feedback. | |||
| The mechanism to be developed by the ConEx WG will enable the sender | The mechanism to be developed by the ConEx WG will enable the sender | |||
| to also relay this congestion information back into the network in- | to also relay this congestion information back into the network in- | |||
| skipping to change at page 1, line 40 | skipping to change at page 1, line 40 | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on April 17, 2011. | This Internet-Draft will expire on April 18, 2011. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2010 IETF Trust and the persons identified as the | Copyright (c) 2010 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 2, line 17 | skipping to change at page 2, line 17 | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 2. Requirements for the Congestion Exposure Signal . . . . . . . 5 | 2. Requirements for the Congestion Exposure Signal . . . . . . . 5 | |||
| 3. Representing Congestion Exposure . . . . . . . . . . . . . . . 7 | 3. Representing Congestion Exposure . . . . . . . . . . . . . . . 7 | |||
| 3.1. One Simple Encoding . . . . . . . . . . . . . . . . . . . 7 | 3.1. Strawman Encoding . . . . . . . . . . . . . . . . . . . . 7 | |||
| 3.2. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 8 | 3.2. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 8 | |||
| 3.2.1. ECN Changes . . . . . . . . . . . . . . . . . . . . . 8 | 3.2.1. ECN Changes . . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3.3. Abstract Encoding . . . . . . . . . . . . . . . . . . . . 9 | 3.3. Abstract Encoding . . . . . . . . . . . . . . . . . . . . 9 | |||
| 3.3.1. Separate Bits . . . . . . . . . . . . . . . . . . . . 9 | 3.3.1. Independent Bits . . . . . . . . . . . . . . . . . . . 9 | |||
| 3.3.2. Enumerated Encoding . . . . . . . . . . . . . . . . . 9 | 3.3.2. Codepoint Encoding . . . . . . . . . . . . . . . . . . 10 | |||
| 4. Congestion Exposure Components . . . . . . . . . . . . . . . . 9 | 4. Congestion Exposure Components . . . . . . . . . . . . . . . . 10 | |||
| 4.1. Modified Senders . . . . . . . . . . . . . . . . . . . . . 9 | 4.1. Modified Senders . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 4.2. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 9 | 4.2. Receivers (Optionally Modified) . . . . . . . . . . . . . 11 | |||
| 4.2.1. Audit . . . . . . . . . . . . . . . . . . . . . . . . 9 | 4.3. Audit . . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 4.2.2. Policers and Shapers . . . . . . . . . . . . . . . . . 10 | 4.4. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | 4.4.1. Congestion Policers . . . . . . . . . . . . . . . . . 12 | |||
| 6. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | 4.4.2. Other Policy Devices . . . . . . . . . . . . . . . . . 12 | |||
| 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10 | 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 | |||
| 9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 10 | 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 10.1. Normative References . . . . . . . . . . . . . . . . . . . 10 | 9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 10.2. Informative References . . . . . . . . . . . . . . . . . . 11 | 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 10.1. Normative References . . . . . . . . . . . . . . . . . . . 13 | ||||
| 10.2. Informative References . . . . . . . . . . . . . . . . . . 13 | ||||
| 1. Introduction | 1. Introduction | |||
| One of the required functions of a transport protocol is controlling | One of the required functions of a transport protocol is controlling | |||
| congestion in the network. There are three techniques in use today | congestion in the network. There are three techniques in use today | |||
| for the network to signal congestion to a transport: | for the network to signal congestion to a transport: | |||
| o The most common congestion signal is packet loss. When congested, | o The most common congestion signal is packet loss. When congested, | |||
| the network simply discards some packets either as part of an | the network simply discards some packets either as part of an | |||
| explicit control function [RFC2309] or as the consequence of a | explicit control function [RFC2309] or as the consequence of a | |||
| skipping to change at page 4, line 25 | skipping to change at page 4, line 25 | |||
| | Sender |>-(new)-IP layer Congestion Exposure Signal--->| Receiver| | | Sender |>-(new)-IP layer Congestion Exposure Signal--->| Receiver| | |||
| | | (Carried in Data Packet Headers) | | | | | (Carried in Data Packet Headers) | | | |||
| | | +-----------+ | | | | | +-----------+ | | | |||
| | |>=Data=Path=>|(Congested)|>=====Data=Path=====>| | | | |>=Data=Path=>|(Congested)|>=====Data=Path=====>| | | |||
| | | | Network |>-Congestion-Signal->| | | | | | Network |>-Congestion-Signal->| | | |||
| | | | Device | | | | | | | Device | | | | |||
| +---------+ +-----------+ +---------+ | +---------+ +-----------+ +---------+ | |||
| Not shown are policy devices along the data path that observe the | Not shown are policy devices along the data path that observe the | |||
| Congestion Exposure Signal, and use the information to monitor or | Congestion Exposure Signal, and use the information to monitor or | |||
| manage traffic. These are discussed in Section 4.2. | manage traffic. These are discussed in Section 4.4. | |||
| Figure 1 | Figure 1 | |||
| 1.1. Terminology | 1.1. Terminology | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in RFC 2119 [RFC2119]. | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
| ConEx signals in IP packet headers from the sender to the network | ConEx signals in IP packet headers from the sender to the network | |||
| {ToDo: These are placeholders for whatever words we decide to use}: | {ToDo: These are placeholders for whatever words we decide to use}: | |||
| Re-Echo Loss (aka Black-Loss) The transport has experienced a loss. | Not-ConEx (aka White) The transport is not ConEx-capable | |||
| Re-Echo ECN (aka Black-ECN) The transport has experienced an ECN | ConEx (aka Grey) The transport is ConEx-capable | |||
| mark | ||||
| Pre-Echo (aka Green) The transport is building up credit to allow | Re-Echo-Loss (aka Purple) The transport has experienced a loss. | |||
| for any future delay in expected ConEx signals | ||||
| Neutral (aka Grey) The transport is ConEx-capable | Re-Echo-ECN (aka Black) The transport has experienced an ECN mark | |||
| Not-ConEx (aka White) The transport is not ConEx-capable | ||||
| Credit (aka Green) The transport is building up credit to allow for | ||||
| any future delay in expected ConEx signals | ||||
| ConEx-Marked Any of Re-Echo-Loss, Re-Echo-ECN or Credit. | ||||
| ConEx-Unmarked ConEx, but not ConEx-Marked. | ||||
| 2. Requirements for the Congestion Exposure Signal | 2. Requirements for the Congestion Exposure Signal | |||
| Ideally, all the following requirements would be met by a Congestion | ||||
| Exposure Signal. However it is already known that some compromises | ||||
| will be necessary, therefore all the requirements are expressed with | ||||
| the keyword 'SHOULD' rather then 'MUST'. The only mandatory | ||||
| requirement is that a concrete protocol description MUST give sound | ||||
| reasoning if it chooses not to meet any of these requirements: | ||||
| a. The Congestion Exposure Signal SHOULD be visible to internetwork | a. The Congestion Exposure Signal SHOULD be visible to internetwork | |||
| layer devices along the entire path from the transport sender to | layer devices along the entire path from the transport sender to | |||
| the transport receiver. Equivalently, it SHOULD be present in | the transport receiver. Equivalently, it SHOULD be present in | |||
| the IPv4 or IPv6 header, and in the outermost IP header if using | the IPv4 or IPv6 header, and in the outermost IP header if using | |||
| IP in IP tunnelling. The Congestion Exposure Signal SHOULD be | IP in IP tunnelling. The Congestion Exposure Signal SHOULD be | |||
| immutable once set by the transport sender. A corollary of these | immutable once set by the transport sender. A corollary of these | |||
| requirements is that existing (legacy) networking gear SHOULD | requirements is that existing (legacy) networking gear SHOULD | |||
| pass the Congestion Exposure Signal silently without | pass the Congestion Exposure Signal silently without | |||
| modification. | modification. | |||
| skipping to change at page 5, line 47 | skipping to change at page 6, line 8 | |||
| actually experiencing. | actually experiencing. | |||
| d. The Congestion Exposure Signal SHOULD be timely. There will be a | d. The Congestion Exposure Signal SHOULD be timely. There will be a | |||
| delay between the time when an auditing device sees an actual | delay between the time when an auditing device sees an actual | |||
| congestion signal and when it sees the subsequent Congestion | congestion signal and when it sees the subsequent Congestion | |||
| Exposure Signal from the sender. The minimum delay will be one | Exposure Signal from the sender. The minimum delay will be one | |||
| round trip, but it may be much longer depending on the | round trip, but it may be much longer depending on the | |||
| transport's choice of feedback delay (consider RTCP [RFC3550] for | transport's choice of feedback delay (consider RTCP [RFC3550] for | |||
| example). It is not practical to expect auditing devices in the | example). It is not practical to expect auditing devices in the | |||
| network to make allowance for such feedback delays. Instead, the | network to make allowance for such feedback delays. Instead, the | |||
| sender MUST be able to send Congestion Exposure signals in | sender SHOULD be able to send Congestion Exposure signals in | |||
| advance, as 'credit' for any audit device to hold as a balance | advance, as 'credit' for any audit device to hold as a balance | |||
| against the risk of congestion during the feedback delay. This | against the risk of congestion during the feedback delay. This | |||
| design choice simplifies auditing devices and correctly makes the | design choice simplifies auditing devices and correctly makes the | |||
| transport responsible for both minimising feedback delay and | transport responsible for both minimising feedback delay and | |||
| minimising sharp increases in packets in flight that would risk | minimising sharp increases in packets in flight that would risk | |||
| causing excessive congestion to others. This issue is discussed | causing excessive congestion to others. This issue is discussed | |||
| in more detail in Section 4.2.1. | in more detail in Section 4.3. | |||
| It is important to note that the auditing requirement implies a | It is important to note that the auditing requirement implies a | |||
| number of additional constraints: The basic auditing technique is to | number of additional constraints: The basic auditing technique is to | |||
| count both actual congestion signals and Congestion Exposure Signals | count both actual congestion signals and Congestion Exposure Signals | |||
| someplace along the data path: | someplace along the data path: | |||
| o For congestion signaled by ECN, auditing is most accurate when | o For congestion signaled by ECN, auditing is most accurate when | |||
| located near the transport receiver. Within any flow or aggregate | located near the transport receiver. Within any flow or aggregate | |||
| of flows, the total volume of ECN marked data seen near the | of flows, the total volume of ECN marked data seen near the | |||
| receiver should always be equal to or less than the volume of data | receiver should always be equal to or less than the volume of data | |||
| skipping to change at page 6, line 28 | skipping to change at page 6, line 37 | |||
| o For congestion signaled by loss, totally accurate auditing is not | o For congestion signaled by loss, totally accurate auditing is not | |||
| believed to be possible in the general case, because it involves a | believed to be possible in the general case, because it involves a | |||
| network node detecting the absence of some packets, when it cannot | network node detecting the absence of some packets, when it cannot | |||
| necessarily see the transport protocol sequence numbers and when | necessarily see the transport protocol sequence numbers and when | |||
| the missing packets might simply be taking a different route. But | the missing packets might simply be taking a different route. But | |||
| there are common cases where sufficient audit accuracy should be | there are common cases where sufficient audit accuracy should be | |||
| possible: | possible: | |||
| * For non-IPsec traffic conforming to standard TCP sequence | * For non-IPsec traffic conforming to standard TCP sequence | |||
| numbering on a single path, the auditor could detect losses by | numbering on a single path, an auditor could detect losses by | |||
| observing both the original transmission and the retransmission | observing both the original transmission and the retransmission | |||
| after the loss. Such auditing would be most accurate near the | after the loss. Such auditing would be most accurate near the | |||
| sender. | sender. | |||
| * For networks designed so that losses predominantly occur under | * For networks designed so that losses predominantly occur under | |||
| the management of one IP-aware node on the path, the auditor | the management of one IP-aware node on the path, the auditor | |||
| could be located at this bottleneck. It could simply compare | could be located at this bottleneck. It could simply compare | |||
| Congestion Exposure Signals with actual local losses. Most | Congestion Exposure Signals with actual local losses. This is | |||
| consumer access networks are design to this model, e.g. the | a good model for most consumer access networks and audit | |||
| radio network controller (RNC) in a cellular network or the | accuracy could well be sufficient even if losses occasionally | |||
| broadband remote access server (BRAS) in a digital subscriber | occurred at other nodes in the network, such as border gateways | |||
| line (DSL) network. Unlike the above TCP-specific solution, | (see Section 4.3 for details). | |||
| this would work for IP packets carrying any transport layer | ||||
| protocol, and whether encrypted or not. | ||||
| The accuracy of an auditor at one predominant bottleneck might | ||||
| still be sufficient, even if losses occasionally occurred at | ||||
| other nodes in the network (e.g. border gateways). Although | ||||
| the auditor at the predominant bottleneck would not always be | ||||
| able to detect losses at other nodes, transports would not know | ||||
| where losses were occurring either. Therefore any transport | ||||
| would not know which losses it could cheat on without getting | ||||
| caught, and which ones it couldn't. | ||||
| Given that loss-based and ECN-based Congestion Exposure might | Given that loss-based and ECN-based Congestion Exposure might | |||
| sometimes be best audited at different locations, have distinct | sometimes be best audited at different locations, having distinct | |||
| encodings would widen the design space for the auditing function. | encodings would widen the design space for the auditing function. | |||
| {Bob: Got to here making suggested changes.} | ||||
| 3. Representing Congestion Exposure | 3. Representing Congestion Exposure | |||
| Most protocol specifications start with a description of packet | Most protocol specifications start with a description of packet | |||
| formats and code points with their associated meanings. This | formats and codepoints with their associated meanings. This document | |||
| document does not: It is already known that choosing the encoding for | does not: It is already known that choosing the encoding for the | |||
| the Congestion Exposure Signal is likely to entail some engineering | Congestion Exposure Signal is likely to entail some engineering | |||
| compromises that have the potential to reduce the protocol's | compromises that have the potential to reduce the protocol's | |||
| usefulness in some settings. Rather than making these engineering | usefulness in some settings. Rather than making these engineering | |||
| choices prematurely, this document side steps the encoding problem by | choices prematurely, this document side steps the encoding problem by | |||
| describing an abstract representation of Congestion Exposure Signal. | describing an abstract representation of a Congestion Exposure | |||
| All of the elements of the protocol can be defined in terms of this | Signal. All of the elements of the protocol can be defined in terms | |||
| abstract representation. Most important, the preliminary use cases | of this abstract representation. Most important, the preliminary use | |||
| for the protocol are described in terms of the abstract | cases for the protocol are described in terms of the abstract | |||
| representation in companion documents. | representation in companion documents [I-D.conex-concepts-uses]. | |||
| Once we have some example use cases we can evaluate different | Once we have some example use cases we can evaluate different | |||
| encoding schemes. Since these schemes are likely to include some | encoding schemes. Since these schemes are likely to include some | |||
| conflated code points, some information will be lost resulting in | conflated code points, some information will be lost resulting in | |||
| weakening or disabling some of the algorithms and eliminating some | weakening or disabling some of the algorithms and eliminating some | |||
| use cases. | use cases. | |||
| The goal of this approach is to be as complete as possible for | The goal of this approach is to be as complete as possible for | |||
| discovering the potential usage and capabilities of the Congestion | discovering the potential usage and capabilities of the Congestion | |||
| Exposure protocol, so we have some hope of making optimal design | Exposure protocol, so we have some hope of making optimal design | |||
| decisions when choosing the encoding. | decisions when choosing the encoding. | |||
| 3.1. One Simple Encoding | 3.1. Strawman Encoding | |||
| As an aid to the reader, it might be helpful to describe one simple | ||||
| encoding of the Congestion Exposure protocol: set IPv4 header bit 48 | ||||
| (aka the "evil bit" [RFC3514]) on all retransmissions or once per ECN | ||||
| signaled window reduction. Clearly network devices along the forward | ||||
| path can see this bit and act on it. For example they can count | ||||
| marked and unmarked packets to estimate the congestion levels along | ||||
| the path. | ||||
| However this encoding has been forbidden by RFC xxxx, which seeks to | As an aid to the reader, it might be helpful to describe a naive | |||
| preserve the last unallocated bit in the IPv4 header for some | strawman encoding of the Congestion Exposure protocol described | |||
| unspecifed future use. | solely in terms of TCP: set the Reserved bit in the IPv4 header (bit | |||
| 48 counting from zero [RFC0791]--aka the "evil bit" [RFC3514]) on all | ||||
| retransmissions or once per ECN signaled window reduction. Clearly | ||||
| network devices along the forward path can see this bit and act on | ||||
| it. For example they can count marked and unmarked packets to | ||||
| estimate the congestion levels along the path. | ||||
| Furthermore this encoding, by itself, does not sufficiently support | However, the IESG has chartered the ConEx working group to establish | |||
| partial deployment or strong auditing and might motivate users and/or | that there is sufficient demand for an IPv6 ConEx protocol before | |||
| applications to misrepresent the congestion that they are be causing. | using the last available bit in the IPv4 header. Furthermore this | |||
| encoding, by itself, does not sufficiently support partial deployment | ||||
| or strong auditing and might motivate users and/or applications to | ||||
| misrepresent the congestion that they are causing. | ||||
| However, this simple encoding does present a clear mental model of | Nonetheless, this strawman encoding does present a clear mental model | |||
| how the Congestion Exposure protocol functions and is very useful for | of how the Congestion Exposure protocol might function under various | |||
| conducting thought experiments about how the protocol might function | uses. | |||
| under various uses. | ||||
| 3.2. ECN Based Encoding | 3.2. ECN Based Encoding | |||
| Bob Briscoe's PhD thesis [Refb-dis], and many derivative works | The re-ECN specification [I-D.briscoe-tsvwg-re-ecn-tcp] presents an | |||
| including RE-ECN [I-D.briscoe-tsvwg-re-ecn-tcp] present an ECN based | ECN based implementation of ConEx. The central theme of this work is | |||
| implementation of ConEx. The central theme of this work includes | an audit mechanism that can provide sufficient disincentives against | |||
| strong disincentives for misrepresenting congestion | misrepresenting congestion [I-D.briscoe-tsvwg-re-ecn-motiv], which is | |||
| [I-D.briscoe-tsvwg-re-ecn-motiv]. However, it also pre-supposes the | analysed extensively in Briscoe's PhD dissertation [Refb-dis]. | |||
| full deployment of ECN, and does not adequately signal congestion | ||||
| indicated by packet loss. Furthermore, given that after 10 years ECN | ||||
| still has not been widely deployed, it does not seem prudent to | ||||
| require its deployment as a prerequisite for deploying a Congestion | ||||
| Exposure protocol. | ||||
| As it currently stands, this work fails to meet the "partial | The re-ECN encoding is tightly integrated with the encoding of ECN in | |||
| deployment" requirement described above in section Section 2. | the IP header. However, re-ECN can be incrementally deployed on | |||
| hosts whether or not any networks support ECN marking and whether or | ||||
| not any networks take note of re-ECN markings. Nonetheless, the | ||||
| audit function has only been formally analysed where at least one | ||||
| autonomous network has deployed ECN marking, which it uses to audit | ||||
| whether the Congestion Exposure Signal matches actual congestion. | ||||
| Thus, even if networks have not deployed ECN, re-ECN acts perfectly | ||||
| well as a loss-based Congestion Exposure protocol. As such, a | ||||
| network could potentially audit re-ECN signals against losses using | ||||
| the loss-based audit techniques in Section 4.3, rather than deploying | ||||
| ECN. | ||||
| Although re-ECN does not require networks to support ECN, it still | ||||
| embodies a major incremental deployment challenge; a sender cannot | ||||
| use re-ECN unless the receiver at least supports ECN. Most operating | ||||
| systems currently being supplied (late 2010) implement ECN, but it is | ||||
| turned off by default at the client end, even though it is on by | ||||
| default at the server end. This is primarily because one home | ||||
| gateway model widely supplied in 2006 crashes if a TCP client behind | ||||
| it attempts to use ECN (there are issues with some other home | ||||
| gateways from that era, but they are surmountable with ECN black-hole | ||||
| detection). | ||||
| Given that, 10 years after standardisation, ECN has still not been | ||||
| widely enabled on TCP clients, if at all possible the Congestion | ||||
| Exposure protocol should not require the receiver to be ECN capable. | ||||
| Therefore, as it currently stands, the re-ECN encoding would fail to | ||||
| meet the "partial deployment" requirement of Section 2. | ||||
| For a tutorial background on Re-Feedback techniques, see [,,] {Bob: | For a tutorial background on Re-Feedback techniques, see [,,] {Bob: | |||
| Matt, What did you have in mind here? SIGCOMM'05 paper? IEEE | Matt, What did you have in mind here? SIGCOMM'05 paper? IEEE | |||
| Spectrum article? Re-ECN Web page?}. | Spectrum article? Re-ECN Web page?}. | |||
| 3.2.1. ECN Changes | 3.2.1. ECN Changes | |||
| It is important to note that Briscoe's work proposes some relatively | Although the re-ECN protocol requires no changes to the network side | |||
| minor modifications to the ECN protocol specified in RFC 3168. They | of the ECN protocol, it is important to note that it does propose | |||
| include: redefining the ECT(0) and ECT(1) code points (this is | some relatively minor modifications to the host-to-host aspects of | |||
| consistent with RFC3168 but requires deprecating [RFC3540]); | the ECN protocol specified in RFC 3168. They include: redefining the | |||
| permitting routers to send ECN signals at a different threshold than | ECT(1) code point (the change is consistent with RFC3168 but requires | |||
| packet loss; modifications to the ECN negotiations carried on the SYN | deprecating the experimental ECN nonce [RFC3540]); modifications to | |||
| and SYN-ACK; and using a different state machine to carry ECN signals | the ECN negotiations carried on the SYN and SYN-ACK; and using a | |||
| in the transport acknowledgments from the Receiver to the Sender. | different state machine to carry ECN signals in the transport | |||
| This later change permits the transport protocol to carry multiple | acknowledgments from the Receiver to the Sender. This last change | |||
| congestion signals per round trip, and greatly simplifies accurate | permits the transport protocol to carry multiple congestion signals | |||
| auditing. | per round trip, and greatly simplifies accurate auditing. | |||
| All of these adjustments to RFC 3168 may also be needed in a future | All of these adjustments to RFC 3168 may also be needed in a future | |||
| standardized Congestion Exposure protocol. There will be very | standardized Congestion Exposure protocol. There will need to be | |||
| careful considerations about any proposed changes to ECN or other | very careful consideration of any proposed changes to ECN or other | |||
| existing protocols, because any such changes increase the cost of | existing protocols, because any such changes increase the cost of | |||
| deployment. | deployment. | |||
| 3.3. Abstract Encoding | 3.3. Abstract Encoding | |||
| {ToDo: Not really done, extra terse} | The Congestion Exposure protocol could take one of two different | |||
| encodings: independently settable bits or an enumerated set of | ||||
| mutually exclusive codepoints. | ||||
| Model with two different encodings: individual bits or as an | In both cases, the amount of congestion is signaled by the volume of | |||
| enumerated set. Enumerated encoding is probably good enough for most | marked data--just as the volume of lost data or ECN marked data | |||
| purposes, but it must not be forgotten that it does lose some small | signals the amount of congestion experienced. Thus the size of each | |||
| amount of information. | packet carrying a Congestion Exposure Signal is signficant. | |||
| 3.3.1. Separate Bits | 3.3.1. Independent Bits | |||
| One bit each for | This encoding involves a field of four flag bits, each of which the | |||
| sender can set independently to indicate to the network that: | ||||
| o Not supported (implicit signal from legacy transport senders) | ConEx (Not-ConEx) The transport is (or is not) using ConEx with this | |||
| packet (the protocol MUST be arranged so that legacy transport | ||||
| senders implicitly send Not-ConEx) | ||||
| o Congestion indicated by packet losses | Re-Echo-Loss (Not-Re-Echo-Loss) The transport has (or has not) | |||
| experienced a loss | ||||
| o ECN signaled congestion | Re-Echo-ECN (Not-Re-Echo-ECN) The transport has (or has not) | |||
| experienced ECN signaled congestion | ||||
| o Pre-congestion credit (AKA green). See Section 4.2.1 devices | Credit (Not-Credit) The transport is (or is not) building up | |||
| below. | congestion credit (see Section 4.3 on audit devices) | |||
| 3.3.2. Enumerated Encoding | 3.3.2. Codepoint Encoding | |||
| For enumerated encoding some marks must be delayed such that each | This encoding involves a bit-field large enough to signal one of the | |||
| packet only carries at most one mark. | following five codepoints: | |||
| ENUM {Not_Supported, No_Mark, Black_ECN, Black_Loss, Green} | ENUM {Not-ConEx, ConEx, Re-Echo-Loss, Re-Echo-ECN, Credit} | |||
| Each named codepoint has the same meaning as in the encoding using | ||||
| independent bits (Section 3.3.1). The use of any one codepoint | ||||
| implies the negative of all the others, except the last three | ||||
| codepoints (Re-Echo-Loss, Re-Echo-ECN and Credit) obviously also | ||||
| imply ConEx is supported. | ||||
| Inherently, the semantics of most of the enumerated codepoints are | ||||
| mutually exclusive. 'Credit' is the only one that might need to be | ||||
| used in combination with either Re-Echo-Loss or Re-Echo-ECN, but even | ||||
| that requirement is questionable. It must not be forgotten that the | ||||
| enumerated encoding loses the flexibility to signal these two | ||||
| combinations, whereas the encoding with four independent bits is not | ||||
| so limited. Alternatively two extra codepoints could be assigned to | ||||
| these two combinations of semantics. | ||||
| {ToDo: Default behaviour for Currently Unused codepoints} | ||||
| {ToDo: Signal from Policer to Receiver to distinguish policy-induced | ||||
| drop from congestion-induced drop} | ||||
| Some might prefer to use the following colours respectively for each | ||||
| codepoint. The same colours as follows (with the omission of Purple) | ||||
| were used to describe re-ECN codepoints: | ||||
| ENUM {White, Grey, Purple, Black, Green}. | ||||
| 4. Congestion Exposure Components | 4. Congestion Exposure Components | |||
| {ToDo: Picture of the components, similar to that in the last | ||||
| slideset about conex-concepts-uses?} | ||||
| 4.1. Modified Senders | 4.1. Modified Senders | |||
| Send Congestion Exposure Signals per congestion signals. | The sending transport needs to be modified to send Congestion | |||
| Exposure Signals in response to congestion feedback signals. | ||||
| 4.2. Policy Devices | 4.2. Receivers (Optionally Modified) | |||
| 4.2.1. Audit | The receiving transport may already feedback sufficiently useful | |||
| signals to the sender so that it does not need to be altered. | ||||
| For loss: detect retransmissions by monitoring sequence numbers. | However, a TCP receiver feeds back ECN congestion signals no more | |||
| Assure that #retransmissions<=#Black_Loss | than once within a round trip. The sender may require more precise | |||
| feedback from the receiver otherwise it will appear to be | ||||
| understating its Congestion Exposure Signals (see Section 3.2.1). | ||||
| (May need to include a fudge factor, because it would be more robust | Ideally, Congestion Exposure should be added to a transport like TCP | |||
| to mark the packet after a retransmission. Otherwise network devices | without mandatory modifications to the receiver. But an optional | |||
| that discard marked packets will cause connectivity failures, rather | modification to the receiver could be recommended for precision. | |||
| than poor performance). | This was the approach taken when adding re-ECN to TCP | |||
| [I-D.briscoe-tsvwg-re-ecn-tcp]. | ||||
| For ECN: count Congestion Exposure Signals and ECN. Would normally | 4.3. Audit | |||
| need to delay ECN by one RTT to avoid false positives. Alternative: | ||||
| use Green (pre-credits) to assure that #ECN<=#Black_ECN+#GREEN, even | ||||
| though the #Black_ECN is delayed by one RTT. | ||||
| 4.2.2. Policers and Shapers | To audit Congestion Exposure Signals against actual losses an auditor | |||
| could use one of the following techniques: | ||||
| {ToDo: Beware these terms are defined differently than the | TCP-specific approach: The auditor could monitor TCP flows or | |||
| conventional usage.} | aggregates of flows, only holding state on a flow if it first | |||
| sends a Credit or a Re-Echo-Loss marking. The auditor could | ||||
| detect retransmissions by monitoring sequence numbers. It would | ||||
| assure that (volume of retransmitted data) <= (volume of data | ||||
| marked Re-Echo-Loss). Traffic would only be auditable in this way | ||||
| if it conformed to the standard TCP protocol and the IP payload | ||||
| was not encrypted (e.g. with IPsec). | ||||
| {ToDo: Abridge from existing doc?} | Predominant bottleneck approach: Unlike the above TCP-specific | |||
| solution, this technique would work for IP packets carrying any | ||||
| transport layer protocol, and whether encrypted or not. But it | ||||
| only works well for networks designed so that losses predominantly | ||||
| occur under the management of one IP-aware node on the path. The | ||||
| auditor could then be located at this bottleneck. It could simply | ||||
| compare Congestion Exposure Signals with actual local losses. | ||||
| Most consumer access networks are design to this model, e.g. the | ||||
| radio network controller (RNC) in a cellular network or the | ||||
| broadband remote access server (BRAS) in a digital subscriber line | ||||
| (DSL) network. | ||||
| The accuracy of an auditor at one predominant bottleneck might | ||||
| still be sufficient, even if losses occasionally occurred at other | ||||
| nodes in the network (e.g. border gateways). Although the auditor | ||||
| at the predominant bottleneck would not always be able to detect | ||||
| losses at other nodes, transports would not know where losses were | ||||
| occurring either. Therefore any transport would not know which | ||||
| losses it could cheat on without getting caught, and which ones it | ||||
| couldn't. | ||||
| To audit Congestion Exposure Signals against actual ECN markings or | ||||
| losses, the auditor could work as follows: monitor flows or | ||||
| aggregates of flows, only holding state on a flow if it first sends a | ||||
| Credit or either Re-Echo marking. Count the number of bytes marked | ||||
| with Credit or Re-Echo-ECN. Separately count the number of bytes | ||||
| marked with ECN. Use Credits to assure that #ECN<=#Re-Echo- | ||||
| ECN+#Credit, even though the Re-Echo-ECN markings are delayed by at | ||||
| least one RTT. | ||||
| Note that an auditing device involves no policy configuration; it | ||||
| merely enforces protocol compliance, not policy. | ||||
| 4.4. Policy Devices | ||||
| 4.4.1. Congestion Policers | ||||
| Note that a congestion policer can be implemented in a very similar | ||||
| way to a bit-rate policer, but its effect is focused solely on | ||||
| traffic causing congestion downstream, not on all traffic just in | ||||
| case it causes congestion. | ||||
| It monitors all ConEx traffic entering a network, or some | ||||
| identifiable subset. Using Congestion Exposure signals, it measures | ||||
| the amount of congestion being caused by this traffic. If this | ||||
| exceeds a policy-configured 'congestion-bit-rate' the congestion | ||||
| policer will limit all the monitored ConEx traffic. A congestion | ||||
| policer can be implemented by a simple token bucket. But unlike a | ||||
| bit-rate policer, it only removes tokens when forwarding packets that | ||||
| a ConEx marked. See [CongPol] for details. | ||||
| 4.4.2. Other Policy Devices | ||||
| Other policy devices that use Congestion Exposure signaling might | ||||
| traffic traffic based on Congestion Exposure Signals in much the same | ||||
| way as the monitoring element of a Congestion Policer. But the | ||||
| resulting action could be different. It might re-route traffic or | ||||
| downgrade the class of service. | ||||
| It might do nothing directly to the traffic, but instead report | ||||
| measurements of Congestion Exposure Signals to systems designed to | ||||
| control congestion indirectly. For instance the measurements might | ||||
| be used to trigger penalty clauses in contracts, to levy charges | ||||
| between networks based on congestion or simply to notify customers | ||||
| who cause excessive congestion. | ||||
| 5. IANA Considerations | 5. IANA Considerations | |||
| This memo includes no request to IANA. | This memo includes no request to IANA. | |||
| Note to RFC Editor: this section may be removed on publication as an | Note to RFC Editor: this section may be removed on publication as an | |||
| RFC. | RFC. | |||
| 6. Security Considerations | 6. Security Considerations | |||
| {ToDo:} | Significant parts of this whole document are about the auditability | |||
| of Congestion Exposure Signals, in particular Section 4.3. | ||||
| 7. Conclusions | 7. Conclusions | |||
| {ToDo:} | {ToDo:} | |||
| 8. Acknowledgements | 8. Acknowledgements | |||
| This document was improved by review comments from Toby Moncaster. | This document was improved by review comments from Toby Moncaster. | |||
| 9. Comments Solicited | 9. Comments Solicited | |||
| skipping to change at page 11, line 7 | skipping to change at page 13, line 42 | |||
| 10.1. Normative References | 10.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in | [RFC2119] Bradner, S., "Key words for use in | |||
| RFCs to Indicate Requirement | RFCs to Indicate Requirement | |||
| Levels", BCP 14, RFC 2119, | Levels", BCP 14, RFC 2119, | |||
| March 1997. | March 1997. | |||
| 10.2. Informative References | 10.2. Informative References | |||
| [CongPol] Jacquet, A., Briscoe, B., and T. | ||||
| Moncaster, "Policing Freedom to Use | ||||
| the Internet Resource Pool", Proc | ||||
| ACM Workshop on Re-Architecting the | ||||
| Internet (ReArch'08) , | ||||
| December 2008, <http:// | ||||
| www.bobbriscoe.net/ | ||||
| pubs.html#polfree>. | ||||
| [I-D.briscoe-tsvwg-re-ecn-motiv] Briscoe, B., Jacquet, A., | [I-D.briscoe-tsvwg-re-ecn-motiv] Briscoe, B., Jacquet, A., | |||
| Moncaster, T., and A. Smith, "Re- | Moncaster, T., and A. Smith, "Re- | |||
| ECN: A Framework for adding | ECN: A Framework for adding | |||
| Congestion Accountability to | Congestion Accountability to | |||
| TCP/IP", draft-briscoe-tsvwg-re- | TCP/IP", draft-briscoe-tsvwg-re- | |||
| ecn-tcp-motivation-01 (work in | ecn-tcp-motivation-01 (work in | |||
| progress), September 2009. | progress), September 2009. | |||
| [I-D.briscoe-tsvwg-re-ecn-tcp] Briscoe, B., Jacquet, A., | [I-D.briscoe-tsvwg-re-ecn-tcp] Briscoe, B., Jacquet, A., | |||
| Moncaster, T., and A. Smith, "Re- | Moncaster, T., and A. Smith, "Re- | |||
| ECN: Adding Accountability for | ECN: Adding Accountability for | |||
| Causing Congestion to TCP/IP", | Causing Congestion to TCP/IP", | |||
| draft-briscoe-tsvwg-re-ecn-tcp-08 | draft-briscoe-tsvwg-re-ecn-tcp-08 | |||
| (work in progress), September 2009. | (work in progress), September 2009. | |||
| [I-D.conex-concepts-uses] Briscoe, B., Woundy, R., Moncaster, | ||||
| T., and J. Leslie, "ConEx Concepts | ||||
| and Use Cases", draft-moncaster- | ||||
| conex-concepts-uses-01 (work in | ||||
| progress), July 2010. | ||||
| [I-D.ietf-ledbat-congestion] Shalunov, S. and G. Hazel, "Low | [I-D.ietf-ledbat-congestion] Shalunov, S. and G. Hazel, "Low | |||
| Extra Delay Background Transport | Extra Delay Background Transport | |||
| (LEDBAT)", | (LEDBAT)", | |||
| draft-ietf-ledbat-congestion-02 | draft-ietf-ledbat-congestion-02 | |||
| (work in progress), July 2010. | (work in progress), July 2010. | |||
| [I-D.sridharan-tcpm-ctcp] Sridharan, M., Tan, K., Bansal, D., | [I-D.sridharan-tcpm-ctcp] Sridharan, M., Tan, K., Bansal, D., | |||
| and D. Thaler, "Compound TCP: A New | and D. Thaler, "Compound TCP: A New | |||
| TCP Congestion Control for High- | TCP Congestion Control for High- | |||
| Speed and Long Distance Networks", | Speed and Long Distance Networks", | |||
| draft-sridharan-tcpm-ctcp-02 (work | draft-sridharan-tcpm-ctcp-02 (work | |||
| in progress), November 2008. | in progress), November 2008. | |||
| [RFC0791] Postel, J., "Internet Protocol", | ||||
| STD 5, RFC 791, September 1981. | ||||
| [RFC2309] Braden, B., Clark, D., Crowcroft, | [RFC2309] Braden, B., Clark, D., Crowcroft, | |||
| J., Davie, B., Deering, S., Estrin, | J., Davie, B., Deering, S., Estrin, | |||
| D., Floyd, S., Jacobson, V., | D., Floyd, S., Jacobson, V., | |||
| Minshall, G., Partridge, C., | Minshall, G., Partridge, C., | |||
| Peterson, L., Ramakrishnan, K., | Peterson, L., Ramakrishnan, K., | |||
| Shenker, S., Wroclawski, J., and L. | Shenker, S., Wroclawski, J., and L. | |||
| Zhang, "Recommendations on Queue | Zhang, "Recommendations on Queue | |||
| Management and Congestion Avoidance | Management and Congestion Avoidance | |||
| in the Internet", RFC 2309, | in the Internet", RFC 2309, | |||
| April 1998. | April 1998. | |||
| End of changes. 53 change blocks. | ||||
| 136 lines changed or deleted | 286 lines changed or added | |||
This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||