TOC |
|
This document describes an abstract mechanism by which senders inform the network about the congestion encountered by packets earlier in the same flow. Today, the network may signal congestion to the receiver by ECN markings or by dropping packets, and the receiver may pass this information back to the sender in transport-layer feedback. The mechanism to be developed by the ConEx WG will enable the sender to also relay this congestion information back into the network in-band at the IP layer, such that the total level of congestion is visible to all IP devices along the path, from where it could, for example, be provided as input to traffic management.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”
This Internet-Draft will expire on April 22, 2011.
Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
1.
Introduction
1.1.
Terminology
2.
Requirements for the ConEx Signal
3.
Representing Congestion Exposure
3.1.
Strawman Encoding
3.2.
ECN Based Encoding
3.2.1.
ECN Changes
3.3.
Abstract Encoding
3.3.1.
Independent Bits
3.3.2.
Codepoint Encoding
4.
Congestion Exposure Components
4.1.
Modified Senders
4.2.
Receivers (Optionally Modified)
4.3.
Audit
4.4.
Policy Devices
4.4.1.
Congestion Policers
4.4.2.
Other Policy Devices
5.
IANA Considerations
6.
Security Considerations
7.
Conclusions
8.
Acknowledgements
9.
Comments Solicited
10.
References
10.1.
Normative References
10.2.
Informative References
TOC |
One of the required functions of a transport protocol is controlling congestion in the network. There are three techniques in use today for the network to signal congestion to a transport:
In all cases the congestion signals follow the route indicated in Figure 1. A congested network device sends a signal in the data stream on the forward path to the transport receiver, the receiver passes it back to the sender through transport level feedback, and the sender makes some congestion control adjustment.
This document proposes to extend the capabilities of the Internet protocol suite with the addition of a ConEx Signal that, to a first approximation, relays the congestion information from the transport sender back through the internetwork layer. That signal is shown in Figure 1. It would be visible to all internetwork layer devices along the forward (data) path and is intended to support a number of new policy-controlled mechanisms that might be used to manage traffic.
123456789012345678901234567890123456789012345678901234567890123456789 +---------+ +---------+ | |<==Feedback Path==============================<| | | |<--Transport Layer returned Congestion Signal-<| | | | | | |Transport| |Transport| | Sender |>---------(new)-IP layer ConEx Signal--------->| Receiver| | | (Carried in Data Packet Headers) | | | | +-----------+ | | | |>=Data=Path=>|(Congested)|>=====Data=Path=====>| | | | | Network |>-Congestion-Signal->| | | | | Device | | | +---------+ +-----------+ +---------+
Not shown are policy devices along the data path that observe the ConEx Signal, and use the information to monitor or manage traffic. These are discussed in Section 4.4 (Policy Devices).
Figure 1 |
TOC |
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119] (Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” March 1997.).
ConEx signals in IP packet headers from the sender to the network {ToDo: These are placeholders for whatever words we decide to use}:
- Not-ConEx:
- The transport is not ConEx-capable
- ConEx-Capable:
- The transport is ConEx-Capable. This is the opposite of Not-ConEx and implies one of the following signals
- Re-Echo-Loss:
- (aka Purple) The transport has experienced a loss
- Re-Echo-ECN:
- (aka Black) The transport has experienced an ECN mark
- Credit:
- (aka Green) The transport is building up credit to allow for any future delay in expected ConEx signals
- ConEx-Not-Marked:
- The transport is ConEx-capable but is signaling none of Re-Echo-Loss, Re-Echo-ECN or Credit
- ConEx-Marked:
- At least one of Re-Echo-Loss, Re-Echo-ECN or Credit.
TOC |
Ideally, all the following requirements would be met by a Congestion Exposure Signal. However it is already known that some compromises will be necessary, therefore all the requirements are expressed with the keyword 'SHOULD' rather than 'MUST'. The only mandatory requirement is that a concrete protocol description MUST give sound reasoning if it chooses not to meet any of these requirements:
- a.
- The ConEx Signal SHOULD be visible to internetwork layer devices along the entire path from the transport sender to the transport receiver. Equivalently, it SHOULD be present in the IPv4 or IPv6 header, and in the outermost IP header if using IP in IP tunneling. The ConEx Signal SHOULD be immutable once set by the transport sender. A corollary of these requirements is that existing (legacy) networking gear SHOULD pass the Congestion Exposure Signal silently without modification.
- b.
- The ConEx Signal SHOULD be useful under only partial deployment. A minimal deployment SHOULD only require changes to transport senders. Furthermore, partial deployment SHOULD create incentives for additional deployment, both in terms of enabling ConEx on more devices and adding richer features to existing devices. Nonetheless, ConEx deployment need never be universal, and it is anticipated that some hosts and some transports may never support the ConEx Protocol and some networks may never use the ConEx Signals.
- c.
- The ConEx Signal SHOULD be accurate. In potentially hostile environments such as the public Internet, it SHOULD be possible for techniques to be deployed to audit the Congestion Exposure Signal by comparing it to the actual congestion signals on the forward data path. The auditing mechanism must have a capability for providing sufficient disincentives against misreported congestion, such as by throttling traffic that reports less congestion than it is actually experiencing.
- d.
- The ConEx Signal SHOULD be timely. There will be a delay between the time when an auditing device sees an actual congestion signal and when it sees the subsequent Congestion Exposure Signal from the sender. The minimum delay will be one round trip, but it may be much longer depending on the transport's choice of feedback delay (consider RTCP [RFC3550] (Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, “RTP: A Transport Protocol for Real-Time Applications,” July 2003.) for example). It is not practical to expect auditing devices in the network to make allowance for such feedback delays. Instead, the sender SHOULD be able to send ConEx signals in advance, as 'credit' for any audit device to hold as a balance against the risk of congestion during the feedback delay. This design choice simplifies auditing devices and correctly makes the transport responsible for both minimizing feedback delay and minimizing sharp increases in packets in flight that would risk causing excessive congestion to others. This issue is discussed in more detail in Section 4.3 (Audit).
It is important to note that the auditing requirement implies a number of additional constraints: The basic auditing technique is to count both actual congestion signals and ConEx Signals someplace along the data path:
Given that loss-based and ECN-based ConEx might sometimes be best audited at different locations, having distinct encodings would widen the design space for the auditing function.
TOC |
Most protocol specifications start with a description of packet formats and codepoints with their associated meanings. This document does not: It is already known that choosing the encoding for the ConEx Signal is likely to entail some engineering compromises that have the potential to reduce the protocol's usefulness in some settings. Rather than making these engineering choices prematurely, this document side steps the encoding problem by describing an abstract representation of ConEx Signals. All of the elements of the protocol can be defined in terms of this abstract representation. Most important, the preliminary use cases for the protocol are described in terms of the abstract representation in companion documents [I‑D.conex‑concepts‑uses] (Briscoe, B., Woundy, R., Moncaster, T., and J. Leslie, “ConEx Concepts and Use Cases,” July 2010.).
Once we have some example use cases we can evaluate different encoding schemes. Since these schemes are likely to include some conflated code points, some information will be lost resulting in weakening or disabling some of the algorithms and eliminating some use cases.
The goal of this approach is to be as complete as possible for discovering the potential usage and capabilities of the ConEx protocol, so we have some hope of making optimal design decisions when choosing the encoding.
TOC |
As an aid to the reader, it might be helpful to describe a naïve strawman encoding of the ConEx protocol described solely in terms of TCP: set the Reserved bit in the IPv4 header (bit 48 counting from zero [RFC0791] (Postel, J., “Internet Protocol,” September 1981.)—aka the "evil bit" [RFC3514] (Bellovin, S., “The Security Flag in the IPv4 Header,” April 2003.)) on all retransmissions or once per ECN signaled window reduction. Clearly network devices along the forward path can see this bit and act on it. For example they can count marked and unmarked packets to estimate the congestion levels along the path.
However, the IESG has chartered the ConEx working group to establish that there is sufficient demand for an IPv6 ConEx protocol before using the last available bit in the IPv4 header. Furthermore this encoding, by itself, does not sufficiently support partial deployment or strong auditing and might motivate users and/or applications to misrepresent the congestion that they are causing.
Nonetheless, this strawman encoding does present a clear mental model of how the ConEx protocol might function under various uses.
TOC |
Ideally ConEx and ECN are orthogonal signals and SHOULD be entirely independent. However, given the limited number of header bit and/or code points, these signals may have to share code points, at least partially.
The re-ECN specification [I‑D.briscoe‑tsvwg‑re‑ecn‑tcp] (Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, “Re-ECN: Adding Accountability for Causing Congestion to TCP/IP,” October 2010.) presents an implementation of ConEx that is tightly integrated with the encoding of ECN in the IP header. The central theme of this work is an audit mechanism that can provide sufficient disincentives against misrepresenting congestion [I‑D.briscoe‑tsvwg‑re‑ecn‑motiv] (Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, “Re-ECN: A Framework for adding Congestion Accountability to TCP/IP,” September 2009.), which is analyzed extensively in Briscoe's PhD dissertation [Refb‑dis] (Briscoe, B., “Re-feedback: Freedom with Accountability for Causing Congestion in a Connectionless Internetwork,” 2009.).
Re-ECN is a good example of one chosen set of compromises attempting to meet the requirements of Section 2 (Requirements for the ConEx Signal). However, the present document takes a step back, aiming to state the ideal requirements in order to allow the Internet community to assess whether other compromises are possible.
In particular, different incremental deployment choices may be desirable to meet the partial deployment requirement of Section 2 (Requirements for the ConEx Signal). Re-ECN requires the receiver to be at least ECN-capable as well as requiring an update to the sender. Although ConEx will inherently require change at the sender, it would be preferable if it could work, even partially, with any receiver.
The chosen ConEx protocol certainly must not require ECN to be deployed in any network. In this respect re-ECN is already a good example—it acts perfectly well as a loss-based ConEx protocol it the loss-based audit techniques in Section 4.3 (Audit) are used. However, it would still be desirable to avoid the dependence on an ECN receiver.
For a tutorial background on Re-Feedback techniques, see [Re‑fb (Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., Salvatori, A., Soppera, A., and M. Koyabe, “Policing Congestion Response in an Internetwork Using Re-Feedback,” August 2005.), FairerFaster (Briscoe, B., “A Fairer, Faster Internet Protocol,” December 2008.)].
TOC |
Although the re-ECN protocol requires no changes to the network side of the ECN protocol, it is important to note that it does propose some relatively minor modifications to the host-to-host aspects of the ECN protocol specified in RFC 3168. They include: redefining the ECT(1) code point (the change is consistent with RFC3168 but requires deprecating the experimental ECN nonce [RFC3540] (Spring, N., Wetherall, D., and D. Ely, “Robust Explicit Congestion Notification (ECN) Signaling with Nonces,” June 2003.)); modifications to the ECN negotiations carried on the SYN and SYN-ACK; and using a different state machine to carry ECN signals in the transport acknowledgments from the Receiver to the Sender. This last change permits the transport protocol to carry multiple congestion signals per round trip, and greatly simplifies accurate auditing.
All of these adjustments to RFC 3168 may also be needed in a future standardized ConEx protocol. There will need to be very careful consideration of any proposed changes to ECN or other existing protocols, because any such changes increase the cost of deployment.
TOC |
The ConEx protocol could take one of two different encodings: independently settable bits or an enumerated set of mutually exclusive codepoints.
In both cases, the amount of congestion is signaled by the volume of marked data—just as the volume of lost data or ECN marked data signals the amount of congestion experienced. Thus the size of each packet carrying a ConEx Signal is significant.
TOC |
This encoding involves flag bits, each of which the sender can set independently to indicate to the network one of the following four signals:
- ConEx (Not-ConEx)
- The transport is (or is not) using ConEx with this packet (the protocol MUST be arranged so that legacy transport senders implicitly send Not-ConEx)
- Re-Echo-Loss (Not-Re-Echo-Loss)
- The transport has (or has not) experienced a loss
- Re-Echo-ECN (Not-Re-Echo-ECN)
- The transport has (or has not) experienced ECN signaled congestion
- Credit (Not-Credit)
- The transport is (or is not) building up congestion credit (see Section 4.3 (Audit) on audit devices)
TOC |
This encoding involves signaling one of the following five codepoints:
ENUM {Not-ConEx, ConEx, Re-Echo-Loss, Re-Echo-ECN, Credit}
Each named codepoint has the same meaning as in the encoding using independent bits (Section 3.3.1 (Independent Bits)). The use of any one codepoint implies the negative of all the others, except the last three codepoints (Re-Echo-Loss, Re-Echo-ECN and Credit) obviously also imply ConEx is supported.
Inherently, the semantics of most of the enumerated codepoints are mutually exclusive. 'Credit' is the only one that might need to be used in combination with either Re-Echo-Loss or Re-Echo-ECN, but even that requirement is questionable. It must not be forgotten that the enumerated encoding loses the flexibility to signal these two combinations, whereas the encoding with four independent bits is not so limited. Alternatively two extra codepoints could be assigned to these two combinations of semantics.
TOC |
{ToDo: Picture of the components, similar to that in the last slideset about conex-concepts-uses?}
TOC |
The sending transport needs to be modified to send Congestion Exposure Signals in response to congestion feedback signals.
TOC |
The receiving transport may already feedback sufficiently useful signals to the sender so that it does not need to be altered.
However, a TCP receiver feeds back ECN congestion signals no more than once within a round trip. The sender may require more precise feedback from the receiver otherwise it will appear to be understating its ConEx Signals (see Section 3.2.1 (ECN Changes)).
Ideally, ConEx should be added to a transport like TCP without mandatory modifications to the receiver. But an optional modification to the receiver could be recommended for precision. This was the approach taken when adding re-ECN to TCP [I‑D.briscoe‑tsvwg‑re‑ecn‑tcp] (Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, “Re-ECN: Adding Accountability for Causing Congestion to TCP/IP,” October 2010.).
TOC |
To audit ConEx Signals against actual losses an auditor could use one of the following techniques:
- TCP-specific approach:
- The auditor could monitor TCP flows or aggregates of flows, only holding state on a flow if it first sends a Credit or a Re-Echo-Loss marking. The auditor could detect retransmissions by monitoring sequence numbers. It would assure that (volume of retransmitted data) <= (volume of data marked Re-Echo-Loss). Traffic would only be auditable in this way if it conformed to the standard TCP protocol and the IP payload was not encrypted (e.g. with IPsec).
- Predominant bottleneck approach:
- Unlike the above TCP-specific solution, this technique would work for IP packets carrying any transport layer protocol, and whether encrypted or not. But it only works well for networks designed so that losses predominantly occur under the management of one IP-aware node on the path. The auditor could then be located at this bottleneck. It could simply compare ConEx Signals with actual local losses. Most consumer access networks are design to this model, e.g. the radio network controller (RNC) in a cellular network or the broadband remote access server (BRAS) in a digital subscriber line (DSL) network.
The accuracy of an auditor at one predominant bottleneck might still be sufficient, even if losses occasionally occurred at other nodes in the network (e.g. border gateways). Although the auditor at the predominant bottleneck would not always be able to detect losses at other nodes, transports would not know where losses were occurring either. Therefore any transport would not know which losses it could cheat on without getting caught, and which ones it couldn't.
To audit ConEx Signals against actual ECN markings or losses, the auditor could work as follows: monitor flows or aggregates of flows, only holding state on a flow if it first sends a Credit or either Re-Echo marking. Count the number of bytes marked with Credit or Re-Echo-ECN. Separately count the number of bytes marked with ECN. Use Credits to assure that #ECN<=#Re-Echo-ECN+#Credit, even though the Re-Echo-ECN markings are delayed by at least one RTT.
TOC |
Policy devices are characterised by a need to be configured with a policy related to the users or neighboring networks being served. In contrast, the auditing devices referred to in the previous section primarily enforce compliance with the ConEx protocol and do not need to be configured with any client-specific policy.
TOC |
Note that a congestion policer can be implemented in a very similar way to a bit-rate policer, but its effect is focused solely on traffic causing congestion downstream, not on all traffic just in case it causes congestion.
It monitors all ConEx traffic entering a network, or some identifiable subset. Using ConEx signals, it measures the amount of congestion being caused by this traffic. If this exceeds a policy-configured 'congestion-bit-rate' the congestion policer will limit all the monitored ConEx traffic. A congestion policer can be implemented by a simple token bucket. But unlike a bit-rate policer, it only removes tokens when forwarding packets that a ConEx marked. See [CongPol] (Jacquet, A., Briscoe, B., and T. Moncaster, “Policing Freedom to Use the Internet Resource Pool,” December 2008.) for details.
TOC |
Other policy devices that use ConEx signaling might traffic traffic based on ConEx Signals in much the same way as the monitoring element of a Congestion Policer. But the resulting action could be different. It might re-route traffic or downgrade the class of service.
It might do nothing directly to the traffic, but instead report measurements of ConEx Signals to systems designed to control congestion indirectly. For instance the measurements might be used to trigger penalty clauses in contracts, to levy charges between networks based on congestion or simply to notify customers who cause excessive congestion.
an auditing device only needs to enforce protocol compliance, it does not need to reflect any policy.
TOC |
This memo includes no request to IANA.
Note to RFC Editor: this section may be removed on publication as an RFC.
TOC |
Significant parts of this whole document are about the auditability of ConEx Signals, in particular Section 4.3 (Audit).
TOC |
{ToDo:}
TOC |
This document was improved by review comments from Toby Moncaster.
TOC |
Comments and questions are encouraged and very welcome. They can be addressed to the IETF Congestion Exposure (ConEx) working group mailing list <conex@ietf.org>, and/or to the authors.
TOC |
TOC |
[RFC2119] | Bradner, S., “Key words for use in RFCs to Indicate Requirement Levels,” BCP 14, RFC 2119, March 1997 (TXT, HTML, XML). |
TOC |
[CongPol] | Jacquet, A., Briscoe, B., and T. Moncaster, “Policing Freedom to Use the Internet Resource Pool,” Proc ACM Workshop on Re-Architecting the Internet (ReArch'08) , December 2008 (PDF). |
[FairerFaster] | Briscoe, B., “A Fairer, Faster Internet Protocol,” IEEE Spectrum Dec 2008:38--43, December 2008 (HTML). |
[I-D.briscoe-tsvwg-re-ecn-motiv] | Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, “Re-ECN: A Framework for adding Congestion Accountability to TCP/IP,” draft-briscoe-tsvwg-re-ecn-tcp-motivation-01 (work in progress), September 2009 (TXT). |
[I-D.briscoe-tsvwg-re-ecn-tcp] | Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, “Re-ECN: Adding Accountability for Causing Congestion to TCP/IP,” draft-briscoe-tsvwg-re-ecn-tcp-09 (work in progress), October 2010 (TXT). |
[I-D.conex-concepts-uses] | Briscoe, B., Woundy, R., Moncaster, T., and J. Leslie, “ConEx Concepts and Use Cases,” draft-moncaster-conex-concepts-uses-01 (work in progress), July 2010 (TXT). |
[I-D.ietf-ledbat-congestion] | Shalunov, S. and G. Hazel, “Low Extra Delay Background Transport (LEDBAT),” draft-ietf-ledbat-congestion-02 (work in progress), July 2010 (TXT). |
[I-D.sridharan-tcpm-ctcp] | Sridharan, M., Tan, K., Bansal, D., and D. Thaler, “Compound TCP: A New TCP Congestion Control for High-Speed and Long Distance Networks,” draft-sridharan-tcpm-ctcp-02 (work in progress), November 2008 (TXT). |
[RFC0791] | Postel, J., “Internet Protocol,” STD 5, RFC 791, September 1981 (TXT). |
[RFC2309] | Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J., and L. Zhang, “Recommendations on Queue Management and Congestion Avoidance in the Internet,” RFC 2309, April 1998 (TXT, HTML, XML). |
[RFC3168] | Ramakrishnan, K., Floyd, S., and D. Black, “The Addition of Explicit Congestion Notification (ECN) to IP,” RFC 3168, September 2001 (TXT). |
[RFC3514] | Bellovin, S., “The Security Flag in the IPv4 Header,” RFC 3514, April 2003 (TXT). |
[RFC3540] | Spring, N., Wetherall, D., and D. Ely, “Robust Explicit Congestion Notification (ECN) Signaling with Nonces,” RFC 3540, June 2003 (TXT). |
[RFC3550] | Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, “RTP: A Transport Protocol for Real-Time Applications,” STD 64, RFC 3550, July 2003 (TXT, PS, PDF). |
[RFC5670] | Eardley, P., “Metering and Marking Behaviour of PCN-Nodes,” RFC 5670, November 2009 (TXT). |
[RFC5681] | Allman, M., Paxson, V., and E. Blanton, “TCP Congestion Control,” RFC 5681, September 2009 (TXT). |
[Re-fb] | Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C., Salvatori, A., Soppera, A., and M. Koyabe, “Policing Congestion Response in an Internetwork Using Re-Feedback,” ACM SIGCOMM CCR 35(4)277--288, August 2005 (PDF). |
[Refb-dis] | Briscoe, B., “Re-feedback: Freedom with Accountability for Causing Congestion in a Connectionless Internetwork,” UCL PhD Dissertation , 2009 (PDF). |
[Vegas] | Brakmo, L. and L. Peterson, “TCP Vegas: End-to-End Congestion Avoidance on a Global Internet,” IEEE Journal on Selected Areas in Communications 13(8)1465--80, October 1995 (PDF). |
TOC |
Matt Mathis | |
Phone: | |
Fax: | |
EMail: | mattmathis at google.com |
URI: | |
Bob Briscoe | |
BT | |
B54/77, Adastral Park | |
Martlesham Heath | |
Ipswich IP5 3RE | |
UK | |
Phone: | +44 1473 645196 |
EMail: | bob.briscoe@bt.com |
URI: | http://bobbriscoe.net/ |