Congestion Exposure (ConEx) M. Mathis Working Group Google Internet-Draft October 14, 2010 Intended status: Informational Expires: April 17, 2011 ConEx Concepts and Abstract Mechanism draft-mathis-conex-abstract-mech-00a Abstract This document describes and abstract mechanism by which senders inform the network about the congestion encountered by previous packets on the same flow. Today, the network may signal congestion by ECN markings or by dropping packets, and the receiver passes this information back to the sender in transport-layer acknowledgments. The mechanism to be developed by the CONEX WG will enable the sender to also relay the congestion information back into the network in- band at the IP layer, such that the total level of congestion is visible to all IP devices along the path, from where it could, for example, be provided as input to traffic management. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on April 17, 2011. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of Mathis Expires April 17, 2011 [Page 1] Internet-Draft ConEx Concepts and Abstract Mechanism October 2010 publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . . 4 2. Requirements for the Congestion Exposure Signal . . . . . . . . 4 3. Representing Congestion Exposure . . . . . . . . . . . . . . . 5 3.1. One Simple Encoding . . . . . . . . . . . . . . . . . . . . 6 3.2. ECN Based Encoding . . . . . . . . . . . . . . . . . . . . 6 3.2.1. ECN Changes . . . . . . . . . . . . . . . . . . . . . . 7 3.3. Abstract Encoding . . . . . . . . . . . . . . . . . . . . . 7 3.3.1. Separate Bits . . . . . . . . . . . . . . . . . . . . . 7 3.3.2. Enumerated Encoding . . . . . . . . . . . . . . . . . . 8 4. Congestion Exposure Components . . . . . . . . . . . . . . . . 8 4.1. Modified Senders . . . . . . . . . . . . . . . . . . . . . 8 4.2. Policy Devices . . . . . . . . . . . . . . . . . . . . . . 8 4.2.1. Audit . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.2.2. Policers and Shapers . . . . . . . . . . . . . . . . . 8 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . . 8 6. Security Considerations . . . . . . . . . . . . . . . . . . . . 8 7. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 9 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 9. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 9 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 10.1. Normative References . . . . . . . . . . . . . . . . . . . 9 10.2. Informative References . . . . . . . . . . . . . . . . . . 9 Mathis Expires April 17, 2011 [Page 2] Internet-Draft ConEx Concepts and Abstract Mechanism October 2010 1. Introduction One of the required functions of a transport protocol is controlling congestion in the network. There are three techniques in use today for signaling congestion: o The most common congestion signal is packet loss. When congested, the network simply discards some packets either as part of an explicit control function [RFC2309] or as the consequence of a queue overflow or other resource starvation. The transport receiver detects that some data is missing and signals such through transport acknowledgments to the transport sender (e.g. TCP SACK options). The sender retransmits the missing data (if a reliable protocol) and then performs the mandatory congestion control adjustment [RFC5681]. o Some experimental transport protocols and TCP variants [Vegas, I-D.ietf-ledbat-congestion...] sense queuing delays in the network before the network itself signals congestion. From the perspective of this document, these algorithm and related techniques prevent congestion, therefore they are out of scope and are not discussed further in this document. o With Explicit Congestion Notification (ECN) [RFC3168], network devices explicitly indicate congestion by setting ECN bits in the IP header. The transport receiver communicates these signals back to the sender, which then performs the mandatory congestion control adjustment. In all cases the congestion signals follow the route indicated in Figure 1. A congested network device sends a signal in the data stream on the forward path to the transport receiver, the receiver passes it back to the sender through transport level acknowledgments, and the sender makes some congestion control adjustment. This document proposes to extend the capabilities of the Internet suite with the addition of a Congestion Exposure Signal that relays the congestion information from the Transport Sender back through the network layer. That signal is shown Figure 1. It would be visible to all network layer devices along the forward (data) path and is intended to support a number of new policy mechanism that might be used to manage traffic. Mathis Expires April 17, 2011 [Page 3] Internet-Draft ConEx Concepts and Abstract Mechanism October 2010 1234567890123456789012345678901234567890123456789012345678901234567890 ----------- ------------- ----------- | | |(Congested)| | | | |>==Data=Path=>| Network |>=====Data=Path=====>| | |Transport| | Device |>-Congestion-Signal->|Transport| | Sender | ------------- | Receiver| | | | | | |<====ACK=Path==================================<| | | |<---Transport Layer returned Congestion Signal-<| | | | | | | |>-(new)-IP layer Congestion Exposure Signal---->| | ----------- (Carried in Data Packets) ----------- Not shown are policy devices along the data path that observe the Congestion Exposure Signal, and use the information to monitor or manage traffic. These are discussed in Section 4.2. Figure 1 1.1. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. 2. Requirements for the Congestion Exposure Signal a. The Congestion Exposure Signal must be visible to the network layer along the entire path from the transport sender to the transport receiver. Equivalently, it must be present in the IPv4 or IPv6 header. A corollary of this is that existing (legacy) networking gear must at the very minimum pass the Congestion Exposures Signal without modification. b. The Congestion Exposure Signal must be useful under only partial deployment. A minimal deployment must only require changes to the transport senders. Furthermore, partial deployment should create incentives for additional deployment, both in terms of enabling Congestion Exposure on more devices and adding richer features to existing devices. It is anticipated that ConEx deployment will be asymptotic, and some residual class of hosts and network equipment will never fully support the Congestion Exposure Protocol. c. The Congestion Exposure Signal must be timely and accurate. It must not be delayed by significantly more than one RTT from the congestion event which triggered the signal. There must be Mathis Expires April 17, 2011 [Page 4] Internet-Draft ConEx Concepts and Abstract Mechanism October 2010 techniques to audit the Congestion Exposure Signal by comparing it to the actual congestion signals on the forward data path. The auditing mechanism must have a capability for providing strong disincentives for miss-reporting congestion, such as by throttling traffic that reports less congestion than it is actually experiencing. It is important to note that the auditing requirement implies a number of additional constraints: The basic auditing technique is to count both congestion signals and Congestion Exposure Signals someplace along the data path. For congestion signaled by ECN, this is most accurate when done near the transport receiver. The total number of ECN marks seen near the receiver should always be equal to or less than the number of Congestion Exposure Signals seen one RTT later. Auditing loss based Congestion Exposure can most easily be implemented near the sender, since down stream losses appear as duplicate data for all reliable protocols (and duplicate sequence numbers for TCP). The auditor can detect losses by observing both the original transmission and the retransmission after the loss. (This method does assume that IPsec is not in use). Given that loss based and ECN based Congestion Exposure are best audited at different locations, it is likely that they will need to have distinct encodings. In addition the simplest mechanism to address the one RTT delay between the congestion event and the Congestion Exposure Signal is to pre-mark some packets with a special Congestion Exposure credit prior any true congestion marks. This technique is described in more detail in Section 4.2.1. 3. Representing Congestion Exposure Most protocol specifications start with a description of packet formats and code points with their associated meanings. This document does not: It is already known that choosing the encoding for the Congestion Exposure Signal is likely to entail some engineering compromises that have the potential to reduce the protocol's usefulness in some settings. Rather than making these engineering choices prematurely, this document side steps the encoding problem by describing an abstract representation of Congestion Exposure Signal. All of the elements of the protocol can be defined in terms of this abstract representation. Most important, the preliminary use cases for the protocol are described in terms of the abstract representation in companion documents. Once we have some example use cases we can evaluate different encoding schemes. Since theses schemes are likely to include some Mathis Expires April 17, 2011 [Page 5] Internet-Draft ConEx Concepts and Abstract Mechanism October 2010 conflated code points, some information will be lost resulting in weakening or disabling some of the algorithms and eliminating some use cases. The goal of this approach is to be as complete as possible for discovering the potential usage and capabilities of the Congestion Exposure protocol, so we have some hope of of making optimal design decisions when choosing the encoding. 3.1. One Simple Encoding As an aid to the reader, it might be helpful to describe one simple encoding of the Congestion Exposure protocol: set IPv4 header bit 48 (aka the "evil bit" [RFC3514]) on all retransmissions or once per ECN signaled window reduction. Clearly network devices along the forward path can see this bit and act on it. For example they can count marked and unmarked packets to estimate the congestion levels along the path. However this encoding has been forbidden by RFC xxxx, which seeks to preserve the last unallocated bit in the IPv4 header for some unspecifed future use. Furthermore this encoding, by itself, does not sufficiently support partial deployment or strong auditing and might motivate users and/or applications to misrepresent the congestion that they are be causing. However, this simple encoding does present a clear mental model of how the Congestion Exposure protocol functions and is very useful for conducting thought experiments about how the protocol might function under various uses. 3.2. ECN Based Encoding Bob Briscoe's PhD thesis [Refb-dis], and many derivative works including RE-ECN [I-D.briscoe-tsvwg-re-ecn-tcp] present an ECN based implementation of ConEx. The central theme of this work includes strong disincentives for misrepresenting congestion [I-D.briscoe-tsvwg-re-ecn-motiv]. However, it also pre-supposes the full deployment of ECN, and does not adequately signal congestion indicated by packet loss. Furthermore, given that after 10 years ECN still has not been widely deployed, it does not seem prudent to require its deployment as a prerequisite for deploying a Congestion Exposure protocol. As it currently stands, this work fails to meet the "partial deployment" requirement described above in section Section 2. Mathis Expires April 17, 2011 [Page 6] Internet-Draft ConEx Concepts and Abstract Mechanism October 2010 For a tutorial background on Re-Feedback techniques, see [,,] {Bob: Matt, What did you have in mind here? SIGCOMM'05 paper? IEEE Spectrum article? Re-ECN Web page?}. 3.2.1. ECN Changes It is important to note that Briscoe's work proposes some relatively minor modifications to the ECN protocol specified in RFC 3168. They include: redefining the ECT(0) and ECT(1) code points (this is consistent with RFC3168 but requires deprecating [RFC3540]); permitting routers to send ECN signals at a different threshold than packet loss; modifications to the ECN negotiations carried on the SYN and SYN-ACK; and using a different state machine to carry ECN signals in the transport acknowledgments from the Receiver to the Sender. This later change permits the transport protocol to carry multiple congestion signals per round trip, and greatly simplifies accurate auditing. All of these adjustments to RFC 3168 may also be needed in a future standardized Congestion Exposure protocol. There will be very careful considerations about any proposed changes to ECN or other existing protocols, because any such changes increase the cost of deployment. 3.3. Abstract Encoding {ToDo: Not really done, extra terse} Model with two different encodings: individual bits or as an enumerated set. Enumerated encoding is probably good enough for most purposes, but it must not be forgotten that it does lose some small amount of information. 3.3.1. Separate Bits One bit each for o Not supported (implicit signal from legacy transport senders) o Congestion indicated by packet losses o ECN signaled congestion o Pre-congestion credit (AKA green). See Section 4.2.1 devices below. Mathis Expires April 17, 2011 [Page 7] Internet-Draft ConEx Concepts and Abstract Mechanism October 2010 3.3.2. Enumerated Encoding For enumerated encoding some marks must be delayed such that each packet only carries at most one mark. ENUM {Not_Supported, No_Mark, Black_ECN, Black_Loss, Green} 4. Congestion Exposure Components 4.1. Modified Senders Send Congestion Exposure Signals per congestion signals. 4.2. Policy Devices 4.2.1. Audit For loss: detect retransmissions by monitoring sequence numbers. Assure that #retransmissions<=#Black_Loss (May need to include a fudge factor, because it would be more robust to mark the packet after a retransmission. Otherwise network devices that discard marked packets will cause connectivity failures, rather than poor performance). For ECN: count Congestion Exposure Signals and ECN. Would normally need to delay ECN by one RTT to avoid false positives. Alternative: use Green (pre-credits) to assure that #ECN<=#Black_ECN+#GREEN, even though the #Black_ECN is delayed by one RTT. 4.2.2. Policers and Shapers {ToDo: Beware these terms are defined differently than the conventional usage.} {ToDo: Abridge from existing doc?} 5. IANA Considerations This memo includes no request to IANA. Note to RFC Editor: this section may be removed on publication as an RFC. 6. Security Considerations {ToDo:} Mathis Expires April 17, 2011 [Page 8] Internet-Draft ConEx Concepts and Abstract Mechanism October 2010 7. Conclusions {ToDo:} 8. Acknowledgements {ToDo:} 9. Comments Solicited Comments and questions are encouraged and very welcome. They can be addressed to the IETF Congestion Exposure (ConEx) working group mailing list , and/or to the authors. 10. References 10.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. 10.2. Informative References [I-D.briscoe-tsvwg-re-ecn-motiv] Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, "Re- ECN: A Framework for adding Congestion Accountability to TCP/IP", draft-briscoe-tsvwg-re- ecn-tcp-motivation-01 (work in progress), September 2009. [I-D.briscoe-tsvwg-re-ecn-tcp] Briscoe, B., Jacquet, A., Moncaster, T., and A. Smith, "Re- ECN: Adding Accountability for Causing Congestion to TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-08 (work in progress), September 2009. [I-D.ietf-ledbat-congestion] Shalunov, S. and G. Hazel, "Low Extra Delay Background Transport (LEDBAT)", draft-ietf-ledbat-congestion-02 (work in progress), July 2010. [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, S., Estrin, Mathis Expires April 17, 2011 [Page 9] Internet-Draft ConEx Concepts and Abstract Mechanism October 2010 D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, S., Wroclawski, J., and L. Zhang, "Recommendations on Queue Management and Congestion Avoidance in the Internet", RFC 2309, April 1998. [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001. [RFC3514] Bellovin, S., "The Security Flag in the IPv4 Header", RFC 3514, April 2003. [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit Congestion Notification (ECN) Signaling with Nonces", RFC 3540, June 2003. [RFC5681] Allman, M., Paxson, V., and E. Blanton, "TCP Congestion Control", RFC 5681, September 2009. [Refb-dis] Briscoe, B., "Re-feedback: Freedom with Accountability for Causing Congestion in a Connectionless Internetwork", UCL PhD Dissertation , 2009, . [Vegas] Brakmo, L. and L. Peterson, "TCP Vegas: End-to-End Congestion Avoidance on a Global Internet", IEEE Journal on Selected Areas in Communications 13(8)1465--80, October 1995, . Mathis Expires April 17, 2011 [Page 10] Internet-Draft ConEx Concepts and Abstract Mechanism October 2010 Author's Address Matt Mathis Google Phone: Fax: EMail: mattmathis at google.com URI: Mathis Expires April 17, 2011 [Page 11]