Congestion Exposure (ConEx)                                    M. Mathis
Working Group                                                     Google
Internet-Draft                                                B. Briscoe
Intended status: Informational                                        BT
Expires: April 17, 2011                                 October 14, 2010


      Congestion Exposure (ConEx) Concepts and Abstract Mechanism
                  draft-mathis-conex-abstract-mech-00b

Abstract

   This document describes an abstract mechanism by which senders inform
   the network about the congestion encountered by packets earlier in
   the same flow.  Today, the network may signal congestion to the
   receiver by ECN markings or by dropping packets, and the receiver may
   pass this information back to the sender in transport-layer feedback.
   The mechanism to be developed by the ConEx WG will enable the sender
   to also relay this congestion information back into the network in-
   band at the IP layer, such that the total level of congestion is
   visible to all IP devices along the path, from where it could, for
   example, be provided as input to traffic management.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on April 17, 2011.

Copyright Notice

   Copyright (c) 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of



Mathis & Briscoe         Expires April 17, 2011                 [Page 1]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
     1.1.  Terminology  . . . . . . . . . . . . . . . . . . . . . . .  4
   2.  Requirements for the Congestion Exposure Signal  . . . . . . .  5
   3.  Representing Congestion Exposure . . . . . . . . . . . . . . .  7
     3.1.  One Simple Encoding  . . . . . . . . . . . . . . . . . . .  7
     3.2.  ECN Based Encoding . . . . . . . . . . . . . . . . . . . .  8
       3.2.1.  ECN Changes  . . . . . . . . . . . . . . . . . . . . .  8
     3.3.  Abstract Encoding  . . . . . . . . . . . . . . . . . . . .  9
       3.3.1.  Separate Bits  . . . . . . . . . . . . . . . . . . . .  9
       3.3.2.  Enumerated Encoding  . . . . . . . . . . . . . . . . .  9
   4.  Congestion Exposure Components . . . . . . . . . . . . . . . .  9
     4.1.  Modified Senders . . . . . . . . . . . . . . . . . . . . .  9
     4.2.  Policy Devices . . . . . . . . . . . . . . . . . . . . . .  9
       4.2.1.  Audit  . . . . . . . . . . . . . . . . . . . . . . . .  9
       4.2.2.  Policers and Shapers . . . . . . . . . . . . . . . . . 10
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 10
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 10
   7.  Conclusions  . . . . . . . . . . . . . . . . . . . . . . . . . 10
   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 10
   9.  Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 10
   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 10
     10.1. Normative References . . . . . . . . . . . . . . . . . . . 10
     10.2. Informative References . . . . . . . . . . . . . . . . . . 11



















Mathis & Briscoe         Expires April 17, 2011                 [Page 2]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


1.  Introduction

   One of the required functions of a transport protocol is controlling
   congestion in the network.  There are three techniques in use today
   for the network to signal congestion to a transport:

   o  The most common congestion signal is packet loss.  When congested,
      the network simply discards some packets either as part of an
      explicit control function [RFC2309] or as the consequence of a
      queue overflow or other resource starvation.  The transport
      receiver detects that some data is missing and signals such
      through transport acknowledgments to the transport sender (e.g.
      TCP SACK options).  The sender performs the appropriate congestion
      control rate reduction (e.g.  [RFC5681] for TCP) and, if it is a
      reliable transport, it retransmits the missing data.

   o  If the transport supports explicit congestion notification (ECN)
      [RFC3168] or pre-congestion notification (PCN) [RFC5670] , the
      transport sender indicates this by setting an ECN-capable
      transport (ECT) codepoint in every packet.  Network devices can
      then explicitly signal congestion to the receiver by setting ECN
      bits in the IP header of such packets.  The transport receiver
      communicates these ECN signals back to the sender, which then
      performs the appropriate congestion control rate reduction.

   o  Some experimental transport protocols and TCP variants [Vegas]
      sense queuing delays in the network and reduce their rate before
      the network has to signal congestion using loss or ECN.  A purely
      delay-sensing transport will tend to be pushed out by other
      competing transports that do not back off until they have driven
      the queue into loss.  Therefore, modern delay-sensing algorithms
      use delay in some combination with loss to signal congestion (e.g.
      LEDBAT [I-D.ietf-ledbat-congestion], Compound
      [I-D.sridharan-tcpm-ctcp]).  In the rest of this document, we will
      confine the discussion to concrete signals of congestion such as
      loss and ECN.  We will not discuss delay-sensing further, because
      it can only avoid these more concrete signals of congestion in
      some circumstances.

   In all cases the congestion signals follow the route indicated in
   Figure 1.  A congested network device sends a signal in the data
   stream on the forward path to the transport receiver, the receiver
   passes it back to the sender through transport level feedback, and
   the sender makes some congestion control adjustment.

   This document proposes to extend the capabilities of the Internet
   protocol suite with the addition of a Congestion Exposure Signal
   that, to a first approximation, relays the congestion information



Mathis & Briscoe         Expires April 17, 2011                 [Page 3]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


   from the transport sender back through the internetwork layer.  That
   signal is shown in Figure 1.  It would be visible to all internetwork
   layer devices along the forward (data) path and is intended to
   support a number of new policy-controlled mechanisms that might be
   used to manage traffic.
   123456789012345678901234567890123456789012345678901234567890123456789
   +---------+                                               +---------+
   |         |<==Feedback Path==============================<|         |
   |         |<--Transport Layer returned Congestion Signal-<|         |
   |         |                                               |         |
   |Transport|                                               |Transport|
   | Sender  |>-(new)-IP layer Congestion Exposure Signal--->| Receiver|
   |         |        (Carried in Data Packet Headers)       |         |
   |         |             +-----------+                     |         |
   |         |>=Data=Path=>|(Congested)|>=====Data=Path=====>|         |
   |         |             |  Network  |>-Congestion-Signal->|         |
   |         |             |   Device  |                     |         |
   +---------+             +-----------+                     +---------+


   Not shown are policy devices along the data path that observe the
   Congestion Exposure Signal, and use the information to monitor or
   manage traffic.  These are discussed in Section 4.2.

                                 Figure 1

1.1.  Terminology

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119 [RFC2119].

   ConEx signals in IP packet headers from the sender to the network
   {ToDo: These are placeholders for whatever words we decide to use}:

   Re-Echo Loss  (aka Black-Loss) The transport has experienced a loss.

   Re-Echo ECN  (aka Black-ECN) The transport has experienced an ECN
      mark

   Pre-Echo  (aka Green) The transport is building up credit to allow
      for any future delay in expected ConEx signals

   Neutral  (aka Grey) The transport is ConEx-capable







Mathis & Briscoe         Expires April 17, 2011                 [Page 4]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


   Not-ConEx  (aka White) The transport is not ConEx-capable

2.  Requirements for the Congestion Exposure Signal

   a.  The Congestion Exposure Signal SHOULD be visible to internetwork
       layer devices along the entire path from the transport sender to
       the transport receiver.  Equivalently, it SHOULD be present in
       the IPv4 or IPv6 header, and in the outermost IP header if using
       IP in IP tunnelling.  The Congestion Exposure Signal SHOULD be
       immutable once set by the transport sender.  A corollary of these
       requirements is that existing (legacy) networking gear SHOULD
       pass the Congestion Exposure Signal silently without
       modification.

   b.  The Congestion Exposure Signal SHOULD be useful under only
       partial deployment.  A minimal deployment SHOULD only require
       changes to transport senders.  Furthermore, partial deployment
       SHOULD create incentives for additional deployment, both in terms
       of enabling Congestion Exposure on more devices and adding richer
       features to existing devices.  Nonetheless, ConEx deployment need
       never be universal, and it is anticipated that some hosts and
       some transports may never support the Congestion Exposure
       Protocol and some networks may never use the Congestion Exposure
       Signals.

   c.  The Congestion Exposure Signal SHOULD be accurate.  In
       potentially hostile environments such as the public Internet, it
       SHOULD be possible for techniques to be deployed to audit the
       Congestion Exposure Signal by comparing it to the actual
       congestion signals on the forward data path.  The auditing
       mechanism must have a capability for providing sufficient
       disincentives against misreported congestion, such as by
       throttling traffic that reports less congestion than it is
       actually experiencing.

   d.  The Congestion Exposure Signal SHOULD be timely.  There will be a
       delay between the time when an auditing device sees an actual
       congestion signal and when it sees the subsequent Congestion
       Exposure Signal from the sender.  The minimum delay will be one
       round trip, but it may be much longer depending on the
       transport's choice of feedback delay (consider RTCP [RFC3550] for
       example).  It is not practical to expect auditing devices in the
       network to make allowance for such feedback delays.  Instead, the
       sender MUST be able to send Congestion Exposure signals in
       advance, as 'credit' for any audit device to hold as a balance
       against the risk of congestion during the feedback delay.  This
       design choice simplifies auditing devices and correctly makes the
       transport responsible for both minimising feedback delay and



Mathis & Briscoe         Expires April 17, 2011                 [Page 5]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


       minimising sharp increases in packets in flight that would risk
       causing excessive congestion to others.  This issue is discussed
       in more detail in Section 4.2.1.

   It is important to note that the auditing requirement implies a
   number of additional constraints: The basic auditing technique is to
   count both actual congestion signals and Congestion Exposure Signals
   someplace along the data path:

   o  For congestion signaled by ECN, auditing is most accurate when
      located near the transport receiver.  Within any flow or aggregate
      of flows, the total volume of ECN marked data seen near the
      receiver should always be equal to or less than the volume of data
      tagged with Congestion Exposure Signals.

   o  For congestion signaled by loss, totally accurate auditing is not
      believed to be possible in the general case, because it involves a
      network node detecting the absence of some packets, when it cannot
      necessarily see the transport protocol sequence numbers and when
      the missing packets might simply be taking a different route.  But
      there are common cases where sufficient audit accuracy should be
      possible:

      *  For non-IPsec traffic conforming to standard TCP sequence
         numbering on a single path, the auditor could detect losses by
         observing both the original transmission and the retransmission
         after the loss.  Such auditing would be most accurate near the
         sender.

      *  For networks designed so that losses predominantly occur under
         the management of one IP-aware node on the path, the auditor
         could be located at this bottleneck.  It could simply compare
         Congestion Exposure Signals with actual local losses.  Most
         consumer access networks are design to this model, e.g. the
         radio network controller (RNC) in a cellular network or the
         broadband remote access server (BRAS) in a digital subscriber
         line (DSL) network.  Unlike the above TCP-specific solution,
         this would work for IP packets carrying any transport layer
         protocol, and whether encrypted or not.

         The accuracy of an auditor at one predominant bottleneck might
         still be sufficient, even if losses occasionally occurred at
         other nodes in the network (e.g. border gateways).  Although
         the auditor at the predominant bottleneck would not always be
         able to detect losses at other nodes, transports would not know
         where losses were occurring either.  Therefore any transport
         would not know which losses it could cheat on without getting
         caught, and which ones it couldn't.



Mathis & Briscoe         Expires April 17, 2011                 [Page 6]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


   Given that loss-based and ECN-based Congestion Exposure might
   sometimes be best audited at different locations, have distinct
   encodings would widen the design space for the auditing function.

   {Bob: Got to here making suggested changes.}

3.  Representing Congestion Exposure

   Most protocol specifications start with a description of packet
   formats and code points with their associated meanings.  This
   document does not: It is already known that choosing the encoding for
   the Congestion Exposure Signal is likely to entail some engineering
   compromises that have the potential to reduce the protocol's
   usefulness in some settings.  Rather than making these engineering
   choices prematurely, this document side steps the encoding problem by
   describing an abstract representation of Congestion Exposure Signal.
   All of the elements of the protocol can be defined in terms of this
   abstract representation.  Most important, the preliminary use cases
   for the protocol are described in terms of the abstract
   representation in companion documents.

   Once we have some example use cases we can evaluate different
   encoding schemes.  Since these schemes are likely to include some
   conflated code points, some information will be lost resulting in
   weakening or disabling some of the algorithms and eliminating some
   use cases.

   The goal of this approach is to be as complete as possible for
   discovering the potential usage and capabilities of the Congestion
   Exposure protocol, so we have some hope of making optimal design
   decisions when choosing the encoding.

3.1.  One Simple Encoding

   As an aid to the reader, it might be helpful to describe one simple
   encoding of the Congestion Exposure protocol: set IPv4 header bit 48
   (aka the "evil bit" [RFC3514]) on all retransmissions or once per ECN
   signaled window reduction.  Clearly network devices along the forward
   path can see this bit and act on it.  For example they can count
   marked and unmarked packets to estimate the congestion levels along
   the path.

   However this encoding has been forbidden by RFC xxxx, which seeks to
   preserve the last unallocated bit in the IPv4 header for some
   unspecifed future use.

   Furthermore this encoding, by itself, does not sufficiently support
   partial deployment or strong auditing and might motivate users and/or



Mathis & Briscoe         Expires April 17, 2011                 [Page 7]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


   applications to misrepresent the congestion that they are be causing.

   However, this simple encoding does present a clear mental model of
   how the Congestion Exposure protocol functions and is very useful for
   conducting thought experiments about how the protocol might function
   under various uses.

3.2.  ECN Based Encoding

   Bob Briscoe's PhD thesis [Refb-dis], and many derivative works
   including RE-ECN [I-D.briscoe-tsvwg-re-ecn-tcp] present an ECN based
   implementation of ConEx.  The central theme of this work includes
   strong disincentives for misrepresenting congestion
   [I-D.briscoe-tsvwg-re-ecn-motiv].  However, it also pre-supposes the
   full deployment of ECN, and does not adequately signal congestion
   indicated by packet loss.  Furthermore, given that after 10 years ECN
   still has not been widely deployed, it does not seem prudent to
   require its deployment as a prerequisite for deploying a Congestion
   Exposure protocol.

   As it currently stands, this work fails to meet the "partial
   deployment" requirement described above in section Section 2.

   For a tutorial background on Re-Feedback techniques, see [,,] {Bob:
   Matt, What did you have in mind here?  SIGCOMM'05 paper?  IEEE
   Spectrum article?  Re-ECN Web page?}.

3.2.1.  ECN Changes

   It is important to note that Briscoe's work proposes some relatively
   minor modifications to the ECN protocol specified in RFC 3168.  They
   include: redefining the ECT(0) and ECT(1) code points (this is
   consistent with RFC3168 but requires deprecating [RFC3540]);
   permitting routers to send ECN signals at a different threshold than
   packet loss; modifications to the ECN negotiations carried on the SYN
   and SYN-ACK; and using a different state machine to carry ECN signals
   in the transport acknowledgments from the Receiver to the Sender.
   This later change permits the transport protocol to carry multiple
   congestion signals per round trip, and greatly simplifies accurate
   auditing.

   All of these adjustments to RFC 3168 may also be needed in a future
   standardized Congestion Exposure protocol.  There will be very
   careful considerations about any proposed changes to ECN or other
   existing protocols, because any such changes increase the cost of
   deployment.





Mathis & Briscoe         Expires April 17, 2011                 [Page 8]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


3.3.  Abstract Encoding

   {ToDo: Not really done, extra terse}

   Model with two different encodings: individual bits or as an
   enumerated set.  Enumerated encoding is probably good enough for most
   purposes, but it must not be forgotten that it does lose some small
   amount of information.

3.3.1.  Separate Bits

   One bit each for

   o  Not supported (implicit signal from legacy transport senders)

   o  Congestion indicated by packet losses

   o  ECN signaled congestion

   o  Pre-congestion credit (AKA green).  See Section 4.2.1 devices
      below.

3.3.2.  Enumerated Encoding

   For enumerated encoding some marks must be delayed such that each
   packet only carries at most one mark.

   ENUM {Not_Supported, No_Mark, Black_ECN, Black_Loss, Green}

4.  Congestion Exposure Components

4.1.  Modified Senders

   Send Congestion Exposure Signals per congestion signals.

4.2.  Policy Devices

4.2.1.  Audit

   For loss: detect retransmissions by monitoring sequence numbers.
   Assure that #retransmissions<=#Black_Loss

   (May need to include a fudge factor, because it would be more robust
   to mark the packet after a retransmission.  Otherwise network devices
   that discard marked packets will cause connectivity failures, rather
   than poor performance).

   For ECN: count Congestion Exposure Signals and ECN.  Would normally



Mathis & Briscoe         Expires April 17, 2011                 [Page 9]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


   need to delay ECN by one RTT to avoid false positives.  Alternative:
   use Green (pre-credits) to assure that #ECN<=#Black_ECN+#GREEN, even
   though the #Black_ECN is delayed by one RTT.

4.2.2.  Policers and Shapers

   {ToDo: Beware these terms are defined differently than the
   conventional usage.}

   {ToDo: Abridge from existing doc?}

5.  IANA Considerations

   This memo includes no request to IANA.

   Note to RFC Editor: this section may be removed on publication as an
   RFC.

6.  Security Considerations

   {ToDo:}

7.  Conclusions

   {ToDo:}

8.  Acknowledgements

   This document was improved by review comments from Toby Moncaster.

9.  Comments Solicited

   Comments and questions are encouraged and very welcome.  They can be
   addressed to the IETF Congestion Exposure (ConEx) working group
   mailing list <conex@ietf.org>, and/or to the authors.

10.  References

10.1.  Normative References

   [RFC2119]                         Bradner, S., "Key words for use in
                                     RFCs to Indicate Requirement
                                     Levels", BCP 14, RFC 2119,
                                     March 1997.







Mathis & Briscoe         Expires April 17, 2011                [Page 10]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


10.2.  Informative References

   [I-D.briscoe-tsvwg-re-ecn-motiv]  Briscoe, B., Jacquet, A.,
                                     Moncaster, T., and A. Smith, "Re-
                                     ECN: A Framework for adding
                                     Congestion Accountability to
                                     TCP/IP", draft-briscoe-tsvwg-re-
                                     ecn-tcp-motivation-01 (work in
                                     progress), September 2009.

   [I-D.briscoe-tsvwg-re-ecn-tcp]    Briscoe, B., Jacquet, A.,
                                     Moncaster, T., and A. Smith, "Re-
                                     ECN: Adding Accountability for
                                     Causing Congestion to TCP/IP",
                                     draft-briscoe-tsvwg-re-ecn-tcp-08
                                     (work in progress), September 2009.

   [I-D.ietf-ledbat-congestion]      Shalunov, S. and G. Hazel, "Low
                                     Extra Delay Background Transport
                                     (LEDBAT)",
                                     draft-ietf-ledbat-congestion-02
                                     (work in progress), July 2010.

   [I-D.sridharan-tcpm-ctcp]         Sridharan, M., Tan, K., Bansal, D.,
                                     and D. Thaler, "Compound TCP: A New
                                     TCP Congestion Control for High-
                                     Speed and Long Distance  Networks",
                                     draft-sridharan-tcpm-ctcp-02 (work
                                     in progress), November 2008.

   [RFC2309]                         Braden, B., Clark, D., Crowcroft,
                                     J., Davie, B., Deering, S., Estrin,
                                     D., Floyd, S., Jacobson, V.,
                                     Minshall, G., Partridge, C.,
                                     Peterson, L., Ramakrishnan, K.,
                                     Shenker, S., Wroclawski, J., and L.
                                     Zhang, "Recommendations on Queue
                                     Management and Congestion Avoidance
                                     in the Internet", RFC 2309,
                                     April 1998.

   [RFC3168]                         Ramakrishnan, K., Floyd, S., and D.
                                     Black, "The Addition of Explicit
                                     Congestion Notification (ECN) to
                                     IP", RFC 3168, September 2001.

   [RFC3514]                         Bellovin, S., "The Security Flag in
                                     the IPv4 Header", RFC 3514,



Mathis & Briscoe         Expires April 17, 2011                [Page 11]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


                                     April 2003.

   [RFC3540]                         Spring, N., Wetherall, D., and D.
                                     Ely, "Robust Explicit Congestion
                                     Notification (ECN) Signaling with
                                     Nonces", RFC 3540, June 2003.

   [RFC3550]                         Schulzrinne, H., Casner, S.,
                                     Frederick, R., and V. Jacobson,
                                     "RTP: A Transport Protocol for
                                     Real-Time Applications", STD 64,
                                     RFC 3550, July 2003.

   [RFC5670]                         Eardley, P., "Metering and Marking
                                     Behaviour of PCN-Nodes", RFC 5670,
                                     November 2009.

   [RFC5681]                         Allman, M., Paxson, V., and E.
                                     Blanton, "TCP Congestion Control",
                                     RFC 5681, September 2009.

   [Refb-dis]                        Briscoe, B., "Re-feedback: Freedom
                                     with Accountability for Causing
                                     Congestion in a Connectionless
                                     Internetwork", UCL PhD
                                     Dissertation , 2009, <http://
                                     bobbriscoe.net/projects/refb/
                                     index.html#refb-dis>.

   [Vegas]                           Brakmo, L. and L. Peterson, "TCP
                                     Vegas: End-to-End Congestion
                                     Avoidance on a Global Internet",
                                     IEEE Journal on Selected Areas in
                                     Communications 13(8)1465--80,
                                     October 1995, <http://
                                     ieeexplore.ieee.org/iel1/49/9740/
                                     00464716.pdf?arnumber=464716>.














Mathis & Briscoe         Expires April 17, 2011                [Page 12]

Internet-Draft    ConEx Concepts and Abstract Mechanism     October 2010


Authors' Addresses

   Matt Mathis
   Google


   Phone:
   Fax:
   EMail: mattmathis at google.com
   URI:


   Bob Briscoe
   BT
   B54/77, Adastral Park
   Martlesham Heath
   Ipswich  IP5 3RE
   UK

   Phone: +44 1473 645196
   EMail: bob.briscoe@bt.com
   URI:   http://bobbriscoe.net/





























Mathis & Briscoe         Expires April 17, 2011                [Page 13]