<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<!--
I trimmed the clutter, since I think we got everything we agreed on.
# Build using:
mv $DLPATH/$NAME.txt $NAME.tmp
iconv -c -t ASCII//TRANSLIT $NAME.tmp | sed 's/\[[a-z]]//' | awk '{ print $0; }; /<\/rfc>/{ exit; }' > $NAME.xml
xml2rfc $NAME.xml  # BEWARE: overwrites $NAME.txt
# (verbose switch removed)
-->
<!-- Alterations to I-D/RFC boilerplate -->
<?rfc strict="no" ?>
<!-- Default strict="no" Don't check I-D nits -->
<?rfc rfcedstyle="yes" ?>
<!-- IETF process -->
<?rfc ipr="yes" ?> <!-- Matt: Not a problem, as long as all IPR leads to a free licence. -->
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<!-- Default symrefs="no" Don't use anchors, but use numbers for refs -->
<?rfc sortrefs="yes"?>
<!-- Default sortrefs="no" Don't sort references into order -->
<?rfc comments="yes" ?>
<!-- Default comments="no" Don't render comments -->
<?rfc inline="no" ?>
<!-- Default inline="no" if comments is "yes", then render comments inline; otherwise render them in an `Editorial Comments' section -->
<?rfc compact="yes"?>
<?rfc subcompact="yes"?>
<?rfc emoticonic="yes" ?>
<!-- Default emoticonic="no" Doesn't prettify HTML format -->
<rfc ipr="trust200902" category="info" docName="draft-ietf-conex-abstract-mech-01">
 <front>
   <title abbrev="ConEx Concepts and Abstract Mechanism">Congestion Exposure
   (ConEx) Concepts and Abstract Mechanism</title>
   <author fullname="Matt Mathis" initials="M." surname="Mathis">
     <organization>Google, Inc</organization>
     <address>
        <postal>
                  <street>1600 Amphitheater Parkway</street>
                      <city>Mountain View</city> <code>93117</code>  <region>California</region> <country>USA</country>
           </postal>
       <email>mattmathis at google.com</email>
      </address>
   </author>
   <author fullname="Bob Briscoe" initials="B." surname="Briscoe">
     <organization>BT</organization>
     <address>
       <postal>
         <street>B54/77, Adastral Park</street>
         <street>Martlesham Heath</street>
         <city>Ipswich</city>
         <code>IP5 3RE</code>
         <country>UK</country>
       </postal>
       <phone>+44 1473 645196</phone>
       <email>bob.briscoe@bt.com</email>
       <uri>http://bobbriscoe.net/</uri>
     </address>
   </author>
   <date day="14" month="March" year="2011" />
   <area>Transport</area>
   <workgroup>Congestion Exposure (ConEx) Working Group</workgroup>
   <keyword>Quality of Service</keyword>
   <keyword>QoS</keyword>
   <keyword>Congestion Control</keyword>
   <keyword>Signaling</keyword>
   <keyword>Protocol</keyword>
   <keyword>Encoding</keyword>
   <keyword>Audit</keyword>
   <keyword>Policing</keyword>
   <abstract>
     <t>This document describes an abstract mechanism by which senders inform
     the network about the congestion encountered by packets earlier in the
     same flow. Today, the network may signal congestion to the receiver by
     ECN markings or by dropping packets, and the receiver passes this
     information back to the sender in transport-layer feedback. The
     mechanism to be developed by the ConEx WG will enable the sender to also
     relay this congestion information back into the network in-band at the
     IP layer, such that the total level of congestion is visible to all IP
     devices along the path, from where it could, for example, provide 
     input to traffic management.</t>
   </abstract>
 </front>
 <middle>
   <!-- ================================================================ -->
   <section anchor="abstrmech_Introduction" title="Introduction">
     <t>One of the required functions of a transport protocol is controlling
     congestion in the network. There are three techniques in use today for
     the network to signal congestion to a transport:<list style="symbols">
         <t>The most common congestion signal is packet loss. When congested,
         the network simply discards some packets either as part of an
         active queue management function <xref target="RFC2309"></xref> or as the
         consequence of a queue overflow or other resource starvation. The
         transport receiver detects that some data is missing and signals
         such through transport acknowledgments to the transport sender (e.g.
         TCP SACK options). The sender performs the appropriate congestion
         control rate reduction (e.g. <xref target="RFC5681"></xref> for TCP)
         and, if it is a reliable transport, it retransmits the missing
         data.</t>
         <t>If the transport supports explicit congestion notification (ECN)
         <xref target="RFC3168"></xref> or pre-congestion notification (PCN)
         <xref target="RFC5670"></xref> , the transport sender indicates this
         by setting an ECN-capable transport (ECT) codepoint in every packet.
         Network devices can then explicitly signal congestion to the
         receiver by setting ECN bits in the IP header of such packets. The
         transport receiver communicates these ECN signals back to the
         sender, which then performs the appropriate congestion control rate
         reduction.</t>
         <t>Some experimental transport protocols and TCP variants <xref
         format="default" target="Vegas"></xref> sense queuing delays in the
         network and reduce their rate before the network has to signal
         congestion using loss or ECN. A purely delay-sensing transport will
         tend to be pushed out by other competing transports that do not back
         off until they have driven the queue into loss. Therefore, modern
         delay-sensing algorithms use delay in some combination with loss to
         signal congestion (e.g. LEDBAT <xref format="default"
         target="I-D.ietf-ledbat-congestion"></xref>, Compound <xref
         target="I-D.sridharan-tcpm-ctcp"></xref>). In the rest of this
         document, we will confine the discussion to concrete signals of
         congestion such as loss and ECN. We will not discuss delay-sensing
         further, because it can only avoid these more concrete signals of
         congestion in some circumstances.</t>
       </list></t>
     <t>In all cases the congestion signals follow the route indicated in
     <xref target="abstrmech_Fig_ConEx_Placement"></xref>. A congested
     network device sends a signal in the data stream on the forward path to
     the transport receiver, the receiver passes it back to the sender
     through transport level feedback, and the sender makes some congestion
     control adjustment.</t>
     <t>This document proposes to extend the capabilities of the Internet
     protocol suite with the addition of a ConEx Signal that, to a first
     approximation, relays the congestion information from the transport
     sender back through the internetwork layer. That signal is shown in
     <xref target="abstrmech_Fig_ConEx_Placement"></xref>. It would be
     visible to all internetwork layer devices along the forward (data) path
     and is intended to support a number of new policy-controlled mechanisms
     that might be used to manage traffic.</t>
     <t>There is no expectation that internetwork layer devices will do fine-grained congestion control using ConEx information. That is still probably best done at the transport sender. Rather, the network will be able to use ConEx information to do better bulk traffic management, which in turn should incentivize end-system transports to be more careful about congesting others <xref
     target="I-D.conex-concepts-uses"></xref>. </t>
     <figure anchor="abstrmech_Fig_ConEx_Placement">
<!--
123456789012345678901234567890123456789012345678901234567890123456789 -->
       <artwork><![CDATA[
+---------+                                               +---------+
|Transport|             +-----------+                     |Transport|
| Sender  |>=Data=Path=>|(Congested)|>=====Data=Path=====>| Receiver|
|         |             |  Network  |>-Congestion-Signal->|---.     |
|         |             |   Device  |                     |   |     |
|         |             +-----------+                     |   |     |
|         |                                               |   |     |
|         |<==Feedback=Path==============================<|   |     |
|     ,---|<--Transport Layer returned Congestion Signal-<|<--'     |
|     |   |                                               |         |
|     |   |>==============Data=Path======================>|         |
|     `-->|>---------(new)-IP layer ConEx Signal--------->|         |
|         |        (Carried in Data Packet Headers)       |         |
+---------+                                               +---------+
]]></artwork>
       <postamble>Not shown are policy devices along the data path that
       observe the ConEx Signal, and use the information to monitor or manage
       traffic. These are discussed in <xref
       target="abstrmech_Policy_Devices"></xref>.</postamble>
     </figure>
     <section anchor="abstrmech_Terminology" title="Terminology">
       <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
       "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
       document are to be interpreted as described in RFC 2119 <xref
       target="RFC2119"></xref>.</t>
       <t>ConEx signals in IP packet headers from the sender to the network
       {ToDo: These are placeholders for whatever words we decide to
       use}:<list style="hanging">
           <t hangText="Not-ConEx:">The transport is not ConEx-capable</t>
           <t hangText="ConEx-Capable:">The transport is ConEx-Capable. This
           is the opposite of Not-ConEx and implies one of the following
           signals<list style="hanging">
               <t hangText="Re-Echo-Loss:">(aka Purple) The transport has
               experienced a loss</t>
               <t hangText="Re-Echo-ECN:">(aka Black) The transport has
               experienced an ECN mark</t>
               <t hangText="Credit:">(aka Green) The transport is building up
               credit to allow for any future delay in expected ConEx
               signals (see <xref
         target="abstrmech_Credit_Simple_Audit"></xref>)</t>
               <t hangText="ConEx-Not-Marked:">The transport is ConEx-capable
               but is signaling none of Re-Echo-Loss, Re-Echo-ECN or
               Credit</t>
               <t hangText="ConEx-Marked:">At least one of Re-Echo-Loss,
               Re-Echo-ECN or Credit.</t>
             </list></t>
         </list></t>
     </section>
   </section>
   <!-- ================================================================ -->
   <section anchor="abstrmech_Requirements"
            title="Requirements for the ConEx Signal">
     <t>Ideally, all the following requirements would be met by a Congestion
     Exposure Signal. However it is already known that some compromises will
     be necessary, therefore all the requirements are expressed with the
     keyword 'SHOULD' rather than 'MUST'. The only mandatory requirement is
     that a concrete protocol description MUST give sound reasoning if it
     chooses not to meet any of these requirements:<list style="letters">
         <t>The ConEx Signal SHOULD be visible to internetwork layer devices
         along the entire path from the transport sender to the transport
         receiver. Equivalently, it SHOULD be present in the IPv4 or IPv6
         header, and in the outermost IP header if using IP in IP tunneling.
         The ConEx Signal SHOULD be immutable once set by the transport
         sender. A corollary of these requirements is that the chosen ConEx 
         encoding SHOULD pass silently without modification through pre-existing 
         networking gear.</t>
         <t>The ConEx Signal SHOULD be useful under only partial deployment.
         A minimal deployment SHOULD only require changes to transport
         senders. Furthermore, partial deployment SHOULD create incentives
         for additional deployment, both in terms of enabling ConEx on more
         devices and adding richer features to existing devices. Nonetheless,
         ConEx deployment need never be universal, and it is anticipated that
         some hosts and some transports may never support the ConEx Protocol
         and some networks may never use the ConEx Signals.</t>
         <t>The ConEx Signal SHOULD be accurate. In potentially hostile
         environments such as the public Internet, it SHOULD be possible for
         techniques to be deployed to audit the Congestion Exposure Signal by
         comparing it to the actual congestion signals on the forward data
         path. The auditing mechanism must have a capability for providing
         sufficient disincentives against misreported congestion, such as by
         throttling traffic that reports less congestion than it is actually
         experiencing.</t>
         <t>The ConEx Signal SHOULD be timely. There will be a delay between
         the time when an auditing device sees an actual congestion signal
         and when it sees the subsequent Congestion Exposure Signal from the
         sender. The minimum delay will be one round trip, but it may be much
         longer depending on the transport's choice of feedback delay
         (consider RTCP <xref target="RFC3550"></xref> for example). It is
         not practical to expect auditing devices in the network to make
         allowance for such feedback delays. Instead, the sender SHOULD be
         able to send ConEx signals in advance, as 'credit' for any audit
         function to hold as a balance against the risk of congestion during
         the feedback delay. This design choice greatly simplifies auditing (see <xref
         target="abstrmech_Credit_Simple_Audit"></xref>).</t>
       </list></t>
     <t>It is important to note that the auditing requirement implies a
     number of additional constraints: The basic auditing technique is to
     count both actual congestion signals and ConEx Signals someplace along
     the data path:<list style="symbols">
         <t>For congestion signaled by ECN, auditing is most accurate when
         located near the transport receiver. Within any flow or aggregate of
         flows, the volume of data tagged with ConEx Signals should never be 
         less than the total volume of ECN marked data seen near the receiver.</t>
         <t>For congestion signaled by loss, totally accurate auditing is not
         believed to be possible in the general case, because it involves a
         network node detecting the absence of some packets, when it cannot
         necessarily see the transport protocol sequence numbers and when the
         missing packets might simply be taking a different route. But there
         are common cases where sufficient audit accuracy should be
         possible:<list style="symbols">
             <t>For non-IPsec traffic conforming to standard TCP sequence
             numbering on a single path, an auditor could detect losses by
             observing both the original transmission and the retransmission
             after the loss. Such auditing would be most accurate near the
             sender.</t>
             <t>For networks designed so that losses predominantly occur
             under the management of one IP-aware node on the path, the
             auditor could be located at this bottleneck. It could simply
             compare ConEx Signals with actual local losses. This is a good
             model for most consumer access networks where audit accuracy could
             well be sufficient even if losses occasionally occur at other
             nodes in the network, such as border gateways (see <xref
             target="abstrmech_Audit"></xref> for details).</t>
           </list></t>
       </list></t>
     <t>Given that loss-based and ECN-based ConEx might sometimes be best
     audited at different locations, having distinct encodings would widen
     the design space for the auditing function.</t>
   </section>
   <!-- ================================================================ -->
   <section anchor="abstrmech_Representing_ConEx"
            title="Representing Congestion Exposure">
     <t>Most protocol specifications start with a description of packet
     formats and codepoints with their associated meanings. This document
     does not: It is already known that choosing the encoding for the ConEx
     Signal is likely to entail some engineering compromises that have the
     potential to reduce the protocol's usefulness in some settings. Rather
     than making these engineering choices prematurely, this document side
     steps the encoding problem by describing an abstract representation of
     ConEx Signals. All of the elements of the protocol can be defined in
     terms of this abstract representation. Most important, the preliminary
     use cases for the protocol are described in terms of the abstract
     representation in companion documents <xref
     target="I-D.conex-concepts-uses"></xref>.</t>
     <t>Once we have some example use cases we can evaluate different
     encoding schemes. Since these schemes are likely to include some
     conflated code points, some information will be lost resulting in
     weakening or disabling some of the algorithms and eliminating some use
     cases.</t>
     <t>The goal of this approach is to be as complete as possible for
     discovering the potential usage and capabilities of the ConEx protocol,
     so we have some hope of making optimal design decisions when choosing
     the encoding.</t>
     <!-- ________________________________________________________________ -->
     <section anchor="abstrmech_Simple_Encoding" title="Strawman Encoding">
       <t>As an aid to the reader, it might be helpful to describe a
       na&iuml;ve strawman encoding of the ConEx protocol described solely in
       terms of TCP: set the Reserved bit in the IPv4 header (bit 48 counting
       from zero <xref target="RFC0791"></xref>&mdash;aka the "evil bit"
       <xref target="RFC3514"></xref>) on all retransmissions or once per ECN
       signaled window reduction. Clearly network devices along the forward
       path can see this bit and act on it. For example they can count marked
       and unmarked packets to estimate the congestion levels along the
       path.</t>
       <t>However, the IESG has chartered the ConEx working group to
       establish that there is sufficient demand for an IPv6 ConEx protocol
       before using the last available bit in the IPv4 header. Furthermore
       this encoding, by itself, does not sufficiently support partial
       deployment or strong auditing and might motivate users and/or
       applications to misrepresent the congestion that they are causing.</t>
       <t>Nonetheless, this strawman encoding does present a clear mental
       model of how the ConEx protocol might function under various uses.</t>
     </section>
     <!-- ________________________________________________________________ -->
     <!---->
     <section anchor="abstrmech_ECN_Encoding" title="ECN Based Encoding">
       <t>Ideally ConEx and ECN are orthogonal signals and SHOULD be entirely
       independent. However, given the limited number of header bit and/or
       code points, these signals may have to share code points, at least
       partially.</t>
       <t>The re-ECN specification <xref
       target="I-D.briscoe-tsvwg-re-ecn-tcp"></xref> presents an
       implementation of ConEx that had to be tightly integrated with the encoding
       of ECN in order to fit into the IP header. The central theme of the re-ECN work is an audit
       mechanism that can provide sufficient disincentives against
       misrepresenting congestion <xref
       target="I-D.briscoe-tsvwg-re-ecn-motiv"></xref>, which is analyzed
       extensively in Briscoe's PhD dissertation <xref
       target="Refb-dis"></xref>.</t>
       <t>Re-ECN is a good example of one chosen set of compromises
       attempting to meet the requirements of <xref
       target="abstrmech_Requirements"></xref>. However, the present document
       takes a step back, aiming to state the ideal requirements in order to
       allow the Internet community to assess whether other compromises are
       possible.</t>
       <t>In particular, different incremental deployment choices may be
       desirable to meet the partial deployment requirement of <xref
       target="abstrmech_Requirements"></xref>. Re-ECN requires the receiver
       to be at least ECN-capable as well as requiring an update to the
       sender. Although ConEx will inherently require change at the sender,
       it would be preferable if it could work, even partially, with any
       receiver.</t>
       <t>The chosen ConEx protocol certainly must not require ECN to be
       deployed in any network. In this respect re-ECN is already a good
       example&mdash;it acts perfectly well as a loss-based ConEx protocol it
       the loss-based audit techniques in <xref
       target="abstrmech_Audit"></xref> are used. However, it would still be
       desirable to avoid the dependence on an ECN receiver.</t>
       <!--Although re-ECN does not require networks to support ECN, it still embodies a major incremental deployment 
challenge; a sender cannot use re-ECN unless the receiver at least supports ECN. Most operating systems 
currently being supplied (late 2010) implement ECN, but it is turned off by default at the client end, 
even though it is on by default at the server end. This is primarily because of a small number of popular 
security appliances and home gateways had bugs that cause serious interoperability problems for the first 
deplores, relative to the small benefit.-->
       <!--Currently (late 2010) ECN is implemented but off by default in the TCP client code of the operating 
systems being most widely suppled (Windows 7 and the Linux mainline). Although ECN is on by default 
in the majority of TCP servers (Linux and Windows server 2007)-->
       <t>For a tutorial background on re-ECN techniques, see [<xref
       format="counter" target="Re-fb"></xref>, <xref format="counter"
       target="FairerFaster"></xref>].</t>
       <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  -->
       <section anchor="abstrmech_ECN_Changes" title="ECN Changes">
         <t>Although the re-ECN protocol requires no changes to the network
         part of the ECN protocol, it is important to note that it does
         propose some relatively minor modifications to the host-to-host
         aspects of the ECN protocol specified in RFC 3168. They include:
         redefining the ECT(1) code point (the change is consistent with
         RFC3168 but requires deprecating the experimental ECN nonce <xref
         target="RFC3540"></xref>); modifications to the ECN negotiations
         carried on the SYN and SYN-ACK; and using a different state machine
         to carry ECN signals in the transport acknowledgments from a modified
         Receiver to the Sender. This last change is optional, but it permits the transport
         protocol to carry multiple congestion signals per round trip. It
         greatly simplifies accurate auditing, and is likely to be useful in other 
         transports, e.g. DCTCP <xref target="DCTCP" />.</t>
         <t>All of these adjustments to RFC 3168 may also be needed in a
         future standardized ConEx protocol. There will need to be very
         careful consideration of any proposed changes to ECN or other
         existing protocols, because any such changes increase the cost of
         deployment.</t>
       </section>
     </section>
     <!-- ________________________________________________________________ -->
     <section anchor="abstrmech_Abstract_Encoding" title="Abstract Encoding">
       <t>The ConEx protocol could take one of two different encodings:
       independently settable bits or an enumerated set of mutually exclusive
       codepoints.</t>
       <!-- Matt, you might want to say something about congestion events here.
         Bob, it doesn't bother me here because the logic works even if the reader
         assumes some alternate definition of volume.   -->
       <t>In both cases, the amount of congestion is signaled by the volume
       of marked data&mdash;just as the volume of lost data or ECN marked
       data signals the amount of congestion experienced. Thus the size of
       each packet carrying a ConEx Signal is significant.</t>
       <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  -->
       <section anchor="abstrmech_Separate" title="Independent Bits">
         <t>This encoding involves flag bits, each of which the sender can
         set independently to indicate to the network one of the following
         four signals:<list style="hanging">
             <t hangText="ConEx (Not-ConEx)">The transport is (or is not)
             using ConEx with this packet (the protocol MUST be arranged so
             that legacy transport senders implicitly send Not-ConEx)</t>
             <t hangText="Re-Echo-Loss (Not-Re-Echo-Loss)">The transport has
             (or has not) experienced a loss</t>
             <t hangText="Re-Echo-ECN (Not-Re-Echo-ECN)">The transport has
             (or has not) experienced ECN-signaled congestion</t>
             <t hangText="Credit (Not-Credit)">The transport is (or is not)
             building up congestion credit (see <xref
             target="abstrmech_Audit"></xref> on the audit function)</t>
           </list></t>
       </section>
       <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  -->
       <section anchor="abstrmech_Enumerated" title="Codepoint Encoding">
         <t>This encoding involves signaling one of the following five
         codepoints:</t>
         <t>ENUM {Not-ConEx, ConEx-Not-Marked, Re-Echo-Loss, Re-Echo-ECN, Credit}</t>
         <t>Each named codepoint has the same meaning as in the encoding
         using independent bits (<xref target="abstrmech_Separate"></xref>).
         The use of any one codepoint implies the negative of all the others.</t>
         <t>Inherently, the semantics of most of the enumerated codepoints
         are mutually exclusive. 'Credit' is the only one that might need to
         be used in combination with either Re-Echo-Loss or Re-Echo-ECN, but
         even that requirement is questionable. It must not be forgotten that
         the enumerated encoding loses the flexibility to signal these two
         combinations, whereas the encoding with four independent bits is not
         so limited. Alternatively two extra codepoints could be assigned to
         these two combinations of semantics.</t>
        <!--{ToDo: Signal from Policer to Receiver to distinguish policy-induced drop from congestion-induced drop. 
Bob NIX this, it is not in scope. -MM}-->
         <!--Some might prefer to use the following colours respectively for each codepoint. 
The same colours as follows (with the omission of Purple) were used to describe re-ECN codepoints:
{Hmmm, I changed them above, I strongly prefer white to be unmarked ConEx enabled, and a non-color (blank?) to be non-conex.}
ENUM {White, Grey, Purple, Black, Green}.
-->
       </section>
     </section>
   </section>
   <!-- ================================================================ -->
   <section anchor="abstrmech_ConEx_Components"
            title="Congestion Exposure Components">
     <t>{ToDo: Picture of the components, similar to that in the last
     slideset about conex-concepts-uses?}</t>
     <!-- ________________________________________________________________ -->
     <section anchor="abstrmech_Senders" title="Modified Senders">
       <t>The sending transport needs to be modified to send Congestion
       Exposure Signals in response to congestion feedback signals.</t>
     </section>
     <!-- ________________________________________________________________ -->
     <section anchor="abstrmech_Receivers"
              title="Receivers (Optionally Modified)">
       <t>The receiving transport may already feedback sufficiently useful
       signals to the sender so that it does not need to be altered.</t>
       <t>However, a TCP receiver feeds back ECN congestion signals no more
       than once within a round trip. The sender may require more precise
       feedback from the receiver otherwise it will appear to be understating
       its ConEx Signals (see <xref
       target="abstrmech_ECN_Changes"></xref>).</t>
       <t>Ideally, ConEx should be added to a transport like TCP without
       mandatory modifications to the receiver. But an optional modification
       to the receiver could be recommended for precision. This was the
       approach taken when adding re-ECN to TCP <xref
       target="I-D.briscoe-tsvwg-re-ecn-tcp"></xref>.</t>
     </section>
     <!-- ________________________________________________________________ -->
     <section anchor="abstrmech_Audit" title="Audit">
       <t>To audit ConEx Signals against actual losses (as opposed to ECN) an auditor could use
       one of the following techniques:<list style="hanging">
           <t hangText="TCP-specific approach:">The auditor could monitor TCP
           flows or aggregates of flows, only holding state on a flow if it
           first sends a Credit or a Re-Echo-Loss marking. The auditor could
           detect retransmissions by monitoring sequence numbers. It would
           assure that (volume of retransmitted data) &lt;= (volume of data
           marked Re-Echo-Loss). Traffic would only be auditable in this way
           if it conformed to the standard TCP protocol and the IP payload
           was not encrypted (e.g. with IPsec).</t>
          <!--Matt: (May need to include a fudge factor, because it would be more robust to mark the packet after a 
retransmission. Otherwise network devices that discard marked packets will cause connectivity 
failures, rather than poor performance).
Bob: This presupposes that network devices will bias discard to marked packets. In my 
dissertation I found that all cases required the reverse. 
I said "we have made sure that the dropper doesn?t drop Positive or Cautious packets, and the 
policer only drops Positive or Cautious packets as a last resort". 
And in the re-ECN spec, preferential discard based on re-ECN markings only drops black or green 
as a last resort.
Also see footnote 18 on p101 of my PhD dissertation, which refers to the paragraphs on 
"Biased Congestion Marking" in S.12.1.2.
Matt: I was more worried about robustness in the presence of bugs than proper behavior of fully compliant network devices.
-->
           <t hangText="Predominant bottleneck approach:">Unlike the above
           TCP-specific solution, this technique would work for IP packets
           carrying any transport layer protocol, and whether encrypted or
           not. But it only works well for networks designed so that losses
           predominantly occur under the management of one IP-aware node on
           the path. The auditor could then be located at this bottleneck. It
           could simply compare ConEx Signals with actual local losses. Most
           consumer access networks are design to this model, e.g. the radio
           network controller (RNC) in a cellular network or the broadband
           remote access server (BRAS) in a digital subscriber line (DSL)
           network. 
           <vspace blankLines="1" />
           The accuracy of an auditor at
           one predominant bottleneck might still be sufficient, even if
           losses occasionally occurred at other nodes in the network (e.g.
           border gateways). Although the auditor at the predominant
           bottleneck would not always be able to detect losses at other
           nodes, transports would not know where losses were occurring
           either. Therefore a transport would not know which losses it
           could cheat on without getting caught, and which ones it
           couldn't.</t>
         </list></t>
       <t>To audit ConEx Signals against actual ECN markings or losses, the
       auditor could work as follows: monitor flows or aggregates of flows,
       only holding state on a flow if it first sends a ConEx-Marked packet (Credit or either
       Re-Echo marking). Count the number of bytes marked with Credit or
       Re-Echo-ECN. Separately count the number of bytes marked with ECN. Use
       Credits to assure that {#ECN} &lt;= {#Re-Echo-ECN} + {#Credit}, even though the
       Re-Echo-ECN markings are delayed by at least one RTT.</t>
        <section anchor="abstrmech_Credit_Simple_Audit" title="Using Credit to Simplify Audit">
           <t>At the audit function,there will be an inherent delay of at least one round trip between a congestion signal and the subsequent ConEx signal it triggers&mdash;as it makes the two passes of the feedback loop in 
<xref target="abstrmech_Fig_ConEx_Placement"></xref>. However, the audit function cannot be expected to wait for a round trip to check that one signal balances the other, because it is hard for a network device to know the RTT of each transport. </t>
           <t>Instead, it considerably simplifies the audit function if the source transport is made responsible for removing the round trip delay in ConEx signals. The transport SHOULD signal sufficient credit in advance to cover any reasonably expected congestion during its feedback delay. Then, the audit function does not need to make allowance for round trip delays&mdash;that it cannot quantify. This design choice correctly makes the transport responsible for both minimizing feedback delay and for the risk that packets in flight will cause congestion to others before the source can react.</t>
           <t>For example, imagine the audit function keeps a running account of the balance between actual congestion signals (loss or ECN), which it counts as negative, and ConEx signals, which it counts as  positive. Having made the transport responsible for round trip delays, it will be expected to have pre-loaded the audit function with some credit at the start. Therefore, if ever the balance does go negative, the audit function can immediately start punishing a flow, without any grace period.</t>
           <t>The one-way nature of packet forwarding probably makes per-flow state unavoidable for the audit function. This was a necessary sacrifice to avoid per-flow state elsewhere in the wider ConEx architecture. Nonetheless, care was taken to ensure that packets could bring soft-state to the audit function, so that it would continue to work if a flow shifted to a different audit device, perhaps after a reroute or an audit device failure. Therefore, although the audit function is likely to need flow state memory, at least it complies with the 'fate-sharing' design principle of the Internet <xref target="IntDesPrinciples"></xref>, and at least per-flow audit is only required at the outer edges of the internetwork, where it is less of a scalability concern.</t>
           <t>Note also that ConEx does not intend to embed rules in the network on how individual flows <spanx style="emph">behave</spanx>. The audit function only does per-flow processing to check the integrity of ConEx <spanx style="emph">information</spanx>.</t>
        </section>
        <section anchor="abstrmech_Audit_Behave_Constraints" title="Behaviour Constraints for the Audit Function">
           <t>There is no intention to standardise how to design or implement the audit function. However, it is necessary to lay down the following normative constraints on audit behaviour so that transport designers will know what to design against and implementers of audit devices will know what pitfalls to avoid:
        <list style="hanging">
             <t hangText="Minimal False Hits:"> Audit SHOULD introduce minimal false hits for
             honest flows;</t>
             <t hangText="Minimal False Misses:"> Audit SHOULD quickly detect and sanction dishonest
             flows, preferably at the first dishonest packet;</t>
             <t hangText="Transport Oblivious:"> Audit MUST NOT be designed around one particular rate
             response, such as any particular TCP congestion control algorithm
           or one particular resource sharing regime such as
             TCP-friendliness <xref target="RFC3448"></xref>. An important goal is to give ingress
             networks the freedom to unilaterally allow different rate responses to congestion and different 
        resource sharing regimes <xref target="Evol_cc" />, without having to coordinate with downstream 
        networks;</t>
             <t hangText="Sufficient Sanction:"> Audit MUST introduce sufficient sanction (e.g. loss in goodput)
        so that sources cannot understate congestion and play off losses at the audit function against 
        higher allowed throughput at a congestion policer <xref target="Salvatori05" />;</t>
             <t hangText="Manage Memory Exhaustion:"> Audit SHOULD be able to counter state exhaustion
             attacks. For instance, if the audit function uses flow-state, it should not be possible for
             sources to exhaust its memory capacity by gratuitously sending numerous packets, each
             with a different flow ID.</t>
             <t hangText="Identifier Accountability:"> Audit MUST NOT be vulnerable to `identity
             whitewashing', where a transport can label a flow with a new ID more cheaply than
             paying the cost of continuing to use its current ID <xref target="CheapPseud" />;</t>
           </list>
           </t>
        </section>
     </section>
     <!-- ________________________________________________________________ -->
     <section anchor="abstrmech_Policy_Devices" title="Policy Devices">
       <t>Policy devices are characterised by a need to be configured with a
       policy related to the users or neighboring networks being served. In
       contrast, the auditing devices referred to in the previous section
       primarily enforce compliance with the ConEx protocol and do not need
       to be configured with any client-specific policy. </t>
       <!-- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  -->
    <section anchor="abstrmech_Other_Policy" title="Policy Monitoring Devices">
         <t>Policy devices can typically be decomposed into two functions i) monitoring the ConEx signal to 
         compare it with a policy then ii) acting in some way on the result. Various actions might be invoked 
         against 'out of contract' traffic, such as policing (see next section), re-routing, or downgrading the 
         class of service.</t>
         <t> Alternatively a policy device might not act directly on the traffic, but instead report
         to management systems that are designed to control congestion indirectly.  For instance 
         the reports might trigger capacity upgrades, penalty clauses in contracts, levy charges between
         networks based on congestion, or merely send warnings to clients who are causing
         excessive congestion.</t>
         <t>Nonetheless, whatever action is invoked, the policy monitoring function will always be a 
         necessary part of any policy device.</t>
</section>
<section anchor="abstrmech_Policers" title="Congestion Policers">
         <t>A congestion policer can be implemented in a very
         similar way to a bit-rate policer, but its effect can be focused solely
         on traffic causing congestion downstream, which ConEx signals make visible. Without ConEx 
         signals, the only way to mitigate congestion is to blindly limit traffic bit-rate, on the assumption that 
         high bit-rate is more likely to cause congestion.</t>
         <t>A congestion policer monitors all ConEx traffic entering a network, or some
         identifiable subset. Using ConEx signals, it measures the amount of
         congestion that this traffic is contributing to somewhere downstream. If this exceeds a
         policy-configured 'congestion-bit-rate' the congestion policer will
         limit all the monitored ConEx traffic.</t>
<!-- Should we give this example here, or rely on the definition of congestion-bit-rate in conex-concepts-uses? -->
<!--          <t>Downstream congestion-bit-rate is the bit-rate of only those packets that are ConEx marked. For instance an allowed congestion-bit-rate of 100kb/s would allow traffic to flow at 10Mb/s into 1% congestion or 100Mb/s into 0.1% congestion.</t> -->
         <t>A congestion policer can be
         implemented by a simple token bucket. But unlike a bit-rate policer,
         it removes a token only when it forwards a packet that is ConEx-Marked, effectively treating 
         Not-ConEx-Marked packets as invisible. Consequently, because tokens give the right to send 
         congested bits, the fill-rate of the token bucket will represent the allowed congestion-bit-rate, which 
         should be sufficient traffic management without having to additionally constrain the straight bit-rate. 
         See <xref target="CongPol"></xref> for details.</t>
       </section>
     </section>
   </section>
   <!-- ================================================================ -->
   <section anchor="abstrmech_IANA" title="IANA Considerations">
     <t>This memo includes no request to IANA.</t>
     <t>Note to RFC Editor: this section may be removed on publication as an
     RFC.</t>
   </section>
   <!-- ================================================================ -->
   <section anchor="abstrmech_Sec_Consider" title="Security Considerations">
     <t>Significant parts of this whole document are about auditability
     of ConEx Signals, in particular <xref
     target="abstrmech_Audit"></xref>.</t>
   </section>
   <!-- ================================================================ -->
   <section anchor="abstrmech_Conclusions" title="Conclusions">
     <t>{ToDo:}</t>
   </section>
   <!-- ================================================================ -->
   <section anchor="abstrmech_Acknowledgements" title="Acknowledgements">
     <t>This document was improved by review comments from Toby
     Moncaster, Nandita Dukkipati, Mirja Kuehlewind and Caitlin Bestler.</t>
   </section>
   <!-- ================================================================ -->
   <section anchor="abstrmech_Comments_Solicited" title="Comments Solicited">
     <t>Comments and questions are encouraged and very welcome. They can be
     addressed to the IETF Congestion Exposure (ConEx) working group mailing
     list &lt;conex@ietf.org&gt;, and/or to the authors.</t>
   </section>
 </middle>
 <back>
   <!-- ================================================================ -->
   <references title="Normative References">
     <?rfc include='reference.RFC.2119'?>
   </references>
   <references title="Informative References">
     <?rfc include='reference.RFC.0791'?>
     <?rfc include='reference.RFC.2309'?>
     <?rfc include='reference.RFC.3168'?>
     <?rfc include='reference.RFC.3448'?>
     <?rfc include='reference.RFC.3514'?>
     <?rfc include='reference.RFC.3540'?>
     <?rfc include='reference.RFC.3550'?>
     <?rfc include='reference.RFC.5670'?>
     <?rfc include='reference.RFC.5681'?>
     <?rfc include='reference.I-D.ietf-ledbat-congestion'?>
     <?rfc include='reference.I-D.briscoe-tsvwg-re-ecn-tcp'?>
     <?rfc include='reference.I-D.sridharan-tcpm-ctcp'?>
     <reference anchor="I-D.briscoe-tsvwg-re-ecn-motiv">
       <front>
         <title>Re-ECN: A Framework for adding Congestion Accountability to
         TCP/IP</title>
         <author fullname="Bob Briscoe" initials="B" surname="Briscoe">
           <organization></organization>
         </author>
         <author fullname="Arnaud Jacquet" initials="A" surname="Jacquet">
           <organization></organization>
         </author>
         <author fullname="T Moncaster" initials="T" surname="Moncaster">
           <organization></organization>
         </author>
         <author fullname="Alan Smith" initials="A" surname="Smith">
           <organization></organization>
         </author>
         <date month="October" day="25" year="2010"/>
         <abstract>
           <t>This document describes the framework to support a new protocol
           for explicit congestion notification (ECN), termed re-ECN, which
           can be deployed incrementally around unmodified routers. Re-ECN
           allows accurate congestion monitoring throughout the network thus
           enabling the upstream party at any trust boundary in the
           internetwork to be held responsible for the congestion they cause,
           or allow to be caused. So, networks can introduce straightforward
           accountability for congestion and policing mechanisms for incoming
           traffic from end- customers or from neighbouring network domains.
           As well as giving the motivation for re-ECN this document also
           gives examples of mechanisms that can use the protocol to ensure
           data sources respond correctly to congestion. And it describes
           example mechanisms that ensure the dominant selfish strategy of
           both network domains and end- points will be to use the protocol
           honestly. Authors' Statement: Status (to be removed by the RFC
           Editor) Although the re-ECN protocol is intended to make a simple
           but far- reaching change to the Internet architecture, the most
           immediate priority for the authors is to delay any move of the ECN
           nonce to Proposed Standard status. The argument for this position
           is developed in Appendix E.</t>
         </abstract>
       </front>
       <seriesInfo name="Internet-Draft"
                   value="draft-briscoe-tsvwg-re-ecn-tcp-motivation-02" />
       <format target="http://www.ietf.org/internet-drafts/draft-briscoe-tsvwg-re-ecn-tcp-motivation-02.txt"
               type="TXT" />
     </reference>
<reference anchor='I-D.conex-concepts-uses'>
<front>
<title>ConEx Concepts and Use Cases</title>
<author initials='B' surname='Briscoe' fullname='Bob Briscoe'>
        <organization />
</author>
<author initials='R' surname='Woundy' fullname='Richard Woundy'>
        <organization />
</author>
<author initials='T' surname='Moncaster' fullname='T Moncaster'>
        <organization />
</author>
<author initials='J' surname='Leslie' fullname='John Leslie'>
        <organization />
</author>
<date month='March' day='14' year='2011' />
<abstract><t>Internet Service Providers (operators) are facing problems where localized congestion prevents full utilization of the path between sender and receiver at today's "broadband" speeds.  Operators desire to control this congestion, which often appears to be caused by a small number of users consuming a large amount of bandwidth. Building out more capacity along all of the path to handle this congestion can be expensive and may not result in improvements for all users so network operators have sought other ways to manage congestion.  The current mechanisms all suffer from difficulty measuring the congestion (as distinguished from the total traffic).  The ConEx Working Group is designing a mechanism to make congestion along any path visible at the Internet Layer.  This document describes example cases where this mechanism would be useful.</t></abstract>
</front>
<seriesInfo name='Internet-Draft' value='draft-ietf-conex-concepts-uses-01' />
<format type='TXT'
           target='http://www.ietf.org/internet-drafts/draft-ietf-conex-concepts-uses-01.txt' />
</reference>
        <reference anchor="DCTCP" target="http://portal.acm.org/citation.cfm?id=1851192">
           <front>
               <title>
                   Data Center TCP (DCTCP)
               </title>
               <author initials="M" surname="Alizadeh" fullname="Mohammad Alizadeh">
                   <organization></organization>
               </author>
               <author initials="A" surname="Greenberg" fullname="Albert Greenberg">
                   <organization></organization>
               </author>
               <author initials="D.A." surname="Maltz" fullname="David A. Maltz">
                   <organization></organization>
               </author>
               <author initials="J" surname="Padhye" fullname="Jitendra Padhye">
                   <organization></organization>
               </author>
               <author initials="P" surname="Patel" fullname="Parveen Patel">
                   <organization></organization>
               </author>
               <author initials="B" surname="Prabhakar" fullname="Balaji Prabhakar">
                   <organization></organization>
               </author>
               <author initials="S" surname="Sengupta" fullname="Sudipta Sengupta">
                   <organization></organization>
               </author>
               <author initials="M" surname="Sridharan" fullname="Murari Sridharan">
                   <organization></organization>
               </author>
               <date month="October" year="2010" />
           </front>
           <seriesInfo name="ACM SIGCOMM CCR" value="40(4)63--74" />
           <format type='PDF'
                       target='http://ccr.sigcomm.org/drupal/files/p63_0.pdf' />
        </reference>
     <reference anchor="Refb-dis"
                target="http://bobbriscoe.net/projects/refb/#refb-dis">
       <front>
         <title>Re-feedback: Freedom with Accountability for Causing
         Congestion in a Connectionless Internetwork</title>
         <author fullname="Bob Briscoe" initials="B" surname="Briscoe">
           <organization>BT &amp; UCL</organization>
         </author>
         <date month="" year="2009" />
       </front>
       <seriesInfo name="UCL PhD Dissertation" value="" />
       <format target="http://www.bobbriscoe.net/pubs.html#refb-dis"
               type="PDF" />
     </reference>
     <reference anchor="Re-fb"
                target="http://www.acm.org/sigs/sigcomm/sigcomm2005/techprog.html#session8">
       <front>
         <title>Policing Congestion Response in an Internetwork Using
         Re-Feedback</title>
         <author fullname="Bob Briscoe" initials="B" surname="Briscoe">
           <organization>BT &amp; UCL</organization>
         </author>
         <author fullname="Arnaud Jacquet" initials="A" surname="Jacquet">
           <organization>BT</organization>
         </author>
         <author fullname="Carla Di Cairano-Gilfedder" initials="C"
                 surname="Di Cairano-Gilfedder">
           <organization>BT</organization>
         </author>
         <author fullname="Alessandro Salvatori" initials="A"
                 surname="Salvatori">
           <organization>Eur&eacute;com &amp; BT</organization>
         </author>
         <author fullname="Andrea Soppera" initials="A" surname="Soppera">
           <organization>BT</organization>
         </author>
         <author fullname="Martin Koyabe" initials="M" surname="Koyabe">
           <organization>BT</organization>
         </author>
         <date month="August" year="2005" />
       </front>
       <seriesInfo name="ACM SIGCOMM CCR" value="35(4)277--288" />
       <format target="http://www.cs.ucl.ac.uk/staff/B.Briscoe/projects/2020comms/refb/refb_sigcomm05.pdf"
               type="PDF" />
     </reference>
     <reference anchor="FairerFaster"
                target="http://bobbriscoe.net/projects/refb/#fairfastip">
       <front>
         <title>A Fairer, Faster Internet Protocol</title>
         <author fullname="Bob Briscoe" initials="B" surname="Briscoe">
           <organization>BT &amp; UCL</organization>
         </author>
         <date month="December" year="2008" />
       </front>
       <seriesInfo name="IEEE Spectrum" value="Dec 2008:38--43" />
       <format target="http://www.spectrum.ieee.org/print/7027" type="HTML" />
     </reference>
        <reference anchor="IntDesPrinciples" target="http://www.acm.org/sigcomm/ccr/archive/1995/jan95/ccr-9501-clark.pdf">
           <front>
               <title>
                   The Design Philosophy of the DARPA Internet Protocols
               </title>
               <author initials="D" surname="Clark" fullname="David Clark">
                   <organization></organization>
               </author>
               <date month="August" year="1988" />
           </front>
           <seriesInfo name="ACM SIGCOMM CCR" value="18(4)106--114" />
           <format type='PDF'
                       target='http://www.acm.org/sigcomm/ccr/archive/1995/jan95/ccr-9501-clark.pdf' />
        </reference>
        <reference anchor="CheapPseud">
           <front>
               <title>
                   The Social Cost of Cheap Pseudonyms
               </title>
               <author initials="E" surname="Friedman" fullname="E. Friedman">
                   <organization></organization>
               </author>
               <author initials="P" surname="Resnick" fullname="P. Resnick">
                   <organization></organization>
               </author>
               <date month="" year="1998" />
           </front>
           <seriesInfo name="Journal of Economics and Management Strategy" value="10(2)173--199" />
        </reference>
        <reference anchor="Evol_cc" target="http://www.statslab.cam.ac.uk/~frank/evol.html">
           <front>
               <title>
                   Resource pricing and the evolution of congestion control
               </title>
               <author initials="R" surname="Gibbens" fullname="Richard J. Gibbens ">
                   <organization>Cam Uni</organization>
               </author>
               <author initials="F" surname="Kelly" fullname="Frank P. Kelly">
                   <organization>Cam Uni</organization>
               </author>
               <date month="December" year="1999" />
           </front>
           <seriesInfo name="Automatica" value="35(12)1969--1985" />
           <format type='PDF'
                       target='http://www.statslab.cam.ac.uk/~frank/evol.html' />
        </reference>
     <reference anchor="Vegas"
                target="http://ieeexplore.ieee.org/iel1/49/9740/00464716.pdf?arnumber=464716">
       <front>
         <title>TCP Vegas: End-to-End Congestion Avoidance on a Global
         Internet</title>
         <author fullname="Lawrence S. Brakmo" initials="L." surname="Brakmo">
           <organization></organization>
         </author>
         <author fullname="Larry L. Peterson" initials="L."
                 surname="Peterson">
           <organization></organization>
         </author>
         <date month="October" year="1995" />
       </front>
       <seriesInfo name="IEEE Journal on Selected Areas in Communications"
                   value="13(8)1465--80" />
       <format target="http://ieeexplore.ieee.org/iel1/49/9740/00464716.pdf?arnumber=464716"
               type="PDF" />
     </reference>
     <reference anchor="CongPol"
                target="http://bobbriscoe.net/projects/refb/#polfree">
       <front>
         <title>Policing Freedom to Use the Internet Resource Pool</title>
         <author fullname="Arnaud Jacquet" initials="A" surname="Jacquet">
           <organization>BT</organization>
         </author>
         <author fullname="Bob Briscoe" initials="B" surname="Briscoe">
           <organization>BT &amp; UCL</organization>
         </author>
         <author fullname="Toby Moncaster" initials="T" surname="Moncaster">
           <organization>BT</organization>
         </author>
         <date month="December" year="2008" />
       </front>
       <seriesInfo name="Proc ACM Workshop on Re-Architecting the Internet (ReArch'08)"
                   value="" />
       <format target="http://www.bobbriscoe.net/projects/2020comms/refb/policer_rearch08.pdf"
               type="PDF" />
     </reference>
        <reference anchor="Salvatori05">
           <front>
               <title>
                   Closed Loop Traffic Policing
               </title>
               <author initials="A" surname="Salvatori" fullname="Alessandro Salvatori">
                   <organization>Eur&eacute;com &amp; BT</organization>
               </author>
               <date month="September" year="2005" />
           </front>
           <seriesInfo name="Politecnico Torino and Institut Eurecom Masters Thesis" value="" />
        </reference>
   </references>
 </back>
</rfc>
