Transport Area Working Group                                  B. Briscoe
Internet-Draft                                                  BT & UCL
Expires: August 31, December 28, 2006                               February 27,                                 June 26, 2006

        Emulating Border Flow Policing using Re-ECN on Bulk Data
               draft-briscoe-tsvwg-re-ecn-border-cheat-00
               draft-briscoe-tsvwg-re-ecn-border-cheat-01

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on August 31, December 28, 2006.

Copyright Notice

   Copyright (C) The Internet Society (2006).

Abstract

   Scaling per flow admission control to the Internet is a hard problem.
   A recently proposed approach combines Diffserv and pre-congestion
   notification (PCN) to provide a service slightly better than Intserv
   controlled load.  It scales to networks of any size, but only if
   domains trust each other to comply with admission control and rate
   policing.  This memo claims to solve this trust problem without
   losing scalability.  It describes bulk border policing that emulates provides
   a sufficient emulation of per-flow policing with the help of another
   recently proposed extension to ECN, involving re-echoing ECN feedback
   (re-ECN).  With only passive, passive bulk measurements at borders, sanctions
   can be applied against cheating networks.

Status (to be removed by the RFC Editor)

   This memo is posted as an Internet-Draft with the intent to
   eventually progress to informational status.  It is envisaged that
   the necessary standards actions to realise the system described would
   sit in three other documents currently being discussed (but not on
   the standards track) in the IETF Transport Area [Re-TCP], [RSVP-ECN]
   & [PCN].  The authors seek comments from the Internet community on
   whether combining PCN and re-ECN is a sufficient solution to the
   admission control problem.

Changes from previous drafts (to be removed by the RFC Editor)

   From -00 to -01:

      Added subsection on Border Accounting Mechanisms (Section 5.6.1)

      Section 4.2 on the re-ECN wire protocol clarified and re-organised
      to separately discuss re-ECN for default ECN marking and for pre-
      congestion marking (PCN).

      Router Forwarding Behaviour subsection added to re-organised
      section on Protocol Operation (Section 4.3).  Extensions section
      moved within Protocol Operations.

      Emulating Border Policing (Section 5) reorganised, starting with a
      new Terminology subsection heading, and a simplified overview
      section.  Added a large new subsection on Border Accounting
      Mechanisms within a new section bringing together other
      subsections on Border Mechanisms generally (Section 5.6).  Some
      text moved from old subsections into these new ones.

      Added section on Incremental Deployment (Section 7), drawing
      together relevant points about deployment made throughout.

      Sections on Design Rationale (Section 8) and Security
      Considerations (Section 9) expanded with some new material,
      including new attacks and their defences.

      Suggested Border Metering Algorithms improved (Appendix A.2) for
      resilience to newly identified attacks.

Table of Contents

   1.  Introduction
   2.  Requirements Notation
   3.  The Problem
     3.1.  The Traditional Per-flow Policing Problem
     3.2.  Generic Scenario
   4.  Re-ECN Protocol for an RSVP (or similar) Transport
     4.1.  Protocol Overview
     4.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or
           v6)
       4.2.1.  Re-ECN Recap
       4.2.2.  Re-ECN Combined with Pre-Congestion Notification
               (re-PCN)
     4.3.  Protocol Operation
     4.4.
       4.3.1.  Protocol Operation for an Established Flow
       4.3.2.  Aggregate Bootstrap
     4.5.
       4.3.3.  Flow Bootstrap
       4.3.4.  Router Forwarding Behaviour
       4.3.5.  Extensions
   5.  Emulating Border Policing with Re-ECN
     5.1.  Informal Terminology
     5.2.  Policing Overview
     5.2.
     5.3.  Pre-requisite Contractual Arrangements
     5.3.
     5.4.  Emulation of Per-Flow Rate Policing: Rationale and
           Limits
     5.4.  Policing
     5.5.  Sanctioning Dishonest Marking
     5.5.
     5.6.  Border Mechanisms
       5.6.1.  Border Accounting Mechanisms
       5.6.2.  Competitive Routing
     5.6.
       5.6.3.  Fail-safes
   6.  Analysis
   7.  Extensions  Incremental Deployment
   8.  Design Choices and Rationale
   9.  IANA  Security Considerations
   10. Security IANA Considerations
   11. Conclusions
   12. Acknowledgements
   13. Comments Solicited
   14. References
     14.1. Normative References
     14.2. Informative References
   Appendix A.  Implementation
     A.1.  Ingress Gateway Algorithm for Blanking the RE bit flag
     A.2.  Downstream Congestion Metering Algorithms
       A.2.1.  Bulk Downstream Congestion Metering Algorithm
       A.2.2.  Inflation Factor for Persistently Negative Flows
     A.3.  Algorithm for Sanctioning Negative Traffic
   Author's Address
   Intellectual Property and Copyright Statements

1.  Introduction

   The Internet community largely lost interest in the Intserv
   architecture after it was clarified that it would be unlikely to
   scale to the whole Internet [RFC2208].  Although Intserv mechanisms
   proved impractical, the services bandwidth reservation service it aimed to
   offer are is still very much required.

   A recently proposed approach [CL-arch] [CL-deploy] combines Diffserv and pre-
   congestion notification (PCN) to provide a service slightly better
   than Intserv controlled load [RFC2211].  It scales to any size
   network, but only if domains trust each other their neighbours to comply with
   admission control and rate policing. have checked
   that upstream customers aren't taking more bandwidth than they
   reserved, either accidentally or deliberately.  This memo describes
   border policing measures to sanction networks so that cheat each other. one network can protect its
   interests, even if networks around it are deliberately trying to
   cheat.  The approach provides a sufficient emulation of flow rate
   policing at trust boundaries but without per-flow processing.  The
   emulation is not perfect, but it is sufficient to ensure that the
   punishment is at least proportionate to the severity of the cheat.

   The aim is to be able to claim that scale controlled load service can scale to any number
   of endpoints, even though such scaling must take account of the
   increasing numbers of networks and users who may all have conflicting
   interests.  To achieve such scaling, this memo combines two recent
   proposals, both of which it briefly recaps:

   o  A framework deployment model for admission control over Diffserv using pre-
      congestion notification [CL-arch] [CL-deploy] describes how bulk pre-
      congestion notification on routers within an edge-to-edge Diffserv
      region can emulate the precision of per-flow admission control to
      provide controlled load service without unscalable per-flow
      processing;

   o  Re-ECN: Adding Accountability to TCP/IP [Re-TCP].  The trick that
      addresses cheating at borders is to recognise that border policing
      is mainly necessary because cheating upstream networks will admit
      traffic when they shouldn't only as long as they don't directly
      experience the downstream congestion their misbehaviour can cause.
      The re-ECN protocol ensures requires upstream nodes honestly to declare expected
      downstream congestion in all forwarded packets, which we packets and it makes it in
      their interests to declare it honestly.  Operators can then use
      monitor downstream congestion in bulk at borders to emulate border
      policing.

   Rather than the end-to-end arrangement used when re-ECN was specified
   for the TCP transport [Re-TCP], this memo specifies re-ECN in an
   edge-to-edge arrangement, making it applicable to the Diffserv above
   deployment model for admission control scenario in the framework. over Diffserv.  Also, rather
   than using a TCP transport for regular congestion feedback, this memo
   specifies re-ECN using RSVP as the transport.  We use the proposed minor
   extension of RSVP that allows it to carry transport for feedback [RSVP-ECN].
   A similar deployment model, but with a different transport for
   signalling congestion feedback [RSVP-
   ECN], which could be used (e.g.  RMD [NSIS-RMD]
   uses NSIS).

   This memo aims to do two things: i) define how to apply the re-ECN
   protocol to the admission control over Diffserv scenario; and ii)
   explain why re-ECN sufficiently emulates border policing in that
   scenario.  Most of the memo is much less frequent but more precise than TCP.

   Of course, network taken up with the second aim;
   explaining why it works.  Applying re-ECN to the scenario actually
   involves quite a trivial modification to the ingress gateway.  Our
   immediate goal is to convince everyone to build that modification in
   to ingress gateways from the start, whether first deployments require
   policing or not.  Otherwise, when we want to add policing, we will
   have built ourselves a legacy problem.  In other words, we aim to
   convince people to "Build in security from the start."

   The body of this memo is structured as follows:

      Section 3 describes the border policing problem.  We recap the
      traditional, unscalable view of how to solve the problem, and we
      recap the admission control solution which has the scalability we
      do not want to lose when we add border policing;

      Section 4 specifies the re-ECN protocol solution in detail;

      Section 5 explains how to use the protocol to emulate border
      policing, and why it works;

      Section 6 analyses the security of the proposed solution;

      Section 8 explains the sometimes subtle rationale behind our
      design decisions;

      Section 9 comments on the overall robustness of the security
      assumptions and lists specific security issues.

   It must be emphasised that we are not evangelical about removing per-
   flow processing from borders.  Network operators may choose to process do
   per-flow
   signalling processing at their borders for their own reasons, such as
   to support business models that require per-flow accounting.  But the goal of this document  Our aim
   is to show that per-flow processing at borders is no longer necessary
   /necessary/ in order to provide end-
   to-end end-to-end QoS using flow admission
   control.  To be clear,  Indeed, we are absolutely opposed to standardisation of
   technology that embeds particular business models into the Internet.
   Our aim here is merely to provide a new useful metric (downstream
   congestion) at trust boundaries.  Given the well-known significance
   of congestion in economics, operators can then use this new metric in
   their interconnection contracts if they choose.  This will enable
   competitive evolution of new business models (for examples see&nbsp[IXQoS]),
   see [IXQoS]), alongside more traditional models that depend on more
   costly per-flow processing at borders.

   We specify this protocol solution in detail in Section 4, after
   specifying the inter-domain policing problem more precisely and
   briefly recapping the framework for providing admission control using
   pre-congestion notification in Section 3.

   Having described the solution, this memo continues as follows: {ToDo:
   }

2.  Requirements Notation

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in [RFC2119].

3.  The Problem

3.1.  The Traditional Per-flow Policing Problem

   If we claim to be able to emulate per-flow policing with bulk
   policing at trust boundaries, we need to know exactly what we are
   emulating.  So, even though we expect it to become a historic
   practice, we will start from the traditional scenario with per-flow
   policing at trust boundaries to explain why it has always been
   considered necessary.

   To be able to take advantage of a reservation-based service such as
   controlled load, a source must reserve resources using a signalling
   protocol such as RSVP [RFC2205].  But, even  An RSVP signalling request refers
   to a flow of packets by its flow ID tuple (filter spec [RFC2205]) (or
   its security parameter index (SPI) [RFC2207] if the source port numbers are
   hidden by IPSec encryption).  Other signalling protocols use similar
   flow identifiers.  But, it is
   authorised insufficient to merely authorise and
   admit a flow based on its identifiers, for instance merely opening a
   pin-hole for packets with identifiers that match an admitted at the flow level, ID.
   Once a flow is admitted, it cannot necessarily be trusted to send
   packets within the rate profile it requested.

   The packet rate must also be policed to keep the flow within the
   requested flow spec [RFC2205].  For instance, without data rate
   policing, a source could reserve resources for an 8kbps audio flow
   but transmit a 6Mbps video (theft of service).  More subtly, the
   sender could generate bursts that were outside the profile it had
   requested.

   In traditional architectures, per-flow packet rate-policing is
   expensive and unscalable but, without it, a network is vulnerable to
   such theft of service (whether malicious or accidental).  Perhaps
   more importantly, if flows are allowed to send more data than they
   were permitted, the ability of admission control to give assurances
   to other flows will break.

   A signalled request refers to a flow of packets by its flow ID tuple
   (filter spec [RFC2205]) (or its security parameter index (SPI)&
   nbsp[RFC2207] if port numbers are hidden by IPsec encryption).  But
   merely opening a pin-hole for packets that match an admitted flow ID
   is an insufficient policing mechanism.  The packet rate must also be
   policed to keep the flow within the requested flow spec [RFC2205].

   Just as sources need not be trusted to keep within their requested
   flow spec, whole networks might also try to cheat.  We will now set
   up a concrete scenario to illustrate such cheats.  Imagine
   reservations for unidirectional flows from senders, through at least
   two networks, an edge network and its downstream transit provider.
   Imagine the edge network charges its retail customers per reservation
   but also has to pay its transit provider a charge per reservation.
   Typically, both its selling and buying charges might depend on the
   duration and rate of each reservation.  The level of the actual
   selling and buying prices are irrelevant to our discussion (most
   likely the network will sell at a higher price than it buys, of
   course).

   A cheating ingress network could systematically reduce the size of
   its retail customers' reservation signalling requests before
   forwarding them to its transit provider (and systematically reinstate
   the responses on the way back).  It would then receive an honest
   income from its upstream retail customer but only pay for
   fraudulently smaller reservations downstream.  Equivalently, a
   cheating ingress network may feed the traffic from a number of flows
   into an aggregate reservation over the transit that is smaller than
   the total of all the flows.  Because of these fraud possibilities, in
   traditional QoS reservation architectures the downstream network
   polices at each border.  The policer checks that the actual sent data
   rate of each flow is within the signalled reservation.

   Reservation signalling could be authenticated end to end, but this
   wouldn't prevent the aggregation cheat just described.  For this
   reason, and to avoid the need for a global PKI, signalling integrity
   is typically only protected on a hop-by-hop basis &nbsp[RFC2747]. [RFC2747].

   A variant of the above cheat is where a router in an honest
   downstream network denies admission to a new reservation, but a
   cheating upstream network still admits the flow.  For instance, the
   networks may be using Diffserv internally, but Intserv admission
   control at their borders [RFC2998].  The cheat would only work if
   they were using bulk Diffserv traffic policing at their borders,
   perhaps to avoid the cost/complexity of Intserv border policing.  As
   far as the cheating upstream network is concerned, it gets the
   revenue from the reservation, but it doesn't have to pay any
   downstream wholesale charges and the congestion is in someone else's
   network.  The cheating network may calculate that most of the flows
   affected by congestion in the downstream network aren't likely to be
   its own.  It may also calculate that the downstream router is
   probably not actually congested, but rather it is denying has been
   configured to deny admission to new flows in order to protect
   bandwidth assigned to other lower priority
   services. network services (e.g. enterprise VPNs).
   So the cheating network can steal capacity from the downstream
   operator's VPNs that are probably not actually congested.

   To summarise, in traditional reservation signalling architectures, if
   a network cannot trust a neighbouring upstream network to rate-police
   each reservation, it has to check for itself that the data rate fits
   within each of the reservations it has admitted.

3.2.  Generic Scenario

   We will now describe a generic internetworking scenario that we will
   use to describe and to test our bulk policing proposal.  It consists
   of a number of networks and endpoints that do not fully trust each
   other to behave.  In Section 6 we will tie down exactly what we mean
   by partial trust, and we will consider the various combinations where
   some networks do not trust each other and others are colluding
   together.

    _    ___      _____________________________________       ___    _
   | |  |   |   _|__    ______    ______    ______    _|__   |   |  | |
   | |  |   |  |    |  |      |  |      |  |      |  |    |  |   |  | |
   | |  |   |  |    |  |Inter-|  |Inter-|  |Inter-|  |    |  |   |  | |
   | |  |   |  |    |  | ior  |  | ior  |  | ior  |  |    |  |   |  | |
   | |  |   |  |    |  |Domain|  |Domain|  |Domain|  |    |  |   |  | |
   | |  |   |  |    |  |  A   |  |  B   |  |  C   |  |    |  |   |  | |
   | |  |   |  |    |  |      |  |      |  |      |  |    |  |   |  | |
   | |  |   |  +----+  +-+  +-+  +-+  +-+  +-+  +-+  +----+  |   |  | |
   | |  |   |  |    |  |B|  |B|  |B|  |B|  |B|  |B|  |    |  |   |   |\ | |
   | |==|   |==|Ingr|==|R|  |R|==|R|  |R|==|R|  |R|==|Egr |==|   |==|   |=>| |
   | |  |   |  |G/W |  | |  | |  | |  | |  | |  | |  |G/W |  |   |   |/ | |
   | |  |   |  +----+  +-+  +-+  +-+  +-+  +-+  +-+  +----+  |   |  | |
   | |  |   |  |    |  |      |  |      |  |      |  |    |  |   |  | |
   | |  |   |  |____|  |______|  |______|  |______|  |____|  |   |  | |
   |_|  |___|    |_____________________________________|     |___|  |_|

   Sx   Ingress               Diffserv region               Egress   Rx
   End  Access                                              Access  End
   Host Network                                            Network Host
                <-------- edge-to-edge signalling ------->
                          (for admission control)

   <-------------------end-to-end QoS signalling protocol------------->

   Figure 1: Generic Scenario (see text for explanation of terms)

   An ingress and egress gateway (Ingr G/W and Egr G/W in Figure 1)
   connect the interior Diffserv region to the edge access networks
   where routers (not shown) use per-flow reservation processing.
   Within the Diffserv region are three interior domains, A, B and C, as
   well as the inward facing interfaces of the ingress and egress
   gateways.  An ingress and egress border router (BR) is shown
   interconnecting each interior domain with the next.  There may be
   other interior routers (not shown) within each interior domain.

   In two paragraphs we now briefly recap how pre-congestion
   notification is intended to be used to control flow admission to a
   large Diffserv region.  The first paragraph describes data plane
   functions and the second describes signalling in the control plane.
   We omit many details from [CL-arch] [CL-deploy] including behaviour during
   routing changes.  For brevity here we assume other flows are already
   in progress across a path through the Diffserv region before a new
   one arrives, but how bootstrap works is described in Section 4.4. 4.3.2.

   Figure 1 shows a single simplex reserved flow from the sending (Sx)
   end host to the receiving (Rx) end host.  The ingress gateway polices
   incoming traffic within its admitted reservation and remarks it to
   turn on an ECN-capable codepoint&nbsp[RFC3168] codepoint [RFC3168] and the controlled load
   (CL) Diffserv codepoint.  Together, these codepoints define which
   traffic is entitled to the enhanced scheduling of the CL behaviour
   aggregate on routers within the Diffserv region.  The CL PHB of
   interior routers consists of a scheduling behaviour and a new ECN
   marking behaviour that we call 'pre-congestion `pre-congestion notification' [PCN].
   The CL PHB simply re-uses the definition of expedited forwarding (EF)&nbsp[RFC3246]
   (EF) [RFC3246] for its scheduling behaviour.  But it incorporates a
   new ECN marking behaviour, which sets the ECN field of an increasing
   number of CL packets to the admission marked (AM) codepoint as they
   approach a threshold rate that is lower than the line rate.  The use
   of virtual queues ensures real queues have hardly built up any
   congestion delay.  The level of marking detected at the egress of the
   Diffserv region, region is then used by the signalling system in order to
   determine admission
   control. control as follows.

   The end-to-end QoS signalling (e.g.  RSVP) for a new reservation
   takes one giant hop from ingress to egress gateway, because interior
   routers within the Diffserv region are configured to ignore RSVP.
   The egress gateway holds flow state because it takes part in the end-to-end end-
   to-end reservation.  So it can classify all packets by flow and it
   can identify all flows that have the same previous RSVP hop (a CL-region-aggregate). CL-
   region-aggregate).  For each CL-region-aggregate of flows in
   progress, the egress gateway maintains a per-packet moving average of
   the fraction of pre-congestion-marked traffic.  Once an RSVP PATH
   message for a new reservation has hopped across the Diffserv region
   and reached the destination, an RSVP RESV message is returned.  As
   the RESV message passes, the egress gateway piggy-backs the relevant
   pre-congestion level onto it [RSVP-ECN].  Again, interior routers
   ignore the RSVP message, but the ingress gateway strips off the pre-congestion pre-
   congestion level.  If the pre-congestion level is above a threshold,
   the ingress gateway denies admission to the new reservation,
   otherwise it returns the original RESV signal back towards the data
   sender.

   Once a reservation is admitted, its traffic will always receive low
   delay service for the duration of the reservation.  This is because
   ingress gateways ensure that traffic not under a reservation cannot
   pass into the Diffserv region with the CL DSCP set.  So non-reserved
   traffic will always be treated with a lower priority PHB at each
   interior router.

4.  Re-ECN Protocol for an RSVP Transport

4.1.  And even if some disaster re-routes traffic after
   it has been admitted, if the traffic through any resource tips over a
   fail-safe threshold, pre-congestion notification will trigger flow-
   pre-emption to very quickly bring every router within the whole
   Diffserv region back below its operating point.

   The whole admission control system just described deliberately
   confines per-flow processing to the access edges of the network,
   where it will not limit the system's scalability.  But ideally we
   want to extend this approach to multiple networks, to take even more
   advantage of its scaling potential.  We would still need per-flow
   processing at the access edges of each network, but not at the high
   speed interfaces where they interconnect.  Even though such an
   admission control system would work technically, it would gain us no
   scaling advantage if each network also wanted to police the rate of
   each admitted flow for itself---border routers would still have to do
   complex packet operations per-flow anyway, given they don't trust
   upstream networks to do their policing for them.

   This memo describes how to emulate per-flow rate policing using bulk
   mechanisms at border routers, so the full scalability potential of
   pre-congestion notification is not limited by the need for per-flow
   policing mechanisms at borders, which would make borders the most
   cost-critical pinch-points.  Then we can achieve the long sought-for
   vision of secure Internet-wide bandwidth reservations without needing
   per-flow processing at all in core and border routers---where
   scalability is most critical.

4.  Re-ECN Protocol for an RSVP (or similar) Transport

4.1.  Protocol Overview

   First we need to recap the way routers accumulate congestion marking
   along a path.  Each ECN-capable router marks some packets with CE,
   the marking probability increasing with the length of the virtual queue at
   its egress link [PCN].  With multiple ECN-capable routers link.  The only difference with pre-congestion
   marking [PCN] is that marking is based on
   a path, the ECN field accumulates the fraction length of CE marking a virtual
   queue, so that
   each router adds.  The combined effect of the packet real queue occupancy can remain very low.  We will
   use the terms congestion and pre-congestion interchangeably in the
   following unless it is important to distinguish between them.

   With multiple ECN-capable routers on a path, the ECN field
   accumulates the fraction of CE marking that each router adds.  The
   combined effect of the packet marking of all the routers along the
   path signals congestion of the whole path to the receiver.  So, for
   example, if one router early in a path is marking 1% of packets and
   another later in a path is marking 2%, flows that pass through both
   routers will experience approximately 3% marking.

   The packets crossing an inter-domain trust boundary within the
   Diffserv region will all have come from different ingress gateways
   and will all be destined for different egress gateways.  We will show
   that the key to policing against theft of service is for a border
   router to be able to directly measure expected downstream pre-congestion the congestion that is about to
   be caused by the traffic it forwards.  That is, it can measure
   locally the congestion on each of the downstream paths between a
   border router itself
   and the egress gateways that packets are headed its traffic is destined for.

   With the original ECN protocol, if CE markings crossing the border
   had been counted over a period, they would have represented the
   accumulated upstream pre-congestion congestion that had already been experienced by
   those packets.  The general idea of re-ECN is for the ingress gateway
   to continuously encode path congestion into the IP header,
   where path header where, in
   this case, `path' means from ingress to egress gateway.  Then at any
   point on that path (e.g. between domains A & B in Figure 2 below), IP
   headers can be monitored to subtract upstream congestion from
   expected path congestion in order to give the expected downstream
   congestion still to be experienced until the egress gateway.

   Importantly, it turns out that there is no need to monitor downstream
   congestion on a per-flow basis.  We will show that accounting for it
   in bulk across all flows will be sufficient.

                  _____________________________________
                _|__    ______    ______    ______    _|__
               |    |  |  A   |  |  B   |  |  C   |  |    |
               +----+  +-+  +-+  +-+  +-+  +-+  +-+  +----+
               |    |  |B|  |B|  |B|  |B|  |B|  |B|  |    |
               |Ingr|==|R|  |R|==|R|  |R|==|R|  |R|==|Egr |
               |G/W |  | |  | |: | |  | |  | |  | |  |G/W |
               +----+  +-+  +-+: +-+  +-+  +-+  +-+  +----+
               |    |  |      |: |      |  |      |  |    |
               |____|  |______|: |______|  |______|  |____|
                 |_____________:_______________________|
                               :
                 |             :                       |
                 |<-upstream-->:<-expected downstream->|
                 | congestion  :      congestion       |
                 |     u               v ~= p - u      |
                 |                                     |
                 |<--- expected path congestion, p --->|

   Figure 2: Re-ECN concept

4.2.  Re-ECN Abstracted Network Layer Wire Protocol (IPv4 or v6)

   In this section we define the names of the various codepoints of the
   re-ECN protocol, protocol when used with pre-congestion notification, deferring
   description of their semantics to the following sections.  First  But first
   we recap the re-ECN wire protocol proposed in [Re-TCP].  It

4.2.1.  Re-ECN Recap

   Re-ECN uses the two bit ECN field broadly as in RFC3168 [RFC3168].
   It also uses a new re-ECN extension (RE) bit. flag.  The actual position
   of the RE bit flag is different between IPv4 & v6 headers so we will use
   an abstraction of the IPv4 and v6 wire protocols by just calling it
   the RE bit. flag.  [Re-TCP] proposes using bit 48 (currently unused) in
   the IPv4 header for the RE bit, flag, while for IPv6 it proposes an ECN
   extension header for IPv6. header.

   Unlike the ECN field, the RE bit flag is intended to be set by the sender
   and remain unchanged along the path, although it can be read by
   network elements that understand the re-ECN protocol.  In the
   scenario used in this memo, an the ingress gateway changes the setting of
   the RE bit, acting acts as a proxy for
   the sender, setting the RE flag as permitted in the specification of
   re-ECN.

   Although

   Note that general-purpose routers do not have to read the RE bit is a separate, single bit field, it can be read
   as an extension flag,
   only special policing elements at borders do.  And no general-purpose
   routers have to change the two-bit ECN field; RE flag, although the three concatenated bits ingress and egress
   gateways do because in what the edge-to-edge deployment model we will call are
   using, they act as proxies for the extended ECN field (EECN) make eight
   codepoints available.  When endpoints.  Therefore the RE bit setting is "don't care", we
   use the RFC3168 names of flag
   does not even have to be visible to interior routers.  So the ECN codepoints, RE flag
   has no implications on protocols like MPLS.  Congested label
   switching routers (LSRs) would have to be able to notify their
   congestion with an ECN/PCN codepoint in the MPLS shim [ECN-MPLS], but
   like any interior IP router, they can be oblivious to the RE flag,
   which need only be read by border policing functions.

   Although the RE flag is a separate, single bit field, it can be read
   as an extension to the two-bit ECN field; the three concatenated bits
   in what we will call the extended ECN field (EECN) make eight
   codepoints available.  When the RE flag setting is "don't care", we
   use the RFC3168 names of the ECN codepoints, but [Re-TCP] proposes
   the following six codepoint names for when there is a need to be more
   specific.

   +-------+------------+------+-------------+-------------------------+

   +-------+------------+------+---------------+-----------------------+
   |  ECN  | RFC3168    |  RE  | re-ECN Extended ECN  |      re-ECN     Re-ECN meaning    |
   | field | codepoint  |  bit flag | codepoint     |                       |
   +-------+------------+------+-------------+-------------------------+
   +-------+------------+------+---------------+-----------------------+
   |   00  | Not-ECT    |   0  | NRECT Not-RECT      |   Not re-ECN-capable  |
   |       |            |      |               |       transport       |
   |   00  | Not-ECT    |   1  | NF FNE           |       No feedback      Feedback not     |
   |       |            |      |               |      established      |
   |   01  | ECT(1)     |   0  | Re-Echo       |  Re-echoed congestion |
   |       |            |      |               |        and RECT       |
   |   01  | ECT(1)     |   1  | RECT          |      re-ECN     Re-ECN capable    |
   |       |            |      |               |       transport       |
   |   10  | ECT(0)     |   0  | --CU-- ---           |     Currently unused     Legacy ECN use    |
   |       |            |      |               |        only           |
   |   10  | ECT(0)     |   1  | --CU--        |    Currently unused   |
   |       |            |      |               |                       |
   |   11  | CE         |   0  | CE(0)         |       Congestion experienced      |
   |       |            |      |               |    experienced with   |
   |       |            |      |               |        Re-Echo        |
   |   11  | CE         |   1  | CE(-1)        |       Congestion      |
   |       |            |      |               |      experienced      |
   +-------+------------+------+-------------+-------------------------+
   +-------+------------+------+---------------+-----------------------+

    Table 1: Re-cap of Default Extended ECN Codepoints Proposed for Re-
                                    ECN

4.2.2.  Re-ECN Combined with Pre-Congestion Notification (re-PCN)

   As permitted by RFC3168, [PCN] proposes new semantics for the ECN
   codepoints when combined with specification [RFC3168], a Diffserv codepoint (DSCP) that uses
   pre-congestion notification.  It also proposes various alternative
   encodings for these semantics, attempting proposal is
   currently being advanced in the IETF to fit five states into define different semantics
   for how routers might mark the
   four available ECN codepoints by making various compromises. field of certain packets.  The
   five states are Not-ECT, ECT (ECN-capable transport),
   idea is to be able to notify congestion when the ECN Nonce,
   Admission Marking (AM) and Pre-emption Marking (PM).

   One router's load
   approaches a logical limit, rather than the physical limit of the five states was
   line.  This new marking is called pre-congestion notification [PCN]
   and we will use the term PCN-enabled router for a router that can
   apply pre-congestion notification marking to the ECN Nonce [RFC3540], but the
   capability we describe in this memo supercedes fields of
   packets.

   [RFC3168] recommends that a packet's Diffserv codepoint should
   determine which type of ECN marking it receives.  A Diffserv per-hop
   behaviour (PHB) can specify that routers should apply pre-congestion
   notification marking to PCN-capable packets.  We will call this a
   PCN-enhanced PHB.  A PCN-capable packet must meet two conditions, it
   must carry a DSCP that maps to a PCN-enhanced PHB and it must carry
   an ECN field that turns on PCN marking.

   As an example, the controlled load (CL) PHB might specify expedited
   forwarding as its scheduling behaviour and PCN marking as its
   congestion marking behaviour.  Then we would say the CL PHB is a PCN-
   enhanced PHB, and that packets with a DSCP that maps to the CL PHB
   and with ECN turned on are PCN-capable packets.

   [PCN] actually proposes that two logical limits should be used for
   pre-congestion notification, with the higher limit as a back-stop for
   dealing with anomalous events.  It envisages PCN will be used to
   admission control inelastic real-time traffic, so marking at the
   lower limit will trigger admission control, while at the higher limit
   it will trigger flow pre-emption.

   Because it needs two types of congestion marking, PCN seems to need
   five states: Not-ECT, ECT (ECN-capable transport), the ECN Nonce,
   Admission Marking (AM) and Flow Pre-emption Marking (PM).  [PCN]
   proposes various alternative encodings of the ECN field, attempting
   various compromises to fit these five states into the four available
   ECN codepoints.

   One of the five states to make room for is the ECN Nonce [RFC3540],
   but the capability we describe in this memo supersedes any need for
   the Nonce.  The ECN Nonce is an elegant scheme, but it only allows a
   sending node (or its proxy) to detect suppression of congestion
   marking by a cheating receiver. in the feedback loop.  Thus the Nonce requires the sender or
   its proxy to be trusted to respond correctly to congestion.  But this
   is precisely the main cheat we want to protect against (as well as
   many others).

   One of the compromises compromise protocol encodings that [PCN] explores
   ("Alternative 5") leaves out support for the ECN Nonce.  Therefore we
   use that one.  Then,
   with the addition  This encoding of PCN markings is shown on the RE bit, the 8 encodings left of
   Table 2.  Note that these codepoints of the extended ECN
   (EECN) field become those defined in the table below.  Note that
   these codepoints only take on
   the semantics in the table below when of pre-congestion noticiation if they are combined with
   a Diffserv codepoint that the operator has configured to cause PCN
   marking, by mapping it to a PCN-enhanced PHB.

   For the rest of this memo, we will not distinguish between Admission
   Marking and Pre-emption Marking unless we need to be specific.  We
   will call both "congestion marking".  With the above encoding,
   congestion marking can be read to mean any packet with the left-most
   bit of the ECN field set.

   The re-ECN protocol can be used to control misbehaving sources
   whether congestion is with respect to a logical threshold (PCN) or
   the physical line rate (ECN).  In either case the RE flag can be used
   to create an extended ECN field.  For PCN-capable packets, the 8
   possible encodings of this 3-bit extended ECN (EECN) field are
   defined as
   supporting pre-congestion notification.

   +--------+-----------+------+-------------+-------------------------+ on the right of Table 2 below.  The purposes of these
   different codepoints will be introduced in subsequent sections.

   +-------+-----------------+------+-------------+--------------------+
   |  ECN  | PCN codepoint   |  RE  | re-ECN Extended    |      re-ECN   Re-ECN meaning   |
   | field | codepoint (Alternative 5) | flag | ECN         |                    |
   |       |                 |  bit      | codepoint   |                    |
   +--------+-----------+------+-------------+-------------------------+
   +-------+-----------------+------+-------------+--------------------+
   |   00  | Not-ECT         |   0  | NRECT Not-RECT    | Not re-ECN-capable |
   |       |                 |      |             |      transport     |
   |   00  | Not-ECT         |   1  | NF FNE         |       No feedback    Feedback not    |
   |       |                 |      |             |     established    |
   |   01  | ECT(1)          |   0  | Re-Echo     |      Re-echoed congestion     |
   |       |                 |      |             |   congestion and   |
   |       |                 |      |             |        RECT        |
   |   01  | ECT(1)          |   1  | RECT        |      re-ECN   Re-ECN capable   |
   |       |                 |      |             |      transport     |
   |   10  | AM              |   0  | AM(0)       |  Admission Marking with |
   |       |                 |      |             |    with Re-Echo    |
   |   10  | AM              |   1  | AM(-1)      |  Admission Marking |
   |       |                 |      |             |                    |
   |   11  | PM              |   0  | PM(0)       |     Pre-emption Marking    |
   |       |                 |      |             |    Marking with    |
   |       |                 |      |             |       Re-Echo      |
   |   11  | PM              |   1  | PM(-1)      |     Pre-emption    |
   |       |                 |      |             |       Marking      |
   +--------+-----------+------+-------------+-------------------------+
   +-------+-----------------+------+-------------+--------------------+

   Table 2: Extended ECN Codepoints if the Diffserv codepoint uses Pre-
                       congestion Notification (PCN)

   For

4.3.  Protocol Operation

4.3.1.  Protocol Operation for an Established Flow

   The re-ECN protocol involves a simple tweak to the rest of this memo, we will not distinguish between Admission
   Marking and Pre-emption Marking (unless stated otherwise).  We will
   call both "congestion marking".  With the above encoding, congestion
   marking can be read to mean any packet with the left-most bit of the
   ECN field set.

   All but the "not re-ECN-capable transport" (NRECT) field imply the
   presence of an ECN-capable transport.  Congested PCN-capable routers
   must drop rather than mark packets carrying the NRECT codepoint.
   Note that adding PCN-capability to a router will involve checking the
   RE bit as well as the ECN field and DSCP before deciding whether to
   drop or to mark a packet during congestion.  Router implementations
   might well append the RE bit to their internal representation of the
   ECN field, treating them internally as one 3-bit extended ECN value.

4.3.  Protocol Operation

   In this section we will give an overview of the operation of the re-
   ECN protocol for an RSVP transport, deferring a detailed
   specification to the following sections.

   The re-ECN protocol involves a simple tweak to the action action of the
   gateway at the ingress edge of the CL region.  In the framework deployment
   model just described [CL-arch], [CL-deploy], for each active traffic aggregate
   across the CL region (CL-region-aggregate) the ingress gateway will
   hold a fairly recent Congestion-Level-Estimate that the egress
   gateway will have fed back to it, piggybacked on the signalling that
   sets up each flow.  For instance, one aggregate might have been
   experiencing 3% pre-
   congestion pre-congestion (that is, congestion marked octets
   whether Admission Marked or Pre-emption Marked).  In this case, the
   ingress gateway MUST clear the RE bit flag to "0" for the same percentage
   of octets of CL-
   packets CL-packets (3%) and set it to "1" in the rest (97%).
   Appendix A.1 gives a simple pseudo-code algorithm that the ingress
   gateway may use to do this.

   The RE bit flag is set and cleared this way round for incremental
   deployent
   deployment reasons (see [Re-TCP]).  To avoid confusion we will use
   the term `blanking' (rather than marking) when the RE bit flag is cleared
   to "0", so we will talk of the `RE blanking fraction' as the fraction
   of octets with the RE bit flag cleared to "0".

       ^
       |
       |         RE blanking fraction
    3% |    +----------------------------+====+
       |    |                            |    |
    2% |    |                            |    |
       |    | congestion marking fraction|    |
    1% |    |     +----------------------+    |
       |    |     |                           |
    0% +----+=====+---------------------------+------>
            ^   <--A---> <---B---> <---C--->  ^        domain
            |     ^                      ^    |
        ingress   |                      |    egress
                1.00%                 2.00%          marking fraction

   Figure 3: Example Re-ECN Codepoint Extended ECN codepoint Marking fractions
   (Imprecise)

   Figure 3 illustrates our example.  The horizontal axis represents the
   index of each congestible resource (typically queues) along a path
   through the Internet.  The two superimposed plots show the fraction
   of each ECN codepoint observed along this path, assuming there are
   two congested routers somewhere within domans domains A and C. And the table Table 3
   below shows the downstream pre-congestion measured at various border
   observation points along the path.  These  Figure 4 (later) shows the same
   results of these subtractions, but in graphical form like the above
   figure.  The tabulated figures are actually reasonable approximations
   derived from more precise formulae given in Appendix A of [Re-TCP].
   The RE bit flag is not changed by interior routers, so it can be seen
   that it acts as a reference against which the congestion marking
   fraction can be compared along the path.

   +--------------------------+---------------------------------------+
   | Border observation point | Approximate Downstream pre-congestion |
   +--------------------------+---------------------------------------+
   |       ingress -- A       |              3% - 0% = 3%             |
   |          A -- B          |              3% - 1% = 2%             |
   |          B -- C          |              3% - 1% = 2%             |
   |        C -- egress       |              3% - 3% = 0%             |
   +--------------------------+---------------------------------------+

   Table 3: Downstream Congestion Measured at Example Observation Points

   Note that the ingress determines the RE blanking fraction for each
   aggregate using the most recent feedback from the relevant egress,
   arriving with each new reservation, or each refresh.  These updates
   arrive relatively infrequently compared to the speed with which
   congestion changes.  Although this feedback will always be out of
   date, on average positive errors will should cancel out negative over a
   sufficiently long duration.

   In summary, the network adds pre-congestion marking in the forward
   data path, the egress feeds its level back to the ingress in RSVP, RSVP (or
   similar signalling), then the ingress gateway re-echoes it into the
   forward data path by blanking the RE bit. flag.  Hence the name re-ECN.
   Then at any border within the Diffserv region, the pre-congestion
   marking that every passing packet will be expected to experience
   downstream can be measured to be the RE blanking fraction minus the
   congestion marking fraction.

4.4.

4.3.2.  Aggregate Bootstrap

   When a new reservation PATH message arrives at the egress, if there
   are currently no flows in progress from the same ingress, there will
   be no state maintaining the current level of pre-congestion marking
   for the aggregate.  While the reservation signalling continues onward
   towards the receiving host, the egress gateway returns an RSVP
   message to the ingress with a flag [RSVP-ECN] asking the ingress to
   send a specified number of data probes between them.  This bootstrap
   behaviour is all described in the framework [CL-arch]. deployment model [CL-deploy].

   However, with our new re-ECN scheme, the ingress does not know what
   proportion of the data probes should have the RE bit flag blanked,
   because it has no estimate yet of pre-congestion for the path across
   the Diffserv region.

   To be conservative, following the guidance for specifying other re-
   ECN transports in [Re-TCP], the ingress SHOULD set the NF FNE codepoint
   of the extended ECN header in all probe packets (Table 2).  As per
   the framework, deployment model, the egress gateway measures the fraction of
   congestion-marked probe octets and feeds back the resulting pre-
   congestion level to the ingress, piggy-backed on the returning
   reservation response (RESV) for the new flow.  Probe packets are
   identifiable by the egress because they have the ingress as the
   source and the egress as the destination in the IP header.

   It may seem inadvisable to expect the NF FNE codepoint to be set on
   probes, given legacy firewalls etc. might discard such packets
   (because this flag had no prevous previous legitimate use).  However, in the
   deployment scenarios envisaged for this admission control framework, envisaged, each domain in the Diffserv region
   has to be explicitly configured to support the controlled load
   service.  So, before deploying the service, the operator MUST
   reconfigure such a misbehaving middlebox to allow through packets
   with the RE bit flag set.

   Note that we have said SHOULD rather than MUST for the NF FNE setting
   behaviour of the ingress for probe packets.  This entertains the
   possibility of an ingress implementation having the benefit of other
   knowledge of the path, which it re-uses for a newly starting
   aggregate.  For instance, it may hold cached information from a
   recent use of the aggregate that is still sufficiently current to be
   useful.

   It might seem pedantic worrying about these few probe packets, but
   this behaviour ensures the system is safe, even if the proportion of
   probe packets becomes large.

4.5.

4.3.3.  Flow Bootstrap

   It might be expected that a new flow within an active aggregate would
   need no special bootstrap behaviour.  If there was an aggregate
   already in progress between the gateways the new flow was about to
   use, it would inherit the prevailing RE blanking fraction.  And if
   there were no active aggregate, the aggregate bootstrap behaviour for an
   aggregate would be appropriate and sufficient for the new flow.

   However, for a number of reasons, at least the first packet of each
   new flow SHOULD be set to the NF FNE codepoint, irrespective of whether
   it is joining an active aggregate or not.  If the first packet is
   unlikely to be reliably delivered, a number of NF FNE packets MAY be
   sent to increase the probability that at least one is delivered to
   the egress gateway.

   If each flow does not start with an NF FNE packet, it will be seen later
   that sanctions may be incorrectly applied too strict at the interface before the egress
   gateway.  It will often be possible to apply sanctions at the
   granularity of aggregates rather than flows, but in an internetworked
   environment it cannot be guaranteed that aggregates will be
   identifiable in remote networks.  So setting NF FNE at the start of each
   flow is a safe strategy.  For instance, a remote network may have
   equal cost multi-path (ECMP) routing enabled, causing different flows
   between the same gateways to traverse different paths.

   After an idle period of more than 1 second, the ingress gateway
   SHOULD set the EECN field of the next packet it sends to NF. FNE.  This
   REQUIREMENT
   allows the design of network policers to be
   deterministic.

   If deterministic (see [Re-
   TCP]).

   However, if the ingress gateway can guarantee that the network(s)
   that will carry the flow to its egress gateway all use a common
   identifier for the aggregate (e.g. a single MPLS network without ECMP
   routing), it MAY NOT set NF FNE when it adds a new flow to an active aggregate and
   aggregate.  And an
   NF FNE packet need only be sent if a whole aggregate
   has been idle for more than 1 second.

5.  Emulating Border Policing with Re-ECN

   Note: In the rest of this memo, where the context makes it clear, we
   will loosely use the term 'congestion' rather than using

4.3.4.  Router Forwarding Behaviour

   Adding re-ECN works well without modifying the stricter
   'downstream pre-congestion'.  Also we will loosely talk forwarding behaviour
   of positive
   or negative traffic, meaning traffic where the moving average of the
   downstream pre-congestion metric is persistently positive or negative
   respectively.

   The notion of positive and negative downstream pre-congestion is
   because downstream pre-congestion is calculated by subtracting the
   congestion marking fraction from the RE blanking fraction.  Therefore any routers.  However, below, two changes are proposed when
   forwarding packets can be considered to have with a 'value multiplier' of +1, 0 or
   -1.  Blanking the RE bit increments the 'value multiplier' of per-hop-behaviour that requires pre-
   congestion notification:

   Preferential drop: When a
   packet.  Congestion marking router cannot avoid dropping ECN-capable
      packets, preferential dropping of packets with different extended
      ECN codepoints SHOULD be implemented between packets within a packet decrements 'the value
   multiplier' (whether admission marking or pre-emption marking).  Both
   together cancel each other out (a neutral or zero 'value-
   multiplier'). PHB
      that uses PCN marking.  The NF codepoint drop preference order to use is an exception.  It has
      defined in Table 4.  Note that to reduce configuration complexity,
      Re-Echo and FNE MAY be given the same
   positive 'value multiplier' as a re-echoed packet.  The table below
   specifies unambiguously the value multipliers of each extended ECN
   codepoint.

   +-------+------+-------------+--------------+-----------------------+ drop preference, but if
      feasible, FNE should be dropped in preference to Re-Echo.

   +--------+------+----------------+---------+------------------------+
   |   ECN  |  RE  | re-ECN Extended ECN   | 'Value Drop    |     re-ECN     Re-ECN meaning     |
   |  field |  bit flag | codepoint      | multiplier' Pref    |                        |
   +-------+------+-------------+--------------+-----------------------+
   +--------+------+----------------+---------+------------------------+
   |   00   01   |   0  | NRECT Re-Echo        | n/a 5/4     |   Not re-ECN-capable  Re-echoed congestion  |
   |        |      |                |         |       transport        and RECT        |
   |   00   |   1  | NF          | +1           |      No feedback      |
   |   01  |   0  | Re-Echo FNE            | +1 4       |  Re-echoed congestion      Feedback not      |
   |        |      |                |         |        and RECT       established      |
   |   01   |   1  | RECT           | 0 3       |     re-ECN     Re-ECN capable     |
   |        |      |                |         |        transport       |
   |   10   |   0  | AM(0)          | 0 3       | Admission Marking with |
   |        |      |                |         |      with         Re-Echo        |
   |   10   |   1  | AM(-1)         | -1 3       |    Admission Marking   |
   |        |      |                |         |                        |
   |   11   |   0  | PM(0)          | 0 2       |   Pre-emption Marking  |
   |        |      |                |         |      with Re-Echo      |
   |   11   |   1  | PM(-1)         | -1 2       |   Pre-emption Marking  |
   +-------+------+-------------+--------------+-----------------------+
   |        |      |                |         |                        |
   |   00   |   0  | Not-RECT       | 1       |   Not re-ECN-capable   |
   |        |      |                |         |        transport       |
   +--------+------+----------------+---------+------------------------+

      Table 4: 'Sign' Drop Preference of Extended ECN Codepoints

   Just (1 = drop 1st)

      Given this proposal is being advanced at the same time as PCN
      itself, we will loosely talk of positive and negative traffic when strongly RECOMMEND that preferential drop based on
      extended ECN codepoint is added to router forwarding at the same
      time as PCN marking.  Preferential dropping can be difficult to
      implement, but we
   mean strongly RECOMMEND this security-related re-ECN
      improvement where feasible as it is an effective defence against
      flooding attacks.

   Marking vs. Drop: We propose that PCN-routers SHOULD inspect the level of downstream pre-congestion in RE
      flag as well as the ECN field to decide whether to drop or mark
      PCN DSCPs.  They MUST choose drop if the stream codepoint of traffic,
   we will also talk this
      extended ECN field is Not-RECT.  Otherwise they SHOULD mark
      (unless, of positive or negative packets, meaning whether course, buffer space is exhausted).

      A PCN-capable router MUST NOT ever congestion mark a packet contributes positively or negatively to downstream pre-
   congestion.

5.1.  Policing Overview

   To emulate border policing,
      carrying the general idea Not-RECT codepoint because the transport will only
      understand drop, not congestion marking.  But a PCN-capable router
      can mark rather than drop an FNE packet, even though its ECN field
      when looked at in isolation is for each domain to
   apply financial penalties '00' which appears to be a legacy
      Not-ECT packet.  Therefore, if a packet's RE flag is '1', even if
      its upstream neighbour ECN field is '00', a PCN-enabled router SHOULD use congestion
      marking.  This allows the `feedback not established' (FNE)
      codepoint to be used for probe packets, in proportion order to
   the amount pick up PCN
      marking when bootstrapping an aggregate.

      ECN marking rather than dropping of downstream pre-congestion FNE packets MUST only be
      deployed in controlled environments, such as that the upstream network
   sends across the border.  This seems to encourage everyone to
   understate downstream pre-congestion to reduce the penalties they
   incur.  But it is in the last domain's interest to create a balancing
   upward pressure by applying sanctions to flows [CL-deploy],
      where the presence of an egress node that understands ECN marking
   fraction goes negative before
      is assured.  Congestion events might otherwise be ignored if the egress gateway.

   Of course, some domains may trust other domains to comply without
   applying sanctions or penalties.  In these cases,
      receiver only understands drop, rather than ECN marking.  This is
      because there is no penalties need
   be applied.  The re-ECN protocol ensures downstream pre-congestion
   marking guarantee that ECN capability has been
      negotiated if feedback is passed on correctly whether or not penalties are applied
   to it, so established (FNE).  Also, [Re-TCP]
      places the system works just as well with strong condition that a mixture of some
   domains trusting each other and others not.

   Figure 4 uses the same example router MUST apply drop rather
      than marking to FNE packets unless it can guarantee that FNE
      packets are rate limited either locally or upstream.

4.3.5.  Extensions

   If a different signalling system, such as NSIS, were used, but it
   provided admission control in previous sections to show the
   downstream pre-congestion marking fraction, v, across a path through
   the Internet.  Downward arrows show the pressure for each domain to
   underdeclare downstream similar way, using pre-congestion in traffic they pass
   notification (e.g. with RMD [NSIS-RMD]) we believe re-ECN could be
   used to protect against misbehaving networks in the
   next domain, because same way as
   proposed above.

5.  Emulating Border Policing with Re-ECN

5.1.  Informal Terminology

   In the rest of this memo, where the penalties.  Note that at context makes it clear, we will
   sometimes loosely use the last egress term `congestion' rather than using the
   stricter `downstream pre-congestion'.  Also we will loosely talk of
   positive or negative flows, meaning flows where the Diffserv region, domain C should not agree to pay any
   penalties to moving average of
   the egress gateway for downstream pre-congestion passed to metric is persistently positive or
   negative.  The notion of a negative metric arises because it is
   derived by subtracting one metric from another.  Of course actual
   downstream congestion cannot be negative, only the
   egress gateway.  Downstream pre-congestion metric can
   (whether due to time lags or deliberate malice).

   Just as we will loosely talk of positive and negative flows, we will
   also talk of positive or negative packets, meaning packets that
   contribute positively or negatively to downstream pre-congestion.

   Therefore packets can be considered to the egress gateway
   should have reached zero here, so if domain C agreed a `worth' of +1, 0 or -1,
   which, when multiplied by their size, indicates their contribution to pay for any
   downstream pre-congestion, it would give congestion.  Packets will usually be sent with a worth of
   0.  Blanking the egress gateway an
   incentive to overdeclare pre-congestion feedback and take RE flag increments the
   resulting profit from domain C.

   Providers should be free worth of a packet to agree +1.
   Congestion marking a packet decrements its worth (whether admission
   marking or pre-emption marking).  Congestion marking a previously
   blanked packet cancel out the contractual terms they wish
   between themselves, so this memo does not propose to standardise how
   these penalties would be applied.  It positive and negative worth of each
   marking (a worth of 0).  The FNE codepoint is sufficient to standardise an exception.  It has
   the re-ECN protocol so same positive worth as a packet with the downstream pre-congestion metric Re-Echo codepoint.  The
   table below specifies unambiguously the worth of each extended ECN
   codepoint.  Note the order is
   available if providers choose different from the previous table to use it.  However, Section 5.2 gives
   some examples of
   emphasise how these penalties might be implemented.

               p e n a l t i e s
              / congestion marking processes decrement the worth.

   +--------+------+------------------+-------+------------------------+
   |        \
       A     :         :         :   ECN  |  RE  |  <--A---> <---B---> <---C--->           domain Extended ECN     |     V         :         :         :
    3% Worth |    +-----+     Re-ECN meaning     |
   |         :  field | flag | codepoint        |    V         V         :
    2%       |                        |     +----------------------+ :
   +--------+------+------------------+-------+------------------------+
   |   00   |  downstream pre-congestion   0  | :
    1% Not-RECT         | n/a   |     :   Not re-ECN-capable   |
   |        |      |                  |       |        transport       |
   |   01   |   0  | Re-Echo          | +1    |  Re-echoed congestion  | :
   |        |     :      | :
    0% +----+----------------------------+====+------>
            :     :                      : A  :
            :     :                      :                  |  :
        ingress   :                      : :  egress
                1.00%                 2.00%:         pre-congestion       |
                                       sanctions

   Figure 4: Policing Framework, showing creation of opposing pressures
   to underdeclare and overdeclare downstream pre-congestion, using
   penalties        and sanctions

   Any traffic that persistently goes negative by the time it leaves a
   domain must RECT        |
   |   10   |   0  | AM(0)            | 0     | Admission Marking with |
   |        |      |                  |       |         Re-Echo        |
   |   11   |   0  | PM(0)            | 0     |   Pre-emption Marking  |
   |        |      |                  |       |      with Re-Echo      |
   |   00   |   1  | FNE              | +1    |      Feedback not have been marked correctly in the first place.  A
   domain      |
   |        |      |                  |       |       established      |
   |   01   |   1  | RECT             | 0     |     Re-ECN capable     |
   |        |      |                  |       |        transport       |
   |   10   |   1  | AM(-1)           | -1    |    Admission Marking   |
   |        |      |                  |       |                        |
   |   11   |   1  | PM(-1)           | -1    |   Pre-emption Marking  |
   +--------+------+------------------+-------+------------------------+

                Table 5: 'Worth' of Extended ECN Codepoints

5.2.  Policing Overview

   It will be recalled that discovers such traffic downstream congestion can adopt a range of strategies to
   protect itself.  Which strategy it uses will depend on policy,
   because it cannot immediately assume malice---there may be an
   innocent configuration error somewhere in found by
   subtracting upstream congestion from path congestion.  Figure 4
   displays the system.  So this memo
   also does not propose difference between the two plots in Figure 3 to standardise any particular mechanism, but
   Section 5.4 does give examples of how show
   downstream pre-congestion across the underlying re-ECN protocol
   could be used same path through the Internet.

   To emulate border policing, the general idea is for each domain to
   apply sanctions to persistently negative traffic.
   The ultimate sanction would be to drop such negative traffic
   indiscriminately, without regard to flows.  A less drastic sanction
   might be penalties to focus drop on specific packets its upstream neighbour in specific flows proportion to
   remove the negative bias while doing minimal harm.

   In all cases a management alarm SHOULD be raised on detecting
   persistently negative traffic and any automatic sanctions taken
   SHOULD be logged.  Even if amount
   of downstream pre-congestion that the chosen policy is to take no automatic
   action, upstream network sends across
   the cause can then be investigated manually.

   The incentive for domains not to tolerate negatively marked traffic
   depends on financial penalties never being negative. border.  That is, any
   level of negative marking only equates to zero penalty.  In other
   words, the penalties are always paid should be in proportion to the same direction as
   height of the data,
   and never against plot.  Downward arrows in the data flow.  This is consistent with figure show the
   definition resulting
   pressure for each domain to under-declare downstream pre-congestion
   in traffic they pass to the next domain, because of physical congestion; when the penalties.

               p e n a resource is underutilised,
   it is not negatively congested, its congestion is just zero.  So,
   although short periods l t i e s
              /        |        \
       A     :         :         :
       |     |  <--A---> <---B---> <---C--->           domain
       |     V         :         :         :
    3% |    +-----+    |         |         :
       |    |     |    V         V         :
    2% |    |     +----------------------+ :
       |    |  downstream pre-congestion | :
    1% |    |     :                      | :
       |    |     :                      | :
    0% +----+----------------------------+====+------>
            :     :                      : A  :
            :     :                      : |  :
        ingress   :                      : :  egress
                1.00%                 2.00%:         pre-congestion
                                           |
                                       sanctions

   Figure 4: Policing Framework, showing creation of negative marking can be tolerated opposing pressures
   to
   correct temporary overdeclarations due under-declare and over-declare downstream pre-congestion, using
   penalties and sanctions

   These penalties seem to lags encourage everyone to understate downstream
   congestion in order to reduce the feedback
   system, persistent penalties they incur.  But a
   balancing pressure is introduced by the last domain, which applies
   sanctions to flows if downstream negative congestion can have no
   physical meaning and therefore must signify a problem. goes negative before the
   egress gateway.  The upward arrow at the egress of domain C at its Domain C's border with the
   egress gateway in Figure 4 represents this the incentive not the sanctions would create to allow
   prevent negative traffic.  But the  The same upward pressure applies can be applied at every
   any domain border (arrows not shown).

   With

   Any flow that persistently goes negative by the above penalty system, each time it leaves a
   domain seems to must not have been marked correctly in the first place.  A
   domain that discovers such a perverse
   incentive flow can adopt a range of strategies to fake pre-congestion.  For instance domain B's profit
   depends on the difference between pre-congestion at its ingress (its
   revenue) and at its egress (its cost).  So if B overstates internal
   pre-congestion
   protect itself.  Which strategy it seems to increase its profit.  However, we can uses will depend on policy,
   because it cannot immediately assume that domain A could bypass B, routing through other domains to
   reach the egress.  So the competitive discipline of least-cost
   routing can ensure that any domain tempted to fake pre-congestion for
   profit risks losing all its usage revenue.  The least congested route
   would eventually malice---there may be able to win this competitive game, only as long
   as it didn't declare more fake pre-congestion than an
   innocent configuration error somewhere in the next most
   competitive route.

   Again, this system.

   This memo does need not propose to standardise any particular mechanism
   for routing based on re-ECN. to
   detect persistently negative flows, but Section 5.5 explains why no new
   standards would does give
   examples.  Note that we have used the term flow, but there will be needed no
   need to bury into the transport layer for congestion routing as long as re-ECN
   marking had been standardised.  That section port numbers; identifiers
   visible in the network layer will be sufficient (IP address pair,
   DSCP, protocol ID).  The appendix also points gives a mechanism to papers
   concerning optimising routing in bound the presence of usage charging.

5.2.  Pre-requisite Contractual Arrangements
   required flow state, preventing state exhaustion attacks.

   Of course, some domains may trust other domains to comply with
   admission control without applying sanctions or penalties.  In these
   cases, the protocol should still be used but no penalties need be
   applied.  The re-ECN protocol has been chosen to solve the policing problem
   because it embeds a ensures downstream pre-congestion metric in passing CL
   traffic that
   marking is difficult to lie about and can passed on correctly whether or not penalties are applied
   to it, so the system works just as well with a mixture of some
   domains trusting each other and others not.

   Providers should be free to agree the contractual terms they wish
   between themselves, so this memo does not propose to standardise how
   these penalties would be applied.  It is sufficient to standardise
   the re-ECN protocol so the downstream pre-congestion metric is
   available if providers choose to use it.  However, the next section
   (Section 5.3) gives some examples of how these penalties might be
   implemented.

5.3.  Pre-requisite Contractual Arrangements

   The re-ECN protocol has been chosen to solve the policing problem
   because it embeds a downstream pre-congestion metric in passing CL
   traffic that is difficult to lie about and can be measured in bulk.
   The ability to emulate border policing depends on network operators
   choosing to use this metric as one of the elements in their contracts
   with each other.

   Already many inter-domain agreements involve a capacity and a usage
   element.  The usage element may be based on volume or various
   measures of peak demand.  We expect that those network operators that who
   choose to use pre-congestion notification for admission control would
   also be willing to consider using this downstream pre-congestion
   metric as a usage element in their interconnection contracts for
   admission controlled (CL) traffic.

   Appendix A.2

   Congestion (or pre-congestion) has the dimension of [octet], being
   the product of volume transferred [octet] and the congestion fraction
   [dimensionless], which is the fraction of the offered load that the
   network isn't able to serve (or would rather not serve in the case of
   pre-congestion).  Measuring downstream congestion gives a suggested algorithm measure of
   the volume transferred but modulated by congestion expected
   downstream.  So volume transferred during off-peak periods counts as
   nearly nothing, while volume transferred at peak times counts very
   highly.  The re-ECN protocol allows one network to measure how much
   pre-congestion has been `dumped' into it by another network.  And
   then in turn how much of that pre-congestion it dumped into the next
   downstream network.

   Section 5.6 describes mechanisms for calculating border penalties
   referring to Appendix A.2 for suggested metering algorithms for
   downstream congestion at a border router.  It  Conceptually, it could
   hardly be simpler.  It broadly involves accumulating the volume of
   packets with the RE bit flag blanked and the volume of those with
   congestion marking and then subtracting the two.  In order to discard a persistent negative balance (see above),
   time is slotted into periods of say 10secs (or a time sufficient for
   a few rounds of feedback depending on the level of aggregation).
   Every timeslot, a positive balance between the two counters is
   accumulated into a long-term counter and reset.  Whereas, if the
   balance during any timeslot is negative, it is discarded and a
   management alarm SHOULD also be raised.  Over an accounting period
   (say a month) the single metric in the long term counter represents
   all the downstream congestion caused by traffic passing the border
   meter.

   Congestion has the dimension of [byte], being the product of volume
   transferred [byte] and percentage pre-congestion [dimensionless].
   The above algorithm effectively gives a measure of the volume
   transferred, but modulated by pre-congestion expected downstream.  So
   volume transferred during off-peak periods counts as nearly nothing,
   while volume transferred at peak times counts very highly.  The re-
   ECN protocol allows one network to measure how much pre-congestion
   has been 'dumped' into it by another network.  And then in turn how
   much of that pre-congestion it dumped into the next downstream
   network.

   Once this downstream pre-congestion metric is available, operators
   are free to choose how they incorporate it into their interconnection
   contracts&nbsp[IXQoS].
   contracts [IXQoS].  Some may include a threshold volume of pre-
   congestion as a quality measure in their service level agreement,
   perhaps with a penalty clause if the upstream network exceeds this
   threshold over, say, a month.  Others may agree a set of tiered
   monthly thresholds, with increasing penalties as each threshold is
   exceeded.  But, it would be just as easy easy, and more precise resistant to
   gaming, to do away with discrete thresholds, and instead make the
   penalty rise smoothly with the volume of pre-congestion by applying a
   price to pre-
   congestion pre-congestion itself.  Then the usage element of the
   interconnection contract would directly relate to the volume of pre-congestion pre-
   congestion caused by the upstream network.

   The direction of penalties and charges relative to the direction of
   traffic flow is a constant source of confusion.  Typically, where
   capacity charges are concerned, lower tier customer networks pay
   higher tier provider networks.  So money flows from the edges to the
   middle of the internetwork where there is internetwork, towards greater
   connectivity. connectivity,
   irrespective of the flow of data.  But we advise that penalties or
   charges for usage normally should follow the same direction as the data
   flow---the direction of control at the network layer.  Otherwise a
   network lays itself open to `denial of funds' attacks.  So, where a
   tier 2 provider sends data into a tier 3 customer network, we would
   expect the penalty clauses for sending too much pre-congestion to be
   against the tier 3 2 network, even though it is the provider.

   The relative direction of penalties and charges is a constant source
   of confusion.

   It may help to remember that data will be flowing in the other
   direction too.  So the provider network has as much opportunity to
   levy usage penalties as its customer, and it can set the price or
   strength of its own penalties higher if it chooses.  Usage charges in
   both directions tend to cancel each other out, which confirms that
   usage-charging is less to do with revenue raising and more to do with
   encouraging load control discipline in order to smooth peaks and
   troughs, improving utilisation and quality.

   To focus the discussion, from now on, unless otherwise stated, we
   will assume a downstream network charges its upstream neighbour in
   proportion to

   Further, when operators agree penalties in their interconnection
   contracts for sending downstream congestion, they should make sure
   that any level of negative marking only equates to zero penalty.  In
   other words, penalties are always paid in the pre-congestion it sends, B_v, using same direction as the notation of
   Appendix A.2.  If they previously agreed
   data, and never against the (fixed) price per byte
   of pre-congestion would data flow, even if downstream congestion
   seems to be L, then the bill at negative.  This is consistent with the end definition of the month
   will simply
   physical congestion; when a resource is underutilised, it is not
   negatively congested.  Its congestion is just zero.  So, although
   short periods of negative marking can be tolerated to correct
   temporary over-declarations due to lags in the product L.B_v, plus any fixed charges they may
   also feedback system,
   persistent downstream negative congestion can have agreed.

   We are well aware that the IETF tries no physical
   meaning and therefore must signify a problem.  The incentive for
   domains not to avoid standardising
   technology that tolerate persistently negative traffic depends on a particular business model.  But our aim
   is merely to show that border policing can at least work with this
   one model, then we can assume
   principle that operators might experiment with penalties must never be paid against the metric in other models.  Effectively tiered thresholds are just
   more coarse-grained approximations data flow.

   Also note that at the last egress of the fine-grained case we choose Diffserv region, domain C
   should not agree to examine.  Of course, operators are free pay any penalties to complement this the egress gateway for pre-
   congestion-based usage element of their charges with traditional
   capacity charging, and we expect they will.

5.3.  Emulation of Per-Flow Rate Policing: Rationale and Limits

   The important feature of charging in proportion to
   congestion volume
   is that passed to the penalty aggregates and deaggregates correctly along with
   packet flows.  This is because egress gateway.  Downstream pre-congestion
   to the penalty rises linearly with bit
   rate and linearly with congestion, because egress gateway should have reached zero here.  If domain C
   were to agree to pay for any remaining downstream pre-congestion, it is the product of them
   both.  So if the packets crossing a border consist of a thousand
   flows, and one of those flows doubles its rate,
   would give the ingress egress gateway
   forwarding that flow will have an incentive to put twice as much over-declare pre-
   congestion
   marking into feedback and take the packets of that flow.  And this extra congestion
   marking will add proportionately to resulting profit from domain C.

   To focus the discussion, from now on, unless otherwise stated, we
   will assume a downstream network charges levied at every
   border the flow crosses its upstream neighbour in
   proportion to the amount of pre-congestion
   remaining on it sends (V_b in the path.

   As importantly, pre-congestion itself rises super-linearly with
   utilisation notation of a particular resource.  So if someone tries to push
   another flow into a path that is already signalling enough pre-
   congestion to warrant admission control, the penalty will be a lot
   greater than it
   Appendix A.2).  Effectively tiered thresholds would have been to add the same flow to a less
   congested path.  So, be just more
   coarse-grained approximations of the system as a whole is fairly insensitive fine-grained case we choose to
   examine.  If these neighbours had previously agreed that the actual level (fixed)
   price per octet of pre-congestion that each ingress chooses for
   triggering admission control.  The deterrent against exceeding
   whatever threshold is chosen rises very quickly with a small amount
   of cheating.

   These are would be L, then the properties that allow re-ECN to emulate per-flow border
   policing of both rate and admission control.  When a whole inter-
   network is operating bill at normal (typically very low) congestion, the
   pre-congestion marking from virtual queues will
   end of the month would simply be a little higher---
   still low, but more noticeable.  But this does not imply that usage
   /charges/ must the product L*V_b, plus any fixed
   charges they may also be low.  That have agreed.

   We are well aware that the IETF tries to avoid standardising
   technology that depends on a particular business model.  Indeed, this
   principle is at the /price/ L.

   For instance, combining capacity and volume charges heart of all our own work.  Our aim here is quite to
   make a common
   feature of interconnection agreements in today's Internet,
   particularly since p2p file-sharing became popular.  Imagine new metric available that the
   monthly payment between two networks we believe is made up of a volume charge
   and a capacity charge, and they usually turn out superior to be in a ratio of
   about 1:2 (not atypical).  If charging for volume were replaced with
   charging for congested volume, one would expect the price of
   congestion all
   existing metrics.  Then, our aim is to be arranged so show that border policing can
   at least work with the total charge for usage remained
   about the same---still about one third of the total settlement.
   Because that is obviously the charge model we have just outlined.  We assume
   that operators might then experiment with the market has found is
   necessary metric in other models.
   Of course, operators are free to push back against usage.  So, if an average pre-
   congestion fraction turned out to be 0.1%, one would complement this pre-congestion-based
   usage element of their charges with traditional capacity charging,
   and we expect they will.

   Also note well that everything we discuss in this memo only concerns
   interconnection within the
   price L per byte Diffserv region.  ISPs are free to sell or
   give away reservations however they want on the retail market.  But
   of pre-congestion would be about 1000 times course, interconnection charges will have a bearing on that.
   Indeed, in the
   previously used per byte price for volume (before present scenario, the ingress gateway effectively
   sells reservations on one side and buys congestion metrics
   were available).

   From penalties on the above example it
   other.  As congestion rises, one can be seen why operators will become
   acutely sensitive to imagine the gateway discovering
   that congestion they cause in other networks,
   which is of course the desired effect to encourage networks to
   /control/ penalties have risen higher than the congestion they allow their users to cause to others.

   Effectively, usage charges (probably fixed)
   revenue it will continuously flow earn from ingress
   gateways to selling the places where there is mild pre-congestion, in
   proportion to next flow reservation.  This
   encourages the data rates from those gateways and gateway to the path pre-
   congestion.

   If anyone sends even one cut its losses by blocking new calls, which
   is why we believe downstream congestion penalties can emulate per-
   flow rate policing at higher rate, they will immediately
   have borders, as the next section explains.

5.4.  Emulation of Per-Flow Rate Policing: Rationale and Limits

   The important feature of charging in proportion to pay proportionately more usage charges.  Because there congestion volume
   is no
   knowledge of reservations within that the Diffserv region, no interior
   router can police whether penalty aggregates and disaggregates correctly along with
   packet flows.  This is because the penalty rises linearly with bit
   rate of each flow (unless congestion is absolutely zero) and linearly with
   congestion, because it is greater than each
   reservation.  So the system doesn't truly emulate rate-policing product of
   each flow.  But there is no incentive to pack a higher rate into a
   reservation, because them both.  So if the charges are directly proportional
   packets crossing a border belong to rate,
   irrespective a thousand flows, and one of
   those flows doubles its rate, the reservation.

   However, if virtual queues start to fill on any path, even though
   real queues ingress gateway forwarding that
   flow will still be able have to provide low latency service, pre- put twice as much congestion marking into the
   packets of that flow.  And this extra congestion marking will rise fairly quickly.  It may eventually reach add
   proportionately to the threshold where penalties levied at every border the ingress gateway would deny admission flow
   crosses in proportion to new
   flows.  If the amount of pre-congestion remaining on
   the path.

   Effectively, usage charges will continuously flow from ingress gateway cheats
   gateways to the places generating pre-congestion marking, in
   proportion to the pre-congestion marking introduced and continues to admit new
   flows, the affected virtual queues will rapidly fill, even though data
   rates from those gateways.

   As importantly, pre-congestion itself rises super-linearly with
   utilisation of a particular resource.  So if someone tries to push
   another flow into a path that is already signalling enough pre-
   congestion to warrant admission control, the
   real queues penalty will still be little worse a lot
   greater than they were when admission
   control should it would have been invoked.  The ingress gateway will have to
   pay the penalty for such an extremely high pre-congestion level, so add the pressure same flow to invoke admission a less
   congested path.  This makes the incentive system fairly insensitive
   to the actual level of pre-congestion for triggering admission
   control should become unbearable. that each ingress chooses.  The above mechanisms protect deterrent against rational operators.  In
   Section 5.6 we discuss how networks can protect themselves from
   accidental or deliberate misconfiguration in neighbouring networks.

5.4.  Policing Dishonest Marking

   As CL traffic leaves the last network before the egress gateway
   (domain C) the RE blanking fraction should match the congestion
   marking fraction, when averaged over a sufficiently long duration
   (perhaps ~10s to allow exceeding
   whatever threshold is chosen rises very quickly with a few rounds small amount
   of feedback through regular
   signalling cheating.

   These are the properties that allow re-ECN to emulate per-flow border
   policing of new both rate and refreshed reservations).

   If domain C doesn't trust the networks around admission control.  It is not a perfect
   emulation of per-flow border policing, but we claim it is sufficient
   to behave honestly,
   it should install a monitor at its egress.  This monitor aims least ensure the cost to
   detect flows others of CL packets that a cheat is borne by the
   cheater, because the penalties are persistently negative. at least proportionate to the
   level of the cheat.  If flows
   are positive, domain C need take no action---this simply means an
   upstream edge network must be paying more penalties than it needs to.
   Appendix A.3 gives operator is selling
   reservations at a suggested algorithm for the monitor.

   Note that large profit over the monitor operates on flows but we would like it congestion cost, these pre-
   congestion penalties will not to
   require per-flow state.  This is why we have been careful be sufficient to ensure
   that all flows MUST start with a packet marked with networks in the NF codepoint.
   If
   middle get a flow does not start share of those profits, but at least they can cover
   their costs.

   We will now explain with the NF codepoint, an example.  When a monitor whole inter-network is likely
   to treat it unfavourably.  This incentivises setting of
   operating at normal (typically very low) congestion, the NF
   codepoint.

   This also means that a monitor pre-
   congestion marking from virtual queues will be resistant to state exhaustion
   attacks from other networks, a little higher than
   if the real queues had been used---still low, but more noticeable.
   But low congestion levels do not imply that usage /charges/ must also
   be low.  Usage charges will depend on the /price/ L as well.

   If the monitor never creates state
   unless an NF packet arrives.  And metric of the usage element of an NF packet counts positive, so it
   will cost a lot for a network interconnection agreement
   was changed from pure volume to send many pre-congested volume, one would
   expect the price of them.

   Monitor algorithms will often maintain pre-congestion to be arranged so that the total
   usage charge remained about the same.  So, if an average pre-
   congestion fraction of RE
   blanked packets across flows.  When maintaining an average across
   flows, a monitor MUST ignore packets with the NF codepoint set.  An
   ingress gateway sets the NF codepoint when it does not have turned out to be 1/1000, one would expect that
   the
   benefit price L (per octet) of feedback from the ingress.  So counting packets with FE
   cleared pre-congestion would be likely to make about 1000 times
   the average unnecessarily positive,
   providing headroom (or should we say footroom?) previously used (per octet) price for dishonest
   (negative) traffic.

   If the monitor detects volume.  We should add that
   a persistently negative flow, it could drop
   sufficient negative and neutral packets to force the flow switch to not be
   negative.  This pre-congestion is unlikely to exactly maintain the approach taken for the 'egress dropper' in
   [Re-TCP], same
   overall level of usage charges, but for the scenario in this memo, where everyone would
   expect everyone else argument will be
   approximately true, because usage charge will rise to keep at least the
   level the market finds necessary to push back against usage.

   From the protocol above example it is probably more
   advisable to raise can be seen why a management alarm.  So all ingresses cannot
   understate downstream pre-congestion without getting logged.  Then 1000x higher price will
   make operators become acutely sensitive to the network operator can deal with congestion they cause
   in other networks, which is of course the offending desired effect; to
   encourage networks to /control/ the congestion they allow their users
   to cause to others.

   If any network sends even one flow at the human
   level, out of band.

5.5.  Competitive Routing

   Goldenberg et al [Smart_rtg] refers higher rate, they will
   immediately have to various commercial product and
   presents its own algorithms for moving traffic between multihomed
   routes based on pay proportionately more usage charges.  None of these systems require any
   changes to standards protocols because the choice between the
   available border gateway protocol (BGP) routes  Because
   there is based on a
   combination of local no knowledge of reservations within the charging regime and local
   measurement Diffserv region, no
   interior router can police whether the rate of traffic levels.  If, as we propose, charges or
   penalties were based on each flow is greater
   than each reservation.  So the level system doesn't truly emulate rate-
   policing of re-ECN measured in passing
   traffic, a similar optimisation could be achieved without requiring
   any changes each flow.  But there is no incentive to standard routing protocols.

   We must be clear that applying pre-congestion-based routing pack a higher
   rate into a reservation, because the charges are directly
   proportional to this
   admission control system remains an open research issue.  Traffic
   engineering based on congestion requires careful damping rate, irrespective of the reservations.

   However, if virtual queues start to avoid
   oscillations, and should not be attempted without adult supervision
   :) Mortier & Pratt [ECN-BGP] have analysed traffic engineering based fill on congestion.  Without the benefit of re-ECN, they they had any path, even though
   real queues will still be able to add a
   path attribute provide low latency service, pre-
   congestion marking will rise fairly quickly.  It may eventually reach
   the threshold where the ingress gateway would deny admission to BGP new
   flows.  If the ingress gateway cheats and continues to advertise a route's downstream congestion
   (actually admit new
   flows, the affected virtual queues will rapidly fill, even though the
   real queues will still be little worse than they proposed that BGP were when admission
   control should advertise have been invoked.  The ingress gateway will have to
   pay the charge penalty for
   congestion, which we believe wrongly embeds such an assumption into BGP
   that congestion will be charged for).

5.6.  Fail-safes

   The mechanisms described extremely high pre-congestion level, so far create incentives for rational
   operators to behave.  That is, one operator aims
   the pressure to make another
   behave responsibly by applying penalties and expecting a rational
   response that trades off costs invoke admission control should become unbearable.

   The above mechanisms protect against benefits.  It is usually
   reasonable to assume that other network operators behave rationally
   (policy routing rational operators.  In
   Section 5.6.3 we discuss how networks can avoid those that might not).  But this approach
   does not protect against themselves from
   accidental or deliberate misconfiguration in neighbouring networks.

5.5.  Sanctioning Dishonest Marking

   As CL traffic leaves the misconfigurations and accidents of other
   operators.

   Therefore, we propose last network before the egress gateway
   (domain C) the following two similar mechanisms at a
   network's borders to provide "defence in depth":

   Highly positive flows RE blanked packets blanking fraction should be sampled and match the congestion
   marking fraction, when averaged over a
      small regular sample picked randomly as they cross sufficiently long duration
   (perhaps ~10s to allow a border
      interface.  Then subsequent packets matching the same source and
      destination address few rounds of feedback through regular
   signalling of new and DSCP refreshed reservations).

   To protect itself, domain C should be monitored.  If the RE
      blanking rate is well above install a threshold (to monitor at its egress.
   It aims to detect flows of CL packets that are persistently negative.
   If flows are positive, domain C need take no action---this simply
   means an upstream network must be determined by
      operational practice), paying more penalties than it needs
   to.  Appendix A.3 gives a management alarm suggested algorithm for the monitor,
   meeting the criteria below.

   o  It SHOULD be raised, introduce minimal false positives for honest flows;

   o  It SHOULD quickly detect and
      the flow MAY sanction dishonest flows (minimal
      false negatives);

   o  It MUST be automatically subject invulnerable to focused drop.

   Persistently negative flows congestion marked packets state exhaustion attacks from malicious
      sources.  For instance, if the dropper uses flow-state, it should
      not be
      sampled and possible for a small regular sample picked randomly as they cross a
      border interface.  Then subsequent packets matching the same source and destination address and DSCP should be monitored.  If
      the RE blanking rate minus the congestion marking rate is
      persistently negative, to send numerous packets, each with a management alarm SHOULD be raised, and
      the
      different flow MAY be automatically subject ID, to focused drop.

   Both these mechanisms rely on force the fact dropper to exhaust its memory
      capacity;

   o  It MUST introduce sufficient loss in goodput so that highly postive (or
   negative) flows will appear more quickly malicious
      sources cannot play off losses in the sample by selecting
   randomly solely from positive (or negative) packets. egress dropper against
      higher allowed throughput.  Salvatori [CLoop_pol] describes this
      attack, which involves the source understating path congestion
      then inserting forward error correction (FEC) packets to
      compensate expected losses.

   Note that there is no assumption that users behave rationally.  The
   system is protected from the vagiaries of irrational user behaviour
   by the ingress gateways, which transform internal penalties into a
   deterministic, admission control mechanism that prevents users from
   misbehaving, by directly engineered means.

6.  Analysis

   The domains in Figure 1 are not expected to be completely malicious
   towards each other.  After all, monitor operates on flows but with careful design we
   can assume avoid per-flow state.  This is why we have been careful to ensure
   that they are all co-
   operating to provide an internetworking service to flows MUST start with a packet marked with the benefit of
   each of them and their customers.  Otherwise their routing polices
   would FNE
   codepoint.  If a flow does not interconnect them in start with the first place.  However, we assume
   that they are also competitors of each other.  So FNE codepoint, a network may try
   monitor is likely to contravene our proposed protocol if treat it would gain or make a
   competitor lose, or both, but only if unfavourably.  This risk makes it can do so without being
   caught.  Therefore we do not have to consider every possible random
   attack one network could launch on worth
   setting the traffic FNE codepoint at the start of another, given
   anyway one network can always drop or corrupt packets a flow, even though there
   is a cost to setting FNE (positive `worth').

   Starting flows with an FNE packet also means that a monitor will be
   resistant to state exhaustion attacks from other networks, as the
   monitor can then be designed to never create state unless an FNE
   packet arrives.  And an FNE packet counts positive, so it
   forwards on behalf of another.

   Therefore, we only consider new opportunities will cost a
   lot for /gainful/ attack
   that our proposal introduces.  But a network to send many of them.

   Monitor algorithms will often maintain a certain extent we can also
   rely on moving average across flows
   of the in depth defences we have described (Section 5.6 )
   intended to mitigate fraction of RE blanked packets.  When maintaining an average
   across flows, a monitor MUST ignore packets with the potential impact if one network accidentally
   misconfiguring FNE codepoint
   set.  An ingress gateway sets the workings FNE codepoint when it does not have
   the benefit of this protocol.

   In feedback from the generic scenario egress.  So counting packets with
   FNE cleared would be likely to make the average unnecessarily
   positive, providing headroom (or should we introduced in Figure 1 say footroom?) for
   dishonest (negative) traffic.

   If the ingress monitor detects a persistently negative flow, it could drop
   sufficient negative and
   egress gateways are shown in neutral packets to force the most generic arrangement, without
   any surrounding network.  This allows us flow to consider more specific
   cases where these gateways and a neighbouring network are operated by
   the same player.  As well as cases where the same player operates
   neighbouring networks, we will also consider cases where not be
   negative.  This is the two
   gateways collude as one player and where approach taken for the sender and receiver
   collude as one.  Collusion of other sets of domains are less likely, `egress dropper' in
   [Re-TCP], but we will consider such cases.  In the general case, we will assume
   none of the nine trust domains across the figure fully trust any of
   the others.

   Taking for the generic scenario in Figure 1, as we only propose this memo, where everyone would
   expect everyone else to keep to change
   routers within the Diffserv region, we assume the operators of
   networks outside the region will protocol, a management alarm
   SHOULD be doing per-flow policing.  That
   is, we assume the networks outside the Diffserv region raised on detecting persistently negative traffic and any
   automatic sanctions taken SHOULD be logged.  Even if the
   gateways around its edges can protect themselves.  So our primary
   concern chosen
   policy is to be able to protect networks that don't do per-flow
   policing from those that do.  The ingress and egress gateways are the
   only way take no automatic action, the outer 'enemy' cause can get at the middle victim, so we then be
   investigated manually.

   Then all ingresses cannot understate downstream pre-congestion
   without their action being logged.  So network operators can
   consider the gateways as deal
   with offending networks at the representatives human level, out of the 'enemy' as far as
   domains A, B and C are concerned.  We will call this trust scenario
   'edges against middles'.

   Earlier band.  As a last
   resort, perhaps where the ingress gateway address seems to have been
   spoofed in this memo, we outlined the classic border rate policing
   problem (Section 3).  It will now signalling, packets can be useful dropped.  Drops could be
   focused on just sufficient packets in misbehaving flows to spell out the
   motivations that would create the lack of trust as remove the root cause
   negative bias while doing minimal harm.

   A future version of
   the problem.  The more reservations this memo may define a control message that could
   be used to notify an offending ingress gateway can allow, (possibly via the more
   revenue
   egress gateway) that it receives.  The middle networks want the edges is sending persistently negative flows.
   However, we are aware that such messages could be used to comply
   with test the admission control protocol when they become
   sensitivity of the detection system, so congested
   that currently we prefer silent
   sanctions.

   An extreme scenario would be where an ingress gateway (or set of
   gateways) mounted a DoS attack against another network.  If their service
   traffic caused sufficient congestion to others might suffer.  The middle networks also
   want lead to ensure the edges cannot steal more service from them than drop but they pay for.

   In
   understated path congestion to avoid penalties for causing high
   congestion, the context preferential drop recommendations in Section 4.3.4
   would at least ensure that these flows would always be dropped before
   honest flows..

5.6.  Border Mechanisms

5.6.1.  Border Accounting Mechanisms

   One of this 'edges aginst middles' scenario, the re-ECN
   protocol has two main effects:

   o  The more pre-congestion there is on a path across the Diffserv
      region, the higher the ingress gateway has design goals of re-ECN was for border security
   mechanisms to declare downstream
      pre-congestion v_0.

   o  because downstream pre-congestion should on average be zero at as simple as possible, otherwise they would become
   the
      egress

   An executive summary pinch-points that limit scalability of our security analysis can be stated in two
   parts, distinguished by the type of collusion considered.  In whole internetwork.
   As the
   first case collusion is limited title of this memo suggests, we want to neighbours avoid per-flow
   processing at borders.  We also want to keep to passive mechanisms
   that can monitor traffic in the feedback loop.
   In other words, two neighbouring parallel to forwarding, rather than
   having to filter traffic inline---in series with forwarding.  As data
   rates continue to rise, we suspect that all-optical interconnection
   between networks can will soon be assumed a requirement.  So we want to act as
   one.  Or avoid any
   new need for buffering (even though border filtering is current
   practice for other reasons, we don't want to make it even less likely
   that we will ever get rid of it).

   So far, we have been able to keep the egress gateway might collude with domain C. Or border mechanisms simple,
   despite having had to harden them against some subtle attacks on the
   ingress gateway might collude with domain A. Or ingress
   re-ECN design.  The mechanisms are still passive and egress
   gateways might collude with each other.

   In these cases where only neighbours avoid per-flow
   processing, although we do use filtering as a fail-safe to
   temporarily shield against extreme events in other networks, such as
   accidental misconfigurations (Section 5.6.3).

   The basic accounting mechanism at each border interface simply
   involves accumulating the feedback loop collude,
   all parties have a volume of packets with positive incentive to declare downstream pre-
   congestion truthfully, worth (Re-
   Echo and FNE), and subtracting the ingress gateway has volume of those with negative
   worth: AM(-1) and PM(-1).  Even though this mechanism takes no regard
   of flows, over an accounting period (say a positive
   incentive to invoke admission control when congestion rises above the
   admission threshold in any network in the region (including its own).
   No party has an incentive to send more traffic than declared in
   reservation signalling (even though only the gateways read month) this
   signalling).  In short, no party can gain at the expense of another.

   In the case of other forms of collusion (e.g. between domain A and C)
   it would be possible subtraction
   will account for say A & B to create a tunnel between
   theselves so that A would gain at the expense of B. But C would then
   lose downstream congestion caused by all the gain that A had made.  Therefore flows
   traversing the value to A & C of
   colluding interface, wherever they come from, and wherever they
   go to.  The two networks can agree to mount use this attack seems questionable.  It is made more
   questionable, because metric however they
   wish to determine some congestion-related penalty against the attack can
   upstream network (see Section 5.3 for examples).  Although the
   algorithm could hardly be statistically detected by B simpler, it is spelled out using the second defence pseudo-
   code in depth mechanism mentioned already.  Note
   that C can effectively prevent A attacking it through a tunnel, by
   treating the tunnel end point as a direct link to a neighbouring
   network, which falls back Appendix A.2.1.

   Various attempts to subvert the regular scenario without collusion.

   {ToDo: Due re-ECN design have been made.  In all
   cases their root cause is persistently negative flows.  But, after
   describing these attacks we will show that we don't actually have to lack of time, the full write up
   get rid of the security
   analysis is deferred all persistently negative flows in order to thwart the next version of this memo.}

   Finally, it is well known that the best person
   attacks.

   In honest flows, downstream congestion is measured as positive minus
   negative volume.  So if all flows are honest (i.e. not persistently
   negative), adding all positive volume and all negative volume without
   regard to analyse the
   security flows will give an aggregate measure of a system downstream
   congestion.  But such simple aggregation is not only possible if no flows
   are persistently negative.  Unless persistently negative flows are
   completely removed, they will reduce the designer.  Therefore, our confident
   claims must aggregate measure of
   congestion.  The aggregate may still be hedged with doubt until others with an incentive to
   break positive overall, but not as
   positive as it would have mounted a full analysis.

7.  Extensions

   If a different signalling system, such as NSIS, were used, but
   providing admission control in a similar way using pre-congestion
   notification (e.g. with RMD [NSIS-RMD]) a similar approach been had the negative flows been removed.

   In Section 5.5 we discussed how to re-ECN
   could be used.

8.  Design Choices and Rationale

   The case sanction traffic to remove, or at
   least to identify, persistently negative flows.  But, even if the
   sanction for using re-feedback (a generalisation of re-ECN) negative traffic is to police
   congestion response and provide QoS discard it, unless it is made in [Re-fb].  Essentially,
   discarded at the insight is that congestion crosses layers exact point it goes negative, it will wrongly
   subtract from the physical
   upwards.  Therefore re-feedback polices congestion response based aggregate downstream congestion, at least at any
   borders it crosses after it has gone negative but before it is
   discarded.

   We rely on
   physical interfaces not addresses.  That is, sanctions to deter dishonest understatement of congestion.
   But even the congestion leaving a
   physical interface ultimate sanction of discard can only be policed at effective if
   the interface, rather than sender is bothered about the
   congestion on packets that claim data getting through to come its
   destination.  A number of attacks have been identified where a sender
   gains from an address, which may
   be spoofed.  Also, re-feedback does not actually require feedback. sending dummy traffic or it can attack someone or
   something using dummy traffic even though it isn't communicating any
   information to anyone:

   o  A
   source must act conservatively before network can simply create its own dummy traffic to congest
      another network, perhaps causing it gets feedback.

   On the subject of lack of feedback, the to lose business at no feedback (NF) codepoint is
   motivated by arguments for a state set-up bit in IP cost to prevent state
   exhaustion attacks.
      the attacking network.  This idea was first put forward is a form of denial of service
      perpetrated by David Clark
   and documented in [Handley_Steps_DoS]. one network on another.  The idea preferential drop
      measures in Section 4.3.4 provide crude protection against such
      attacks, but we are not overly worried about more accurate
      prevention measures, because it is that network
   layer datagrams should signal explicitly when they require state already possible for networks
      to
   be created in DoS other networks on the layer above (e.g. at flow start).  Then general Internet, but they generally
      don't because of the higher
   layer can refuse to create any state unless a datagram declares this
   intent. grave consequences of being found out.  We believe the NF codepoint can be used to serve the same
   purpose as the proposed more generic state-set-up bit.

   The re-feedback paper [Re-fb] also makes
      are only concerned if re-ECN increases the case motivation for using such an
   economic interpretation of congestion, which is the basis of the
   incentives-based approach used
      attack, as in this memo.  That paper also makes
   the case against the use of classic feedback if the economic
   interpretation of congestion is to be realised.  The problem next example.

   o  A network can just generate negative traffic and send it over its
      border with
   using classic feedback for policing congestion is a neighbour to reduce the overall penalties that it opens up
   receiving networks
      should pay to `denial that neighbour.  It could even initialise the TTL so
      it expired shortly after entering the neighbouring network,
      reducing the chance of funds' attacks.

   {ToDo: Further Design Rationale will detection further downstream.  This attack
      need not be included in future versions motivated by a desire to deny service and indeed need
      not cause denial of this memo}

9.  IANA Considerations

   {ToDo:}This memo includes no request service.  A network's main motivator would
      most likely be to IANA (yet).

10.  Security Considerations

   This whole memo concerns reduce the security of penalties it pays to a scalable admission control
   system.  In particular the analysis section.  Below some specific
   security issues are mentioned that did not fit elsewhere in the memo
   or which comment on neighbour.
      But, the robustness prospect of financial gain might tempt the security provided by network into
      mounting a DoS attack on the
   design.

   Firstly, we must repeat other network as well, given the statement gain
      would offset some of applicability in the
   analysis: risk of being detected.

   Note that we only consider new opportunities for /gainful/
   attack that our proposal introduces.  Despite only involving a few
   bits, there is sufficient complexity have not included DoS by Internet hosts in the whole system that there
   are numerous possibilities for attacks not catered for.  But as far
   as above
   list of attacks, because we are aware, none reap any benefit have restricted ourselves to a scenario
   with edge-to-edge admission control across a Diffserv region.  In
   this case, the attacker.  It will
   always be possible for one network to cause damage to another
   neighbouring network's traffic by dropping or corrupting it as it
   forwards it.  Therefore we do not believe networks would set their
   routing policies to interconnect in edge ingress gateways insulate the Diffserv region
   from DoS by Internet hosts.  Re-ECN resists more general DoS attacks,
   but this is discussed in [Re-TCP].

   The first place if they didn't
   trust the other networks not step towards a solution to damage their traffic without any
   /direct/ gain all these problems with negative
   flows is to themselves.

   Having said this, we do want be able to highlight some of estimate the weaker parts of
   our argument.  We have argued that networks will be dissuaded from
   faking contribution they make to
   downstream congestion marking by at a border and to correct the possibility that upstream networks
   will route round them.  As measure
   accordingly.  Although ideally we have said, these arguments are
   intuitive and will remain fairly tenuous until proved in practice,
   particularly close want to remove negative flows
   themselves, perhaps surprisingly, the egress where less competitive routing most effective first step is
   likely.

   We should also point to
   cancel out that the approach in this memo was only
   designed to be robust for admission control.  We do not claim polluting effect negative flows have on the
   incentives will always be strong enough to force correct flow pre-
   emption behaviour.  This measure of
   downstream congestion at a border.  It is because pre-emption more important to get an
   unbiased estimate of flows tends their effect, than to be
   associated with much higher damage try to remove them all.  A
   suggested algorithm to give an operator's reputation for
   robust quality than denying admission.  However, in general unbiased estimate of the
   incentives for correct flow pre-emption are similar contribution
   from negative flows to those for
   admission control.

   Finally, it may seem that the 8 codepoints that have been made
   available by extending downstream congestion measure is given in
   Appendix A.2.2.

   Although making an accurate assessment of the ECN field with contribution from
   negative flows may not be easy, just the RE bit have been used
   rather wastefully.  In single step of neutralising
   their polluting effect on congestion metrics removes all the RE bit has been used as an
   orthogonal single bit in nearly gains
   networks could otherwise make from mounting dummy traffic attacks on
   each other.  This puts all cases.  The only exception being
   when networks on the ECN field is cleared same side (only with
   respect to "00".  The mapping of the codepoints
   in an earlier version negative flows of course), rather than being pitched
   against each other.  The network where this proposal used the codepoint space more
   efficiently, but flow goes negative as
   well as all the scheme became vulnerable to a network operator
   focusing its networks downstream lose out from not being
   reimbursed for any congestion marking to mark more positive than neutral
   packets in order to reduce its penalties.

   {ToDo: More security considerations will undoubtedly be added this flow causes.  So they all have an
   interest in
   future versions getting rid of this memo.}

11.  Conclusions

   Using pre-congestion is these negative flows.  Networks forwarding
   a promising technique to control flow
   admissions that will scale to any size network.  However, before it requires
   a mechanism to ensure that networks can interconnect even if they do
   not trust each to keep to goes negative aren't strictly on the admission control protocols.  We claim same side, but
   they are disinterested bystanders---they don't care that the re-ECN protocol provides such a mechanism, flow
   goes negative downstream, but at least they can't actively gain from
   making it go negative.  The problem becomes localised so that one
   network can detect and prevent another network in once a
   flow goes negative, all the system fro
   cheating for its own gain.

12.  Acknowledgements

   All the following networks from where it happens and beyond
   downstream each have given helpful comments a small problem, each can detect it has a
   problem and some may become co-
   authors each can get rid of later drafts: Arnaud Jacquet, Alessandro Salvatori, Steve
   Rudkin, David Songhurst, John Davey, Ian Self, Anthony Sheppard (BT),
   Stephen Hailes (UCL), Francois Le Faucheur, Anna Charny (Cisco),
   Jozef Babiarz, Kwok-Ho Chan, Corey Alexander (Nortel), David Clark,
   Bill Lehr, Sharon Gillett (MIT) and comments from participants in the
   CFP/CRN inter-provider QoS and broadband working groups.

13.  Comments Solicited

   Comments and questions are encouraged and very welcome.  They problem if it chooses to.  But
   negative flows can no longer be
   addressed to the IETF Transport Area working group's mailing list
   <tsvwg@ietf.org>, and/or to the authors.

14.  References

14.1.  Normative References

   [PCN]      Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F.,
              Charny, A., Liatsos, V., Babiarz, J., Chan, K., and S.
              Dudley, "Pre-Congestion Notification",
              draft-briscoe-tsvwg-cl-phb-01 (work in progress),
              March 2006.

   [RFC2119]  Bradner, S., "Key words used for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2211]  Wroclawski, J., "Specification any new attacks.

   Once an unbiased estimate of the Controlled-Load
              Network Element Service", RFC 2211, September 1997.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition effect of Explicit Congestion Notification (ECN) negative flows can be
   made, the problem reduces to IP",
              RFC 3168, September 2001.

   [RFC3246]  Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec,
              J., Courtney, W., Davari, S., Firoiu, V., detecting and D.
              Stiliadis, "An Expedited Forwarding PHB (Per-Hop
              Behavior)", RFC 3246, March 2002.

   [RSVP-ECN]
              Le Faucheur, F., Charny, A., Briscoe, B., Eardley, P.,
              Babiarz, preferably removing flows
   that have gone negative as soon as possible.  But importantly,
   complete eradication of negative flows is no longer critical---best
   endeavours will be sufficient.

   Note that the guiding principle behind all the above discussion is
   that any gain from subverting the protocol should be precisely
   neutralised, rather than punished.  If a gain is punished to a
   greater extent than is sufficient to neutralise it, it will most
   likely open up a new vulnerability, where the amplifying effect of
   the punishment mechanism can be turned on others.

   For instance, if possible, flows should be removed as soon as they go
   negative, but we do NOT RECOMMEND any attempts to discard such flows
   further upstream while they are still positive.  Such over-zealous
   push-back is unnecessary and potentially dangerous.  These flows have
   paid their `fare' up to the point they go negative, so there is no
   harm in delivering them that far.  If someone downstream asks for a
   flow to be dropped as near to the source as possible, because they
   say it is going to become negative later, an upstream node cannot
   test the truth of this assertion.  Rather than have to authenticate
   such messages, re-ECN has been designed so that flows can be dropped
   solely based on locally measurable evidence.  A message hinting that
   a flow should be watched closely to test for negativity is fine.  But
   not a message that claims that a positive flow will go negative
   later, so it should be dropped. .

5.6.2.  Competitive Routing

   With the above penalty system, each domain seems to have a perverse
   incentive to fake pre-congestion.  For instance domain B profits from
   the difference between penalties it receives at its ingress (its
   revenue) and those it pays at its egress (its cost).  So if B
   overstates internal pre-congestion it seems to increase its profit.
   However, we can assume that domain A could bypass B, routing through
   other domains to reach the egress.  So the competitive discipline of
   least-cost routing can ensure that any domain tempted to fake pre-
   congestion for profit risks losing /all/ its incoming traffic.  The
   least congested route would eventually be able to win this
   competitive game, only as long as it didn't declare more fake pre-
   congestion than the next most competitive route.

   This memo does not need to standardise any particular mechanism for
   routing based on re-ECN.  Goldenberg et al [Smart_rtg] refers to
   various commercial products and presents its own algorithms for
   moving traffic between multi-homed routes based on usage charges.
   None of these systems require any changes to standards protocols
   because the choice between the available border gateway protocol
   (BGP) routes is based on a combination of local knowledge of the
   charging regime and local measurement of traffic levels.  If, as we
   propose, charges or penalties were based on the level of re-ECN
   measured in passing traffic, a similar optimisation could be achieved
   without requiring any changes to standard routing protocols.

   We must be clear that applying pre-congestion-based routing to this
   admission control system remains an open research issue.  Traffic
   engineering based on congestion requires careful damping to avoid
   oscillations, and should not be attempted without adult supervision
   :) Mortier & Pratt [ECN-BGP] have analysed traffic engineering based
   on congestion.  But without the benefit of re-ECN, they had to add a
   path attribute to BGP to advertise a route's downstream congestion
   (actually they proposed that BGP should advertise the charge for
   congestion, which we believe wrongly embeds an assumption into BGP
   that the only thing to do with congestion is charge for it).

5.6.3.  Fail-safes

   The mechanisms described so far create incentives for rational
   operators to behave.  That is, one operator aims to make another
   behave responsibly by applying penalties and expects a rational
   response (i.e. one that trades off costs against benefits).  It is
   usually reasonable to assume that other network operators will behave
   rationally (policy routing can avoid those that might not).  But this
   approach does not protect against the misconfigurations and accidents
   of other operators.

   Therefore, we propose the following two mechanisms at a network's
   borders to provide "defence in depth".  Both are similar:

   Highly positive flows: A small sample of positive packets should be
      picked randomly as they cross a border interface.  Then subsequent
      packets matching the same source and destination address and DSCP
      should be monitored.  If the fraction of positive marking is well
      above a threshold (to be determined by operational practice), a
      management alarm SHOULD be raised, and the flow MAY be
      automatically subject to focused drop.

   Persistently negative flows: A small sample of congestion marked
      packets should be picked randomly as they cross a border
      interface.  Then subsequent packets matching the same source and
      destination address and DSCP should be monitored.  If the RE
      blanking fraction minus the congestion marking fraction is
      persistently negative, a management alarm SHOULD be raised, and
      the flow MAY be automatically subject to focused drop.

   Both these mechanisms rely on the fact that highly positive (or
   negative) flows will appear more quickly in the sample by selecting
   randomly solely from positive (or negative) packets.

   Note that there is no assumption that /users/ behave rationally.  The
   system is protected from the vagaries of irrational user behaviour by
   the ingress gateways, which transform internal penalties into a
   deterministic, admission control mechanism that prevents users from
   misbehaving, by directly engineered means.

6.  Analysis

   The domains in Figure 1 are not expected to be completely malicious
   towards each other.  After all, we can assume that they are all co-
   operating to provide an internetworking service to the benefit of
   each of them and their customers.  Otherwise their routing polices
   would not interconnect them in the first place.  However, we assume
   that they are also competitors of each other.  So a network may try
   to contravene our proposed protocol if it would gain or make a
   competitor lose, or both, but only if it can do so without being
   caught.  Therefore we do not have to consider every possible random
   attack one network could launch on the traffic of another, given
   anyway one network can always drop or corrupt packets that it
   forwards on behalf of another.

   Therefore, we only consider new opportunities for /gainful/ attack
   that our proposal introduces.  But to a certain extent we can also
   rely on the in depth defences we have described (Section 5.6.3 )
   intended to mitigate the potential impact if one network accidentally
   misconfiguring the workings of this protocol.

   The ingress and egress gateways are shown in the most generic
   arrangement possible in Figure 1, without any surrounding network.
   This allows us to consider more specific cases where these gateways
   and a neighbouring network are operated by the same player.  As well
   as cases where the same player operates neighbouring networks, we
   will also consider cases where the two gateways collude as one player
   and where the sender and receiver collude as one.  Collusion of other
   sets of domains is less likely, but we will consider such cases.  In
   the general case, we will assume none of the nine trust domains
   across the figure fully trust any of the others.

   As we only propose to change routers within the Diffserv region, we
   assume the operators of networks outside the region will be doing
   per-flow policing.  That is, we assume the networks outside the
   Diffserv region and the gateways around its edges can protect
   themselves.  So given we are proposing to remove flow policing from
   some networks, our primary concern must be to protect networks that
   don't do per-flow policing (the potential `victims') from those that
   do (the `enemy').  The ingress and egress gateways are the only way
   the outer enemy can get at the middle victim, so we can consider the
   gateways as the representatives of the enemy as far as domains A, B
   and C are concerned.  We will call this trust scenario `edges against
   middles'.

   Earlier in this memo, we outlined the classic border rate policing
   problem (Section 3).  It will now be useful to reiterate the
   motivations that are the root cause of the problem.  The more
   reservations a gateway can allow, the more revenue it receives.  The
   middle networks want the edges to comply with the admission control
   protocol when they become so congested that their service to others
   might suffer.  The middle networks also want to ensure the edges
   cannot steal more service from them than they are entitled to.

   In the context of this `edges against middles' scenario, the re-ECN
   protocol has two main effects:

   o  The more pre-congestion there is on a path across the Diffserv
      region, the higher the ingress gateway must declare downstream
      pre-congestion.

   o  If the ingress gateway does not declare downstream pre-congestion
      high enough on average, it will `hit the ground before the
      runway', going negative and triggering sanctions, either directly
      against the traffic or against the ingress gateway at a management
      level

   An executive summary of our security analysis can be stated in three
   parts, distinguished by the type of collusion considered.

   Neighbour-only Middle-Middle Collusion: Here there is no collusion or
      collusion is limited to neighbours in the feedback loop.  In other
      words, two neighbouring networks can be assumed to act as one.  Or
      the egress gateway might collude with domain C. Or the ingress
      gateway might collude with domain A. Or ingress and egress
      gateways might collude with each other.

      In these cases where only neighbours in the feedback loop collude,
      we concludes that all parties have a positive incentive to declare
      downstream pre-congestion truthfully, and the ingress gateway has
      a positive incentive to invoke admission control when congestion
      rises above the admission threshold in any network in the region
      (including its own).  No party has an incentive to send more
      traffic than declared in reservation signalling (even though only
      the gateways read this signalling).  In short, no party can gain
      at the expense of another.

   Non-neighbour Middle-Middle Collusion: In the case of other forms of
      collusion between middle networks (e.g. between domain A and C) it
      would be possible for say A & C to create a tunnel between
      themselves so that A would gain at the expense of B. But C would
      then lose the gain that A had made.  Therefore the value to A & C
      of colluding to mount this attack seems questionable.  It is made
      more questionable, because the attack can be statistically
      detected by B using the second `defence in depth' mechanism
      mentioned already.  Note that C can defend itself from being
      attacked through a tunnel by treating the tunnel end point as a
      direct link to a neighbouring network (e.g. as if A were a
      neighbour of C, via the tunnel), which falls back to the safety of
      the neighbour-only scenario.

   Middle-Edge Collusion: Collusion between networks or gateways within
      the Diffserv region and networks or users outside the region has
      not yet been fully analysed.  The presence of full per-flow
      policing at the ingress gateway seems to make this a less likely
      source of a successful attack.

   {ToDo: Due to lack of time, the full write up of the security
   analysis is deferred to the next version of this memo.}

   Finally, it is well known that the best person to analyse the
   security of a system is not the designer.  Therefore, our confident
   claims must be hedged with doubt until others with perhaps a greater
   incentive to break it have mounted a full analysis.

7.  Incremental Deployment

   We believe ECN has so far not been widely deployed because it
   requires widespread end system and network deployment just to achieve
   a marginal improvement in performance.  The ability to offer a new
   service (admission control) would be a much stronger driver for ECN
   deployment.

   As stated in the introduction, the aim of this memo is to "build in
   security from the start" when admission control is based on pre-
   congestion notification.  However, the proposal has been designed so
   that security can be added some time after first deployment.  Given
   admission control based on pre-congestion notification requires few
   changes to standards, it should be deployable fairly soon.  However,
   re-ECN requires a change to IP, which may take a little longer.

   We expect that initial deployments of PCN-based admission control
   will be confined to single networks, or to clubs of networks that
   trust each other.  The proposal in this memo will only become
   relevant once networks with conflicting interests wish to
   interconnect their admission controlled services, but without the
   scalability constraints of per-flow border policing.  It will not be
   possible to use re-ECN, even in a controlled environment between
   consenting operators, unless it is standardised into IP.  Given the
   IPv4 header has limited space for further changes, current IESG
   policy [{ToDo: ref?}] is not to allow experimental use of codepoints
   in the IPv4 header, as whenever an experiment isn't taken up, the
   space it used tends to be impossible to reclaim.

   If PCN-based admission control is deployed before re-ECN is
   standardised into IP, wherever a networks (or club of networks)
   connects to another network (or club of networks) with conflicting
   interests, they will place a gateway between the two regions that
   does per-flow rate policing and admission control.  If re-ECN is
   eventually standardised into IP, it will be possible for these
   separate regions to upgrade all their gateways to use re-ECN before
   removing the per-flow policing gateways between them.  Given the
   edge-to-edge deployment model of PCN-based admission control, it is
   reasonable to imagine this incremental deployment model without
   needing to cater for partial deployment of re-ECN in just some of the
   gateways around one Diffserv region.

   Only the edge gateways around a Diffserv region have to be upgraded
   to add re-ECN support, not interior routers.  It is also necessary to
   add the mechanisms that use re-ECN to secure a network against
   misbehaving gateways and networks.  Specifically, these are the
   border mechanisms (Section 5.6) and the mechanisms to sanction
   dishonest marking (Section 5.5).

   We also RECOMMEND adding improvements to forwarding on interior
   routers (Section 4.3.4).  But the system works whether all, some or
   none are upgraded, so interior routers may be upgraded in a piecemeal
   fashion at any time.

8.  Design Choices and Rationale

   The primary insight of this work is that downstream congestion is the
   metric that would be most useful to control an internetwork, and
   particularly to police how one network responds to the congestion it
   causes in a remote network.  This is the problem that has previously
   made it so hard to provide scalable admission control.

   The case for using re-feedback (a generalisation of re-ECN) to police
   congestion response and provide QoS is made in [Re-fb].  Essentially,
   the insight is that congestion is a factor that crosses layers from
   the physical upwards.  Therefore re-feedback polices congestion where
   it emerges from a physical interface between networks.  This is
   achieved by bringing the congestion information to the interface,
   rather than examining packet addressing where there is congestion.
   Then congestion crossing the physical interface at a border can be
   policed at the interface, rather than policing the congestion on
   packets that claim to come from an address (which may be spoofed).
   Also, re-feedback works in the network layer independently of other
   layers---despite its name re-feedback does not actually require
   feedback.  It requires a source to act conservatively before it gets
   feedback.

   On the subject of lack of feedback, the feedback not established
   (FNE) codepoint is motivated by arguments for a state set-up bit in
   IP to prevent state exhaustion attacks.  This idea was first put
   forward informally by David Clark and documented by Handley and
   Greenhalgh in [Steps_DoS].  The idea is that network layer datagrams
   should signal explicitly when they require state to be created in the
   network layer or the layer above (e.g. at flow start).  Then a node
   can refuse to create any state unless a datagram declares this
   intent.  We believe the proposed FNE codepoint serves the same
   purpose as the proposed state-set-up bit, but it has been overloaded
   with a more specific purpose, using it on more packets than just the
   first in a flow, but never less (i.e. it is idempotent).  In effect
   the FNE codepoint serves the purpose of a `soft-state set-up
   codepoint'.

   The re-feedback paper [Re-fb] also makes the case for converting the
   economic interpretation of congestion into hard engineering
   mechanism, which is the basis of the approach used in this memo.  The
   admission control gateways around the Diffserv region use hard
   engineering, not incentives, to prevent end users from sending more
   traffic than they have reserved.  Incentive-based mechanisms are only
   used between networks, because they are expected to respond to
   incentives more rationally than end-users can be expected to.
   However, even then, a network can use fail-safes to protect itself
   from excessively unusual behaviour by neighbouring networks, whether
   due to an accidental misconfiguration or malicious intent.

   The guiding principle behind the incentive-based approach used
   between networks is that any gain from subverting the protocol should
   be precisely neutralised, rather than punished.  If a gain is
   punished to a greater extent than is sufficient to neutralise it, it
   will most likely open up a new vulnerability, where the amplifying
   effect of the punishment mechanism can be turned on others.

   The re-feedback paper also makes the case against the use of
   congestion charging to police congestion if it is based on classic
   feedback (where only upstream congestion is visible to network
   elements).  It argues this would open up receiving networks to
   `denial of funds' attacks and would require end users to accept
   dynamic pricing (which few would).

   Re-ECN has been deliberately designed to simplify policing at the
   borders between networks.  These trust boundaries are the critical
   pinch-points that will limit the scalability of the whole
   internetwork unless the overall design minimises the complexity of
   security functions at these borders.  The border mechanisms described
   in this memo run passively in parallel to data forwarding and they do
   not require per-flow processing.

9.  Security Considerations

   This whole memo concerns the security of a scalable admission control
   system.  In particular the analysis section.  Below some specific
   security issues are mentioned that did not belong elsewhere or which
   comment on the overall robustness of the security provided by the
   design.

   Firstly, we must repeat the statement of applicability in the
   analysis: that we only consider new opportunities for /gainful/
   attack that our proposal introduces, particularly if the attacker can
   avoid being identified.  Despite only involving a few bits, there is
   sufficient complexity in the whole system that there are probably
   numerous possibilities for other attacks.  However, as far as we are
   aware, none reap any benefit to the attacker.  For instance, it would
   be possible for a downstream network to remove the congestion
   markings introduced by an upstream network, but it would only lose
   out on the penalties it could apply to a downstream network.

   When one network forwards a neighbouring network's traffic it will
   always be possible to cause damage by dropping or corrupting it.
   Therefore we do not believe networks would set their routing policies
   to interconnect in the first place if they didn't trust the other
   networks not to arbitrarily damage their traffic.

   Having said this, we do want to highlight some of the weaker parts of
   our argument.  We have argued that networks will be dissuaded from
   faking congestion marking by the possibility that upstream networks
   will route round them.  As we have said, these arguments are based on
   fairly delicate assumptions and will remain fairly tenuous until
   proved in practice, particularly close to the egress where less
   competitive routing is likely.

   We should also point out that the approach in this memo was only
   designed to be robust for admission control.  We do not claim the
   incentives will always be strong enough to force correct flow pre-
   emption behaviour.  This is because a user will tend to perceive much
   greater loss in value if a flow is pre-empted than if admission is
   denied at the start.  However, in general the incentives for correct
   flow pre-emption are similar to those for admission control.

   Finally, it may seem that the 8 codepoints that have been made
   available by extending the ECN field with the RE flag have been used
   rather wastefully.  In effect the RE flag has been used as an
   orthogonal single bit in nearly all cases.  The only exception being
   when the ECN field is cleared to "00".  The mapping of the codepoints
   in an earlier version of this proposal used the codepoint space more
   efficiently, but the scheme became vulnerable to a network operator
   focusing its congestion marking to mark more positive than neutral
   packets in order to reduce its penalties.

   With the scheme as now proposed, once the RE flag is set or cleared
   by the sender or its proxy, it should not be written by the network,
   only read.  So the gateways can detect if any network maliciously
   alters the RE flag.  IPSec AH integrity checking does not cover the
   IPv4 option flags (they were considered mutable---even the one we
   propose using for the RE flag that was `currently unused' when IPSec
   was defined).  But it would be sufficient for a pair of gateways to
   make random checks on whether the RE flag was the same when it
   reached the egress gateway as when it left the ingress.  Indeed, if
   IPSec AH had covered the RE flag, any network intending to alter
   sufficient RE flags to make a gain would have focused its alterations
   on packets without authenticating headers (AHs).

   No cryptographic algorithms have been harmed in the making of this
   proposal.

10.  IANA Considerations

   This memo includes no request to IANA.

11.  Conclusions

   This memo builds on a promising technique to solve the classic
   problem of making flow admission control scale to any size network.
   It involves the use of Diffserv in a deployment model that uses pre-
   congestion notification feedback to control admission into a network
   path [CL-deploy].  However as it stands, that deployment model
   depends on all network domains trusting each other to comply with the
   protocols, invoking admission control and flow pre-emption when
   requested.

   We propose that the congestion feedback used in that deployment model
   should be re-echoed into the forward data path, by making a trivial
   modification to the ingress gateway.  We then explain how the
   resulting downstream pre-congestion metric in packets can be
   monitored in bulk at borders to sufficiently emulate flow rate
   policing.

   We claim the result of combining these two approaches is an admission
   control system that scales to any size network /and/ any number of
   interconnected networks, even if they all act in their own interests.

   This proposal aims to convince its readers to "Design in Security
   from the start," by building modified ingress gateways from day one,
   even if border policing is not needed at first.  This way, we will
   not build ourselves tomorrow's legacy problem.

   Re-echoing congestion feedback is based on a principled technique
   called Re-ECN [Re-TCP], designed to add accountability for causing
   congestion to the general-purpose IP datagram service.  Re-ECN
   proposes to consume the last completely unused bit in the basic IPv4
   header.

12.  Acknowledgements

   All the following have given helpful comments and some may become co-
   authors of later drafts: Arnaud Jacquet, Alessandro Salvatori, Steve
   Rudkin, David Songhurst, John Davey, Ian Self, Anthony Sheppard,
   Carla Di Cairano-Gilfedder (BT), Mark Handley (who identified the
   excess canceled packets attack), Stephen Hailes, Adam Greenhalgh
   (UCL), Francois Le Faucheur, Anna Charny (Cisco), Jozef Babiarz,
   Kwok-Ho Chan, Corey Alexander (Nortel), David Clark, Bill Lehr,
   Sharon Gillett, Steve Bauer (MIT) (who publicised various dummy
   traffic attacks), Sally Floyd (ICIR) and comments from participants
   in the CFP/CRN inter-provider QoS and broadband working groups.

13.  Comments Solicited

   Comments and questions are encouraged and very welcome.  They can be
   addressed to the IETF Transport Area working group's mailing list
   <tsvwg@ietf.org>, and/or to the authors.

14.  References

14.1.  Normative References

   [PCN]      Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F.,
              Charny, A., Liatsos, V., Babiarz, J., Chan, K., Dudley,
              S., Westberg, L., Bader, A., and G. Karagiannis, "Pre-
              Congestion Notification Marking",
              draft-briscoe-tsvwg-cl-phb-02 (work in progress),
              June 2006.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2211]  Wroclawski, J., "Specification of the Controlled-Load
              Network Element Service", RFC 2211, September 1997.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, September 2001.

   [RFC3246]  Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec,
              J., Courtney, W., Davari, S., Firoiu, V., and D.
              Stiliadis, "An Expedited Forwarding PHB (Per-Hop
              Behavior)", RFC 3246, March 2002.

   [RSVP-ECN]
              Le Faucheur, F., Charny, A., Briscoe, B., Eardley, P.,
              Babiarz, J., and K. Chan, "RSVP Extensions for Admission
              Control over Diffserv using Pre-congestion Notification",
              draft-lefaucheur-rsvp-ecn-00
              draft-lefaucheur-rsvp-ecn-01 (work in progress),
              October 2005.
              June 2006.

   [Re-TCP]   Briscoe, B., Jacquet, A., and A. Salvatori, "Re-ECN:
              Adding Accountability for Causing Congestion to TCP/IP",
              draft-briscoe-tsvwg-re-ecn-tcp-01
              draft-briscoe-tsvwg-re-ecn-tcp-02 (work in progress),
              March
              June 2006.

14.2.  Informative References

   [CL-arch]

   [CL-deploy]
              Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F.,
              Charny, A., Babiarz, J., Babiarz, J., Chan, K., Westberg, L., Bader,
              A., and K. Chan, G. Karagiannis, "A Framework Deployment Model for Admission
              Control over DiffServ using Pre-Congestion Notification", draft-briscoe-tsvwg-cl-architecture-02
              draft-briscoe-tsvwg-cl-architecture-03 (work in progress), March
              June 2006.

   [CLoop_pol]
              Salvatori, A., "Closed Loop Traffic Policing", Politecnico
              Torino and Institut Eurecom Masters Thesis ,
              September 2005.

   [ECN-BGP]  Mortier, R. and I. Pratt, "Incentive Based Inter-Domain
              Routeing", Proc Internet Charging and QoS Technology
              Workshop (ICQT'03) pp308--317, September 2003, <http://
              research.microsoft.com/users/mort/publications.aspx>.

   [ECN-MPLS]
              Bruce, B., Briscoe, B., and J. Tay, "Explicit Congestion
              Marking in MPLS", draft-davie-ecn-mpls-00 (work in
              progress), June 2006.

   [IXQoS]    Briscoe, B. and S. Rudkin, "Commercial Models for IP
              Quality of Service Interconnect", BT Technology Journal
              (BTTJ) 23(2)171--195, April 2005,
              <http://www.cs.ucl.ac.uk/staff/B.Briscoe/pubs.html#ixqos>.

   [NSIS-RMD]
              Bader, A., Westberg, L., Karagiannis, G., Kappler, C., and
              T. Phelan, "RMD-QOSM - The Resource Management in Diffserv
              QOS Model", draft-ietf-nsis-rmd-06 (work in progress),
              February 2006.

   [RFC2205]  Braden, B., Zhang, L., Berson, S., Herzog, S., and S.
              Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1
              Functional Specification", RFC 2205, September 1997.

   [RFC2207]  Berger, L. and T. O'Malley, "RSVP Extensions for IPSEC
              Data Flows", RFC 2207, September 1997.

   [RFC2208]  Mankin, A., Baker, F., Braden, B., Bradner, S., O'Dell,
              M., Romanow, A., Weinrib, A., and L. Zhang, "Resource
              ReSerVation Protocol (RSVP) Version 1 Applicability
              Statement Some Guidelines on Deployment", RFC 2208,
              September 1997.

   [RFC2747]  Baker, F., Lindell, B., and M. Talwar, "RSVP Cryptographic
              Authentication", RFC 2747, January 2000.

   [RFC2998]  Bernet, Y., Ford, P., Yavatkar, R., Baker, F., Zhang, L.,
              Speer, M., Braden, R., Davie, B., Wroclawski, J., and E.
              Felstaine, "A Framework for Integrated Services Operation
              over Diffserv Networks", RFC 2998, November 2000.

   [RFC3540]  Spring, N., Wetherall, D., and D. Ely, "Robust Explicit
              Congestion Notification (ECN) Signaling with Nonces",
              RFC 3540, June 2003.

   [Re-fb]    Briscoe, B., Jacquet, A., Di Cairano-Gilfedder, C.,
              Salvatori, A., Soppera, A., and M. Koyabe, "Policing
              Congestion Response in an Internetwork Using Re-Feedback",
              ACM SIGCOMM CCR 35(4)277--288, August 2005, <http://
              www.acm.org/sigs/sigcomm/sigcomm2005/
              techprog.html#session8>.

   [Smart_rtg]
              Goldenberg, D., Qiu, L., Xie, H., Yang, Y., and Y. Zhang,
              "Optimizing Cost and Performance for Multihoming", ACM
              SIGCOMM CCR 34(4)79--92, October 2004,
              <http://citeseer.ist.psu.edu/698472.html>.

   [Steps_DoS]
              Handley, M. and A. Greenhalgh, "Steps towards a DoS-
              resistant Internet Architecture", Proc. ACM SIGCOMM
              workshop on Future directions in network architecture
              (FDNA'04) pp 49--56, August 2004.

Appendix A.  Implementation

A.1.  Ingress Gateway Algorithm for Blanking the RE bit

   The ingress gateway receives regular feedback reporting RE flag

   The ingress gateway receives regular feedback reporting the fraction
   of congestion marked octets for each aggregate arriving at the
   egress.  So for each aggregate it should blank the RE flag on the
   same fraction of octets.  It is more efficient to calculate the
   reciprocal of this fraction when the signalling arrives, Z_0 = (1 /
   Congestion-Level-Estimate).  Z_0 will be the number of octets of
   packets the ingress should send with the RE flag set between those it
   sends with the RE flag blanked.  Z_0 will also take account of the
   sustainable rate reported during the flow pre-emption process, if
   necessary.

   A suitable pseudo-code algorithm for the ingress gateway is as
   follows:

   ====================================================================
   B_i = 0                 /* interblank volume                     */
   for each PCN-capable packet {
       b = readLength()    /* set b to packet size                  */
       B_i += b            /* accumulate interblank volume          */
       if B_i < b * Z_0 {  /* test whether interblank volume...     */
           writeRE(1)
       } else {            /* ...exceeds blank RE spacing * pkt size*/
           writeRE(0)      /* ...and if so, clear RE                */
           B_i = 0         /* ...and re-set interblank volume       */
       }
   }
   ====================================================================

A.2.  Downstream Congestion Metering Algorithms

A.2.1.  Bulk Downstream Congestion Metering Algorithm

   To meter the bulk amount of downstream pre-congestion in traffic
   crossing an inter-domain border, an algorithm is needed that
   accumulates the size of positive packets and subtracts the fraction size of congestion marked octets
   negative packets.  We maintain two counters:

      V_b: accumulated pre-congestion volume

      B: total data volume (in case it is needed)

   A suitable pseudo-code algorithm for a border router is as follows:

   ====================================================================
   V_b = 0
   B   = 0
   for each aggregate arriving at PCN-capable packet {
       b = readLength(packet)      /* set b to packet size          */
       B += b                      /* accumulate total volume       */
       if readEECN(packet) == (Re-Echo || FNE) {
           V_b += b                /* increment...                  */
       } elseif readEECN(packet) == ( AM(-1) || PM(-1) ) {
           V_b -= b                /* ...or decrement V_b...        */
       }                           /*...depending on EECN field     */
   }
   ====================================================================

   At the
   egress.  So end of an accounting period this counter V_b represents the
   pre-congestion volume that penalties could be applied to, as
   described in Section 5.3.

   For instance, accumulated volume of pre-congestion through a border
   interface over a month might be V_b = 5PB (petabyte = 10^15 byte).
   This might have resulted from an average downstream pre-congestion
   level of 1% on an accumulated total data volume of B = 500PB.

A.2.2.  Inflation Factor for each aggregate Persistently Negative Flows

   The following process is suggested to complement the simple algorithm
   above in order to protect against the various attacks from
   persistently negative flows described in Section 5.6.1.  As explained
   in that section, the most important and first step is to estimate the
   contribution of persistently negative flows to the bulk volume of
   downstream pre-congestion and to inflate this bulk volume as if these
   flows weren't there.  The process below has been designed to give an
   unboased estimate, but it should blank may be possible to define other processes
   that achieve similar ends.

   While the RE bit on above simple metering algorithm is counting the same
   fraction bulk of octets.  It is more efficient to calculate
   traffic over an accounting period, the reciprocal meter should also select a
   subset of this fraction when the signalling arrives, Z_0 = 1 / Congestion-
   Level-Estimate, which will whole flow ID space that is small enough to be the number able to
   realistically measure but large enough to give a realistic sample.
   Many different samples of bytes different subsets of packets the
   ingress ID space should send with the RE bit set between those it sends with
   the RE bit blanked.  Z_0 will also take account of the sustainable
   rate reported be
   taken at different times during the flow pre-emption process, if necessary.

   A suitable pseudo-code algorithm for accounting period, preferably
   covering the ingress gateway is as
   follows:

   ====================================================================
   B_i = 0                 /* interblank volume                     */
   for whole ID space.  During each packet {
       b = readLength()    /* set b to packet size                  */
       B_i += b            /* accumulate interblank volume          */
       if B_i < b * Z_0 {  /* test whether interblank volume...     */
           writeRE(1)
       } else {            /* ...exceeds blank RE spacing * pkt size*/
           writeRE(0)      /* ...and if so, clear RE                */
           B_i = 0         /* ...and re-set interblank volume       */
       }
   }
   ====================================================================

A.2.  Bulk Downstream Congestion Metering Algorithm

   To meter sample, the bulk amount of downstream pre-congestion in passing
   traffic an algorithm is needed that accumulates meter should
   count the size volume of positive packets
   with RE blanked (or NF set) and subtracts subtract the size volume of congestion
   marked packets, but ignores
   negative, maintaining a persistently negative balance over separate account for each flow in the sample.
   It should run a
   duration lot longer than the large majority of flows, to avoid
   a bias from missing the starts and ends of T ~ 10secs, say.  Three counters need flows, which tend to be maintained:

      B_v: accumulated pre-congestion volume

      B_s: pre-congestion volume in timeslot

      B_t:
   positive and negative respectively.

   Once the accounting period finishes, the meter should calculate the
   total data volume

   A suitable pseudo-code algorithm for a border router is as follows:

   ====================================================================
   B_v = 0
   B_s = 0
   B_t = 0
   t = timeNow() + T           /* divide into timeslots of few secs */ the accounts V_{bI} for each packet {
       b = readLength()            /* set b to packet size          */
       B_t += b                    /* accumulate the subset of flows I in the sample,
   and the total volume       */
       if readRE() == 0 || readEECN() == NF {
           B_s += b                /* increment...                  */
       } elseif readECN() == 1X {
           B_s -= b                /* ...or decrement B_s...        */
       }                           /*...depending on EECN field     */
       if timeNow() > t {      /* every timeslot...                 */
           if B_v > 0 {        /* count of the accounts V_{fI} excluding flows with a negative balance as zero  */
               B_v += B_s      /* otherwise accumulate
   account from the balance  */
           }
           B_s subset I. Then the weighted mean of all these
   samples should be taken a_S = 0                 /* re-set sum_{forall I} V_{fI} / sum_{forall I}
   V_{bI}.

   If V_b is the temp counter...    */
           t += T                  /* ...for result of the next timeslot      */
       }
   }
   ====================================================================

   At bulk accounting algorithm over the end of an
   accounting period (Appendix A.2.1) it can be inflated by this counter B_v represents factor
   a_S to get a good unbiased estimate of the
   pre-congestion volume that penalties could be applied to, as
   described in Section 5.2.

   For instance, accumulated volume of pre-congestion through a border
   interface over a month might be B_v = 5PB (petabyte = 10^15 byte).
   This might have resulted from an average downstream pre-congestion
   level of 1% on an accumulated total data volume
   congestion over the accounting period a_S.V_b, without being polluted
   by the effect of B_t = 500PB. persistently negative flows.

A.3.  Algorithm for Sanctioning Negative Traffic

   {ToDo: Write up dropper algorithms similar to Appendix D of [Re-TCP] for the
   negative flow monitor with flow management algorithm and the variant
   with bounded flow state.}

Author's Address

   Bob Briscoe
   BT & UCL
   B54/77, Adastral Park
   Martlesham Heath
   Ipswich  IP5 3RE
   UK

   Phone: +44 1473 645196
   Email: bob.briscoe@bt.com
   URI:   http://www.cs.ucl.ac.uk/staff/B.Briscoe/

Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.

Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Copyright Statement

   Copyright (C) The Internet Society (2006).  This document is subject
   to the rights, licenses and restrictions contained in BCP 78, and
   except as set forth therein, the authors retain all their rights.

Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.