draft-briscoe-tsvwg-cl-architecture-03.txt   draft-briscoe-tsvwg-cl-architecture-04.txt 
TSVWG B. Briscoe TSVWG B. Briscoe
Internet Draft P. Eardley Internet Draft P. Eardley
draft-briscoe-tsvwg-cl-architecture-03.txt D. Songhurst draft-briscoe-tsvwg-cl-architecture-04.txt D. Songhurst
Expires: December 2006 BT Expires: April 2007 BT
F. Le Faucheur F. Le Faucheur
A. Charny A. Charny
Cisco Systems, Inc Cisco Systems, Inc
J. Babiarz J. Babiarz
K. Chan K. Chan
S. Dudley S. Dudley
Nortel Nortel
G. Karagiannis G. Karagiannis
University of Twente / Ericsson University of Twente / Ericsson
A. Bader A. Bader
L. Westberg L. Westberg
Ericsson Ericsson
26 June, 2006 25 October, 2006
An edge-to-edge Deployment Model for Pre-Congestion Notification: An edge-to-edge Deployment Model for Pre-Congestion Notification:
Admission Control over a DiffServ Region Admission Control over a DiffServ Region
draft-briscoe-tsvwg-cl-architecture-03.txt draft-briscoe-tsvwg-cl-architecture-04.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 2, line 16 skipping to change at page 2, line 16
This Internet-Draft will expire on September 6, 2006. This Internet-Draft will expire on September 6, 2006.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2006). All Rights Reserved. Copyright (C) The Internet Society (2006). All Rights Reserved.
Abstract Abstract
This document describes a deployment model for pre-congestion This document describes a deployment model for pre-congestion
notification (PCN). PCN-based flow admission control and if necessary notification (PCN) operating in a large DiffServ-based region of the
flow pre-emption preserve the Controlled Load service to admitted Internet. PCN-based admission control protects the quality of service
flows. Routers in a large DiffServ-based region of the Internet use of existing flows in normal circumstances, whilst if necessary (eg
new pre-congestion notification marking to give early warning of after a large failure) pre-emption of some flows preserves the quality
their own congestion. Gateways around the edges of the region convert of service of the remaining flows. Each link has a configured-
measurements of this packet granularity marking into admission admission-rate and a configured-pre-emption-rate, and a router marks
control and pre-emption functions at flow granularity. Note that packets that exceed these rates. Hence routers give an early warning of
interior routers of the DiffServ-based region do not require flow their own potential congestion, before packets need to be dropped.
state or signalling - they only have to do the bulk packet marking of Gateways around the edges of the PCN-region convert measurements of
PCN. Hence an end-to-end Controlled Load service can be achieved packet rates and their markings into decisions about whether to admit
without any scalability impact on interior routers. new flows, and (if necessary) into the rate of excess traffic that
should be pre-empted. Per-flow admission states are kept at the
gateways only, while the PCN markers that are required for all routers
operate on the aggregate traffic - hence there is no scalability impact
on interior routers.
Authors' Note (TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION) Authors' Note (TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION)
This document is posted as an Internet-Draft with the intention of This document is posted as an Internet-Draft with the intention of
eventually becoming an INFORMATIONAL RFC. eventually becoming an INFORMATIONAL RFC.
Table of Contents Table of Contents
1. Introduction......................................... 5 1. Introduction................................................5
1.1. Summary......................................... 5 1.1. Summary................................................5
1.1.1. Flow admission control........................ 7 1.2. Key benefits...........................................8
1.1.2. Flow pre-emption............................. 9 1.3. Terminology............................................9
1.1.3. Both admission control and pre-emption.......... 10 1.4. Existing terminology...................................11
1.2. Terminology.................................... 10 1.5. Standardisation requirements...........................11
1.3. Existing terminology............................. 12 1.6. Structure of rest of the document......................12
1.4. Standardisation requirements...................... 12 2. Key aspects of the deployment model.........................13
1.5. Structure of rest of the document.................. 13 2.1. Key goals.............................................13
2. Key aspects of the deployment model..................... 14 2.2. Key assumptions........................................14
2.1. Key goals...................................... 14 3. Deployment model...........................................17
2.2. Key assumptions................................. 15 3.1. Admission control......................................17
2.3. Key benefits ................................... 17 3.1.1. Pre-Congestion Notification for Admission Marking..17
3. Deployment model.................................... 19 3.1.2. Measurements to support admission control..........17
3.1. Admission control ............................... 19
3.1.1. Pre-Congestion Notification for Admission Marking. 19
3.1.2. Measurements to support admission control........ 19
3.1.3. How edge-to-edge admission control supports end-to-end 3.1.3. How edge-to-edge admission control supports end-to-end
QoS signalling ................................... 20 QoS signalling..........................................18
3.1.4. Use case................................... 20 3.1.4. Use case.........................................18
3.2. Flow pre-emption................................ 22 3.2. Flow pre-emption.......................................20
3.2.1. Alerting an ingress gateway that flow pre-emption may be 3.2.1. Alerting an ingress gateway that flow pre-emption may be
needed.......................................... 22 needed..................................................20
3.2.2. Determining the right amount of CL traffic to drop 24 3.2.2. Determining the right amount of CL traffic to drop.23
3.2.3. Use case for flow pre-emption ................. 25 3.2.3. Use case for flow pre-emption.....................24
4. Summary of Functionality.............................. 27 3.3. Both admission control and pre-emption.................25
4.1. Ingress gateways................................ 27 4. Summary of Functionality....................................27
4.2. Interior routers................................ 28 4.1. Ingress gateways.......................................27
4.3. Egress gateways................................. 28 4.2. Interior routers.......................................28
4.4. Failures....................................... 29 4.3. Egress gateways........................................28
5. Limitations and some potential solutions................. 31 4.4. Failures..............................................29
5.1. ECMP.......................................... 31 5. Limitations and some potential solutions....................31
5.2. Beat down effect................................ 33 5.1. ECMP..................................................31
5.3. Bi-directional sessions .......................... 35 5.2. Beat down effect.......................................33
5.4. Global fairness................................. 37 5.3. Bi-directional sessions................................35
5.5. Flash crowds ................................... 39 5.4. Global fairness........................................37
5.6. Pre-empting too fast............................. 40 5.5. Flash crowds..........................................39
5.7. Other potential extensions........................ 42 5.6. Pre-empting too fast...................................41
5.7.1. Tunnelling................................. 42 5.7. Other potential extensions.............................42
5.7.2. Multi-domain and multi-operator usage........... 43 5.7.1. Tunnelling........................................42
5.7.3. Preferential dropping of pre-emption marked packets43 5.7.2. Multi-domain and multi-operator usage.............43
5.7.4. Adaptive bandwidth for the Controlled Load service 44 5.7.3. Preferential dropping of pre-emption marked packets44
5.7.4. Adaptive bandwidth for the Controlled Load service.44
5.7.5. Controlled Load service with end-to-end Pre-Congestion 5.7.5. Controlled Load service with end-to-end Pre-Congestion
Notification..................................... 44 Notification............................................45
5.7.6. MPLS-TE ................................... 45 5.7.6. MPLS-TE..........................................45
6. Relationship to other QoS mechanisms.................... 46 6. Relationship to other QoS mechanisms........................46
6.1. IntServ Controlled Load .......................... 46 6.1. IntServ Controlled Load................................46
6.2. Integrated services operation over DiffServ.......... 46 6.2. Integrated services operation over DiffServ............46
6.3. Differentiated Services .......................... 46 6.3. Differentiated Services................................46
6.4. ECN........................................... 47 6.4. ECN...................................................47
6.5. RTECN......................................... 47 6.5. RTECN.................................................47
6.6. RMD........................................... 47 6.6. RMD...................................................48
6.7. RSVP Aggregation over MPLS-TE...................... 48 6.7. RSVP Aggregation over MPLS-TE..........................48
7. Security Considerations............................... 49 6.8. Other Network Admission Control Approaches.............48
8. Acknowledgements.................................... 49 7. Security Considerations.....................................49
9. Comments solicited................................... 49 8. Acknowledgements...........................................49
10. Changes from earlier versions of the draft.............. 50 9. Comments solicited.........................................50
11. Appendices ........................................ 51 10. Changes from earlier versions of the draft.................50
11.1. Appendix A: Explicit Congestion Notification ........ 51 11. Appendices................................................52
11.1. Appendix A: Explicit Congestion Notification..........52
11.2. Appendix B: What is distributed measurement-based admission 11.2. Appendix B: What is distributed measurement-based admission
control?........................................... 52 control?...................................................53
11.3. Appendix C: Calculating the Exponentially weighted moving 11.3. Appendix C: Calculating the Exponentially weighted moving
average (EWMA)...................................... 53 average (EWMA).............................................54
12. References ........................................ 55 12. References................................................56
Authors' Addresses..................................... 60 Authors' Addresses............................................61
Intellectual Property Statement .......................... 62 Intellectual Property Statement................................63
Disclaimer of Validity.................................. 62 Disclaimer of Validity........................................63
Copyright Statement.................................... 62 Copyright Statement...........................................63
1. Introduction 1. Introduction
1.1. Summary 1.1. Summary
This document describes a deployment model to achieve an end-to-end This document describes a deployment model to achieve an end-to-end
Controlled Load service by using (within a large region of the Controlled Load service by using (within a large region of the
Internet) DiffServ and edge-to-edge distributed measurement-based Internet) DiffServ and edge-to-edge distributed measurement-based
admission control and flow pre-emption. Controlled load service is a admission control and flow pre-emption. Controlled load service is a
quality of service (QoS) closely approximating the QoS that the same quality of service (QoS) closely approximating the QoS that the same
flow would receive from a lightly loaded network element [RFC2211]. flow would receive from a lightly loaded network element [RFC2211].
Controlled Load (CL) is useful for inelastic flows such as those for Controlled Load (CL) is useful for inelastic flows such as those for
real-time media. real-time media.
In line with the "IntServ over DiffServ" framework defined in In line with the "IntServ over DiffServ" framework defined in
[RFC2998], the CL service is supported end-to-end and RSVP signalling [RFC2998], the CL service is supported end-to-end and RSVP signalling
[RFC2205] is used end-to-end, over an edge-to-edge DiffServ region. [RFC2205] is used end-to-end, over an edge-to-edge DiffServ region.
We call the DiffServ region the "CL-region".
___ ___ _______________________________________ ____ ___ ___ ___ _______________________________________ ____ ___
| | | | | | | | | |
| | | | |Ingress Interior Egress| | | | | | | | | |Ingress Interior Egress| | | | |
| | | | |gateway routers gateway| | | | | | | | | |gateway routers gateway| | | | |
| | | | |-------+ +-------+ +-------+ +------| | | | | | | | | |-------+ +-------+ +-------+ +------| | | | |
| | | | | PCN- | | PCN- | | PCN- | | | | | | | | | | | | PCN- | | PCN- | | PCN- | | | | | | |
| |..| |..|marking|..|marking|..|marking|..| Meter|..| |..| | | |..| |..|marking|..|marking|..|marking|..| Meter|..| |..| |
| | | | |-------+ +-------+ +-------+ +------| | | | | | | | | |-------+ +-------+ +-------+ +------| | | | |
| | | | | \ / | | | | | | | | | | \ / | | | | |
| | | | | \ / | | | | | | | | | | \ / | | | | |
| | | | | \ Congestion-Level-Estimate / | | | | | | | | | | \ Congestion-Level-Estimate / | | | | |
| | | | | \ (for admission control) / | | | | | | | | | | \ (for admission control) / | | | | |
skipping to change at page 6, line 8 skipping to change at page 6, line 8
<------ edge-to-edge signalling -----> <------ edge-to-edge signalling ----->
(for admission control & flow pre-emption) (for admission control & flow pre-emption)
<-------------------end-to-end QoS signalling protocol---------------> <-------------------end-to-end QoS signalling protocol--------------->
Figure 1: Overall QoS architecture (NB terminology explained later) Figure 1: Overall QoS architecture (NB terminology explained later)
Figure 1 shows an example of an overall QoS architecture, where the Figure 1 shows an example of an overall QoS architecture, where the
two access networks are connected by a CL-region. Another possibility two access networks are connected by a CL-region. Another possibility
is that there are several CL-regions between the access networks - is that there are several CL-regions between the access networks -
each would operate the Pre-Congestion Notification mechanisms each would operate the Pre-Congestion Notification mechanisms
separately. separately. The document assumes RSVP as the end-to-end QoS
signalling protocol. However, the RSVP signalling may itself be
In Section 1.1.1 we summarise how admission of new CL microflows is originated or terminated by proxies still closer to the edge of the
controlled so as to deliver the required QoS. In abnormal network, such as home hubs or the like, triggered in turn by
circumstances, for instance a disaster affecting multiple interior application layer signalling. [RFC2998] and our approach are compared
routers, then the QoS on existing CL microflows may degrade even if further in Section 6.2.
care was exercised when admitting those microflows before those
circumstances. Therefore we also propose a mechanism (summarised in
Section 1.1.2) to pre-empt some of the existing microflows. Then
remaining microflows retain their expected QoS, while improved QoS is
quickly restored to lower priority traffic.
As a fundamental building block to support these two mechanisms, we
introduce "Pre-Congestion Notification". Pre-Congestion Notification
(PCN) builds on the concepts of RFC 3168, "The addition of Explicit
Congestion Notification to IP". The [PCN] document defines the
respective algorithms that determine when a PCN-enabled router marks
a packet with Admission Marking or Pre-emption Marking, depending on
the traffic level.
In order to support CL traffic we would expect PCN to supplement the
existing Expedited Forwarding (EF). Within the controlled edge-to-
edge region, a particular packet receives the Pre-Congestion
Notification (PCN) behaviour if the packet's differentiated services
codepoint (DSCP) is set to EF and also the ECN field indicates ECN
Capable Transport. However, PCN is not only intended to supplement
EF. PCN is specified (in [PCN]) as a building block which can
supplement the scheduling behaviour of other PHBs.
There are various possible ways to encode the markings into a packet,
using the ECN field and perhaps other DSCPs, which are discussed in
[PCN]. In this draft we use the abstract names Admission Marking and
Pre-emption Marking.
This framework assumes that the Pre-Congestion Notification behaviour Flows must enter and leave the CL-region through its ingress and
is used in a controlled environment, i.e. within the controlled edge- egress gateways, and they need traffic descriptors that are policed
to-edge region. by the ingress gateway (NB the policing function is out of this
document's scope). The overall CL-traffic between two border routers
is called a "CL-region-aggregate".
1.1.1. Flow admission control The document introduces a mechanism for flow admission control:
should a new flow be admitted into a specific CL-region-aggregate?
Admission control protects the QoS of existing CL-flows in normal
circumstances. In abnormal circumstances, for instance a disaster
affecting multiple interior routers, then the QoS on existing CL
microflows may degrade even if care was exercised when admitting
those microflows before those circumstances. Therefore we also
propose a mechanism for flow pre-emption: how much traffic, in a
specific CL-region-aggregate, should be pre-empted in order to
preserve the QoS of the remaining CL-flows? Flow pre-emption also
restores QoS to lower priority traffic.
This document describes a new admission control procedure for an As a fundamental building block to enable these two mechanisms, each
edge-to-edge region, which uses new per-hop Pre-Congestion link of the CL-region is associated with a configured-admission-rate
Notification 'admission marking' as a fundamental building block. In and configured-pre-emption-rate; the former is usually significantly
turn, an end-to-end CL service would use this as a building block larger than the latter. If traffic in a specific DiffServ class ("CL-
within a broader QoS architecture. traffic") on the link exceeds these rates then packets are marked
with "Admission Marking" or "Pre-emption Marking". The algorithms
that determine the number of packets marked are outlined in Section 3
and detailed in [PCN]. PCN marking (Pre-Congestion Notification)
builds on the concepts of RFC 3168, "The addition of Explicit
Congestion Notification to IP" (which is briefly summarised in
Appendix A).
The per-hop, edge-to-edge and end-to-end aspects are now briefly Traffic rate on link ^
introduced in turn. |
| Drop packets
link bandwidth -|---------------------------
|
| Pre-emption Mark packets
configured-pre-emption-rate -|---------------------------
|
| Admission Mark packets
configured-admission-rate -|---------------------------
|
| No marking of packets
|
+---------------------------
Appendix A provides a brief summary of Explicit Congestion Figure 2: Packet Marking by Routers
Notification (ECN) [RFC3168]. It specifies that a router sets the ECN
field to the Congestion Experienced (CE) value as a warning of
incipient congestion. RFC3168 doesn't specify a particular algorithm
for setting the CE codepoint, although Random Early Detection (RED)
is expected to be used.
Pre-Congestion Notification (PCN) builds on the concepts of ECN. PCN Gateways of the CL-region make measurements of packet rates and their
introduces a new algorithm that Admission Marks packets before there PCN markings and convert them into decisions about whether to admit
is any significant build-up of CL packets in the queue. Admission new flows, and (if necessary) into the rate of excess traffic that
marked packets therefore act as an "early warning" when the amount of should be pre-empted. These mechanisms are detailed in Section 3 and
packets flowing is getting close to the engineered capacity. Hence it briefly outlined in the next few paragraphs.
can be used with per-hop behaviours (PHBs) designed to operate with
very low queue occupancy, such as Expedited Forwarding (EF). Note
that our use of the ECN field operates across the CL-region, i.e.
edge-to-edge, and not host-to-host as in [RFC3168].
Turning next to the edge-to-edge aspect. All routers within a region The admission control mechanism for a new flow entering the network
of the Internet, which we call the CL-region, apply the PHB used for at ingress gateway G0 and leaving it at egress gateway G1 relies on
CL traffic and the Pre-Congestion Notification behaviour. Traffic feedback from the egress gateway G1 about the existing CL-region-
must enter/leave the CL-region through ingress/egress gateways, which aggregate between G0 and G1. This feedback is generated as follows.
have special functionality. Typically the CL-region is the core or All routers meter the rate of the CL-traffic on their outgoing links
backbone of an operator. The CL service is achieved "edge-to-edge" and mark the packets with the Admission Mark if the configured-
across the CL-region, by using distributed measurement-based admission-rate is exceeded. Egress gateway G1 measures the Admission
admission control: the decision whether to admit a new microflow Marks for each of its CL-region-aggregates separately. If the
depends on a measurement of the existing traffic between the same fraction of traffic on a CL-region-aggregate that is Admission Marked
pair of ingress and egress gateways (i.e. the same pair as the exceeds some threshold, no further flows should be admitted into this
prospective new microflow). (See Appendix B for further discussion on CL-region-aggregate. Because sources vary their data rates (amongst
"What is distributed measurement-based admission control?") other reasons) the rate of the CL-traffic on a link may fluctuate
above and below the configured-admission-rate. Hence to get more
stable information, the egress gateway measures the fraction as a
moving average, called the Congestion-Level-Estimate. This is
signalled from the egress G1 to the ingress G0, to enable the ingress
to block new flows.
As CL packets travel across the CL-region, routers will admission Admission control seems most useful for DiffServ's Controlled load
mark packets (according to the Pre-Congestion Notification algorithm) service. In order to support CL traffic we would expect PCN to
as an "early warning" of potential congestion, i.e. before there is supplement the existing scheduling behaviour Expedited Forwarding
any significant build-up of CL packets in the queue. For traffic from (EF). Since PCN gives an "early warning" of potential congestion
each remote ingress gateway, the CL-region's egress gateway measures (hence "pre-congestion notification"), admission control can kick in
the fraction of CL traffic that is admission marked. The egress before there is any significant build up of packets in routers -
gateway calculates the value on a per bit basis as a moving average which is exactly the performance required for CL. However, PCN is not
(exponentially weighted is suggested), and which we term Congestion- only intended to supplement EF. PCN is specified (in [PCN]) as a
Level-Estimate (CLE). Then it reports it to the CL-region's ingress building block which can supplement the scheduling behaviour of other
gateway, piggy-backed on the signalling for a new flow. The ingress PHBs.
gateway only admits the new CL microflow if the Congestion-Level-
Estimate is less than the value of the CLE-threshold. Hence
previously accepted CL microflows will suffer minimal queuing delay,
jitter and loss.
In turn, the edge-to-edge architecture is a building block in The function to pre-empt flows (or allow the potential to pre-empt
delivering an end-to-end CL service. The approach is similar to that them) relies on feedback from the egress gateway about the CL-region-
described in [RFC2998] for Integrated services operation over aggregates. This feedback is generated as follows. All routers meter
DiffServ networks. Like [RFC2998], an IntServ class (CL in our case) the rate of the CL-traffic on their outgoing links, and if the rate
is achieved end-to-end, with a CL-region viewed as a single is in excess of the configured-pre-emption-rate then packets
reservation hop in the total end-to-end path. Interior routers of the amounting to the excess rate are Pre-emption Marked. If the egress
CL-region do not process flow signalling nor do they hold per flow gateway G1 sees a Pre-emption Marked packet then it measures, for
state. We assume that the end-to-end signalling mechanism is RSVP this CL-region-aggregate, the rate of all received packets that
(Section 2.2). However, the RSVP signalling may itself be originated aren't Pre-emption Marked. This is the rate of CL-traffic that the
or terminated by proxies still closer to the edge of the network, network can actually support from G0 to G1, and we thus call it the
such as home hubs or the like, triggered in turn by application layer Sustainable-Aggregate-Rate. The ingress gateway G0 compares the
signalling. [RFC2998] and our approach are compared further in Sustainable-Aggregate-Rate with the rate that it is sending towards
Section 6.2. G1, and hence determines the required traffic rate reduction. The
document assumes flow pre-emption as the way of reacting to this
information, ie stopping sufficient flows to reduce the rate to the
Sustainable-Aggregate-Rate. However, this isn't mandated, for
instance policy or regulation may prevent pre-emption of some flows -
such considerations are out of scope of this document.
An important benefit compared with the IntServ over DiffServ model 1.2. Key benefits
[RFC2998] arises from the fact that the load is controlled
dynamically rather than with traffic conditioning agreements (TCAs).
TCAs were originally introduced in the (informational) DiffServ
architecture [RFC2475] as an alternative to reservation processing in
the interior region in order to reduce the burden on interior
routers. With TCAs, in practice service providers rely on
subscription-time Service Level Agreements that statically define the
parameters of the traffic that will be accepted from a customer. The
problem arises because the TCA at the ingress must allow any
destination address, if it is to remain scalable. But for longer
topologies, the chances increase that traffic will focus on an
interior resource, even though it is within contract at the ingress
[Reid], e.g. all flows converge on the same egress gateway. Even
though networks can be engineered to make such failures rare, when
they occur all inelastic flows through the congested resource fail
catastrophically.
Distributed measurement-based admission control avoids reservation We believe that the mechanisms described in this document are simple,
processing (whether per flow or aggregated) on interior routers but scalable, and robust because:
flows are still blocked dynamically in response to actual congestion
on any interior router. Hence there is no need for accurate or
conservative prediction of the traffic matrix.
1.1.2. Flow pre-emption o Per flow state is only required at the ingress gateways to prevent
non-admitted CL traffic from entering the PCN-region. Other
network entities are not aware of individual flows.
An essential QoS issue in core and backbone networks is being able to o For each of its links a router has Admission Marking and Pre-
cope with failures of routers and links. The consequent re-routing emption Marking behaviours. These markers operate on the overall
can cause severe congestion on some links and hence degrade the QoS CL traffic of the respective link. Therefore, there are no
experienced by on-going microflows and other, lower priority traffic. scalability concerns.
Even when the network is engineered to sustain a single link failure,
multiple link failures (e.g. due to a fibre cut, router failure or a
natural disaster) can cause violation of capacity constraints and
resulting QoS failures. Our solution uses rate-based flow pre-
emption, so that sufficient of the previously admitted CL microflows
are dropped to ensure that the remaining ones again receive QoS
commensurate with the CL service and at least some QoS is quickly
restored to other traffic classes.
The solution involves four steps. First, triggering the ingress o The information of these measurements is implicitly signalled to
gateway to test whether pre-emption may be needed. A router enhanced the egress gateways by the marks in the packet headers. No
with Pre-Congestion Notification may optionally include an algorithm protocol actions (explicit messages) are required.
that Pre-emption Marks packets. Reception of a packet with such a
marking alerts the egress gateway that pre-emption may be needed,
which in turn sends a Pre-emption Alert message to the ingress
gateway. Secondly, calculating the right amount of traffic to drop.
This involves the egress gateway measuring, and reporting to the
ingress gateway, the current rate of CL traffic received from that
particular ingress gateway. This is the CL rate which the network can
actually support from that ingress gateway to that egress gateway,
and we thus call it the Sustainable-Aggregate-Rate. The ingress
gateway compares the Sustainable-Aggregate-Rate) with the rate that
it is sending and hence determines how much traffic needs to be pre-
empted. Thirdly, choosing which flows to shed in order to drop the
traffic calculated in the second step. Information on the priority of
flows may be held by the ingress gateway, or by some out of band
policy decision point. How these systems co-ordinate to determine
which flows to drop is outside the scope of this document, but
between them they have all the information necessary to make the
decision. Fourthly, tearing down reservations for the chosen flows.
The ingress gateway triggers standard tear-down messages for the
reservation protocol in use. In turn, this is expected to result in
end-systems tearing down the corresponding sessions (e.g. voice
calls) using the corresponding session control protocols.
The focus of this document is on the first two steps, i.e. o The egress gateways make separate measurements for each ingress
determining that pre-emption may be needed and estimating how much gateway of packets. Each meter operates on the overall CL traffic
traffic needs to be pre-empted. We provide some hints about the of a particular CL-region-aggregate. Therefore, there are no
latter two steps in Section 3.2.3, but don't try to provide full scalability concerns as long as the number of ingress gateways is
guidance as it greatly depends on the particular detailed operational not overwhelmingly large.
situation.
The solution operates within a little over one round trip time - the o Feedback signalling is required between all pairs of ingress and
time required for microflow packets that have experienced Pre-emption egress gateways and the signalled information is on the basis of
Marking to travel downstream through the CL-region and arrive at the the corresponding CL-region-aggregate, i.e. it is also unaware of
egress gateway, plus some additional time for the egress gateway to individual flows.
measure the rate seen after it has been alerted that pre-emption may
be needed, and the time for the egress gateway to report this
information to the ingress gateway.
1.1.3. Both admission control and pre-emption o The configured-admission-rates can be chosen small enough that
admitted traffic can still be carried after a rerouting in most
failure cases. This is an important feature as QoS violations in
core networks due to link failures are more likely than QoS
violations due to increased traffic volume.
This document describes both the admission control and pre-emption o The admitted load is controlled dynamically. Therefore it adapts
mechanisms, and we suggest that an operator uses both. However, we do as the traffic matrix changes, and also if the network topology
not require this and some operators may want to implement only one. changes (eg after a link failure). Hence an operator can be less
conservative when deploying network capacity, and less accurate in
their prediction of the traffic matrix. Also, controlling the load
using statically provisioned capacity per ingress (regardless of
the egress of a flow), as is typical in the DiffServ architecture
[RFC2475], can lead to focussed overload: many flows happen to
focus on a particular link and then all flows through the
congested link fail catastrophically (Section 6.2).
For example, an operator could use just admission control, solving o The pre-emption function complements admission control. It allows
heavy congestion (caused by re-routing) by 'just waiting' - as the network to recover from sudden unexpected surges of CL-traffic
sessions end, existing microflows naturally depart from the system on some links, thus restoring QoS to the remaining flows. Such
over time, and the admission control mechanism will prevent admission scenarios are very unlikely but not impossible. They can be caused
of new microflows that use the affected links. So the CL-region will by large network failures that redirect lots of admitted CL-
naturally return to normal controlled load service, but with reduced traffic to other links, or by malfunction of the measurement-based
capacity. The drawback of this approach would be that until flows admission control in the presence of admitted flows that send for
naturally depart to relieve the congestion, all flows and lower a while with an atypically low rate and increase their rates in a
priority services will be adversely affected. As another example, an correlated way.
operator could use just admission control, avoiding heavy congestion
(caused by re-routing) by 'capacity planning' - by configuring
admission control thresholds to lower levels than the network could
accept in normal situations such that the load after failure is
expected to stay below acceptable levels even with reduced network
resources.
On the other hand, an operator could just rely for admission control 1.3. Terminology
on the traffic conditioning agreements of the DiffServ architecture
[RFC2475]. The pre-emption mechanism described in this document would
be used to counteract the problem described at the end of Section
1.1.1.
1.2. Terminology EDITOR'S NOTE: Terminology in this document is (hopefully) consistent
with that in [PCN]. However, it may not be consistent with the
terminology in other PCN-related documents. The PCN Working Group (if
formed) will need to agree a single set of terminology.
This terminology is copied from the pre-congestion notification This terminology is copied from the pre-congestion notification
marking draft [PCN]: marking draft [PCN]:
o Pre-Congestion Notification (PCN): two new algorithms that o Pre-Congestion Notification (PCN): two new algorithms that
determine when a PCN-enabled router Admission Marks and Pre- determine when a PCN-enabled router Admission Marks and Pre-
emption Marks a packet, depending on the traffic level. emption Marks a packet, depending on the traffic level.
o Admission Marking condition: the traffic level is such that the o Admission Marking condition: the traffic level is such that the
router Admission Marks packets. The router provides an "early router Admission Marks packets. The router provides an "early
skipping to change at page 12, line 23 skipping to change at page 11, line 27
o Sustainable-Aggregate-Rate: the rate of traffic that the network o Sustainable-Aggregate-Rate: the rate of traffic that the network
can actually support for a specific CL-region-aggregate. So it is can actually support for a specific CL-region-aggregate. So it is
measured by an egress gateway for the CL packets from a particular measured by an egress gateway for the CL packets from a particular
ingress gateway. ingress gateway.
o Ingress-Aggregate-Rate: the rate of traffic that is being sent on o Ingress-Aggregate-Rate: the rate of traffic that is being sent on
a specific CL-region-aggregate. So it is measured by an ingress a specific CL-region-aggregate. So it is measured by an ingress
gateway for the CL packets sent towards a particular egress gateway for the CL packets sent towards a particular egress
gateway. gateway.
1.3. Existing terminology 1.4. Existing terminology
This is a placeholder for useful terminology that is defined This is a placeholder for useful terminology that is defined
elsewhere. elsewhere.
1.4. Standardisation requirements 1.5. Standardisation requirements
The framework described in this document has two new standardisation The framework described in this document has two new standardisation
requirements: requirements:
o new Pre-Congestion Notification for Admission Marking and Pre- o new Pre-Congestion Notification for Admission Marking and Pre-
emption Marking are required, as detailed in [PCN]. emption Marking are required, as detailed in [PCN].
o the end-to-end signalling protocol needs to be modified to carry o the end-to-end signalling protocol needs to be modified to carry
the Congestion-Level-Estimate report (for admission control) and the Congestion-Level-Estimate report (for admission control) and
the Sustainable-Aggregate-Rate (for flow pre-emption). With our the Sustainable-Aggregate-Rate (for flow pre-emption). With our
skipping to change at page 13, line 8 skipping to change at page 12, line 11
detailed in [RSVP-PCN], for example to carry the Congestion-Level- detailed in [RSVP-PCN], for example to carry the Congestion-Level-
Estimate and Sustainable-Aggregate-Rate information from egress Estimate and Sustainable-Aggregate-Rate information from egress
gateway to ingress gateway. gateway to ingress gateway.
o We are discussing what to standardise about the gateway's o We are discussing what to standardise about the gateway's
behaviour. behaviour.
Other than these things, the arrangement uses existing IETF protocols Other than these things, the arrangement uses existing IETF protocols
throughout, although not in their usual architecture. throughout, although not in their usual architecture.
1.5. Structure of rest of the document 1.6. Structure of rest of the document
Section 2 describes some key aspects of the deployment model: our Section 2 describes some key aspects of the deployment model: our
goals, assumptions and the benefits we believe it has. Section 3 goals and assumptions. Section 3 describes the deployment model,
describes the deployment model, whilst Section 4 summarises the whilst Section 4 summarises the required changes to the various
required changes to the various routers in the CL-region. Section 5 routers in the CL-region. Section 5 outlines some limitations of PCN
outlines some limitations of PCN that we've identified in this that we've identified in this deployment model; it also discusses
deployment model; it also discusses some potential solutions, and some potential solutions, and other possible extensions. Section 6
other possible extensions. Section 6 provides some comparison with provides some comparison with existing QoS mechanisms.
existing QoS mechanisms.
2. Key aspects of the deployment model 2. Key aspects of the deployment model
EDITOR'S NOTE: The material in Section 2 will eventually disappear,
as it will be covered by the problem statement of the PCN Working
Group (if formed).
In this section we discuss the key aspects of the deployment model: In this section we discuss the key aspects of the deployment model:
o At a high level, our key goals, i.e. the functionality that we o At a high level, our key goals, i.e. the functionality that we
want to achieve want to achieve
o The assumptions that we're prepared to make o The assumptions that we're prepared to make
o The consequent benefits they bring
2.1. Key goals 2.1. Key goals
The deployment model achieves an end-to-end controlled load (CL) The deployment model achieves an end-to-end controlled load (CL)
service where a segment of the end-to-end path is an edge-to-edge service where a segment of the end-to-end path is an edge-to-edge
Pre-Congestion Notification region. CL is a quality of service (QoS) Pre-Congestion Notification region. CL is a quality of service (QoS)
closely approximating the QoS that the same flow would receive from a closely approximating the QoS that the same flow would receive from a
lightly loaded network element [RFC2211]. It is useful for inelastic lightly loaded network element [RFC2211]. It is useful for inelastic
flows such as those for real-time media. flows such as those for real-time media.
o The CL service should be achieved despite varying load levels of o The CL service should be achieved despite varying load levels of
skipping to change at page 17, line 18 skipping to change at page 17, line 5
Expedited Forwarding's PHB, but supplemented with Pre-Congestion Expedited Forwarding's PHB, but supplemented with Pre-Congestion
Notification. If this is possible, other PHBs (like Assured Notification. If this is possible, other PHBs (like Assured
Forwarding) could be supplemented with the same new behaviours. Forwarding) could be supplemented with the same new behaviours.
This is similar to how RFC3168 ECN was defined to supplement any This is similar to how RFC3168 ECN was defined to supplement any
PHB. PHB.
o Routing: we are looking in greater detail at the solution in the o Routing: we are looking in greater detail at the solution in the
presence of Equal Cost Multi-Path routing and at suitable presence of Equal Cost Multi-Path routing and at suitable
enhancements. See also the 'ECMP' section 5.1 later. enhancements. See also the 'ECMP' section 5.1 later.
2.3. Key benefits
We believe that the mechanism described in this document has several
advantages:
o It achieves statistical guarantees of quality of service for
microflows, delivering a very low delay, jitter and packet loss
service suitable for applications like voice and video calls that
generate real time inelastic traffic. This is because of its per
microflow admission control scheme, combined with its dynamic on-
path "early warning" of potential congestion. The guarantee is at
least as strong as with IntServ Controlled Load (Section 6.1
mentions why the guarantee may be somewhat better), but without
the scalability problems of per-microflow IntServ.
o It can support "Emergency" and military Multi-Level Pre-emption
and Priority (MLPP) services, even in times of heavy congestion
(perhaps caused by failure of a router within the CL-region), by
pre-empting on-going "ordinary CL microflows". See also Section
4.5.
o It scales well, because there is no signal processing or per flow
state held by the interior routers of the CL-region. Note that
interior routers only hold state per outgoing interface - they do
not hold state per CL-region-aggregate nor per flow.
o It is resilient, again because no per flow state is held by the
interior routers of the CL-region. Hence during an interior
routing change caused by a router failure, no microflow state has
to be relocated. The flow pre-emption mechanism further helps
resilience because it rapidly reduces the load to one that the CL-
region can support.
o It helps preserve, through the flow pre-emption mechanism, QoS to
as many microflows as possible and to lower priority traffic in
times of heavy congestion (e.g. caused by failure of an interior
router). Otherwise long-lived microflows could cause loss on all
CL microflows for a long time.
o It avoids the potential catastrophic failure problem when the
DiffServ architecture is used in large networks using statically
provisioned capacity. This is achieved by controlling the load
dynamically, based on edge-to-edge-path real-time measurement of
Pre-Congestion Notification, as discussed in Section 1.1.1.
o It requires minimal new standardisation, because it reuses
existing QoS protocols and algorithms.
o It can be deployed incrementally, region by region or network by
network. Not all the regions or networks on the end-to-end path
need to have it deployed. Two CL-regions can even be separated by
a network that uses another QoS mechanism (e.g. MPLS-TE).
o It provides a deployment path for use of ECN for real-time
applications. Operators can gain experience of ECN before its
applicability to end-systems is understood and end terminals are
ECN capable.
3. Deployment model 3. Deployment model
3.1. Admission control 3.1. Admission control
In this section we describe the admission control mechanism. We In this section we describe the admission control mechanism. We
discuss the three pieces of the solution and then give an example of discuss the three pieces of the solution and then give an example of
how they fit together in a use case: how they fit together in a use case:
o the new Pre-Congestion Notification for Admission Marking used by o the new Pre-Congestion Notification for Admission Marking used by
all routers in the CL-region all routers in the CL-region
skipping to change at page 22, line 19 skipping to change at page 20, line 19
they fit together in a use case: they fit together in a use case:
o How an ingress gateway is triggered to test whether flow pre- o How an ingress gateway is triggered to test whether flow pre-
emption may be needed emption may be needed
o How an ingress gateway determines the right amount of CL traffic o How an ingress gateway determines the right amount of CL traffic
to drop to drop
The mechanism is defined in [PCN] and [RSVP-PCN]. The mechanism is defined in [PCN] and [RSVP-PCN].
Two subsequent steps could be:
o Choose which flows to shed, influenced by their priority and other
policy information
o Tear down the reservations for the chosen flows
We provide some hints about these latter two steps in Section 3.2.3,
but don't try to provide full guidance as it greatly depends on the
particular detailed operational situation.
An essential QoS issue in core and backbone networks is being able to
cope with failures of routers and links. The consequent re-routing
can cause severe congestion on some links and hence degrade the QoS
experienced by on-going microflows and other, lower priority traffic.
Even when the network is engineered to sustain a single link failure,
multiple link failures (e.g. due to a fibre cut, router failure or a
natural disaster) can cause violation of capacity constraints and
resulting QoS failures. Our solution uses rate-based flow pre-
emption, so that sufficient of the previously admitted CL microflows
are dropped to ensure that the remaining ones again receive QoS
commensurate with the CL service and at least some QoS is quickly
restored to other traffic classes.
3.2.1. Alerting an ingress gateway that flow pre-emption may be needed 3.2.1. Alerting an ingress gateway that flow pre-emption may be needed
Alerting an ingress gateway that flow pre-emption may be needed is a Alerting an ingress gateway that flow pre-emption may be needed is a
two stage process: a router in the CL-region alerts an egress gateway two stage process: a router in the CL-region alerts an egress gateway
that flow pre-emption may be needed; in turn the egress gateway that flow pre-emption may be needed; in turn the egress gateway
alerts the relevant ingress gateway. Every router in the CL-region alerts the relevant ingress gateway. Every router in the CL-region
has the ability to alert egress gateways, which may be done either has the ability to alert egress gateways, which may be done either
explicitly or implicitly: explicitly or implicitly:
o Explicit - the router per-hop behaviour is supplemented with a new o Explicit - the router per-hop behaviour is supplemented with a new
skipping to change at page 23, line 5 skipping to change at page 21, line 30
that packets are pre-emption marked before the actual queue builds that packets are pre-emption marked before the actual queue builds
up. The algorithm's main parameter is the configured-pre-emption- up. The algorithm's main parameter is the configured-pre-emption-
rate, which is set lower than the link speed (but higher than the rate, which is set lower than the link speed (but higher than the
configured-admission-rate). Thus pre-emption marked packets indicate configured-admission-rate). Thus pre-emption marked packets indicate
that the CL traffic rate is reaching the configured-pre-emption-rate that the CL traffic rate is reaching the configured-pre-emption-rate
and so act as an "early warning" that the engineered capacity is and so act as an "early warning" that the engineered capacity is
nearly reached. Therefore they indicate that it may be advisable to nearly reached. Therefore they indicate that it may be advisable to
pre-empt some of the existing CL flows in order to preserve the QoS pre-empt some of the existing CL flows in order to preserve the QoS
of the others. of the others.
Note that the pre-emption marking algorithm doesn't measure the
packets that are already Pre-emption Marked. This ensures that in a
scenario with several links that are above their configured-pre-
emption-rate, then at the egress gateway the rate of packets
excluding Pre-emption Marked ones truly does represent the
Sustainable-Aggregate-Rate(see below for explanation).
Note that the explicit mechanism only makes sense if all the routers Note that the explicit mechanism only makes sense if all the routers
in the CL-region have the functionality so that the egress gateways in the CL-region have the functionality so that the egress gateways
can rely on the explicit mechanism. Otherwise there is the danger can rely on the explicit mechanism. Otherwise there is the danger
that the traffic happens to focus on a router without it, and egress that the traffic happens to focus on a router without it, and egress
gateways then have also to watch for implicit pre-emption alerts. gateways then have also to watch for implicit pre-emption alerts.
When one or more packets in a CL-region-aggregate alert the egress When one or more packets in a CL-region-aggregate alert the egress
gateway of the need for flow pre-emption, whether explicitly or gateway of the need for flow pre-emption, whether explicitly or
implicitly, the egress puts that CL-region-aggregate into the Pre- implicitly, the egress puts that CL-region-aggregate into the Pre-
emption Alert state. For each CL-region-aggregate in alert state it emption Alert state. For each CL-region-aggregate in alert state it
skipping to change at page 26, line 8 skipping to change at page 24, line 40
this packet is part of (by using a five-tuple filter and comparing it this packet is part of (by using a five-tuple filter and comparing it
with state installed at admission) and hence which ingress gateway with state installed at admission) and hence which ingress gateway
the packet came from. It sets up a meter to measure the traffic rate the packet came from. It sets up a meter to measure the traffic rate
from this ingress gateway, and as soon as possible sends a message to from this ingress gateway, and as soon as possible sends a message to
the ingress gateway. This message alerts the ingress gateway that the ingress gateway. This message alerts the ingress gateway that
pre-emption may be needed and contains the traffic rate measured by pre-emption may be needed and contains the traffic rate measured by
the egress gateway. Then the ingress gateway determines the traffic the egress gateway. Then the ingress gateway determines the traffic
rate that it is sending towards this egress gateway and hence it can rate that it is sending towards this egress gateway and hence it can
calculate the amount of traffic that needs to be pre-empted. calculate the amount of traffic that needs to be pre-empted.
The solution operates within a little over one round trip time - the
time required for microflow packets that have experienced Pre-emption
Marking to travel downstream through the CL-region and arrive at the
egress gateway, plus some additional time for the egress gateway to
measure the rate seen after it has been alerted that pre-emption may
be needed, and the time for the egress gateway to report this
information to the ingress gateway.
The ingress gateway could now just shed random microflows, but it is The ingress gateway could now just shed random microflows, but it is
better if the least important ones are dropped. The ingress gateway better if the least important ones are dropped. The ingress gateway
could use information stored locally in each reservation's state could use information stored locally in each reservation's state
(such as for example the RSVP pre-emption priority of [RSVP- (such as for example the RSVP pre-emption priority of [RSVP-
PREEMPTION] or the RSVP admission priority of [RSVP-EMERGENCY]) as PREEMPTION] or the RSVP admission priority of [RSVP-EMERGENCY]) as
well as information provided by a policy decision point in order to well as information provided by a policy decision point in order to
decide which of the flows to shed (or perhaps which ones not to decide which of the flows to shed (or perhaps which ones not to
shed). This way, flow pre-emption can also helps emergency/military shed). This way, flow pre-emption can also helps emergency/military
calls by taking into account the corresponding priorities (as calls by taking into account the corresponding priorities (as
conveyed in RSVP policy elements) when selecting calls to be pre- conveyed in RSVP policy elements) when selecting calls to be pre-
skipping to change at page 27, line 5 skipping to change at page 25, line 36
significantly less than the physical line capacity, flow pre-emption significantly less than the physical line capacity, flow pre-emption
may be triggered before any congestion has actually occurred and may be triggered before any congestion has actually occurred and
before any packet is dropped. before any packet is dropped.
We extend the scenario further by imagining that (due to a disaster We extend the scenario further by imagining that (due to a disaster
of some kind) further routers in the CL-region fail during the time of some kind) further routers in the CL-region fail during the time
taken by the pre-emption process described above. This is handled taken by the pre-emption process described above. This is handled
naturally, as packets will continue to be pre-emption marked and so naturally, as packets will continue to be pre-emption marked and so
the pre-emption process will happen for a second time. the pre-emption process will happen for a second time.
3.3. Both admission control and pre-emption
This document describes both the admission control and pre-emption
mechanisms, and we suggest that an operator uses both. However, we do
not require this and some operators may want to implement only one.
For example, an operator could use just admission control, solving
heavy congestion (caused by re-routing) by 'just waiting' - as
sessions end, existing microflows naturally depart from the system
over time, and the admission control mechanism will prevent admission
of new microflows that use the affected links. So the CL-region will
naturally return to normal controlled load service, but with reduced
capacity. The drawback of this approach would be that until flows
naturally depart to relieve the congestion, all flows and lower
priority services will be adversely affected. As another example, an
operator could use just admission control, avoiding heavy congestion
(caused by re-routing) by 'capacity planning' - by configuring
admission control thresholds to lower levels than the network could
accept in normal situations such that the load after failure is
expected to stay below acceptable levels even with reduced network
resources.
On the other hand, an operator could just rely for admission control
on the traffic conditioning agreements of the DiffServ architecture
[RFC2475]. The pre-emption mechanism described in this document would
be used to counteract the problem described at the end of Section
1.1.1.
4. Summary of Functionality 4. Summary of Functionality
This section is intended to provide a systematic summary of the new This section is intended to provide a systematic summary of the new
functionality required by the routers in the CL-region. functionality required by the routers in the CL-region.
A network operator upgrades normal IP routers by: A network operator upgrades normal IP routers by:
o Adding functionality related to admission control and flow pre- o Adding functionality related to admission control and flow pre-
emption to all its ingress and egress gateways emption to all its ingress and egress gateways
skipping to change at page 31, line 13 skipping to change at page 31, line 13
(and, if needed, the pre-emption mechanism) to sort things out. (and, if needed, the pre-emption mechanism) to sort things out.
5. Limitations and some potential solutions 5. Limitations and some potential solutions
In this section we describe various limitations of the deployment In this section we describe various limitations of the deployment
model, and some suggestions about potential ways of alleviating them. model, and some suggestions about potential ways of alleviating them.
The limitations fall into three broad categories: The limitations fall into three broad categories:
o ECMP (Section 5.1): the assumption about routing (Section 2.2) is o ECMP (Section 5.1): the assumption about routing (Section 2.2) is
that all packets between a pair of ingress and egress gateways that all packets between a pair of ingress and egress gateways
follow the same path; ECMP breaks this assumption follow the same path; ECMP breaks this assumption. A study
regarding the accuracy of load balancing schemes can be found in
[LoadBalancing-a] and [LoadBalancing-b].
o The lack of global coordination (Sections 5.2, 5.3 and 5.4): a o The lack of global coordination (Sections 5.2, 5.3 and 5.4): a
decision about admission control or flow pre-emption is made for decision about admission control or flow pre-emption is made for
one aggregate independently of other aggregates one aggregate independently of other aggregates
o Timing and accuracy of measurements (Sections 5.5 and 5.6): the o Timing and accuracy of measurements (Sections 5.5 and 5.6): the
assumption (Section 2.2) that additional load, offered within the assumption (Section 2.2) that additional load, offered within the
reaction time of the measurement-based admission control reaction time of the measurement-based admission control
mechanism, doesn't move the system directly from no congestion to mechanism, doesn't move the system directly from no congestion to
overload (dropping packets). A 'flash crowd' may break this overload (dropping packets). A 'flash crowd' may break this
skipping to change at page 32, line 42 skipping to change at page 32, line 43
or are pre-empted), and there is still the danger that for some or are pre-empted), and there is still the danger that for some
traffic mixes the operator hasn't been cautious enough. traffic mixes the operator hasn't been cautious enough.
o for admission control, probe to obtain a flow-specific congestion- o for admission control, probe to obtain a flow-specific congestion-
level-estimate. Earlier this document suggests continuously level-estimate. Earlier this document suggests continuously
monitoring the congestion-level-estimate. Instead, probe packets monitoring the congestion-level-estimate. Instead, probe packets
could be sent for each prospective new flow. The probe packets could be sent for each prospective new flow. The probe packets
have the same IP address etc as the data packets would have, and have the same IP address etc as the data packets would have, and
hence follow the same ECMP path. However, probing is an extra hence follow the same ECMP path. However, probing is an extra
overhead, depending on how many probe packets need to be sent to overhead, depending on how many probe packets need to be sent to
get a sufficiently accurate congestion-level-estimate. get a sufficiently accurate congestion-level-estimate. Probes also
cause a processing overhead, either for the machine at the
destination address or for the egress gateway to identify and
remove the probe packets.
o for flow pre-emption, only select flows for pre-emption from o for flow pre-emption, only select flows for pre-emption from
amongst those that have actually received a Pre-emption Marked amongst those that have actually received a Pre-emption Marked
packet. Because these flows must have followed an ECMP path that packet. Because these flows must have followed an ECMP path that
goes through an overloaded router. However, it needs some extra goes through an overloaded router. However, it needs some extra
work by the egress gateway, to record this information and report work by the egress gateway, to record this information and report
it to the ingress gateway. it to the ingress gateway.
o for flow pre-emption, a variant of this idea involves introducing o for flow pre-emption, a variant of this idea involves introducing
a new marking behaviour, 'Router Marking'. A router that is pre- a new marking behaviour, 'Router Marking'. A router that is pre-
skipping to change at page 43, line 36 skipping to change at page 43, line 47
(Section 2.2), so that the CL-region could consist of multiple (Section 2.2), so that the CL-region could consist of multiple
domains run by different operators that did not trust each other. domains run by different operators that did not trust each other.
Then only the ingress and egress gateways of the CL-region would take Then only the ingress and egress gateways of the CL-region would take
part in the admission control procedure, i.e. at the ingress to the part in the admission control procedure, i.e. at the ingress to the
first domain and the egress from the final domain. The border routers first domain and the egress from the final domain. The border routers
between operators within the CL-region would only have to do bulk between operators within the CL-region would only have to do bulk
accounting - they wouldn't do per microflow metering and policing, accounting - they wouldn't do per microflow metering and policing,
and they wouldn't take part in signal processing or hold per flow and they wouldn't take part in signal processing or hold per flow
state [Briscoe]. [Re-feedback] explains how a downstream domain can state [Briscoe]. [Re-feedback] explains how a downstream domain can
police that its upstream domain does not 'cheat' by admitting traffic police that its upstream domain does not 'cheat' by admitting traffic
when the downstream path is over-congested. [Re-PCN] proposes how to when the downstream path is congested. [Re-PCN] proposes how to
achieve this with the help of another recently proposed extension to achieve this with the help of another recently proposed extension to
ECN, involving re-echoing ECN feedback [Re-ECN]. ECN, involving re-echoing ECN feedback [Re-ECN].
5.7.3. Preferential dropping of pre-emption marked packets 5.7.3. Preferential dropping of pre-emption marked packets
When the rate of real-time traffic in the specified class exceeds the When the rate of real-time traffic in the specified class exceeds the
maximum configured rate, then a router has to drop some packet(s) maximum configured rate, then a router has to drop some packet(s)
instead of forwarding them on the out-going link. Now when the egress instead of forwarding them on the out-going link. Now when the egress
gateway measures the Sustainable-Aggregate-Rate, neither dropped gateway measures the Sustainable-Aggregate-Rate, neither dropped
packets nor pre-emption marked packets contribute to it. Dropping packets nor pre-emption marked packets contribute to it. Dropping
skipping to change at page 45, line 9 skipping to change at page 45, line 20
aggregation assumption (Section 2.2) doesn't hold. In the extreme it aggregation assumption (Section 2.2) doesn't hold. In the extreme it
may be possible to operate the framework end-to-end, i.e. between end may be possible to operate the framework end-to-end, i.e. between end
hosts. One potential method is to send probe packets to test whether hosts. One potential method is to send probe packets to test whether
the network can support a prospective new CL microflow. The probe the network can support a prospective new CL microflow. The probe
packets would be sent at the same traffic rate as expected for the packets would be sent at the same traffic rate as expected for the
actual microflow, but in order not to disturb existing CL traffic a actual microflow, but in order not to disturb existing CL traffic a
router would always schedule probe packets behind CL ones (compare router would always schedule probe packets behind CL ones (compare
[Breslau00]); this implies they have a new DSCP. Otherwise the [Breslau00]); this implies they have a new DSCP. Otherwise the
routers would treat probe packets identically to CL packets. In order routers would treat probe packets identically to CL packets. In order
to perform admission control quickly, in parts of the network where to perform admission control quickly, in parts of the network where
there are only a few CL microflows, the Pre-Congestion marking there are only a few CL microflows, the algorithm for Admission
behaviour for probe packets would switch from admission marking no Marking described in [PCN] would need to "switch on" very rapidly, ie
packets to admission marking them all for only a minimal increase in go from marking no packets to marking them all for only a minimal
load. increase in the size of the virtual queue.
5.7.6. MPLS-TE 5.7.6. MPLS-TE
[ECN-MPLS] discusses how to extend the deployment model to MPLS, i.e. [ECN-MPLS] discusses how to extend the deployment model to MPLS, i.e.
for admission control of microflows into a set of MPLS-TE aggregates for admission control of microflows into a set of MPLS-TE aggregates
(Multi-protocol label switching traffic engineering). It would (Multi-protocol label switching traffic engineering). It would
require that the MPLS header could include the ECN field, which is require that the MPLS header could include the ECN field, which is
not precluded by RFC3270. See [ECN-MPLS]. not precluded by RFC3270. See [ECN-MPLS].
6. Relationship to other QoS mechanisms 6. Relationship to other QoS mechanisms
skipping to change at page 46, line 50 skipping to change at page 46, line 50
indications of network resource availability. In practice, service indications of network resource availability. In practice, service
providers rely on subscription-time Service Level Agreements (SLAs) providers rely on subscription-time Service Level Agreements (SLAs)
that statically define the parameters of the traffic that will be that statically define the parameters of the traffic that will be
accepted from a customer. The CL mechanism allows dynamic reservation accepted from a customer. The CL mechanism allows dynamic reservation
of resources through the DiffServ domain and, with the potential of resources through the DiffServ domain and, with the potential
extension mentioned in Section 5.7.2, it can span multiple domains extension mentioned in Section 5.7.2, it can span multiple domains
without active policing mechanisms at the borders (unlike DiffServ). without active policing mechanisms at the borders (unlike DiffServ).
Therefore we do not use the traffic conditioning agreements (TCAs) of Therefore we do not use the traffic conditioning agreements (TCAs) of
the (informational) DiffServ architecture [RFC2475]. the (informational) DiffServ architecture [RFC2475].
An important benefit arises from the fact that the load is controlled
dynamically rather than with traffic conditioning agreements (TCAs).
TCAs were originally introduced in the (informational) DiffServ
architecture [RFC2475] as an alternative to reservation processing in
the interior region in order to reduce the burden on interior
routers. With TCAs, in practice service providers rely on
subscription-time Service Level Agreements that statically define the
parameters of the traffic that will be accepted from a customer. The
problem arises because the TCA at the ingress must allow any
destination address, if it is to remain scalable. But for longer
topologies, the chances increase that traffic will focus on an
interior resource, even though it is within contract at the ingress
[Reid], e.g. all flows converge on the same egress gateway. Even
though networks can be engineered to make such failures rare, when
they occur all inelastic flows through the congested resource fail
catastrophically.
[Johnson] compares admission control with a 'generously dimensioned' [Johnson] compares admission control with a 'generously dimensioned'
DiffServ network as ways to achieve QoS. The former is recommended. DiffServ network as ways to achieve QoS. The former is recommended.
6.4. ECN 6.4. ECN
The marking behaviour described in this document complies with the The marking behaviour described in this document complies with the
ECN aspects of the IP wire protocol RFC3168, but provides its own ECN aspects of the IP wire protocol RFC3168, but provides its own
edge-to-edge feedback instead of the TCP aspects of RFC3168. All edge-to-edge feedback instead of the TCP aspects of RFC3168. All
routers within the CL-region are upgraded with the admission marking routers within the CL-region are upgraded with the admission marking
and pre-emption marking of Pre-Congestion Notification, so the and pre-emption marking of Pre-Congestion Notification, so the
skipping to change at page 49, line 5 skipping to change at page 48, line 38
Multi-protocol label switching traffic engineering (MPLS-TE) allows Multi-protocol label switching traffic engineering (MPLS-TE) allows
scalable reservation of resources in the core for an aggregate of scalable reservation of resources in the core for an aggregate of
many microflows. To achieve end-to-end reservations, admission many microflows. To achieve end-to-end reservations, admission
control and policing of microflows into the aggregate can be achieved control and policing of microflows into the aggregate can be achieved
using techniques such as RSVP Aggregation over MPLS TE Tunnels as per using techniques such as RSVP Aggregation over MPLS TE Tunnels as per
[AGGRE-TE]. However, in the case of inter-provider environments, [AGGRE-TE]. However, in the case of inter-provider environments,
these techniques require that admission control and policing be these techniques require that admission control and policing be
repeated at each trust boundary or that MPLS TE tunnels span multiple repeated at each trust boundary or that MPLS TE tunnels span multiple
domains. domains.
6.8. Other Network Admission Control Approaches
Link admission control (LAC) describes how admission control (AC) can
be done on a single link and comprises, e.g., the calculation of
effective bandwidths which may be the base for a parameter-based AC.
In contrast, network AC (NAC) describes how AC can be done for a
network and focuses on the locations from which data is gathered for
the admission decision. Most approaches implement a link budget based
NAC (LB NAC) where each link has a certain AC-budget. RSVP works
according to that principle, but also the new concept admits
additional flows as long as each link on the new flow's path still
has resources available. The border-to-border budget based NAC (BBB
NAC) pre-configures an AC budget for all border-to-border
relationships (= CL-region-aggregates) and if this capacity budget is
exhausted, new flows are rejected. The TCA-based admission control
which is associated with the DiffServ architecture implements an
ingress budget based NAC (IB NAC). These basically different concepts
have different flexibility and efficiency with regard to the use of
link bandwidths [NAC-a,NAC-b]. They can be made resilient by choosing
the budgets in such a way that the network will not be congested
after rerouting due to a failure. The efficiency of the approaches is
different with and without such resilient requirements.
7. Security Considerations 7. Security Considerations
To protect against denial of service attacks, the ingress gateway of To protect against denial of service attacks, the ingress gateway of
the CL-region needs to police all CL packets and drop packets in the CL-region needs to police all CL packets and drop packets in
excess of the reservation. This is similar to operations with excess of the reservation. This is similar to operations with
existing IntServ behaviour. existing IntServ behaviour.
For pre-emption, it is considered acceptable from a security For pre-emption, it is considered acceptable from a security
perspective that the ingress gateway can treat "emergency/military" perspective that the ingress gateway can treat "emergency/military"
CL flows preferentially compared with "ordinary" CL flows. However, CL flows preferentially compared with "ordinary" CL flows. However,
skipping to change at page 49, line 39 skipping to change at page 50, line 5
The admission control mechanism evolved from the work led by Martin The admission control mechanism evolved from the work led by Martin
Karsten on the Guaranteed Stream Provider developed in the M3I Karsten on the Guaranteed Stream Provider developed in the M3I
project [GSPa, GSP-TR], which in turn was based on the theoretical project [GSPa, GSP-TR], which in turn was based on the theoretical
work of Gibbens and Kelly [DCAC]. Kennedy Cheng, Gabriele Corliano, work of Gibbens and Kelly [DCAC]. Kennedy Cheng, Gabriele Corliano,
Carla Di Cairano-Gilfedder, Kashaf Khan, Peter Hovell, Arnaud Jacquet Carla Di Cairano-Gilfedder, Kashaf Khan, Peter Hovell, Arnaud Jacquet
and June Tay (BT) helped develop and evaluate this approach. and June Tay (BT) helped develop and evaluate this approach.
Many thanks to those who have commented on this work at Transport Many thanks to those who have commented on this work at Transport
Area Working Group meetings and on the mailing list, including: Ken Area Working Group meetings and on the mailing list, including: Ken
Carlberg, Ruediger Geib, Lars Westberg, David Black, Robert Hancock, Carlberg, Ruediger Geib, Lars Westberg, David Black, Robert Hancock,
Cornelia Kappler. Cornelia Kappler, Michael Menth.
9. Comments solicited 9. Comments solicited
Comments and questions are encouraged and very welcome. They can be Comments and questions are encouraged and very welcome. They can be
sent to the Transport Area Working Group's mailing list, sent to the Transport Area Working Group's mailing list,
tsvwg@ietf.org, and/or to the authors. tsvwg@ietf.org, and/or to the authors.
10. Changes from earlier versions of the draft 10. Changes from earlier versions of the draft
The main changes are: The main changes are:
skipping to change at page 51, line 5 skipping to change at page 51, line 5
Section 5 has been updated and expanded. It is now about the Section 5 has been updated and expanded. It is now about the
'limitations' of the PCN mechanism, as described in the earlier 'limitations' of the PCN mechanism, as described in the earlier
sections, plus discussion of 'possible solutions' to those sections, plus discussion of 'possible solutions' to those
limitations. limitations.
The measurement of the Congestion-Level-Estimate now includes pre- The measurement of the Congestion-Level-Estimate now includes pre-
emption marked packets as well as admission marked ones. Section emption marked packets as well as admission marked ones. Section
3.1.2 explains. 3.1.2 explains.
From -03 to -04
Detailed review by Michael Menth. In response, Abstract, Summary and
Key benefits sections re-written. Numerous detailed comments on
Sections 5 and following sections.
11. Appendices 11. Appendices
11.1. Appendix A: Explicit Congestion Notification 11.1. Appendix A: Explicit Congestion Notification
This Appendix provides a brief summary of Explicit Congestion This Appendix provides a brief summary of Explicit Congestion
Notification (ECN). Notification (ECN).
[RFC3168] specifies the incorporation of ECN to TCP and IP, including [RFC3168] specifies the incorporation of ECN to TCP and IP, including
ECN's use of two bits in the IP header. It specifies a method for ECN's use of two bits in the IP header. It specifies a method for
indicating incipient congestion to end-hosts (e.g. as in RED, Random indicating incipient congestion to end-hosts (e.g. as in RED, Random
skipping to change at page 52, line 5 skipping to change at page 53, line 7
The CE codepoint '11' is set by a router to indicate congestion to The CE codepoint '11' is set by a router to indicate congestion to
the end hosts. The term 'CE packet' denotes a packet that has the CE the end hosts. The term 'CE packet' denotes a packet that has the CE
codepoint set. codepoint set.
The ECN-Capable Transport (ECT) codepoints '10' and '01' (ECT(0) and The ECN-Capable Transport (ECT) codepoints '10' and '01' (ECT(0) and
ECT(1) respectively) are set by the data sender to indicate that the ECT(1) respectively) are set by the data sender to indicate that the
end-points of the transport protocol are ECN-capable. Routers treat end-points of the transport protocol are ECN-capable. Routers treat
the ECT(0) and ECT(1) codepoints as equivalent. Senders are free to the ECT(0) and ECT(1) codepoints as equivalent. Senders are free to
use either the ECT(0) or the ECT(1) codepoint to indicate ECT, on a use either the ECT(0) or the ECT(1) codepoint to indicate ECT, on a
packet-by-packet basis. The use of both the two codepoints for ECT is packet-by-packet basis. The motivation for having two codepoints (the
motivated primarily by the desire to allow mechanisms for the data 'ECN nonce') is the desire to check two things: for the data sender
sender to verify that network elements are not erasing the CE to verify that network elements are not erasing the CE codepoint; and
codepoint, and that data receivers are properly reporting to the for the data sender to verify that data receivers are properly
sender the receipt of packets with the CE codepoint set. reporting to the sender the receipt of packets with the CE codepoint
set.
ECN requires support from the transport protocol, in addition to the ECN requires support from the transport protocol, in addition to the
functionality given by the ECN field in the IP packet header. functionality given by the ECN field in the IP packet header.
[RFC3168] addresses the addition of ECN Capability to TCP, specifying [RFC3168] addresses the addition of ECN Capability to TCP, specifying
three new pieces of functionality: negotiation between the endpoints three new pieces of functionality: negotiation between the endpoints
during connection setup to determine if they are both ECN-capable; an during connection setup to determine if they are both ECN-capable; an
ECN-Echo (ECE) flag in the TCP header so that the data receiver can ECN-Echo (ECE) flag in the TCP header so that the data receiver can
inform the data sender when a CE packet has been received; and a inform the data sender when a CE packet has been received; and a
Congestion Window Reduced (CWR) flag in the TCP header so that the Congestion Window Reduced (CWR) flag in the TCP header so that the
data sender can inform the data receiver that the congestion window data sender can inform the data receiver that the congestion window
skipping to change at page 55, line 5 skipping to change at page 56, line 5
bits]n ) bits]n )
[EWMA-AM-bits]'n+1 = (B * bits-in-packet) + (w' * [EWMA-AM-bits]n [EWMA-AM-bits]'n+1 = (B * bits-in-packet) + (w' * [EWMA-AM-bits]n
) )
where w' = (1-w)/w. where w' = (1-w)/w.
If w' is arranged to be a power of 2, these per packet algorithms can If w' is arranged to be a power of 2, these per packet algorithms can
be implemented solely with a shift and an add. be implemented solely with a shift and an add.
There are alternative possibilities for smoothing out the congestion-
level-estimate. For example [TEWMA] deals better with the issue of
stale information when the traffic rate for
12. References 12. References
A later version will distinguish normative and informative A later version will distinguish normative and informative
references. references.
[AGGRE-TE] Francois Le Faucheur, Michael Dibiasio, Bruce Davie, [AGGRE-TE] Francois Le Faucheur, Michael Dibiasio, Bruce Davie,
Michael Davenport, Chris Christou, Jerry Ash, Bur Michael Davenport, Chris Christou, Jerry Ash, Bur
Goode, 'Aggregation of RSVP Reservations over MPLS Goode, 'Aggregation of RSVP Reservations over MPLS
TE/DS-TE Tunnels', draft-ietf-tsvwg-rsvp-dste-03 (work TE/DS-TE Tunnels', draft-ietf-tsvwg-rsvp-dste-03 (work
[ANSI.MLPP.Spec] American National Standards Institute, [ANSI.MLPP.Spec] American National Standards Institute,
skipping to change at page 56, line 37 skipping to change at page 58, line 4
http://www.kom.e-technik.tu- http://www.kom.e-technik.tu-
darmstadt.de/publications/abstracts/KS02-5.html (May, darmstadt.de/publications/abstracts/KS02-5.html (May,
2002) 2002)
[ITU.MLPP.1990] International Telecommunications Union, "Multilevel [ITU.MLPP.1990] International Telecommunications Union, "Multilevel
Precedence and Pre-emption Service (MLPP)", ITU-T Precedence and Pre-emption Service (MLPP)", ITU-T
Recommendation I.255.3, 1990. Recommendation I.255.3, 1990.
[Johnson] DM Johnson, 'QoS control versus generous [Johnson] DM Johnson, 'QoS control versus generous
dimensioning', BT Technology Journal, Vol 23 No 2, dimensioning', BT Technology Journal, Vol 23 No 2,
[LoadBalancing-a] Ruediger Martin, Michael Menth, and Michael
Hemmkeppler: "Accuracy and Dynamics of Hash-Based Load
Balancing Algorithms for Multipath Internet Routing",
IEEE Broadnets, San Jose, CA, USA, October 2006
http://www3.informatik.uni-
wuerzburg.de/~menth/Publications/Menth06p.pdf
[LoadBalancing-b] Ruediger Martin, Michael Menth, and Michael
Hemmkeppler: "Accuracy and Dynamics of Multi-Stage
Load Balancing for Multipath Internet Routing",
currently under submission http://www3.informatik.uni-
wuerzburg.de/~menth/Publications/Menth07-Sub-6.pdf
[Low] S. Low, L. Andrew, B. Wydrowski, 'Understanding XCP: [Low] S. Low, L. Andrew, B. Wydrowski, 'Understanding XCP:
equilibrium and fairness', IEEE InfoCom 2005 equilibrium and fairness', IEEE InfoCom 2005
[NAC-a] Michael Menth: "Efficient Admission Control and
Routing in Resilient Communication Networks", PhD
thesis, July 2004, http://opus.bibliothek.uni-
wuerzburg.de/opus/volltexte/2004/994/pdf/Menth04.pdf
[NAC-b] Michael Menth, Stefan Kopf, Joachim Charzinski, and
Karl Schrodi: "Resilient Network Admission Control",
currently under submission.
http://www3.informatik.uni-
wuerzburg.de/~menth/Publications/Menth07-Sub-3.pdf
[PCN] B. Briscoe, P. Eardley, D. Songhurst, F. Le Faucheur, [PCN] B. Briscoe, P. Eardley, D. Songhurst, F. Le Faucheur,
A. Charny, V. Liatsos, S. Dudley, J. Babiarz, K. Chan, A. Charny, V. Liatsos, S. Dudley, J. Babiarz, K. Chan,
G. Karagiannis, A. Bader, L. Westberg. 'Pre-Congestion G. Karagiannis, A. Bader, L. Westberg. 'Pre-Congestion
Notification marking', draft-briscoe-tsvwg-cl-phb-02 Notification marking', draft-briscoe-tsvwg-cl-phb-02
(work in progress), June 2006. (work in progress), June 2006.
[Re-ECN] Bob Briscoe, Arnaud Jacquet, Alessandro Salvatori, [Re-ECN] Bob Briscoe, Arnaud Jacquet, Alessandro Salvatori,
'Re-ECN: Adding Accountability for Causing Congestion 'Re-ECN: Adding Accountability for Causing Congestion
to TCP/IP', draft-briscoe-tsvwg-re-ecn-tcp-01 (work in to TCP/IP', draft-briscoe-tsvwg-re-ecn-tcp-01 (work in
progress), March 2006. progress), March 2006.
 End of changes. 57 change blocks. 
358 lines changed or deleted 405 lines changed or added

This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/