| draft-briscoe-tsvwg-cl-architecture-03.txt | draft-briscoe-tsvwg-cl-architecture-04.txt | |||
|---|---|---|---|---|
| TSVWG B. Briscoe | TSVWG B. Briscoe | |||
| Internet Draft P. Eardley | Internet Draft P. Eardley | |||
| draft-briscoe-tsvwg-cl-architecture-03.txt D. Songhurst | draft-briscoe-tsvwg-cl-architecture-04.txt D. Songhurst | |||
| Expires: December 2006 BT | Expires: April 2007 BT | |||
| F. Le Faucheur | F. Le Faucheur | |||
| A. Charny | A. Charny | |||
| Cisco Systems, Inc | Cisco Systems, Inc | |||
| J. Babiarz | J. Babiarz | |||
| K. Chan | K. Chan | |||
| S. Dudley | S. Dudley | |||
| Nortel | Nortel | |||
| G. Karagiannis | G. Karagiannis | |||
| University of Twente / Ericsson | University of Twente / Ericsson | |||
| A. Bader | A. Bader | |||
| L. Westberg | L. Westberg | |||
| Ericsson | Ericsson | |||
| 26 June, 2006 | 25 October, 2006 | |||
| An edge-to-edge Deployment Model for Pre-Congestion Notification: | An edge-to-edge Deployment Model for Pre-Congestion Notification: | |||
| Admission Control over a DiffServ Region | Admission Control over a DiffServ Region | |||
| draft-briscoe-tsvwg-cl-architecture-03.txt | draft-briscoe-tsvwg-cl-architecture-04.txt | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 2, line 16 | skipping to change at page 2, line 16 | |||
| This Internet-Draft will expire on September 6, 2006. | This Internet-Draft will expire on September 6, 2006. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The Internet Society (2006). All Rights Reserved. | Copyright (C) The Internet Society (2006). All Rights Reserved. | |||
| Abstract | Abstract | |||
| This document describes a deployment model for pre-congestion | This document describes a deployment model for pre-congestion | |||
| notification (PCN). PCN-based flow admission control and if necessary | notification (PCN) operating in a large DiffServ-based region of the | |||
| flow pre-emption preserve the Controlled Load service to admitted | Internet. PCN-based admission control protects the quality of service | |||
| flows. Routers in a large DiffServ-based region of the Internet use | of existing flows in normal circumstances, whilst if necessary (eg | |||
| new pre-congestion notification marking to give early warning of | after a large failure) pre-emption of some flows preserves the quality | |||
| their own congestion. Gateways around the edges of the region convert | of service of the remaining flows. Each link has a configured- | |||
| measurements of this packet granularity marking into admission | admission-rate and a configured-pre-emption-rate, and a router marks | |||
| control and pre-emption functions at flow granularity. Note that | packets that exceed these rates. Hence routers give an early warning of | |||
| interior routers of the DiffServ-based region do not require flow | their own potential congestion, before packets need to be dropped. | |||
| state or signalling - they only have to do the bulk packet marking of | Gateways around the edges of the PCN-region convert measurements of | |||
| PCN. Hence an end-to-end Controlled Load service can be achieved | packet rates and their markings into decisions about whether to admit | |||
| without any scalability impact on interior routers. | new flows, and (if necessary) into the rate of excess traffic that | |||
| should be pre-empted. Per-flow admission states are kept at the | ||||
| gateways only, while the PCN markers that are required for all routers | ||||
| operate on the aggregate traffic - hence there is no scalability impact | ||||
| on interior routers. | ||||
| Authors' Note (TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION) | Authors' Note (TO BE DELETED BY THE RFC EDITOR UPON PUBLICATION) | |||
| This document is posted as an Internet-Draft with the intention of | This document is posted as an Internet-Draft with the intention of | |||
| eventually becoming an INFORMATIONAL RFC. | eventually becoming an INFORMATIONAL RFC. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction......................................... 5 | 1. Introduction................................................5 | |||
| 1.1. Summary......................................... 5 | 1.1. Summary................................................5 | |||
| 1.1.1. Flow admission control........................ 7 | 1.2. Key benefits...........................................8 | |||
| 1.1.2. Flow pre-emption............................. 9 | 1.3. Terminology............................................9 | |||
| 1.1.3. Both admission control and pre-emption.......... 10 | 1.4. Existing terminology...................................11 | |||
| 1.2. Terminology.................................... 10 | 1.5. Standardisation requirements...........................11 | |||
| 1.3. Existing terminology............................. 12 | 1.6. Structure of rest of the document......................12 | |||
| 1.4. Standardisation requirements...................... 12 | 2. Key aspects of the deployment model.........................13 | |||
| 1.5. Structure of rest of the document.................. 13 | 2.1. Key goals.............................................13 | |||
| 2. Key aspects of the deployment model..................... 14 | 2.2. Key assumptions........................................14 | |||
| 2.1. Key goals...................................... 14 | 3. Deployment model...........................................17 | |||
| 2.2. Key assumptions................................. 15 | 3.1. Admission control......................................17 | |||
| 2.3. Key benefits ................................... 17 | 3.1.1. Pre-Congestion Notification for Admission Marking..17 | |||
| 3. Deployment model.................................... 19 | 3.1.2. Measurements to support admission control..........17 | |||
| 3.1. Admission control ............................... 19 | ||||
| 3.1.1. Pre-Congestion Notification for Admission Marking. 19 | ||||
| 3.1.2. Measurements to support admission control........ 19 | ||||
| 3.1.3. How edge-to-edge admission control supports end-to-end | 3.1.3. How edge-to-edge admission control supports end-to-end | |||
| QoS signalling ................................... 20 | QoS signalling..........................................18 | |||
| 3.1.4. Use case................................... 20 | 3.1.4. Use case.........................................18 | |||
| 3.2. Flow pre-emption................................ 22 | 3.2. Flow pre-emption.......................................20 | |||
| 3.2.1. Alerting an ingress gateway that flow pre-emption may be | 3.2.1. Alerting an ingress gateway that flow pre-emption may be | |||
| needed.......................................... 22 | needed..................................................20 | |||
| 3.2.2. Determining the right amount of CL traffic to drop 24 | 3.2.2. Determining the right amount of CL traffic to drop.23 | |||
| 3.2.3. Use case for flow pre-emption ................. 25 | 3.2.3. Use case for flow pre-emption.....................24 | |||
| 4. Summary of Functionality.............................. 27 | 3.3. Both admission control and pre-emption.................25 | |||
| 4.1. Ingress gateways................................ 27 | 4. Summary of Functionality....................................27 | |||
| 4.2. Interior routers................................ 28 | 4.1. Ingress gateways.......................................27 | |||
| 4.3. Egress gateways................................. 28 | 4.2. Interior routers.......................................28 | |||
| 4.4. Failures....................................... 29 | 4.3. Egress gateways........................................28 | |||
| 5. Limitations and some potential solutions................. 31 | 4.4. Failures..............................................29 | |||
| 5.1. ECMP.......................................... 31 | 5. Limitations and some potential solutions....................31 | |||
| 5.2. Beat down effect................................ 33 | 5.1. ECMP..................................................31 | |||
| 5.3. Bi-directional sessions .......................... 35 | 5.2. Beat down effect.......................................33 | |||
| 5.4. Global fairness................................. 37 | 5.3. Bi-directional sessions................................35 | |||
| 5.5. Flash crowds ................................... 39 | 5.4. Global fairness........................................37 | |||
| 5.6. Pre-empting too fast............................. 40 | 5.5. Flash crowds..........................................39 | |||
| 5.7. Other potential extensions........................ 42 | 5.6. Pre-empting too fast...................................41 | |||
| 5.7.1. Tunnelling................................. 42 | 5.7. Other potential extensions.............................42 | |||
| 5.7.2. Multi-domain and multi-operator usage........... 43 | 5.7.1. Tunnelling........................................42 | |||
| 5.7.3. Preferential dropping of pre-emption marked packets43 | 5.7.2. Multi-domain and multi-operator usage.............43 | |||
| 5.7.4. Adaptive bandwidth for the Controlled Load service 44 | 5.7.3. Preferential dropping of pre-emption marked packets44 | |||
| 5.7.4. Adaptive bandwidth for the Controlled Load service.44 | ||||
| 5.7.5. Controlled Load service with end-to-end Pre-Congestion | 5.7.5. Controlled Load service with end-to-end Pre-Congestion | |||
| Notification..................................... 44 | Notification............................................45 | |||
| 5.7.6. MPLS-TE ................................... 45 | 5.7.6. MPLS-TE..........................................45 | |||
| 6. Relationship to other QoS mechanisms.................... 46 | 6. Relationship to other QoS mechanisms........................46 | |||
| 6.1. IntServ Controlled Load .......................... 46 | 6.1. IntServ Controlled Load................................46 | |||
| 6.2. Integrated services operation over DiffServ.......... 46 | 6.2. Integrated services operation over DiffServ............46 | |||
| 6.3. Differentiated Services .......................... 46 | 6.3. Differentiated Services................................46 | |||
| 6.4. ECN........................................... 47 | 6.4. ECN...................................................47 | |||
| 6.5. RTECN......................................... 47 | 6.5. RTECN.................................................47 | |||
| 6.6. RMD........................................... 47 | 6.6. RMD...................................................48 | |||
| 6.7. RSVP Aggregation over MPLS-TE...................... 48 | 6.7. RSVP Aggregation over MPLS-TE..........................48 | |||
| 7. Security Considerations............................... 49 | 6.8. Other Network Admission Control Approaches.............48 | |||
| 8. Acknowledgements.................................... 49 | 7. Security Considerations.....................................49 | |||
| 9. Comments solicited................................... 49 | 8. Acknowledgements...........................................49 | |||
| 10. Changes from earlier versions of the draft.............. 50 | 9. Comments solicited.........................................50 | |||
| 11. Appendices ........................................ 51 | 10. Changes from earlier versions of the draft.................50 | |||
| 11.1. Appendix A: Explicit Congestion Notification ........ 51 | 11. Appendices................................................52 | |||
| 11.1. Appendix A: Explicit Congestion Notification..........52 | ||||
| 11.2. Appendix B: What is distributed measurement-based admission | 11.2. Appendix B: What is distributed measurement-based admission | |||
| control?........................................... 52 | control?...................................................53 | |||
| 11.3. Appendix C: Calculating the Exponentially weighted moving | 11.3. Appendix C: Calculating the Exponentially weighted moving | |||
| average (EWMA)...................................... 53 | average (EWMA).............................................54 | |||
| 12. References ........................................ 55 | 12. References................................................56 | |||
| Authors' Addresses..................................... 60 | Authors' Addresses............................................61 | |||
| Intellectual Property Statement .......................... 62 | Intellectual Property Statement................................63 | |||
| Disclaimer of Validity.................................. 62 | Disclaimer of Validity........................................63 | |||
| Copyright Statement.................................... 62 | Copyright Statement...........................................63 | |||
| 1. Introduction | 1. Introduction | |||
| 1.1. Summary | 1.1. Summary | |||
| This document describes a deployment model to achieve an end-to-end | This document describes a deployment model to achieve an end-to-end | |||
| Controlled Load service by using (within a large region of the | Controlled Load service by using (within a large region of the | |||
| Internet) DiffServ and edge-to-edge distributed measurement-based | Internet) DiffServ and edge-to-edge distributed measurement-based | |||
| admission control and flow pre-emption. Controlled load service is a | admission control and flow pre-emption. Controlled load service is a | |||
| quality of service (QoS) closely approximating the QoS that the same | quality of service (QoS) closely approximating the QoS that the same | |||
| flow would receive from a lightly loaded network element [RFC2211]. | flow would receive from a lightly loaded network element [RFC2211]. | |||
| Controlled Load (CL) is useful for inelastic flows such as those for | Controlled Load (CL) is useful for inelastic flows such as those for | |||
| real-time media. | real-time media. | |||
| In line with the "IntServ over DiffServ" framework defined in | In line with the "IntServ over DiffServ" framework defined in | |||
| [RFC2998], the CL service is supported end-to-end and RSVP signalling | [RFC2998], the CL service is supported end-to-end and RSVP signalling | |||
| [RFC2205] is used end-to-end, over an edge-to-edge DiffServ region. | [RFC2205] is used end-to-end, over an edge-to-edge DiffServ region. | |||
| We call the DiffServ region the "CL-region". | ||||
| ___ ___ _______________________________________ ____ ___ | ___ ___ _______________________________________ ____ ___ | |||
| | | | | | | | | | | | ||||
| | | | | |Ingress Interior Egress| | | | | | | | | | |Ingress Interior Egress| | | | | | |||
| | | | | |gateway routers gateway| | | | | | | | | | |gateway routers gateway| | | | | | |||
| | | | | |-------+ +-------+ +-------+ +------| | | | | | | | | | |-------+ +-------+ +-------+ +------| | | | | | |||
| | | | | | PCN- | | PCN- | | PCN- | | | | | | | | | | | | | PCN- | | PCN- | | PCN- | | | | | | | | |||
| | |..| |..|marking|..|marking|..|marking|..| Meter|..| |..| | | | |..| |..|marking|..|marking|..|marking|..| Meter|..| |..| | | |||
| | | | | |-------+ +-------+ +-------+ +------| | | | | | | | | | |-------+ +-------+ +-------+ +------| | | | | | |||
| | | | | | \ / | | | | | | | | | | | \ / | | | | | | |||
| | | | | | \ / | | | | | | | | | | | \ / | | | | | | |||
| | | | | | \ Congestion-Level-Estimate / | | | | | | | | | | | \ Congestion-Level-Estimate / | | | | | | |||
| | | | | | \ (for admission control) / | | | | | | | | | | | \ (for admission control) / | | | | | | |||
| skipping to change at page 6, line 8 | skipping to change at page 6, line 8 | |||
| <------ edge-to-edge signalling -----> | <------ edge-to-edge signalling -----> | |||
| (for admission control & flow pre-emption) | (for admission control & flow pre-emption) | |||
| <-------------------end-to-end QoS signalling protocol---------------> | <-------------------end-to-end QoS signalling protocol---------------> | |||
| Figure 1: Overall QoS architecture (NB terminology explained later) | Figure 1: Overall QoS architecture (NB terminology explained later) | |||
| Figure 1 shows an example of an overall QoS architecture, where the | Figure 1 shows an example of an overall QoS architecture, where the | |||
| two access networks are connected by a CL-region. Another possibility | two access networks are connected by a CL-region. Another possibility | |||
| is that there are several CL-regions between the access networks - | is that there are several CL-regions between the access networks - | |||
| each would operate the Pre-Congestion Notification mechanisms | each would operate the Pre-Congestion Notification mechanisms | |||
| separately. | separately. The document assumes RSVP as the end-to-end QoS | |||
| signalling protocol. However, the RSVP signalling may itself be | ||||
| In Section 1.1.1 we summarise how admission of new CL microflows is | originated or terminated by proxies still closer to the edge of the | |||
| controlled so as to deliver the required QoS. In abnormal | network, such as home hubs or the like, triggered in turn by | |||
| circumstances, for instance a disaster affecting multiple interior | application layer signalling. [RFC2998] and our approach are compared | |||
| routers, then the QoS on existing CL microflows may degrade even if | further in Section 6.2. | |||
| care was exercised when admitting those microflows before those | ||||
| circumstances. Therefore we also propose a mechanism (summarised in | ||||
| Section 1.1.2) to pre-empt some of the existing microflows. Then | ||||
| remaining microflows retain their expected QoS, while improved QoS is | ||||
| quickly restored to lower priority traffic. | ||||
| As a fundamental building block to support these two mechanisms, we | ||||
| introduce "Pre-Congestion Notification". Pre-Congestion Notification | ||||
| (PCN) builds on the concepts of RFC 3168, "The addition of Explicit | ||||
| Congestion Notification to IP". The [PCN] document defines the | ||||
| respective algorithms that determine when a PCN-enabled router marks | ||||
| a packet with Admission Marking or Pre-emption Marking, depending on | ||||
| the traffic level. | ||||
| In order to support CL traffic we would expect PCN to supplement the | ||||
| existing Expedited Forwarding (EF). Within the controlled edge-to- | ||||
| edge region, a particular packet receives the Pre-Congestion | ||||
| Notification (PCN) behaviour if the packet's differentiated services | ||||
| codepoint (DSCP) is set to EF and also the ECN field indicates ECN | ||||
| Capable Transport. However, PCN is not only intended to supplement | ||||
| EF. PCN is specified (in [PCN]) as a building block which can | ||||
| supplement the scheduling behaviour of other PHBs. | ||||
| There are various possible ways to encode the markings into a packet, | ||||
| using the ECN field and perhaps other DSCPs, which are discussed in | ||||
| [PCN]. In this draft we use the abstract names Admission Marking and | ||||
| Pre-emption Marking. | ||||
| This framework assumes that the Pre-Congestion Notification behaviour | Flows must enter and leave the CL-region through its ingress and | |||
| is used in a controlled environment, i.e. within the controlled edge- | egress gateways, and they need traffic descriptors that are policed | |||
| to-edge region. | by the ingress gateway (NB the policing function is out of this | |||
| document's scope). The overall CL-traffic between two border routers | ||||
| is called a "CL-region-aggregate". | ||||
| 1.1.1. Flow admission control | The document introduces a mechanism for flow admission control: | |||
| should a new flow be admitted into a specific CL-region-aggregate? | ||||
| Admission control protects the QoS of existing CL-flows in normal | ||||
| circumstances. In abnormal circumstances, for instance a disaster | ||||
| affecting multiple interior routers, then the QoS on existing CL | ||||
| microflows may degrade even if care was exercised when admitting | ||||
| those microflows before those circumstances. Therefore we also | ||||
| propose a mechanism for flow pre-emption: how much traffic, in a | ||||
| specific CL-region-aggregate, should be pre-empted in order to | ||||
| preserve the QoS of the remaining CL-flows? Flow pre-emption also | ||||
| restores QoS to lower priority traffic. | ||||
| This document describes a new admission control procedure for an | As a fundamental building block to enable these two mechanisms, each | |||
| edge-to-edge region, which uses new per-hop Pre-Congestion | link of the CL-region is associated with a configured-admission-rate | |||
| Notification 'admission marking' as a fundamental building block. In | and configured-pre-emption-rate; the former is usually significantly | |||
| turn, an end-to-end CL service would use this as a building block | larger than the latter. If traffic in a specific DiffServ class ("CL- | |||
| within a broader QoS architecture. | traffic") on the link exceeds these rates then packets are marked | |||
| with "Admission Marking" or "Pre-emption Marking". The algorithms | ||||
| that determine the number of packets marked are outlined in Section 3 | ||||
| and detailed in [PCN]. PCN marking (Pre-Congestion Notification) | ||||
| builds on the concepts of RFC 3168, "The addition of Explicit | ||||
| Congestion Notification to IP" (which is briefly summarised in | ||||
| Appendix A). | ||||
| The per-hop, edge-to-edge and end-to-end aspects are now briefly | Traffic rate on link ^ | |||
| introduced in turn. | | | |||
| | Drop packets | ||||
| link bandwidth -|--------------------------- | ||||
| | | ||||
| | Pre-emption Mark packets | ||||
| configured-pre-emption-rate -|--------------------------- | ||||
| | | ||||
| | Admission Mark packets | ||||
| configured-admission-rate -|--------------------------- | ||||
| | | ||||
| | No marking of packets | ||||
| | | ||||
| +--------------------------- | ||||
| Appendix A provides a brief summary of Explicit Congestion | Figure 2: Packet Marking by Routers | |||
| Notification (ECN) [RFC3168]. It specifies that a router sets the ECN | ||||
| field to the Congestion Experienced (CE) value as a warning of | ||||
| incipient congestion. RFC3168 doesn't specify a particular algorithm | ||||
| for setting the CE codepoint, although Random Early Detection (RED) | ||||
| is expected to be used. | ||||
| Pre-Congestion Notification (PCN) builds on the concepts of ECN. PCN | Gateways of the CL-region make measurements of packet rates and their | |||
| introduces a new algorithm that Admission Marks packets before there | PCN markings and convert them into decisions about whether to admit | |||
| is any significant build-up of CL packets in the queue. Admission | new flows, and (if necessary) into the rate of excess traffic that | |||
| marked packets therefore act as an "early warning" when the amount of | should be pre-empted. These mechanisms are detailed in Section 3 and | |||
| packets flowing is getting close to the engineered capacity. Hence it | briefly outlined in the next few paragraphs. | |||
| can be used with per-hop behaviours (PHBs) designed to operate with | ||||
| very low queue occupancy, such as Expedited Forwarding (EF). Note | ||||
| that our use of the ECN field operates across the CL-region, i.e. | ||||
| edge-to-edge, and not host-to-host as in [RFC3168]. | ||||
| Turning next to the edge-to-edge aspect. All routers within a region | The admission control mechanism for a new flow entering the network | |||
| of the Internet, which we call the CL-region, apply the PHB used for | at ingress gateway G0 and leaving it at egress gateway G1 relies on | |||
| CL traffic and the Pre-Congestion Notification behaviour. Traffic | feedback from the egress gateway G1 about the existing CL-region- | |||
| must enter/leave the CL-region through ingress/egress gateways, which | aggregate between G0 and G1. This feedback is generated as follows. | |||
| have special functionality. Typically the CL-region is the core or | All routers meter the rate of the CL-traffic on their outgoing links | |||
| backbone of an operator. The CL service is achieved "edge-to-edge" | and mark the packets with the Admission Mark if the configured- | |||
| across the CL-region, by using distributed measurement-based | admission-rate is exceeded. Egress gateway G1 measures the Admission | |||
| admission control: the decision whether to admit a new microflow | Marks for each of its CL-region-aggregates separately. If the | |||
| depends on a measurement of the existing traffic between the same | fraction of traffic on a CL-region-aggregate that is Admission Marked | |||
| pair of ingress and egress gateways (i.e. the same pair as the | exceeds some threshold, no further flows should be admitted into this | |||
| prospective new microflow). (See Appendix B for further discussion on | CL-region-aggregate. Because sources vary their data rates (amongst | |||
| "What is distributed measurement-based admission control?") | other reasons) the rate of the CL-traffic on a link may fluctuate | |||
| above and below the configured-admission-rate. Hence to get more | ||||
| stable information, the egress gateway measures the fraction as a | ||||
| moving average, called the Congestion-Level-Estimate. This is | ||||
| signalled from the egress G1 to the ingress G0, to enable the ingress | ||||
| to block new flows. | ||||
| As CL packets travel across the CL-region, routers will admission | Admission control seems most useful for DiffServ's Controlled load | |||
| mark packets (according to the Pre-Congestion Notification algorithm) | service. In order to support CL traffic we would expect PCN to | |||
| as an "early warning" of potential congestion, i.e. before there is | supplement the existing scheduling behaviour Expedited Forwarding | |||
| any significant build-up of CL packets in the queue. For traffic from | (EF). Since PCN gives an "early warning" of potential congestion | |||
| each remote ingress gateway, the CL-region's egress gateway measures | (hence "pre-congestion notification"), admission control can kick in | |||
| the fraction of CL traffic that is admission marked. The egress | before there is any significant build up of packets in routers - | |||
| gateway calculates the value on a per bit basis as a moving average | which is exactly the performance required for CL. However, PCN is not | |||
| (exponentially weighted is suggested), and which we term Congestion- | only intended to supplement EF. PCN is specified (in [PCN]) as a | |||
| Level-Estimate (CLE). Then it reports it to the CL-region's ingress | building block which can supplement the scheduling behaviour of other | |||
| gateway, piggy-backed on the signalling for a new flow. The ingress | PHBs. | |||
| gateway only admits the new CL microflow if the Congestion-Level- | ||||
| Estimate is less than the value of the CLE-threshold. Hence | ||||
| previously accepted CL microflows will suffer minimal queuing delay, | ||||
| jitter and loss. | ||||
| In turn, the edge-to-edge architecture is a building block in | The function to pre-empt flows (or allow the potential to pre-empt | |||
| delivering an end-to-end CL service. The approach is similar to that | them) relies on feedback from the egress gateway about the CL-region- | |||
| described in [RFC2998] for Integrated services operation over | aggregates. This feedback is generated as follows. All routers meter | |||
| DiffServ networks. Like [RFC2998], an IntServ class (CL in our case) | the rate of the CL-traffic on their outgoing links, and if the rate | |||
| is achieved end-to-end, with a CL-region viewed as a single | is in excess of the configured-pre-emption-rate then packets | |||
| reservation hop in the total end-to-end path. Interior routers of the | amounting to the excess rate are Pre-emption Marked. If the egress | |||
| CL-region do not process flow signalling nor do they hold per flow | gateway G1 sees a Pre-emption Marked packet then it measures, for | |||
| state. We assume that the end-to-end signalling mechanism is RSVP | this CL-region-aggregate, the rate of all received packets that | |||
| (Section 2.2). However, the RSVP signalling may itself be originated | aren't Pre-emption Marked. This is the rate of CL-traffic that the | |||
| or terminated by proxies still closer to the edge of the network, | network can actually support from G0 to G1, and we thus call it the | |||
| such as home hubs or the like, triggered in turn by application layer | Sustainable-Aggregate-Rate. The ingress gateway G0 compares the | |||
| signalling. [RFC2998] and our approach are compared further in | Sustainable-Aggregate-Rate with the rate that it is sending towards | |||
| Section 6.2. | G1, and hence determines the required traffic rate reduction. The | |||
| document assumes flow pre-emption as the way of reacting to this | ||||
| information, ie stopping sufficient flows to reduce the rate to the | ||||
| Sustainable-Aggregate-Rate. However, this isn't mandated, for | ||||
| instance policy or regulation may prevent pre-emption of some flows - | ||||
| such considerations are out of scope of this document. | ||||
| An important benefit compared with the IntServ over DiffServ model | 1.2. Key benefits | |||
| [RFC2998] arises from the fact that the load is controlled | ||||
| dynamically rather than with traffic conditioning agreements (TCAs). | ||||
| TCAs were originally introduced in the (informational) DiffServ | ||||
| architecture [RFC2475] as an alternative to reservation processing in | ||||
| the interior region in order to reduce the burden on interior | ||||
| routers. With TCAs, in practice service providers rely on | ||||
| subscription-time Service Level Agreements that statically define the | ||||
| parameters of the traffic that will be accepted from a customer. The | ||||
| problem arises because the TCA at the ingress must allow any | ||||
| destination address, if it is to remain scalable. But for longer | ||||
| topologies, the chances increase that traffic will focus on an | ||||
| interior resource, even though it is within contract at the ingress | ||||
| [Reid], e.g. all flows converge on the same egress gateway. Even | ||||
| though networks can be engineered to make such failures rare, when | ||||
| they occur all inelastic flows through the congested resource fail | ||||
| catastrophically. | ||||
| Distributed measurement-based admission control avoids reservation | We believe that the mechanisms described in this document are simple, | |||
| processing (whether per flow or aggregated) on interior routers but | scalable, and robust because: | |||
| flows are still blocked dynamically in response to actual congestion | ||||
| on any interior router. Hence there is no need for accurate or | ||||
| conservative prediction of the traffic matrix. | ||||
| 1.1.2. Flow pre-emption | o Per flow state is only required at the ingress gateways to prevent | |||
| non-admitted CL traffic from entering the PCN-region. Other | ||||
| network entities are not aware of individual flows. | ||||
| An essential QoS issue in core and backbone networks is being able to | o For each of its links a router has Admission Marking and Pre- | |||
| cope with failures of routers and links. The consequent re-routing | emption Marking behaviours. These markers operate on the overall | |||
| can cause severe congestion on some links and hence degrade the QoS | CL traffic of the respective link. Therefore, there are no | |||
| experienced by on-going microflows and other, lower priority traffic. | scalability concerns. | |||
| Even when the network is engineered to sustain a single link failure, | ||||
| multiple link failures (e.g. due to a fibre cut, router failure or a | ||||
| natural disaster) can cause violation of capacity constraints and | ||||
| resulting QoS failures. Our solution uses rate-based flow pre- | ||||
| emption, so that sufficient of the previously admitted CL microflows | ||||
| are dropped to ensure that the remaining ones again receive QoS | ||||
| commensurate with the CL service and at least some QoS is quickly | ||||
| restored to other traffic classes. | ||||
| The solution involves four steps. First, triggering the ingress | o The information of these measurements is implicitly signalled to | |||
| gateway to test whether pre-emption may be needed. A router enhanced | the egress gateways by the marks in the packet headers. No | |||
| with Pre-Congestion Notification may optionally include an algorithm | protocol actions (explicit messages) are required. | |||
| that Pre-emption Marks packets. Reception of a packet with such a | ||||
| marking alerts the egress gateway that pre-emption may be needed, | ||||
| which in turn sends a Pre-emption Alert message to the ingress | ||||
| gateway. Secondly, calculating the right amount of traffic to drop. | ||||
| This involves the egress gateway measuring, and reporting to the | ||||
| ingress gateway, the current rate of CL traffic received from that | ||||
| particular ingress gateway. This is the CL rate which the network can | ||||
| actually support from that ingress gateway to that egress gateway, | ||||
| and we thus call it the Sustainable-Aggregate-Rate. The ingress | ||||
| gateway compares the Sustainable-Aggregate-Rate) with the rate that | ||||
| it is sending and hence determines how much traffic needs to be pre- | ||||
| empted. Thirdly, choosing which flows to shed in order to drop the | ||||
| traffic calculated in the second step. Information on the priority of | ||||
| flows may be held by the ingress gateway, or by some out of band | ||||
| policy decision point. How these systems co-ordinate to determine | ||||
| which flows to drop is outside the scope of this document, but | ||||
| between them they have all the information necessary to make the | ||||
| decision. Fourthly, tearing down reservations for the chosen flows. | ||||
| The ingress gateway triggers standard tear-down messages for the | ||||
| reservation protocol in use. In turn, this is expected to result in | ||||
| end-systems tearing down the corresponding sessions (e.g. voice | ||||
| calls) using the corresponding session control protocols. | ||||
| The focus of this document is on the first two steps, i.e. | o The egress gateways make separate measurements for each ingress | |||
| determining that pre-emption may be needed and estimating how much | gateway of packets. Each meter operates on the overall CL traffic | |||
| traffic needs to be pre-empted. We provide some hints about the | of a particular CL-region-aggregate. Therefore, there are no | |||
| latter two steps in Section 3.2.3, but don't try to provide full | scalability concerns as long as the number of ingress gateways is | |||
| guidance as it greatly depends on the particular detailed operational | not overwhelmingly large. | |||
| situation. | ||||
| The solution operates within a little over one round trip time - the | o Feedback signalling is required between all pairs of ingress and | |||
| time required for microflow packets that have experienced Pre-emption | egress gateways and the signalled information is on the basis of | |||
| Marking to travel downstream through the CL-region and arrive at the | the corresponding CL-region-aggregate, i.e. it is also unaware of | |||
| egress gateway, plus some additional time for the egress gateway to | individual flows. | |||
| measure the rate seen after it has been alerted that pre-emption may | ||||
| be needed, and the time for the egress gateway to report this | ||||
| information to the ingress gateway. | ||||
| 1.1.3. Both admission control and pre-emption | o The configured-admission-rates can be chosen small enough that | |||
| admitted traffic can still be carried after a rerouting in most | ||||
| failure cases. This is an important feature as QoS violations in | ||||
| core networks due to link failures are more likely than QoS | ||||
| violations due to increased traffic volume. | ||||
| This document describes both the admission control and pre-emption | o The admitted load is controlled dynamically. Therefore it adapts | |||
| mechanisms, and we suggest that an operator uses both. However, we do | as the traffic matrix changes, and also if the network topology | |||
| not require this and some operators may want to implement only one. | changes (eg after a link failure). Hence an operator can be less | |||
| conservative when deploying network capacity, and less accurate in | ||||
| their prediction of the traffic matrix. Also, controlling the load | ||||
| using statically provisioned capacity per ingress (regardless of | ||||
| the egress of a flow), as is typical in the DiffServ architecture | ||||
| [RFC2475], can lead to focussed overload: many flows happen to | ||||
| focus on a particular link and then all flows through the | ||||
| congested link fail catastrophically (Section 6.2). | ||||
| For example, an operator could use just admission control, solving | o The pre-emption function complements admission control. It allows | |||
| heavy congestion (caused by re-routing) by 'just waiting' - as | the network to recover from sudden unexpected surges of CL-traffic | |||
| sessions end, existing microflows naturally depart from the system | on some links, thus restoring QoS to the remaining flows. Such | |||
| over time, and the admission control mechanism will prevent admission | scenarios are very unlikely but not impossible. They can be caused | |||
| of new microflows that use the affected links. So the CL-region will | by large network failures that redirect lots of admitted CL- | |||
| naturally return to normal controlled load service, but with reduced | traffic to other links, or by malfunction of the measurement-based | |||
| capacity. The drawback of this approach would be that until flows | admission control in the presence of admitted flows that send for | |||
| naturally depart to relieve the congestion, all flows and lower | a while with an atypically low rate and increase their rates in a | |||
| priority services will be adversely affected. As another example, an | correlated way. | |||
| operator could use just admission control, avoiding heavy congestion | ||||
| (caused by re-routing) by 'capacity planning' - by configuring | ||||
| admission control thresholds to lower levels than the network could | ||||
| accept in normal situations such that the load after failure is | ||||
| expected to stay below acceptable levels even with reduced network | ||||
| resources. | ||||
| On the other hand, an operator could just rely for admission control | 1.3. Terminology | |||
| on the traffic conditioning agreements of the DiffServ architecture | ||||
| [RFC2475]. The pre-emption mechanism described in this document would | ||||
| be used to counteract the problem described at the end of Section | ||||
| 1.1.1. | ||||
| 1.2. Terminology | EDITOR'S NOTE: Terminology in this document is (hopefully) consistent | |||
| with that in [PCN]. However, it may not be consistent with the | ||||
| terminology in other PCN-related documents. The PCN Working Group (if | ||||
| formed) will need to agree a single set of terminology. | ||||
| This terminology is copied from the pre-congestion notification | This terminology is copied from the pre-congestion notification | |||
| marking draft [PCN]: | marking draft [PCN]: | |||
| o Pre-Congestion Notification (PCN): two new algorithms that | o Pre-Congestion Notification (PCN): two new algorithms that | |||
| determine when a PCN-enabled router Admission Marks and Pre- | determine when a PCN-enabled router Admission Marks and Pre- | |||
| emption Marks a packet, depending on the traffic level. | emption Marks a packet, depending on the traffic level. | |||
| o Admission Marking condition: the traffic level is such that the | o Admission Marking condition: the traffic level is such that the | |||
| router Admission Marks packets. The router provides an "early | router Admission Marks packets. The router provides an "early | |||
| skipping to change at page 12, line 23 | skipping to change at page 11, line 27 | |||
| o Sustainable-Aggregate-Rate: the rate of traffic that the network | o Sustainable-Aggregate-Rate: the rate of traffic that the network | |||
| can actually support for a specific CL-region-aggregate. So it is | can actually support for a specific CL-region-aggregate. So it is | |||
| measured by an egress gateway for the CL packets from a particular | measured by an egress gateway for the CL packets from a particular | |||
| ingress gateway. | ingress gateway. | |||
| o Ingress-Aggregate-Rate: the rate of traffic that is being sent on | o Ingress-Aggregate-Rate: the rate of traffic that is being sent on | |||
| a specific CL-region-aggregate. So it is measured by an ingress | a specific CL-region-aggregate. So it is measured by an ingress | |||
| gateway for the CL packets sent towards a particular egress | gateway for the CL packets sent towards a particular egress | |||
| gateway. | gateway. | |||
| 1.3. Existing terminology | 1.4. Existing terminology | |||
| This is a placeholder for useful terminology that is defined | This is a placeholder for useful terminology that is defined | |||
| elsewhere. | elsewhere. | |||
| 1.4. Standardisation requirements | 1.5. Standardisation requirements | |||
| The framework described in this document has two new standardisation | The framework described in this document has two new standardisation | |||
| requirements: | requirements: | |||
| o new Pre-Congestion Notification for Admission Marking and Pre- | o new Pre-Congestion Notification for Admission Marking and Pre- | |||
| emption Marking are required, as detailed in [PCN]. | emption Marking are required, as detailed in [PCN]. | |||
| o the end-to-end signalling protocol needs to be modified to carry | o the end-to-end signalling protocol needs to be modified to carry | |||
| the Congestion-Level-Estimate report (for admission control) and | the Congestion-Level-Estimate report (for admission control) and | |||
| the Sustainable-Aggregate-Rate (for flow pre-emption). With our | the Sustainable-Aggregate-Rate (for flow pre-emption). With our | |||
| skipping to change at page 13, line 8 | skipping to change at page 12, line 11 | |||
| detailed in [RSVP-PCN], for example to carry the Congestion-Level- | detailed in [RSVP-PCN], for example to carry the Congestion-Level- | |||
| Estimate and Sustainable-Aggregate-Rate information from egress | Estimate and Sustainable-Aggregate-Rate information from egress | |||
| gateway to ingress gateway. | gateway to ingress gateway. | |||
| o We are discussing what to standardise about the gateway's | o We are discussing what to standardise about the gateway's | |||
| behaviour. | behaviour. | |||
| Other than these things, the arrangement uses existing IETF protocols | Other than these things, the arrangement uses existing IETF protocols | |||
| throughout, although not in their usual architecture. | throughout, although not in their usual architecture. | |||
| 1.5. Structure of rest of the document | 1.6. Structure of rest of the document | |||
| Section 2 describes some key aspects of the deployment model: our | Section 2 describes some key aspects of the deployment model: our | |||
| goals, assumptions and the benefits we believe it has. Section 3 | goals and assumptions. Section 3 describes the deployment model, | |||
| describes the deployment model, whilst Section 4 summarises the | whilst Section 4 summarises the required changes to the various | |||
| required changes to the various routers in the CL-region. Section 5 | routers in the CL-region. Section 5 outlines some limitations of PCN | |||
| outlines some limitations of PCN that we've identified in this | that we've identified in this deployment model; it also discusses | |||
| deployment model; it also discusses some potential solutions, and | some potential solutions, and other possible extensions. Section 6 | |||
| other possible extensions. Section 6 provides some comparison with | provides some comparison with existing QoS mechanisms. | |||
| existing QoS mechanisms. | ||||
| 2. Key aspects of the deployment model | 2. Key aspects of the deployment model | |||
| EDITOR'S NOTE: The material in Section 2 will eventually disappear, | ||||
| as it will be covered by the problem statement of the PCN Working | ||||
| Group (if formed). | ||||
| In this section we discuss the key aspects of the deployment model: | In this section we discuss the key aspects of the deployment model: | |||
| o At a high level, our key goals, i.e. the functionality that we | o At a high level, our key goals, i.e. the functionality that we | |||
| want to achieve | want to achieve | |||
| o The assumptions that we're prepared to make | o The assumptions that we're prepared to make | |||
| o The consequent benefits they bring | ||||
| 2.1. Key goals | 2.1. Key goals | |||
| The deployment model achieves an end-to-end controlled load (CL) | The deployment model achieves an end-to-end controlled load (CL) | |||
| service where a segment of the end-to-end path is an edge-to-edge | service where a segment of the end-to-end path is an edge-to-edge | |||
| Pre-Congestion Notification region. CL is a quality of service (QoS) | Pre-Congestion Notification region. CL is a quality of service (QoS) | |||
| closely approximating the QoS that the same flow would receive from a | closely approximating the QoS that the same flow would receive from a | |||
| lightly loaded network element [RFC2211]. It is useful for inelastic | lightly loaded network element [RFC2211]. It is useful for inelastic | |||
| flows such as those for real-time media. | flows such as those for real-time media. | |||
| o The CL service should be achieved despite varying load levels of | o The CL service should be achieved despite varying load levels of | |||
| skipping to change at page 17, line 18 | skipping to change at page 17, line 5 | |||
| Expedited Forwarding's PHB, but supplemented with Pre-Congestion | Expedited Forwarding's PHB, but supplemented with Pre-Congestion | |||
| Notification. If this is possible, other PHBs (like Assured | Notification. If this is possible, other PHBs (like Assured | |||
| Forwarding) could be supplemented with the same new behaviours. | Forwarding) could be supplemented with the same new behaviours. | |||
| This is similar to how RFC3168 ECN was defined to supplement any | This is similar to how RFC3168 ECN was defined to supplement any | |||
| PHB. | PHB. | |||
| o Routing: we are looking in greater detail at the solution in the | o Routing: we are looking in greater detail at the solution in the | |||
| presence of Equal Cost Multi-Path routing and at suitable | presence of Equal Cost Multi-Path routing and at suitable | |||
| enhancements. See also the 'ECMP' section 5.1 later. | enhancements. See also the 'ECMP' section 5.1 later. | |||
| 2.3. Key benefits | ||||
| We believe that the mechanism described in this document has several | ||||
| advantages: | ||||
| o It achieves statistical guarantees of quality of service for | ||||
| microflows, delivering a very low delay, jitter and packet loss | ||||
| service suitable for applications like voice and video calls that | ||||
| generate real time inelastic traffic. This is because of its per | ||||
| microflow admission control scheme, combined with its dynamic on- | ||||
| path "early warning" of potential congestion. The guarantee is at | ||||
| least as strong as with IntServ Controlled Load (Section 6.1 | ||||
| mentions why the guarantee may be somewhat better), but without | ||||
| the scalability problems of per-microflow IntServ. | ||||
| o It can support "Emergency" and military Multi-Level Pre-emption | ||||
| and Priority (MLPP) services, even in times of heavy congestion | ||||
| (perhaps caused by failure of a router within the CL-region), by | ||||
| pre-empting on-going "ordinary CL microflows". See also Section | ||||
| 4.5. | ||||
| o It scales well, because there is no signal processing or per flow | ||||
| state held by the interior routers of the CL-region. Note that | ||||
| interior routers only hold state per outgoing interface - they do | ||||
| not hold state per CL-region-aggregate nor per flow. | ||||
| o It is resilient, again because no per flow state is held by the | ||||
| interior routers of the CL-region. Hence during an interior | ||||
| routing change caused by a router failure, no microflow state has | ||||
| to be relocated. The flow pre-emption mechanism further helps | ||||
| resilience because it rapidly reduces the load to one that the CL- | ||||
| region can support. | ||||
| o It helps preserve, through the flow pre-emption mechanism, QoS to | ||||
| as many microflows as possible and to lower priority traffic in | ||||
| times of heavy congestion (e.g. caused by failure of an interior | ||||
| router). Otherwise long-lived microflows could cause loss on all | ||||
| CL microflows for a long time. | ||||
| o It avoids the potential catastrophic failure problem when the | ||||
| DiffServ architecture is used in large networks using statically | ||||
| provisioned capacity. This is achieved by controlling the load | ||||
| dynamically, based on edge-to-edge-path real-time measurement of | ||||
| Pre-Congestion Notification, as discussed in Section 1.1.1. | ||||
| o It requires minimal new standardisation, because it reuses | ||||
| existing QoS protocols and algorithms. | ||||
| o It can be deployed incrementally, region by region or network by | ||||
| network. Not all the regions or networks on the end-to-end path | ||||
| need to have it deployed. Two CL-regions can even be separated by | ||||
| a network that uses another QoS mechanism (e.g. MPLS-TE). | ||||
| o It provides a deployment path for use of ECN for real-time | ||||
| applications. Operators can gain experience of ECN before its | ||||
| applicability to end-systems is understood and end terminals are | ||||
| ECN capable. | ||||
| 3. Deployment model | 3. Deployment model | |||
| 3.1. Admission control | 3.1. Admission control | |||
| In this section we describe the admission control mechanism. We | In this section we describe the admission control mechanism. We | |||
| discuss the three pieces of the solution and then give an example of | discuss the three pieces of the solution and then give an example of | |||
| how they fit together in a use case: | how they fit together in a use case: | |||
| o the new Pre-Congestion Notification for Admission Marking used by | o the new Pre-Congestion Notification for Admission Marking used by | |||
| all routers in the CL-region | all routers in the CL-region | |||
| skipping to change at page 22, line 19 | skipping to change at page 20, line 19 | |||
| they fit together in a use case: | they fit together in a use case: | |||
| o How an ingress gateway is triggered to test whether flow pre- | o How an ingress gateway is triggered to test whether flow pre- | |||
| emption may be needed | emption may be needed | |||
| o How an ingress gateway determines the right amount of CL traffic | o How an ingress gateway determines the right amount of CL traffic | |||
| to drop | to drop | |||
| The mechanism is defined in [PCN] and [RSVP-PCN]. | The mechanism is defined in [PCN] and [RSVP-PCN]. | |||
| Two subsequent steps could be: | ||||
| o Choose which flows to shed, influenced by their priority and other | ||||
| policy information | ||||
| o Tear down the reservations for the chosen flows | ||||
| We provide some hints about these latter two steps in Section 3.2.3, | ||||
| but don't try to provide full guidance as it greatly depends on the | ||||
| particular detailed operational situation. | ||||
| An essential QoS issue in core and backbone networks is being able to | ||||
| cope with failures of routers and links. The consequent re-routing | ||||
| can cause severe congestion on some links and hence degrade the QoS | ||||
| experienced by on-going microflows and other, lower priority traffic. | ||||
| Even when the network is engineered to sustain a single link failure, | ||||
| multiple link failures (e.g. due to a fibre cut, router failure or a | ||||
| natural disaster) can cause violation of capacity constraints and | ||||
| resulting QoS failures. Our solution uses rate-based flow pre- | ||||
| emption, so that sufficient of the previously admitted CL microflows | ||||
| are dropped to ensure that the remaining ones again receive QoS | ||||
| commensurate with the CL service and at least some QoS is quickly | ||||
| restored to other traffic classes. | ||||
| 3.2.1. Alerting an ingress gateway that flow pre-emption may be needed | 3.2.1. Alerting an ingress gateway that flow pre-emption may be needed | |||
| Alerting an ingress gateway that flow pre-emption may be needed is a | Alerting an ingress gateway that flow pre-emption may be needed is a | |||
| two stage process: a router in the CL-region alerts an egress gateway | two stage process: a router in the CL-region alerts an egress gateway | |||
| that flow pre-emption may be needed; in turn the egress gateway | that flow pre-emption may be needed; in turn the egress gateway | |||
| alerts the relevant ingress gateway. Every router in the CL-region | alerts the relevant ingress gateway. Every router in the CL-region | |||
| has the ability to alert egress gateways, which may be done either | has the ability to alert egress gateways, which may be done either | |||
| explicitly or implicitly: | explicitly or implicitly: | |||
| o Explicit - the router per-hop behaviour is supplemented with a new | o Explicit - the router per-hop behaviour is supplemented with a new | |||
| skipping to change at page 23, line 5 | skipping to change at page 21, line 30 | |||
| that packets are pre-emption marked before the actual queue builds | that packets are pre-emption marked before the actual queue builds | |||
| up. The algorithm's main parameter is the configured-pre-emption- | up. The algorithm's main parameter is the configured-pre-emption- | |||
| rate, which is set lower than the link speed (but higher than the | rate, which is set lower than the link speed (but higher than the | |||
| configured-admission-rate). Thus pre-emption marked packets indicate | configured-admission-rate). Thus pre-emption marked packets indicate | |||
| that the CL traffic rate is reaching the configured-pre-emption-rate | that the CL traffic rate is reaching the configured-pre-emption-rate | |||
| and so act as an "early warning" that the engineered capacity is | and so act as an "early warning" that the engineered capacity is | |||
| nearly reached. Therefore they indicate that it may be advisable to | nearly reached. Therefore they indicate that it may be advisable to | |||
| pre-empt some of the existing CL flows in order to preserve the QoS | pre-empt some of the existing CL flows in order to preserve the QoS | |||
| of the others. | of the others. | |||
| Note that the pre-emption marking algorithm doesn't measure the | ||||
| packets that are already Pre-emption Marked. This ensures that in a | ||||
| scenario with several links that are above their configured-pre- | ||||
| emption-rate, then at the egress gateway the rate of packets | ||||
| excluding Pre-emption Marked ones truly does represent the | ||||
| Sustainable-Aggregate-Rate(see below for explanation). | ||||
| Note that the explicit mechanism only makes sense if all the routers | Note that the explicit mechanism only makes sense if all the routers | |||
| in the CL-region have the functionality so that the egress gateways | in the CL-region have the functionality so that the egress gateways | |||
| can rely on the explicit mechanism. Otherwise there is the danger | can rely on the explicit mechanism. Otherwise there is the danger | |||
| that the traffic happens to focus on a router without it, and egress | that the traffic happens to focus on a router without it, and egress | |||
| gateways then have also to watch for implicit pre-emption alerts. | gateways then have also to watch for implicit pre-emption alerts. | |||
| When one or more packets in a CL-region-aggregate alert the egress | When one or more packets in a CL-region-aggregate alert the egress | |||
| gateway of the need for flow pre-emption, whether explicitly or | gateway of the need for flow pre-emption, whether explicitly or | |||
| implicitly, the egress puts that CL-region-aggregate into the Pre- | implicitly, the egress puts that CL-region-aggregate into the Pre- | |||
| emption Alert state. For each CL-region-aggregate in alert state it | emption Alert state. For each CL-region-aggregate in alert state it | |||
| skipping to change at page 26, line 8 | skipping to change at page 24, line 40 | |||
| this packet is part of (by using a five-tuple filter and comparing it | this packet is part of (by using a five-tuple filter and comparing it | |||
| with state installed at admission) and hence which ingress gateway | with state installed at admission) and hence which ingress gateway | |||
| the packet came from. It sets up a meter to measure the traffic rate | the packet came from. It sets up a meter to measure the traffic rate | |||
| from this ingress gateway, and as soon as possible sends a message to | from this ingress gateway, and as soon as possible sends a message to | |||
| the ingress gateway. This message alerts the ingress gateway that | the ingress gateway. This message alerts the ingress gateway that | |||
| pre-emption may be needed and contains the traffic rate measured by | pre-emption may be needed and contains the traffic rate measured by | |||
| the egress gateway. Then the ingress gateway determines the traffic | the egress gateway. Then the ingress gateway determines the traffic | |||
| rate that it is sending towards this egress gateway and hence it can | rate that it is sending towards this egress gateway and hence it can | |||
| calculate the amount of traffic that needs to be pre-empted. | calculate the amount of traffic that needs to be pre-empted. | |||
| The solution operates within a little over one round trip time - the | ||||
| time required for microflow packets that have experienced Pre-emption | ||||
| Marking to travel downstream through the CL-region and arrive at the | ||||
| egress gateway, plus some additional time for the egress gateway to | ||||
| measure the rate seen after it has been alerted that pre-emption may | ||||
| be needed, and the time for the egress gateway to report this | ||||
| information to the ingress gateway. | ||||
| The ingress gateway could now just shed random microflows, but it is | The ingress gateway could now just shed random microflows, but it is | |||
| better if the least important ones are dropped. The ingress gateway | better if the least important ones are dropped. The ingress gateway | |||
| could use information stored locally in each reservation's state | could use information stored locally in each reservation's state | |||
| (such as for example the RSVP pre-emption priority of [RSVP- | (such as for example the RSVP pre-emption priority of [RSVP- | |||
| PREEMPTION] or the RSVP admission priority of [RSVP-EMERGENCY]) as | PREEMPTION] or the RSVP admission priority of [RSVP-EMERGENCY]) as | |||
| well as information provided by a policy decision point in order to | well as information provided by a policy decision point in order to | |||
| decide which of the flows to shed (or perhaps which ones not to | decide which of the flows to shed (or perhaps which ones not to | |||
| shed). This way, flow pre-emption can also helps emergency/military | shed). This way, flow pre-emption can also helps emergency/military | |||
| calls by taking into account the corresponding priorities (as | calls by taking into account the corresponding priorities (as | |||
| conveyed in RSVP policy elements) when selecting calls to be pre- | conveyed in RSVP policy elements) when selecting calls to be pre- | |||
| skipping to change at page 27, line 5 | skipping to change at page 25, line 36 | |||
| significantly less than the physical line capacity, flow pre-emption | significantly less than the physical line capacity, flow pre-emption | |||
| may be triggered before any congestion has actually occurred and | may be triggered before any congestion has actually occurred and | |||
| before any packet is dropped. | before any packet is dropped. | |||
| We extend the scenario further by imagining that (due to a disaster | We extend the scenario further by imagining that (due to a disaster | |||
| of some kind) further routers in the CL-region fail during the time | of some kind) further routers in the CL-region fail during the time | |||
| taken by the pre-emption process described above. This is handled | taken by the pre-emption process described above. This is handled | |||
| naturally, as packets will continue to be pre-emption marked and so | naturally, as packets will continue to be pre-emption marked and so | |||
| the pre-emption process will happen for a second time. | the pre-emption process will happen for a second time. | |||
| 3.3. Both admission control and pre-emption | ||||
| This document describes both the admission control and pre-emption | ||||
| mechanisms, and we suggest that an operator uses both. However, we do | ||||
| not require this and some operators may want to implement only one. | ||||
| For example, an operator could use just admission control, solving | ||||
| heavy congestion (caused by re-routing) by 'just waiting' - as | ||||
| sessions end, existing microflows naturally depart from the system | ||||
| over time, and the admission control mechanism will prevent admission | ||||
| of new microflows that use the affected links. So the CL-region will | ||||
| naturally return to normal controlled load service, but with reduced | ||||
| capacity. The drawback of this approach would be that until flows | ||||
| naturally depart to relieve the congestion, all flows and lower | ||||
| priority services will be adversely affected. As another example, an | ||||
| operator could use just admission control, avoiding heavy congestion | ||||
| (caused by re-routing) by 'capacity planning' - by configuring | ||||
| admission control thresholds to lower levels than the network could | ||||
| accept in normal situations such that the load after failure is | ||||
| expected to stay below acceptable levels even with reduced network | ||||
| resources. | ||||
| On the other hand, an operator could just rely for admission control | ||||
| on the traffic conditioning agreements of the DiffServ architecture | ||||
| [RFC2475]. The pre-emption mechanism described in this document would | ||||
| be used to counteract the problem described at the end of Section | ||||
| 1.1.1. | ||||
| 4. Summary of Functionality | 4. Summary of Functionality | |||
| This section is intended to provide a systematic summary of the new | This section is intended to provide a systematic summary of the new | |||
| functionality required by the routers in the CL-region. | functionality required by the routers in the CL-region. | |||
| A network operator upgrades normal IP routers by: | A network operator upgrades normal IP routers by: | |||
| o Adding functionality related to admission control and flow pre- | o Adding functionality related to admission control and flow pre- | |||
| emption to all its ingress and egress gateways | emption to all its ingress and egress gateways | |||
| skipping to change at page 31, line 13 | skipping to change at page 31, line 13 | |||
| (and, if needed, the pre-emption mechanism) to sort things out. | (and, if needed, the pre-emption mechanism) to sort things out. | |||
| 5. Limitations and some potential solutions | 5. Limitations and some potential solutions | |||
| In this section we describe various limitations of the deployment | In this section we describe various limitations of the deployment | |||
| model, and some suggestions about potential ways of alleviating them. | model, and some suggestions about potential ways of alleviating them. | |||
| The limitations fall into three broad categories: | The limitations fall into three broad categories: | |||
| o ECMP (Section 5.1): the assumption about routing (Section 2.2) is | o ECMP (Section 5.1): the assumption about routing (Section 2.2) is | |||
| that all packets between a pair of ingress and egress gateways | that all packets between a pair of ingress and egress gateways | |||
| follow the same path; ECMP breaks this assumption | follow the same path; ECMP breaks this assumption. A study | |||
| regarding the accuracy of load balancing schemes can be found in | ||||
| [LoadBalancing-a] and [LoadBalancing-b]. | ||||
| o The lack of global coordination (Sections 5.2, 5.3 and 5.4): a | o The lack of global coordination (Sections 5.2, 5.3 and 5.4): a | |||
| decision about admission control or flow pre-emption is made for | decision about admission control or flow pre-emption is made for | |||
| one aggregate independently of other aggregates | one aggregate independently of other aggregates | |||
| o Timing and accuracy of measurements (Sections 5.5 and 5.6): the | o Timing and accuracy of measurements (Sections 5.5 and 5.6): the | |||
| assumption (Section 2.2) that additional load, offered within the | assumption (Section 2.2) that additional load, offered within the | |||
| reaction time of the measurement-based admission control | reaction time of the measurement-based admission control | |||
| mechanism, doesn't move the system directly from no congestion to | mechanism, doesn't move the system directly from no congestion to | |||
| overload (dropping packets). A 'flash crowd' may break this | overload (dropping packets). A 'flash crowd' may break this | |||
| skipping to change at page 32, line 42 | skipping to change at page 32, line 43 | |||
| or are pre-empted), and there is still the danger that for some | or are pre-empted), and there is still the danger that for some | |||
| traffic mixes the operator hasn't been cautious enough. | traffic mixes the operator hasn't been cautious enough. | |||
| o for admission control, probe to obtain a flow-specific congestion- | o for admission control, probe to obtain a flow-specific congestion- | |||
| level-estimate. Earlier this document suggests continuously | level-estimate. Earlier this document suggests continuously | |||
| monitoring the congestion-level-estimate. Instead, probe packets | monitoring the congestion-level-estimate. Instead, probe packets | |||
| could be sent for each prospective new flow. The probe packets | could be sent for each prospective new flow. The probe packets | |||
| have the same IP address etc as the data packets would have, and | have the same IP address etc as the data packets would have, and | |||
| hence follow the same ECMP path. However, probing is an extra | hence follow the same ECMP path. However, probing is an extra | |||
| overhead, depending on how many probe packets need to be sent to | overhead, depending on how many probe packets need to be sent to | |||
| get a sufficiently accurate congestion-level-estimate. | get a sufficiently accurate congestion-level-estimate. Probes also | |||
| cause a processing overhead, either for the machine at the | ||||
| destination address or for the egress gateway to identify and | ||||
| remove the probe packets. | ||||
| o for flow pre-emption, only select flows for pre-emption from | o for flow pre-emption, only select flows for pre-emption from | |||
| amongst those that have actually received a Pre-emption Marked | amongst those that have actually received a Pre-emption Marked | |||
| packet. Because these flows must have followed an ECMP path that | packet. Because these flows must have followed an ECMP path that | |||
| goes through an overloaded router. However, it needs some extra | goes through an overloaded router. However, it needs some extra | |||
| work by the egress gateway, to record this information and report | work by the egress gateway, to record this information and report | |||
| it to the ingress gateway. | it to the ingress gateway. | |||
| o for flow pre-emption, a variant of this idea involves introducing | o for flow pre-emption, a variant of this idea involves introducing | |||
| a new marking behaviour, 'Router Marking'. A router that is pre- | a new marking behaviour, 'Router Marking'. A router that is pre- | |||
| skipping to change at page 43, line 36 | skipping to change at page 43, line 47 | |||
| (Section 2.2), so that the CL-region could consist of multiple | (Section 2.2), so that the CL-region could consist of multiple | |||
| domains run by different operators that did not trust each other. | domains run by different operators that did not trust each other. | |||
| Then only the ingress and egress gateways of the CL-region would take | Then only the ingress and egress gateways of the CL-region would take | |||
| part in the admission control procedure, i.e. at the ingress to the | part in the admission control procedure, i.e. at the ingress to the | |||
| first domain and the egress from the final domain. The border routers | first domain and the egress from the final domain. The border routers | |||
| between operators within the CL-region would only have to do bulk | between operators within the CL-region would only have to do bulk | |||
| accounting - they wouldn't do per microflow metering and policing, | accounting - they wouldn't do per microflow metering and policing, | |||
| and they wouldn't take part in signal processing or hold per flow | and they wouldn't take part in signal processing or hold per flow | |||
| state [Briscoe]. [Re-feedback] explains how a downstream domain can | state [Briscoe]. [Re-feedback] explains how a downstream domain can | |||
| police that its upstream domain does not 'cheat' by admitting traffic | police that its upstream domain does not 'cheat' by admitting traffic | |||
| when the downstream path is over-congested. [Re-PCN] proposes how to | when the downstream path is congested. [Re-PCN] proposes how to | |||
| achieve this with the help of another recently proposed extension to | achieve this with the help of another recently proposed extension to | |||
| ECN, involving re-echoing ECN feedback [Re-ECN]. | ECN, involving re-echoing ECN feedback [Re-ECN]. | |||
| 5.7.3. Preferential dropping of pre-emption marked packets | 5.7.3. Preferential dropping of pre-emption marked packets | |||
| When the rate of real-time traffic in the specified class exceeds the | When the rate of real-time traffic in the specified class exceeds the | |||
| maximum configured rate, then a router has to drop some packet(s) | maximum configured rate, then a router has to drop some packet(s) | |||
| instead of forwarding them on the out-going link. Now when the egress | instead of forwarding them on the out-going link. Now when the egress | |||
| gateway measures the Sustainable-Aggregate-Rate, neither dropped | gateway measures the Sustainable-Aggregate-Rate, neither dropped | |||
| packets nor pre-emption marked packets contribute to it. Dropping | packets nor pre-emption marked packets contribute to it. Dropping | |||
| skipping to change at page 45, line 9 | skipping to change at page 45, line 20 | |||
| aggregation assumption (Section 2.2) doesn't hold. In the extreme it | aggregation assumption (Section 2.2) doesn't hold. In the extreme it | |||
| may be possible to operate the framework end-to-end, i.e. between end | may be possible to operate the framework end-to-end, i.e. between end | |||
| hosts. One potential method is to send probe packets to test whether | hosts. One potential method is to send probe packets to test whether | |||
| the network can support a prospective new CL microflow. The probe | the network can support a prospective new CL microflow. The probe | |||
| packets would be sent at the same traffic rate as expected for the | packets would be sent at the same traffic rate as expected for the | |||
| actual microflow, but in order not to disturb existing CL traffic a | actual microflow, but in order not to disturb existing CL traffic a | |||
| router would always schedule probe packets behind CL ones (compare | router would always schedule probe packets behind CL ones (compare | |||
| [Breslau00]); this implies they have a new DSCP. Otherwise the | [Breslau00]); this implies they have a new DSCP. Otherwise the | |||
| routers would treat probe packets identically to CL packets. In order | routers would treat probe packets identically to CL packets. In order | |||
| to perform admission control quickly, in parts of the network where | to perform admission control quickly, in parts of the network where | |||
| there are only a few CL microflows, the Pre-Congestion marking | there are only a few CL microflows, the algorithm for Admission | |||
| behaviour for probe packets would switch from admission marking no | Marking described in [PCN] would need to "switch on" very rapidly, ie | |||
| packets to admission marking them all for only a minimal increase in | go from marking no packets to marking them all for only a minimal | |||
| load. | increase in the size of the virtual queue. | |||
| 5.7.6. MPLS-TE | 5.7.6. MPLS-TE | |||
| [ECN-MPLS] discusses how to extend the deployment model to MPLS, i.e. | [ECN-MPLS] discusses how to extend the deployment model to MPLS, i.e. | |||
| for admission control of microflows into a set of MPLS-TE aggregates | for admission control of microflows into a set of MPLS-TE aggregates | |||
| (Multi-protocol label switching traffic engineering). It would | (Multi-protocol label switching traffic engineering). It would | |||
| require that the MPLS header could include the ECN field, which is | require that the MPLS header could include the ECN field, which is | |||
| not precluded by RFC3270. See [ECN-MPLS]. | not precluded by RFC3270. See [ECN-MPLS]. | |||
| 6. Relationship to other QoS mechanisms | 6. Relationship to other QoS mechanisms | |||
| skipping to change at page 46, line 50 | skipping to change at page 46, line 50 | |||
| indications of network resource availability. In practice, service | indications of network resource availability. In practice, service | |||
| providers rely on subscription-time Service Level Agreements (SLAs) | providers rely on subscription-time Service Level Agreements (SLAs) | |||
| that statically define the parameters of the traffic that will be | that statically define the parameters of the traffic that will be | |||
| accepted from a customer. The CL mechanism allows dynamic reservation | accepted from a customer. The CL mechanism allows dynamic reservation | |||
| of resources through the DiffServ domain and, with the potential | of resources through the DiffServ domain and, with the potential | |||
| extension mentioned in Section 5.7.2, it can span multiple domains | extension mentioned in Section 5.7.2, it can span multiple domains | |||
| without active policing mechanisms at the borders (unlike DiffServ). | without active policing mechanisms at the borders (unlike DiffServ). | |||
| Therefore we do not use the traffic conditioning agreements (TCAs) of | Therefore we do not use the traffic conditioning agreements (TCAs) of | |||
| the (informational) DiffServ architecture [RFC2475]. | the (informational) DiffServ architecture [RFC2475]. | |||
| An important benefit arises from the fact that the load is controlled | ||||
| dynamically rather than with traffic conditioning agreements (TCAs). | ||||
| TCAs were originally introduced in the (informational) DiffServ | ||||
| architecture [RFC2475] as an alternative to reservation processing in | ||||
| the interior region in order to reduce the burden on interior | ||||
| routers. With TCAs, in practice service providers rely on | ||||
| subscription-time Service Level Agreements that statically define the | ||||
| parameters of the traffic that will be accepted from a customer. The | ||||
| problem arises because the TCA at the ingress must allow any | ||||
| destination address, if it is to remain scalable. But for longer | ||||
| topologies, the chances increase that traffic will focus on an | ||||
| interior resource, even though it is within contract at the ingress | ||||
| [Reid], e.g. all flows converge on the same egress gateway. Even | ||||
| though networks can be engineered to make such failures rare, when | ||||
| they occur all inelastic flows through the congested resource fail | ||||
| catastrophically. | ||||
| [Johnson] compares admission control with a 'generously dimensioned' | [Johnson] compares admission control with a 'generously dimensioned' | |||
| DiffServ network as ways to achieve QoS. The former is recommended. | DiffServ network as ways to achieve QoS. The former is recommended. | |||
| 6.4. ECN | 6.4. ECN | |||
| The marking behaviour described in this document complies with the | The marking behaviour described in this document complies with the | |||
| ECN aspects of the IP wire protocol RFC3168, but provides its own | ECN aspects of the IP wire protocol RFC3168, but provides its own | |||
| edge-to-edge feedback instead of the TCP aspects of RFC3168. All | edge-to-edge feedback instead of the TCP aspects of RFC3168. All | |||
| routers within the CL-region are upgraded with the admission marking | routers within the CL-region are upgraded with the admission marking | |||
| and pre-emption marking of Pre-Congestion Notification, so the | and pre-emption marking of Pre-Congestion Notification, so the | |||
| skipping to change at page 49, line 5 | skipping to change at page 48, line 38 | |||
| Multi-protocol label switching traffic engineering (MPLS-TE) allows | Multi-protocol label switching traffic engineering (MPLS-TE) allows | |||
| scalable reservation of resources in the core for an aggregate of | scalable reservation of resources in the core for an aggregate of | |||
| many microflows. To achieve end-to-end reservations, admission | many microflows. To achieve end-to-end reservations, admission | |||
| control and policing of microflows into the aggregate can be achieved | control and policing of microflows into the aggregate can be achieved | |||
| using techniques such as RSVP Aggregation over MPLS TE Tunnels as per | using techniques such as RSVP Aggregation over MPLS TE Tunnels as per | |||
| [AGGRE-TE]. However, in the case of inter-provider environments, | [AGGRE-TE]. However, in the case of inter-provider environments, | |||
| these techniques require that admission control and policing be | these techniques require that admission control and policing be | |||
| repeated at each trust boundary or that MPLS TE tunnels span multiple | repeated at each trust boundary or that MPLS TE tunnels span multiple | |||
| domains. | domains. | |||
| 6.8. Other Network Admission Control Approaches | ||||
| Link admission control (LAC) describes how admission control (AC) can | ||||
| be done on a single link and comprises, e.g., the calculation of | ||||
| effective bandwidths which may be the base for a parameter-based AC. | ||||
| In contrast, network AC (NAC) describes how AC can be done for a | ||||
| network and focuses on the locations from which data is gathered for | ||||
| the admission decision. Most approaches implement a link budget based | ||||
| NAC (LB NAC) where each link has a certain AC-budget. RSVP works | ||||
| according to that principle, but also the new concept admits | ||||
| additional flows as long as each link on the new flow's path still | ||||
| has resources available. The border-to-border budget based NAC (BBB | ||||
| NAC) pre-configures an AC budget for all border-to-border | ||||
| relationships (= CL-region-aggregates) and if this capacity budget is | ||||
| exhausted, new flows are rejected. The TCA-based admission control | ||||
| which is associated with the DiffServ architecture implements an | ||||
| ingress budget based NAC (IB NAC). These basically different concepts | ||||
| have different flexibility and efficiency with regard to the use of | ||||
| link bandwidths [NAC-a,NAC-b]. They can be made resilient by choosing | ||||
| the budgets in such a way that the network will not be congested | ||||
| after rerouting due to a failure. The efficiency of the approaches is | ||||
| different with and without such resilient requirements. | ||||
| 7. Security Considerations | 7. Security Considerations | |||
| To protect against denial of service attacks, the ingress gateway of | To protect against denial of service attacks, the ingress gateway of | |||
| the CL-region needs to police all CL packets and drop packets in | the CL-region needs to police all CL packets and drop packets in | |||
| excess of the reservation. This is similar to operations with | excess of the reservation. This is similar to operations with | |||
| existing IntServ behaviour. | existing IntServ behaviour. | |||
| For pre-emption, it is considered acceptable from a security | For pre-emption, it is considered acceptable from a security | |||
| perspective that the ingress gateway can treat "emergency/military" | perspective that the ingress gateway can treat "emergency/military" | |||
| CL flows preferentially compared with "ordinary" CL flows. However, | CL flows preferentially compared with "ordinary" CL flows. However, | |||
| skipping to change at page 49, line 39 | skipping to change at page 50, line 5 | |||
| The admission control mechanism evolved from the work led by Martin | The admission control mechanism evolved from the work led by Martin | |||
| Karsten on the Guaranteed Stream Provider developed in the M3I | Karsten on the Guaranteed Stream Provider developed in the M3I | |||
| project [GSPa, GSP-TR], which in turn was based on the theoretical | project [GSPa, GSP-TR], which in turn was based on the theoretical | |||
| work of Gibbens and Kelly [DCAC]. Kennedy Cheng, Gabriele Corliano, | work of Gibbens and Kelly [DCAC]. Kennedy Cheng, Gabriele Corliano, | |||
| Carla Di Cairano-Gilfedder, Kashaf Khan, Peter Hovell, Arnaud Jacquet | Carla Di Cairano-Gilfedder, Kashaf Khan, Peter Hovell, Arnaud Jacquet | |||
| and June Tay (BT) helped develop and evaluate this approach. | and June Tay (BT) helped develop and evaluate this approach. | |||
| Many thanks to those who have commented on this work at Transport | Many thanks to those who have commented on this work at Transport | |||
| Area Working Group meetings and on the mailing list, including: Ken | Area Working Group meetings and on the mailing list, including: Ken | |||
| Carlberg, Ruediger Geib, Lars Westberg, David Black, Robert Hancock, | Carlberg, Ruediger Geib, Lars Westberg, David Black, Robert Hancock, | |||
| Cornelia Kappler. | Cornelia Kappler, Michael Menth. | |||
| 9. Comments solicited | 9. Comments solicited | |||
| Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
| sent to the Transport Area Working Group's mailing list, | sent to the Transport Area Working Group's mailing list, | |||
| tsvwg@ietf.org, and/or to the authors. | tsvwg@ietf.org, and/or to the authors. | |||
| 10. Changes from earlier versions of the draft | 10. Changes from earlier versions of the draft | |||
| The main changes are: | The main changes are: | |||
| skipping to change at page 51, line 5 | skipping to change at page 51, line 5 | |||
| Section 5 has been updated and expanded. It is now about the | Section 5 has been updated and expanded. It is now about the | |||
| 'limitations' of the PCN mechanism, as described in the earlier | 'limitations' of the PCN mechanism, as described in the earlier | |||
| sections, plus discussion of 'possible solutions' to those | sections, plus discussion of 'possible solutions' to those | |||
| limitations. | limitations. | |||
| The measurement of the Congestion-Level-Estimate now includes pre- | The measurement of the Congestion-Level-Estimate now includes pre- | |||
| emption marked packets as well as admission marked ones. Section | emption marked packets as well as admission marked ones. Section | |||
| 3.1.2 explains. | 3.1.2 explains. | |||
| From -03 to -04 | ||||
| Detailed review by Michael Menth. In response, Abstract, Summary and | ||||
| Key benefits sections re-written. Numerous detailed comments on | ||||
| Sections 5 and following sections. | ||||
| 11. Appendices | 11. Appendices | |||
| 11.1. Appendix A: Explicit Congestion Notification | 11.1. Appendix A: Explicit Congestion Notification | |||
| This Appendix provides a brief summary of Explicit Congestion | This Appendix provides a brief summary of Explicit Congestion | |||
| Notification (ECN). | Notification (ECN). | |||
| [RFC3168] specifies the incorporation of ECN to TCP and IP, including | [RFC3168] specifies the incorporation of ECN to TCP and IP, including | |||
| ECN's use of two bits in the IP header. It specifies a method for | ECN's use of two bits in the IP header. It specifies a method for | |||
| indicating incipient congestion to end-hosts (e.g. as in RED, Random | indicating incipient congestion to end-hosts (e.g. as in RED, Random | |||
| skipping to change at page 52, line 5 | skipping to change at page 53, line 7 | |||
| The CE codepoint '11' is set by a router to indicate congestion to | The CE codepoint '11' is set by a router to indicate congestion to | |||
| the end hosts. The term 'CE packet' denotes a packet that has the CE | the end hosts. The term 'CE packet' denotes a packet that has the CE | |||
| codepoint set. | codepoint set. | |||
| The ECN-Capable Transport (ECT) codepoints '10' and '01' (ECT(0) and | The ECN-Capable Transport (ECT) codepoints '10' and '01' (ECT(0) and | |||
| ECT(1) respectively) are set by the data sender to indicate that the | ECT(1) respectively) are set by the data sender to indicate that the | |||
| end-points of the transport protocol are ECN-capable. Routers treat | end-points of the transport protocol are ECN-capable. Routers treat | |||
| the ECT(0) and ECT(1) codepoints as equivalent. Senders are free to | the ECT(0) and ECT(1) codepoints as equivalent. Senders are free to | |||
| use either the ECT(0) or the ECT(1) codepoint to indicate ECT, on a | use either the ECT(0) or the ECT(1) codepoint to indicate ECT, on a | |||
| packet-by-packet basis. The use of both the two codepoints for ECT is | packet-by-packet basis. The motivation for having two codepoints (the | |||
| motivated primarily by the desire to allow mechanisms for the data | 'ECN nonce') is the desire to check two things: for the data sender | |||
| sender to verify that network elements are not erasing the CE | to verify that network elements are not erasing the CE codepoint; and | |||
| codepoint, and that data receivers are properly reporting to the | for the data sender to verify that data receivers are properly | |||
| sender the receipt of packets with the CE codepoint set. | reporting to the sender the receipt of packets with the CE codepoint | |||
| set. | ||||
| ECN requires support from the transport protocol, in addition to the | ECN requires support from the transport protocol, in addition to the | |||
| functionality given by the ECN field in the IP packet header. | functionality given by the ECN field in the IP packet header. | |||
| [RFC3168] addresses the addition of ECN Capability to TCP, specifying | [RFC3168] addresses the addition of ECN Capability to TCP, specifying | |||
| three new pieces of functionality: negotiation between the endpoints | three new pieces of functionality: negotiation between the endpoints | |||
| during connection setup to determine if they are both ECN-capable; an | during connection setup to determine if they are both ECN-capable; an | |||
| ECN-Echo (ECE) flag in the TCP header so that the data receiver can | ECN-Echo (ECE) flag in the TCP header so that the data receiver can | |||
| inform the data sender when a CE packet has been received; and a | inform the data sender when a CE packet has been received; and a | |||
| Congestion Window Reduced (CWR) flag in the TCP header so that the | Congestion Window Reduced (CWR) flag in the TCP header so that the | |||
| data sender can inform the data receiver that the congestion window | data sender can inform the data receiver that the congestion window | |||
| skipping to change at page 55, line 5 | skipping to change at page 56, line 5 | |||
| bits]n ) | bits]n ) | |||
| [EWMA-AM-bits]'n+1 = (B * bits-in-packet) + (w' * [EWMA-AM-bits]n | [EWMA-AM-bits]'n+1 = (B * bits-in-packet) + (w' * [EWMA-AM-bits]n | |||
| ) | ) | |||
| where w' = (1-w)/w. | where w' = (1-w)/w. | |||
| If w' is arranged to be a power of 2, these per packet algorithms can | If w' is arranged to be a power of 2, these per packet algorithms can | |||
| be implemented solely with a shift and an add. | be implemented solely with a shift and an add. | |||
| There are alternative possibilities for smoothing out the congestion- | ||||
| level-estimate. For example [TEWMA] deals better with the issue of | ||||
| stale information when the traffic rate for | ||||
| 12. References | 12. References | |||
| A later version will distinguish normative and informative | A later version will distinguish normative and informative | |||
| references. | references. | |||
| [AGGRE-TE] Francois Le Faucheur, Michael Dibiasio, Bruce Davie, | [AGGRE-TE] Francois Le Faucheur, Michael Dibiasio, Bruce Davie, | |||
| Michael Davenport, Chris Christou, Jerry Ash, Bur | Michael Davenport, Chris Christou, Jerry Ash, Bur | |||
| Goode, 'Aggregation of RSVP Reservations over MPLS | Goode, 'Aggregation of RSVP Reservations over MPLS | |||
| TE/DS-TE Tunnels', draft-ietf-tsvwg-rsvp-dste-03 (work | TE/DS-TE Tunnels', draft-ietf-tsvwg-rsvp-dste-03 (work | |||
| [ANSI.MLPP.Spec] American National Standards Institute, | [ANSI.MLPP.Spec] American National Standards Institute, | |||
| skipping to change at page 56, line 37 | skipping to change at page 58, line 4 | |||
| http://www.kom.e-technik.tu- | http://www.kom.e-technik.tu- | |||
| darmstadt.de/publications/abstracts/KS02-5.html (May, | darmstadt.de/publications/abstracts/KS02-5.html (May, | |||
| 2002) | 2002) | |||
| [ITU.MLPP.1990] International Telecommunications Union, "Multilevel | [ITU.MLPP.1990] International Telecommunications Union, "Multilevel | |||
| Precedence and Pre-emption Service (MLPP)", ITU-T | Precedence and Pre-emption Service (MLPP)", ITU-T | |||
| Recommendation I.255.3, 1990. | Recommendation I.255.3, 1990. | |||
| [Johnson] DM Johnson, 'QoS control versus generous | [Johnson] DM Johnson, 'QoS control versus generous | |||
| dimensioning', BT Technology Journal, Vol 23 No 2, | dimensioning', BT Technology Journal, Vol 23 No 2, | |||
| [LoadBalancing-a] Ruediger Martin, Michael Menth, and Michael | ||||
| Hemmkeppler: "Accuracy and Dynamics of Hash-Based Load | ||||
| Balancing Algorithms for Multipath Internet Routing", | ||||
| IEEE Broadnets, San Jose, CA, USA, October 2006 | ||||
| http://www3.informatik.uni- | ||||
| wuerzburg.de/~menth/Publications/Menth06p.pdf | ||||
| [LoadBalancing-b] Ruediger Martin, Michael Menth, and Michael | ||||
| Hemmkeppler: "Accuracy and Dynamics of Multi-Stage | ||||
| Load Balancing for Multipath Internet Routing", | ||||
| currently under submission http://www3.informatik.uni- | ||||
| wuerzburg.de/~menth/Publications/Menth07-Sub-6.pdf | ||||
| [Low] S. Low, L. Andrew, B. Wydrowski, 'Understanding XCP: | [Low] S. Low, L. Andrew, B. Wydrowski, 'Understanding XCP: | |||
| equilibrium and fairness', IEEE InfoCom 2005 | equilibrium and fairness', IEEE InfoCom 2005 | |||
| [NAC-a] Michael Menth: "Efficient Admission Control and | ||||
| Routing in Resilient Communication Networks", PhD | ||||
| thesis, July 2004, http://opus.bibliothek.uni- | ||||
| wuerzburg.de/opus/volltexte/2004/994/pdf/Menth04.pdf | ||||
| [NAC-b] Michael Menth, Stefan Kopf, Joachim Charzinski, and | ||||
| Karl Schrodi: "Resilient Network Admission Control", | ||||
| currently under submission. | ||||
| http://www3.informatik.uni- | ||||
| wuerzburg.de/~menth/Publications/Menth07-Sub-3.pdf | ||||
| [PCN] B. Briscoe, P. Eardley, D. Songhurst, F. Le Faucheur, | [PCN] B. Briscoe, P. Eardley, D. Songhurst, F. Le Faucheur, | |||
| A. Charny, V. Liatsos, S. Dudley, J. Babiarz, K. Chan, | A. Charny, V. Liatsos, S. Dudley, J. Babiarz, K. Chan, | |||
| G. Karagiannis, A. Bader, L. Westberg. 'Pre-Congestion | G. Karagiannis, A. Bader, L. Westberg. 'Pre-Congestion | |||
| Notification marking', draft-briscoe-tsvwg-cl-phb-02 | Notification marking', draft-briscoe-tsvwg-cl-phb-02 | |||
| (work in progress), June 2006. | (work in progress), June 2006. | |||
| [Re-ECN] Bob Briscoe, Arnaud Jacquet, Alessandro Salvatori, | [Re-ECN] Bob Briscoe, Arnaud Jacquet, Alessandro Salvatori, | |||
| 'Re-ECN: Adding Accountability for Causing Congestion | 'Re-ECN: Adding Accountability for Causing Congestion | |||
| to TCP/IP', draft-briscoe-tsvwg-re-ecn-tcp-01 (work in | to TCP/IP', draft-briscoe-tsvwg-re-ecn-tcp-01 (work in | |||
| progress), March 2006. | progress), March 2006. | |||
| End of changes. 57 change blocks. | ||||
| 358 lines changed or deleted | 405 lines changed or added | |||
This html diff was produced by rfcdiff 1.33. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||