| draft-briscoe-tsvwg-ecn-tunnel-01.txt | draft-ietf-tsvwg-ecn-tunnel-00.txt | |||
|---|---|---|---|---|
| Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
| Internet-Draft BT | Internet-Draft BT | |||
| Intended status: Standards Track July 14, 2008 | Intended status: Standards Track Oct 16, 2008 | |||
| Expires: January 15, 2009 | Expires: April 19, 2009 | |||
| Layered Encapsulation of Congestion Notification | Layered Encapsulation of Congestion Notification | |||
| draft-briscoe-tsvwg-ecn-tunnel-01 | draft-ietf-tsvwg-ecn-tunnel-00 | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 34 | skipping to change at page 1, line 34 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on January 15, 2009. | This Internet-Draft will expire on April 19, 2009. | |||
| Abstract | Abstract | |||
| This document redefines how the explicit congestion notification | This document redefines how the explicit congestion notification | |||
| (ECN) field of the outer IP header of a tunnel should be constructed. | (ECN) field of the outer IP header of a tunnel should be constructed. | |||
| It brings all IP in IP tunnels (v4 or v6) into line with the way | It brings all IP in IP tunnels (v4 or v6) into line with the way | |||
| IPsec tunnels now construct the ECN field. It includes a thorough | IPsec tunnels now construct the ECN field. It includes a thorough | |||
| analysis of the reasoning for this change and the implications. It | analysis of the reasoning for this change and the implications. It | |||
| also gives guidelines on the encapsulation of IP congestion | also gives guidelines on the encapsulation of IP congestion | |||
| notification by any outer header, whether encapsulated in an IP | notification by any outer header, whether encapsulated in an IP | |||
| tunnel or in a lower layer header. Following these guidelines should | tunnel or in a lower layer header. Following these guidelines should | |||
| help interworking, if the IETF or other standards bodies specify any | help interworking, if the IETF or other standards bodies specify any | |||
| new encapsulation of congestion notification. | new encapsulation of congestion notification. | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.1. The Need for Rationalisation . . . . . . . . . . . . . . . 4 | 1.1. The Need for Rationalisation . . . . . . . . . . . . . . . 5 | |||
| 1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 5 | 1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 8 | 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 8 | |||
| 3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 8 | 3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 8 | 3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 8 | |||
| 3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 10 | 3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 10 | |||
| 3.3. Management Constraints . . . . . . . . . . . . . . . . . . 11 | 3.3. Management Constraints . . . . . . . . . . . . . . . . . . 12 | |||
| 4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 12 | 4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 4.1. Design Guidelines for New Encapsulations of Congestion | 4.1. Design Guidelines for New Encapsulations of Congestion | |||
| Notification . . . . . . . . . . . . . . . . . . . . . . . 13 | Notification . . . . . . . . . . . . . . . . . . . . . . . 14 | |||
| 5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 15 | 5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 15 | |||
| 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 16 | 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 16 | |||
| 7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 18 | 7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 18 | |||
| 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 | |||
| 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 21 | 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 | 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
| 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 22 | 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| 13.1. Normative References . . . . . . . . . . . . . . . . . . . 22 | 13.1. Normative References . . . . . . . . . . . . . . . . . . . 23 | |||
| 13.2. Informative References . . . . . . . . . . . . . . . . . . 23 | 13.2. Informative References . . . . . . . . . . . . . . . . . . 23 | |||
| Appendix A. Why resetting CE on encapsulation harms PCN . . . . . 25 | Appendix A. Why resetting CE on encapsulation harms PCN . . . . . 25 | |||
| Appendix B. Contribution to Congestion across a Tunnel . . . . . 25 | Appendix B. Contribution to Congestion across a Tunnel . . . . . 26 | |||
| Appendix C. Ideal Decapsulation Rules . . . . . . . . . . . . . . 27 | Appendix C. Ideal Decapsulation Rules . . . . . . . . . . . . . . 27 | |||
| Appendix D. Non-Dependence of Tunnelling on In-path Load | Appendix D. Non-Dependence of Tunnelling on In-path Load | |||
| Regulation . . . . . . . . . . . . . . . . . . . . . 28 | Regulation . . . . . . . . . . . . . . . . . . . . . 29 | |||
| D.1. Dependence of In-Path Load Regulation on Tunnelling . . . 29 | D.1. Dependence of In-Path Load Regulation on Tunnelling . . . 30 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 32 | Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 33 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . . 34 | Intellectual Property and Copyright Statements . . . . . . . . . . 34 | |||
| Changes from previous drafts (to be removed by the RFC Editor) | Changes from previous drafts (to be removed by the RFC Editor) | |||
| From briscoe-01 to ietf-00 (current): | ||||
| * Re-wrote Appendix B giving much simpler technique to measure | ||||
| contribution to congestion across a tunnel. | ||||
| * Added discussion of backward compatibility of the ideal | ||||
| decapsulation scheme in Appendix C | ||||
| * Updated references. Minor corrections & clarifications | ||||
| throughout. | ||||
| From -00 to -01: | From -00 to -01: | |||
| * Related everything conceptually to the uniform and pipe models | * Related everything conceptually to the uniform and pipe models | |||
| of RFC2983 on Diffserv Tunnels, and completely removed the | of RFC2983 on Diffserv Tunnels, and completely removed the | |||
| dependence of tunnelling behaviour on the presence of any in- | dependence of tunnelling behaviour on the presence of any in- | |||
| path load regulation by using the [1 - Before] [2 - Outer] | path load regulation by using the [1 - Before] [2 - Outer] | |||
| function placement concepts from RFC2983. | function placement concepts from RFC2983; | |||
| * Added specifc cases where the existing standards limit new | * Added specific cases where the existing standards limit new | |||
| proposals. | proposals, particularly Appendix A; | |||
| * Added sub-structure to Introduction (Need for Rationalisation, | * Added sub-structure to Introduction (Need for Rationalisation, | |||
| Roadmap), added new Introductory subsection on "Scope" and | Roadmap), added new Introductory subsection on "Scope" and | |||
| improved clarity | improved clarity; | |||
| * Added Design Guidelines for New Encapsulations of Congestion | * Added Design Guidelines for New Encapsulations of Congestion | |||
| Notification | Notification (Section 4.1); | |||
| * Considerably clarified the Backward Compatibility section | * Considerably clarified the Backward Compatibility section | |||
| (Section 6); | ||||
| * Considerably extended the Security Considerations section | * Considerably extended the Security Considerations section | |||
| (Section 9); | ||||
| * Summarised the primary rationale much better in the conclusions | * Summarised the primary rationale much better in the | |||
| conclusions; | ||||
| * Added numerous extra acknowledgements | * Added numerous extra acknowledgements; | |||
| * Added Appendix A. "Why resetting CE on encapsulation harms | * Added Appendix A. "Why resetting CE on encapsulation harms | |||
| PCN", Appendix B. "Contribution to Congestion across a Tunnel" | PCN", Appendix B. "Contribution to Congestion across a Tunnel" | |||
| and Appendix C. "Ideal Decapsulation Rules" | and Appendix C. "Ideal Decapsulation Rules"; | |||
| * Changed Appendix A "In-path Load Regulation" to "Non-Dependence | * Re-wrote Appendix D, explaining how tunnel encapsulation no | |||
| of Tunnelling on In-path Load Regulation" and added sub-section | longer depends on in-path load-regulation (changed title from | |||
| on "Dependence of In-Path Load Regulation on Tunnelling" | "In-path Load Regulation" to "Non-Dependence of Tunnelling on | |||
| In-path Load Regulation"), but explained how an in-path load | ||||
| regulation function must be carefully placed with respect to | ||||
| tunnel encapsulation (in a new sub-section entitled "Dependence | ||||
| of In-Path Load Regulation on Tunnelling"). | ||||
| 1. Introduction | 1. Introduction | |||
| This document redefines how the explicit congestion notification | This document redefines how the explicit congestion notification | |||
| (ECN) field [RFC3168] of the outer IP header of a tunnel should be | (ECN) field [RFC3168] of the outer IP header of a tunnel should be | |||
| constructed. It brings all IP in IP tunnels (v4 or v6) into line | constructed. It brings all IP in IP tunnels (v4 or v6) into line | |||
| with the way IPsec tunnels [RFC4301] now construct the ECN field, | with the way IPsec tunnels [RFC4301] now construct the ECN field, | |||
| ensuring that the outer header reveals any congestion experienced so | ensuring that the outer header reveals any congestion experienced so | |||
| far on the whole path, not just since the last tunnel ingress. | far on the whole path, not just since the last tunnel ingress. | |||
| skipping to change at page 5, line 38 | skipping to change at page 6, line 9 | |||
| makes it harder to design networks and new protocols that work | makes it harder to design networks and new protocols that work | |||
| predictably. | predictably. | |||
| Already complicated constraints have had to be added to a standards | Already complicated constraints have had to be added to a standards | |||
| track congestion marking proposal. The section of the pre-congestion | track congestion marking proposal. The section of the pre-congestion | |||
| notification (PCN) architecture [I-D.ietf-pcn-architecture] on | notification (PCN) architecture [I-D.ietf-pcn-architecture] on | |||
| tunnelling says PCN works correctly in the presence of RFC4301 IPsec | tunnelling says PCN works correctly in the presence of RFC4301 IPsec | |||
| encapsulation (and RFC5129 MPLS encapsulation). However it doesn't | encapsulation (and RFC5129 MPLS encapsulation). However it doesn't | |||
| work with RFC3168 IP in IP encapsulation (Appendix A explains why). | work with RFC3168 IP in IP encapsulation (Appendix A explains why). | |||
| Section 3 assesses further security, control and management functions | To ensure we do not cause any unintended side-effects, Section 3 | |||
| that cannot be achieved in each case (resetting vs copying CE | assesses whether copying or resetting CE would harm any security, | |||
| markings). It finds that resetting CE makes life difficult in a | control or management functions. It finds that resetting CE makes | |||
| number of directions, while copying CE harms nothing (other than | life difficult in a number of directions, while copying CE harms | |||
| opening a low bit-rate covert channel vulnerability which the | nothing (other than opening a low bit-rate covert channel | |||
| Security Area deems is manageable). | vulnerability which the IETF Security Area deems is manageable). | |||
| 1.2. Document Roadmap | 1.2. Document Roadmap | |||
| Most of the document gives a thorough analysis of the knock-on | Most of the document gives a thorough analysis of the knock-on | |||
| effects of the apparently minor change to tunnel encapsulation. The | effects of the apparently minor change to tunnel encapsulation. The | |||
| reader may jump to Section 5 if only interested in standards actions | reader may jump to Section 5 if only interested in standards actions | |||
| impacting implementation. The whole document is organised as | impacting implementation. The whole document is organised as | |||
| follows: | follows: | |||
| o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to | o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to | |||
| 'switch in' different behaviours for marking the ECN field, just | 'switch in' different behaviours for marking the ECN field, just | |||
| as it switches in different per-hop behaviours (PHBs) for | as it switches in different per-hop behaviours (PHBs) for | |||
| scheduling. Therefore we cannot only discuss the ECN protocol | scheduling. Therefore we cannot only discuss the ECN protocol | |||
| that RFC3168 gives as a default. We need to also give guidance | that RFC3168 gives as a default. Instead, Section 3 lays out the | |||
| for possible different marking schemes. Therefore in Section 3 we | design constraints when tunnelling congestion notification without | |||
| lay out the design constraints when tunnelling congestion | assuming a particular congestion marking scheme. | |||
| notification. | ||||
| o Then in Section 4 we resolve the tensions between these | o Then in Section 4 we resolve the tensions between these | |||
| constraints to give general design principles and guidelines on | constraints to give general design principles and guidelines on | |||
| how a tunnel should process congestion notification; principles | how a tunnel should process congestion notification; principles | |||
| that could apply to any marking behaviour for any PHB, not just | that could apply to any marking behaviour for any PHB, not just | |||
| the default in RFC3168. In particular, we examine the underlying | the default in RFC3168. In particular, we examine the underlying | |||
| principles behind whether CE should be reset or copied into the | principles behind whether CE should be reset or copied into the | |||
| outer header at the ingress to a tunnel--or indeed at the ingress | outer header at the ingress to a tunnel--or indeed at the ingress | |||
| of any layered encapsulation of headers with congestion | of any layered encapsulation of headers with congestion | |||
| notification fields. We end this section with a bulleted list of | notification fields. We end this section with a bulleted list of | |||
| more design guidelines for new encapsulations of congestion | design guidelines for new encapsulations of congestion | |||
| notification. | notification. | |||
| o Section 5 then uses precise standards terminology to confirm the | o Section 5 then uses precise standards terminology to confirm the | |||
| rules for the default ECN tunnelling behaviour based on the above | rules for the default ECN tunnelling behaviour based on the above | |||
| design principles. | design principles. | |||
| o Extending the new IPsec tunnel ingress behaviour to all IP in IP | o Extending the new IPsec tunnel ingress behaviour to all IP in IP | |||
| tunnels requires consideration of backwards compatibility, which | tunnels requires consideration of backwards compatibility, which | |||
| is covered in Section 6 and changes from earlier RFCs are brought | is covered in Section 6 and changes from earlier RFCs are brought | |||
| together in Section 7. | together in Section 7. | |||
| skipping to change at page 7, line 34 | skipping to change at page 8, line 4 | |||
| As well as guiding alternate IP in IP tunnelling schemes, the design | As well as guiding alternate IP in IP tunnelling schemes, the design | |||
| guidelines of Section 4 are intended to be followed when IP packets | guidelines of Section 4 are intended to be followed when IP packets | |||
| are encapsulated by any connectionless datagram/packet/frame where | are encapsulated by any connectionless datagram/packet/frame where | |||
| the outer header is designed to support a congestion notification | the outer header is designed to support a congestion notification | |||
| capability. [RFC5129] already deals with handling ECN for IP in MPLS | capability. [RFC5129] already deals with handling ECN for IP in MPLS | |||
| and MPLS in MPLS, and S.9.3 of [RFC3168] lists IP encapsulated in | and MPLS in MPLS, and S.9.3 of [RFC3168] lists IP encapsulated in | |||
| L2TP [RFC2661], GRE [RFC1701] or PPTP [RFC2637] as possible examples | L2TP [RFC2661], GRE [RFC1701] or PPTP [RFC2637] as possible examples | |||
| where ECN may be added in future. | where ECN may be added in future. | |||
| Of course, the IETF does not have standards authority over every link | Of course, the IETF does not have standards authority over every link | |||
| or tunnel protocol, so this document merely aims to define the | or tunnel protocol, so this document merely aims to guide the | |||
| interface between IP ECN and lower layer congestion notification. | interface between IP ECN and lower layer congestion notification. | |||
| Then the IETF or the relevant standards body can be free to define | Then the IETF or the relevant standards body can be free to define | |||
| the specifics of each lower layer scheme, but a common interface | the specifics of each lower layer scheme, but a common interface | |||
| should ensure interworking across all technologies. | should ensure interworking across all technologies. | |||
| Note that just because there is forward congestion notification in a | Note that just because there is forward congestion notification in a | |||
| lower layer protocol, if the lower layer has its own feedback and | lower layer protocol, if the lower layer has its own feedback and | |||
| load regulation, there is no need to propagate it up the layers. For | load regulation, there is no need to propagate it up the layers. For | |||
| instance, FECN (forward ECN) has been present in Frame Relay and EFCI | instance, FECN (forward ECN) has been present in Frame Relay and EFCI | |||
| (explicit forward congestion indication) in ATM [ITU-T.I.371] for a | (explicit forward congestion indication) in ATM [ITU-T.I.371] for a | |||
| long time, but they have been used for internal management rather | long time. But so far they have been used for internal management | |||
| than being propagated to endpoint transports for them to control end- | rather than being propagated to endpoint transports for them to | |||
| to-end congestion. | control end-to-end congestion. | |||
| [RFC2983] is a comprehensive primer on differentiated services and | [RFC2983] is a comprehensive primer on differentiated services and | |||
| tunnels. Given ECN raises similar issues to differentiated services | tunnels. Given ECN raises similar issues to differentiated services | |||
| when interacting with tunnels, useful concepts introduced in RFC2983 | when interacting with tunnels, useful concepts introduced in RFC2983 | |||
| are used throughout, with brief recaps of the explanations where | are used throughout, with brief recaps of the explanations where | |||
| necessary. | necessary. | |||
| 2. Requirements Language | 2. Requirements Language | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| skipping to change at page 8, line 31 | skipping to change at page 8, line 49 | |||
| Information security can be assured by using various end to end | Information security can be assured by using various end to end | |||
| security solutions (including IPsec in transport mode [RFC4301]), but | security solutions (including IPsec in transport mode [RFC4301]), but | |||
| a commonly used scenario involves the need to communicate between two | a commonly used scenario involves the need to communicate between two | |||
| physically protected domains across the public Internet. In this | physically protected domains across the public Internet. In this | |||
| case there are certain management advantages to using IPsec in tunnel | case there are certain management advantages to using IPsec in tunnel | |||
| mode solely across the publicly accessible part of the path. The | mode solely across the publicly accessible part of the path. The | |||
| path followed by a packet then crosses security 'domains'; the ones | path followed by a packet then crosses security 'domains'; the ones | |||
| protected by physical or other means before and after the tunnel and | protected by physical or other means before and after the tunnel and | |||
| the one protected by an IPsec tunnel across the otherwise unprotected | the one protected by an IPsec tunnel across the otherwise unprotected | |||
| domain. We will use the scenario in Figure 1 where endpoints 'A' and | domain. We will use the scenario in Figure 1 where endpoints 'A' and | |||
| 'B' communicate through a tunnel with ingress 'I' and egress 'E' | 'B' communicate through a tunnel. The tunnel ingress 'I' and egress | |||
| within physically protected edge domains across an unprotected | 'E' are within physically protected edge domains, while the tunnel | |||
| internetwork where there may be 'men in the middle', M. | spans an unprotected internetwork where there may be 'men in the | |||
| middle', M. | ||||
| physically unprotected physically | physically unprotected physically | |||
| <-protected domain-><--domain--><-protected domain-> | <-protected domain-><--domain--><-protected domain-> | |||
| +------------------+ +------------------+ | +------------------+ +------------------+ | |||
| | | M | | | | | M | | | |||
| | A-------->I=========>==========>E-------->B | | | A-------->I=========>==========>E-------->B | | |||
| | | | | | | | | | | |||
| +------------------+ +------------------+ | +------------------+ +------------------+ | |||
| <----IPsec secured----> | <----IPsec secured----> | |||
| tunnel | tunnel | |||
| skipping to change at page 9, line 23 | skipping to change at page 9, line 42 | |||
| of the inner header. And if 'E' copies these fields from the outer | of the inner header. And if 'E' copies these fields from the outer | |||
| header to the inner, even if it validates authentication from 'I', it | header to the inner, even if it validates authentication from 'I', it | |||
| will have allowed a covert channel from 'M' to 'B'. | will have allowed a covert channel from 'M' to 'B'. | |||
| ECN at the IP layer is designed to carry information about congestion | ECN at the IP layer is designed to carry information about congestion | |||
| from a congested resource towards downstream nodes. Typically a | from a congested resource towards downstream nodes. Typically a | |||
| downstream transport might feed the information back somehow to the | downstream transport might feed the information back somehow to the | |||
| point upstream of the congestion that can regulate the load on the | point upstream of the congestion that can regulate the load on the | |||
| congested resource, but other actions are possible (see [RFC3168] | congested resource, but other actions are possible (see [RFC3168] | |||
| S.6). In terms of the above unicast scenario, ECN is typically | S.6). In terms of the above unicast scenario, ECN is typically | |||
| intended to create an information channel from 'M' to 'B', for 'B' to | intended to create an information channel from 'M' to 'B' (for 'B' to | |||
| forward to 'A'. Therefore the goals of IPsec and ECN are mutually | feed back to 'A'). Therefore the goals of IPsec and ECN are mutually | |||
| incompatible. | incompatible. | |||
| With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, | With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, | |||
| "controls are provided to manage the bandwidth of this [covert] | "controls are provided to manage the bandwidth of this [covert] | |||
| channel". Using the ECN processing rules of RFC4301, the channel | channel". Using the ECN processing rules of RFC4301, the channel | |||
| bandwidth is two bits per datagram from 'A' to 'M' and one bit per | bandwidth is two bits per datagram from 'A' to 'M' and one bit per | |||
| datagram from 'M' to 'A' (because 'E' limits the combinations of the | datagram from 'M' to 'A' (because 'E' limits the combinations of the | |||
| 2-bit ECN field that it will copy). In both cases the covert channel | 2-bit ECN field that it will copy). In both cases the covert channel | |||
| bandwidth is further reduced by noise from any real congestion | bandwidth is further reduced by noise from any real congestion | |||
| marking. RFC4301 therefore implies that these covert channels are | marking. RFC4301 therefore implies that these covert channels are | |||
| skipping to change at page 10, line 12 | skipping to change at page 10, line 30 | |||
| by copying into the outer header on encapsulation and copying from | by copying into the outer header on encapsulation and copying from | |||
| the outer header on decapsulation. | the outer header on decapsulation. | |||
| The pipe model: where the outer header is independent of that in the | The pipe model: where the outer header is independent of that in the | |||
| inner header so it hides the Diffserv field of the inner header | inner header so it hides the Diffserv field of the inner header | |||
| from any interaction with nodes along the tunnel. | from any interaction with nodes along the tunnel. | |||
| However, for ECN, the new IPsec security architecture in RFC4301 only | However, for ECN, the new IPsec security architecture in RFC4301 only | |||
| standardised one tunnelling model equivalent to the uniform model. | standardised one tunnelling model equivalent to the uniform model. | |||
| It deemed that simplicity was more important than allowing | It deemed that simplicity was more important than allowing | |||
| administrators the option of a tiny increment in security especially | administrators the option of a tiny increment in security, especially | |||
| given not copying congestion indications could seriously harm | given not copying congestion indications could seriously harm | |||
| everyone's network service. | everyone's network service. | |||
| 3.2. Control Constraints | 3.2. Control Constraints | |||
| Congestion control requires that any congestion notification marked | Congestion control requires that any congestion notification marked | |||
| into packets by a resource will be able to traverse a feedback loop | into packets by a resource will be able to traverse a feedback loop | |||
| back to a node capable of controlling the load on that resource. To | back to a function capable of controlling the load on that resource. | |||
| be precise, rather than calling this node the data source, we will | To be precise, rather than calling this function the data source, we | |||
| call it the Load Regulator. This will allow us to deal with | will call it the Load Regulator. This will allow us to deal with | |||
| exceptional cases where load is not regulated by the data source, but | exceptional cases where load is not regulated by the data source, but | |||
| usually the two terms will be synonymous. Note the term "a node | usually the two terms will be synonymous. Note the term "a function | |||
| _capable of_ controlling the load" deliberately includes a source | _capable of_ controlling the load" deliberately includes a source | |||
| application that doesn't actually control the load but ought to (e.g. | application that doesn't actually control the load but ought to (e.g. | |||
| an application without congestion control that uses UDP). | an application without congestion control that uses UDP). | |||
| A--->R--->I=========>M=========>E-------->B | A--->R--->I=========>M=========>E-------->B | |||
| Figure 2: Simple Tunnel Scenario | Figure 2: Simple Tunnel Scenario | |||
| We now consider a similar tunnelling scenario to the IPsec one just | We now consider a similar tunnelling scenario to the IPsec one just | |||
| described, but without the different security domains so we can just | described, but without the different security domains so we can just | |||
| skipping to change at page 11, line 14 | skipping to change at page 11, line 37 | |||
| congestion occurred across a tunnel or upstream of it. If outer | congestion occurred across a tunnel or upstream of it. If outer | |||
| header congestion marking was reset by the tunnel ingress ('I'), at | header congestion marking was reset by the tunnel ingress ('I'), at | |||
| the end of a tunnel ('E') the outer headers would indicate congestion | the end of a tunnel ('E') the outer headers would indicate congestion | |||
| experienced across the tunnel ('I' to 'E'), while the inner header | experienced across the tunnel ('I' to 'E'), while the inner header | |||
| would indicate congestion upstream of 'I'. But similar information | would indicate congestion upstream of 'I'. But similar information | |||
| can be gleaned even if the tunnel ingress copies the inner to the | can be gleaned even if the tunnel ingress copies the inner to the | |||
| outer headers. At the end of the tunnel ('E'), any packet with an | outer headers. At the end of the tunnel ('E'), any packet with an | |||
| _extra_ mark in the outer header relative to the inner header | _extra_ mark in the outer header relative to the inner header | |||
| indicates congestion across the tunnel ('I' to 'E'), while the inner | indicates congestion across the tunnel ('I' to 'E'), while the inner | |||
| header would still indicate congestion upstream of ('I'). Appendix B | header would still indicate congestion upstream of ('I'). Appendix B | |||
| gives a more precise method for inferring the congestion level | gives a simple and precise method for a tunnel egress to infer the | |||
| introduced across a tunnel. | congestion level introduced across a tunnel. | |||
| All this shows that 'E' can preserve the control loop irrespective of | All this shows that 'E' can preserve the control loop irrespective of | |||
| whether 'I' copies congestion notification into the outer header or | whether 'I' copies congestion notification into the outer header or | |||
| resets it. | resets it. | |||
| That is the situation for existing control arrangements but, because | That is the situation for existing control arrangements but, because | |||
| copying reveals more information, it would open up possibilities for | copying reveals more information, it would open up possibilities for | |||
| better control system designs. For instance, Appendix A describes | better control system designs. For instance, Appendix A describes | |||
| how resetting CE marking at a tunnel ingress confuses a proposed | how resetting CE marking at a tunnel ingress confuses a proposed | |||
| congestion marking scheme on the standards track. It ends up | congestion marking scheme on the standards track. It ends up | |||
| removing excessive amounts of traffic unnecessarily. Whereas copying | removing excessive amounts of traffic unnecessarily. Whereas copying | |||
| CE markings at ingress leads to the correct control behaviour. | CE markings at ingress leads to the correct control behaviour. | |||
| 3.3. Management Constraints | 3.3. Management Constraints | |||
| As well as control, there are also management constraints. | As well as control, there are also management constraints. | |||
| Specifically, a management system may monitor congestion markings in | Specifically, a management system may monitor congestion markings in | |||
| passing packets, perhaps at the border between networks as part of a | passing packets, perhaps at the border between networks as part of a | |||
| service level agreement. For instance, monitors at the borders of | service level agreement. For instance, monitors at the borders of | |||
| autonomous systems may need to measure how much congestion has | autonomous systems may need to measure how much congestion has | |||
| accumulated since the original source to determine between them how | accumulated since the original source, perhaps to determine between | |||
| much of the congestion is contributed by each domain. | them how much of the congestion is contributed by each domain. | |||
| Therefore, when monitoring the middle of a path, it should be | Therefore, when monitoring the middle of a path, it should be | |||
| possible to establish how far back in the path congestion markings | possible to establish how far back in the path congestion markings | |||
| have accumulated from. In this document we term this the baseline of | have accumulated from. In this document we term this the baseline of | |||
| congestion marking (or the Congestion Baseline), i.e. the source of | congestion marking (or the Congestion Baseline), i.e. the source of | |||
| the layer that last reset (or created) the congestion notification | the layer that last reset (or created) the congestion notification | |||
| field. Given some tunnels cross domain borders (e.g. consider M in | field. Given some tunnels cross domain borders (e.g. consider M in | |||
| Figure 2 is monitoring a border), it would therefore be desirable for | Figure 2 is monitoring a border), it would therefore be desirable for | |||
| 'I' to copy congestion accumulated so far into the outer headers | 'I' to copy congestion accumulated so far into the outer headers | |||
| exposed across the tunnel. | exposed across the tunnel. | |||
| Appendix D discusses various scenarios where the Load Regulator lies | Appendix D discusses various scenarios where the Load Regulator lies | |||
| in-path, not at the source host as we would typically expect. It | in-path, not at the source host as we would typically expect. It | |||
| concludes that a Congestion Baseline is determined by where the Load | concludes that a Congestion Baseline is determined by where the Load | |||
| Regulator function is, which should be identified in the transport | Regulator function is, which should be identified in the transport | |||
| layer, not by addresses in network layer headers. This applies | layer, not by addresses in network layer headers. This applies | |||
| whether the Load Regulator is at the source host or within the path. | whether the Load Regulator is at the source host or within the path. | |||
| The appendix also discusses where a Load Regulator function should be | The appendix also discusses where a Load Regulator function should be | |||
| located relative to a local encapsulation function. | located relative to a local tunnel encapsulation function. | |||
| 4. Design Principles | 4. Design Principles | |||
| The constraints from the three perspectives of security, control and | The constraints from the three perspectives of security, control and | |||
| management in Section 3 are somewhat in tension as to whether a | management in Section 3 are somewhat in tension as to whether a | |||
| tunnel ingress should copy congestion markings into the outer header | tunnel ingress should copy congestion markings into the outer header | |||
| it creates or reset them. From the control perspective either | it creates or reset them. From the control perspective either | |||
| copying or resetting works for existing arrangements, but copying has | copying or resetting works for existing arrangements, but copying has | |||
| more potential for simplifying control. From the management | more potential for simplifying control. From the management | |||
| perspective copying is preferable. From the security perspective | perspective copying is preferable. From the security perspective | |||
| skipping to change at page 15, line 45 | skipping to change at page 16, line 20 | |||
| 2-bit ECN field of the arriving IP header into the outer | 2-bit ECN field of the arriving IP header into the outer | |||
| encapsulating IP header, for all types of IP in IP tunnel. This | encapsulating IP header, for all types of IP in IP tunnel. This | |||
| encapsulation behaviour MUST only be used if the tunnel ingress is in | encapsulation behaviour MUST only be used if the tunnel ingress is in | |||
| `normal state'. A `compatibility state' with a different | `normal state'. A `compatibility state' with a different | |||
| encapsulation behaviour is also specified in Section 6 for backward | encapsulation behaviour is also specified in Section 6 for backward | |||
| compatibility with legacy tunnel egresses that do not understand ECN. | compatibility with legacy tunnel egresses that do not understand ECN. | |||
| To decapsulate the inner header at the tunnel egress, a compliant | To decapsulate the inner header at the tunnel egress, a compliant | |||
| tunnel egress MUST set the outgoing ECN field to the codepoint at the | tunnel egress MUST set the outgoing ECN field to the codepoint at the | |||
| intersection of the appropriate incoming inner header (row) and outer | intersection of the appropriate incoming inner header (row) and outer | |||
| header (column) in Table 1. | header (column) in Figure 3. | |||
| +--Incoming Outer Header--- | ||||
| +---------------------------------------------+ | ||||
| | Incoming Outer Header | | ||||
| +---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | |||
| | Header | | | | | | | Header | | | | | | |||
| +---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| | Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) | | | Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) | | |||
| | ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE | | | ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE | | |||
| | ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE | | | ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE | | |||
| | CE | CE | CE | CE (!!!) | CE | | | CE | CE | CE | CE (!!!) | CE | | |||
| +---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| | Outgoing Header | | ||||
| +---------------------------------------------+ | ||||
| +-----Outgoing Header------ | Figure 3: IP in IP Decapsulation | |||
| Table 1: IP in IP Decapsulation | ||||
| The exclamation marks '(!!!)' in Table 1 indicate that this | The exclamation marks '(!!!)' in Figure 3 indicate that this | |||
| combination of inner and outer headers should not be possible if only | combination of inner and outer headers should not be possible if only | |||
| legal transitions have taken place. So, the decapsulator should drop | legal transitions have taken place. So, the decapsulator should drop | |||
| or mark the ECN field as the table specifies, but it MAY also raise | or mark the ECN field as the table specifies, but it MAY also raise | |||
| an appropriate alarm. It MUST NOT raise an alarm so often that the | an appropriate alarm. It MUST NOT raise an alarm so often that the | |||
| illegal combinations would amplify into a flood of alarm messages. | illegal combinations would amplify into a flood of alarm messages. | |||
| 6. Backward Compatibility | 6. Backward Compatibility | |||
| Note: in RFC3168, a tunnel was in one of two modes: limited | Note: in RFC3168, a tunnel was in one of two modes: limited | |||
| functionality or full functionality. Rather than working with modes | functionality or full functionality. Rather than working with modes | |||
| skipping to change at page 22, line 24 | skipping to change at page 22, line 42 | |||
| design of alternate forms of tunnel processing of congestion | design of alternate forms of tunnel processing of congestion | |||
| notification, if required for specific Diffserv PHBs or for other | notification, if required for specific Diffserv PHBs or for other | |||
| lower layer encapsulating protocols that might support congestion | lower layer encapsulating protocols that might support congestion | |||
| notification in the future. | notification in the future. | |||
| 11. Acknowledgements | 11. Acknowledgements | |||
| Thanks to David Black for explaining a better way to think about | Thanks to David Black for explaining a better way to think about | |||
| function placement and to Louise Burness for a better way to think | function placement and to Louise Burness for a better way to think | |||
| about multilayer transports and networks, having read | about multilayer transports and networks, having read | |||
| [Patterns_Arch]. Also thanks to Arnaud Jacquet for ideas behind the | [Patterns_Arch]. Also thanks to Arnaud Jacquet for the idea for | |||
| algorithms in Appendix B. Thanks to Bruce Davie, Toby Moncaster, | Appendix B. Thanks to Bruce Davie, Toby Moncaster, Gorry Fairhurst, | |||
| Gorry Fairhurst, Sally Floyd, Alfred Hoenes and Gabriele Corliano for | Sally Floyd, Alfred Hoenes and Gabriele Corliano for their thoughts | |||
| their thoughts and careful review comments. | and careful review comments. | |||
| Bob Briscoe is partly funded by Trilogy, a research project (ICT- | ||||
| 216372) supported by the European Community under its Seventh | ||||
| Framework Programme. The views expressed here are those of the | ||||
| author only. | ||||
| 12. Comments Solicited | 12. Comments Solicited | |||
| Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
| addressed to the IETF Transport Area working group mailing list | addressed to the IETF Transport Area working group mailing list | |||
| <tsvwg@ietf.org>, and/or to the authors. | <tsvwg@ietf.org>, and/or to the authors. | |||
| 13. References | 13. References | |||
| 13.1. Normative References | 13.1. Normative References | |||
| skipping to change at page 23, line 14 | skipping to change at page 23, line 35 | |||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
| RFC 3168, September 2001. | RFC 3168, September 2001. | |||
| [RFC4301] Kent, S. and K. Seo, "Security Architecture for the | [RFC4301] Kent, S. and K. Seo, "Security Architecture for the | |||
| Internet Protocol", RFC 4301, December 2005. | Internet Protocol", RFC 4301, December 2005. | |||
| 13.2. Informative References | 13.2. Informative References | |||
| [I-D.eardley-pcn-marking-behaviour] | ||||
| Eardley, P., "Marking behaviour of PCN-nodes", | ||||
| draft-eardley-pcn-marking-behaviour-01 (work in progress), | ||||
| June 2008. | ||||
| [I-D.ietf-pcn-architecture] | [I-D.ietf-pcn-architecture] | |||
| Eardley, P., "Pre-Congestion Notification Architecture", | Eardley, P., "Pre-Congestion Notification (PCN) | |||
| draft-ietf-pcn-architecture-03 (work in progress), | Architecture", draft-ietf-pcn-architecture-07 (work in | |||
| February 2008. | progress), September 2008. | |||
| [I-D.ietf-pcn-marking-behaviour] | ||||
| Eardley, P., "Marking behaviour of PCN-nodes", | ||||
| draft-ietf-pcn-marking-behaviour-00 (work in progress), | ||||
| October 2008. | ||||
| [I-D.ietf-pwe3-congestion-frmwk] | [I-D.ietf-pwe3-congestion-frmwk] | |||
| Bryant, S., Davie, B., Martini, L., and E. Rosen, | Bryant, S., Davie, B., Martini, L., and E. Rosen, | |||
| "Pseudowire Congestion Control Framework", | "Pseudowire Congestion Control Framework", | |||
| draft-ietf-pwe3-congestion-frmwk-01 (work in progress), | draft-ietf-pwe3-congestion-frmwk-01 (work in progress), | |||
| May 2008. | May 2008. | |||
| [I-D.moncaster-pcn-3-state-encoding] | [I-D.moncaster-pcn-3-state-encoding] | |||
| Moncaster, T., Briscoe, B., and M. Menth, "A three state | Moncaster, T., Briscoe, B., and M. Menth, "A three state | |||
| extended PCN encoding scheme", | extended PCN encoding scheme", | |||
| skipping to change at page 25, line 16 | skipping to change at page 25, line 39 | |||
| (Expired) | (Expired) | |||
| Appendix A. Why resetting CE on encapsulation harms PCN | Appendix A. Why resetting CE on encapsulation harms PCN | |||
| Regarding encapsulation, the section of the PCN architecture | Regarding encapsulation, the section of the PCN architecture | |||
| [I-D.ietf-pcn-architecture] on tunnelling says that header copying | [I-D.ietf-pcn-architecture] on tunnelling says that header copying | |||
| (RFC4301) allows PCN to work correctly. However, resetting CE | (RFC4301) allows PCN to work correctly. However, resetting CE | |||
| markings confuses PCN marking. | markings confuses PCN marking. | |||
| The specific issue here concerns PCN excess rate marking | The specific issue here concerns PCN excess rate marking | |||
| [I-D.eardley-pcn-marking-behaviour], i.e. the bulk marking of traffic | [I-D.ietf-pcn-marking-behaviour], i.e. the bulk marking of traffic | |||
| that exceeds a configured threshold rate. One of the goals of excess | that exceeds a configured threshold rate. One of the goals of excess | |||
| rate marking is to enable the speedy removal of excess admission | rate marking is to enable the speedy removal of excess admission | |||
| controlled traffic following re-routes caused by link failures or | controlled traffic following re-routes caused by link failures or | |||
| other disasters. This maintains a share of the capacity for | other disasters. This maintains a share of the capacity for | |||
| competing admission controlled traffic and for traffic in lower | competing admission controlled traffic and for traffic in lower | |||
| priority classes. After failures, traffic re-routed onto remaining | priority classes. After failures, traffic re-routed onto remaining | |||
| links can often stress multiple links along a path. Therefore, | links can often stress multiple links along a path. Therefore, | |||
| traffic can arrive at a link under stress with some proportion | traffic can arrive at a link under stress with some proportion | |||
| already marked for removal by a previous link. By design, marked | already marked for removal by a previous link. By design, marked | |||
| traffic will be removed by the overall system in subsequent round | traffic will be removed by the overall system in subsequent round | |||
| trips. So when the excess rate marking algorithm decides how much | trips. So when the excess rate marking algorithm decides how much | |||
| traffic to mark for removal, it doesn't include traffic already | traffic to mark for removal, it doesn't include traffic already | |||
| marked for removal by another node upstream (the `Excess traffic | marked for removal by another node upstream (the `Excess traffic | |||
| meter function' of [I-D.eardley-pcn-marking-behaviour]). | meter function' of [I-D.ietf-pcn-marking-behaviour]). | |||
| However, if an RFC3168 tunnel ingress intervenes, it resets the ECN | However, if an RFC3168 tunnel ingress intervenes, it resets the ECN | |||
| field in all the outer headers, hiding all the evidence of problems | field in all the outer headers, hiding all the evidence of problems | |||
| upstream. Thus, although excess rate marking works fine with RFC4301 | upstream. Thus, although excess rate marking works fine with RFC4301 | |||
| IPsec tunnels, with RFC3168 tunnels it typically removes large | IPsec tunnels, with RFC3168 tunnels it typically removes large | |||
| volumes of traffic that it didn't need to remove at all. | volumes of traffic that it didn't need to remove at all. | |||
| Appendix B. Contribution to Congestion across a Tunnel | Appendix B. Contribution to Congestion across a Tunnel | |||
| This specification mandates that a tunnel ingress determines the ECN | This specification mandates that a tunnel ingress determines the ECN | |||
| field of each new outer tunnel header by copying the arriving header. | field of each new outer tunnel header by copying the arriving header. | |||
| If instead the outer ECN field were reset at a tunnel ingress (as it | Concern has been expressed that this will make it difficult for the | |||
| was for the full functionality mode of RFC3168), it would be possible | tunnel egress to monitor congestion introduced along a tunnel, which | |||
| for the tunnel egress to measure: | is easy if the outer ECN field is reset at a tunnel ingress (RFC3168 | |||
| full functionality mode). However, in fact copying CE marks at | ||||
| o congestion marking before the tunnel ingress (fraction of inner | ingress will still make it easy for the egress to measure congestion | |||
| header markings, p_i); | introduced across a tunnel, as illustrated below. | |||
| o congestion marking across the tunnel (fraction of outer header | ||||
| markings, p_t); | ||||
| o congestion marking after the tunnel egress (fraction of departing | Consider 100 packets measured at the egress. It measures that 30 are | |||
| header markings, p_o). | CE marked in the inner and outer headers and 12 have additional CE | |||
| marks in the outer but not the inner. This means packets arriving at | ||||
| the ingress had already experienced 30% congestion. However, it does | ||||
| not mean there was 12% congestion across the tunnel. The correct | ||||
| calculation of congestion across the tunnel is p_t = 12/(100-30) = | ||||
| 12/70 = 17%. This is easy for the egress to to measure. It is the | ||||
| packets with additional CE marking in the outer header (12) as a | ||||
| proportion of packets not marked in the inner header (70). | ||||
| Although the newly mandated copying behaviour at ingress gains the | Figure 4 illustrates this in a combinatorial probability diagram. | |||
| advantages described in the body of this specification, this one | The square represents 100 packets. The 30% division along the bottom | |||
| advantage of the resetting behaviour of RFC3168 seems to have been | represents marking before the ingress, and the p_t division up the | |||
| lost: on first impressions, it seems that the egress can no longer | side represents marking along the tunnel. | |||
| accurately measure congestion contributed along the tunnel (p_t). | ||||
| The egress could _estimate _the contribution along the tunnel by | ||||
| measure which packets carry only a mark in the outer header (not the | ||||
| inner). But this is not precisely the same as the congestion | ||||
| contributed along the tunnel; tunnel nodes may have tried to mark | ||||
| some packets that already had a marking in both the inner and outer | ||||
| header. Measuring only additional outer markings will miss these. | ||||
| Nonetheless, with the newly proposed scheme, a tunnel egress can | ||||
| derive a precise estimate of marking introduced across a tunnel (p_t) | ||||
| as follows. | ||||
| The combined fraction of markings at the tunnel egress will be p_o = | +-----+---------+100% | |||
| 1 - (1 - p_i)(1 - p_t). Explanation: this is (1 - the probability a | | | | | |||
| departing packet is not marked), which is (1 - (prob not marked | | 30 | | | |||
| before tunnel)(prob not marked along tunnel)). Therefore, | | | | The large square | |||
| rearranging, the egress can infer the fraction of marks introduced | | +---------+p_t represents 100 packets | |||
| across the tunnel as p_t = (p_o - p_i)/(1 - p_i). If arriving | | | 12 | | |||
| congestion is low (p_i <<1), then the approximation p_t ~ (p_o - p_i) | +-----+---------+0 | |||
| should be good enough. This is the estimate we advised originally; | 0 30% 100% | |||
| i.e. measuring only the extra markings in the outer header that are | inner header marking | |||
| not present in the inner header. If a better approximation is needed | ||||
| p_t ~ (p_o - p_i)(1 + p_i), which removes the division, but still | ||||
| assumes p_i<<1. | ||||
| Using any of these formulae (including the precise one), it would be | Figure 4: Tunnel Marking of Packets Already Marked at Ingress | |||
| possible for a tunnel egress to calculate a moving average of the | ||||
| fraction of packets being marked by tunnel nodes, including those | ||||
| already marked in the inner header. Alternatively, it should even be | ||||
| possible for a tunnel egress to reverse engineer which packets would | ||||
| have been marked across the tunnel if CE was reset on ingress even if | ||||
| CE was actually copied on ingress.[[anchor3: Note from Bob: I've | ||||
| worked out an algorithm so the tunnel egress can reverse engineer | ||||
| marking as if CE was reset at the ingress even though CE was copied | ||||
| at the ingress. It typically consumes 2 cycles / pkt, occasionally 4 | ||||
| and very occasionally 8. {ToDo: On testing an implementation just now | ||||
| it still has a wrinkle in it, but with a little more development I | ||||
| believe it would work well. I'll write it into the next revision if | ||||
| I get it working.}]] | ||||
| Appendix C. Ideal Decapsulation Rules | Appendix C. Ideal Decapsulation Rules | |||
| Compliance with this appendix is NOT REQUIRED for compliance with the | This appendix is not normative. Compliance with this appendix is NOT | |||
| present specification. | REQUIRED for compliance with the present specification. | |||
| If the default ECN encapsulation behaviour does not offer suitable | If the default ECN encapsulation behaviour does not offer suitable | |||
| trade offs, procedures exist for associating a new behaviour with a | trade offs, procedures exist for associating a new behaviour with a | |||
| new Diffserv PHB. However, it is unrealistic to expect vendors of | new Diffserv PHB. However, it is unrealistic to expect vendors of | |||
| all IPSec and all IP in IP tunnel endpoints to cater for the | all IPSec and all IP in IP tunnel endpoints to cater for the | |||
| exceptional behaviour of PHB XXX. If all tunnels did require XXX- | exceptional behaviour of PHB XXX. If all tunnels did require XXX- | |||
| specific behaviour, the resulting patchy and error-prone deployment | specific behaviour, the resulting patchy and error-prone deployment | |||
| would probably cause XXX to suffer byzantine feature interactions | would probably cause XXX to suffer byzantine feature interactions | |||
| with poorly implemented tunnels. The default rules for tunnel | with poorly implemented tunnels. The default rules for tunnel | |||
| endpoints to handle both the Diffserv field and the ECN field should | endpoints to handle both the Diffserv field and the ECN field should | |||
| skipping to change at page 27, line 42 | skipping to change at page 28, line 7 | |||
| marking) [I-D.ietf-pcn-architecture]. The aim is for the first level | marking) [I-D.ietf-pcn-architecture]. The aim is for the first level | |||
| of marking to stop admitting new traffic and the second level to | of marking to stop admitting new traffic and the second level to | |||
| terminate sufficient existing flows to bring a network back to its | terminate sufficient existing flows to bring a network back to its | |||
| operating point after a serious failure. | operating point after a serious failure. | |||
| Although the ECN field gives sufficient codepoints for these three | Although the ECN field gives sufficient codepoints for these three | |||
| states, the PCN working group cannot use them in case any tunnel | states, the PCN working group cannot use them in case any tunnel | |||
| decapsulations occur within a PCN region. If a node in a tunnel sets | decapsulations occur within a PCN region. If a node in a tunnel sets | |||
| the ECN field to ECT(0) or ECT(1), this change will be discarded by a | the ECN field to ECT(0) or ECT(1), this change will be discarded by a | |||
| tunnel egress compliant with RFC4301 and RFC3168. This can be seen | tunnel egress compliant with RFC4301 and RFC3168. This can be seen | |||
| in Table 1, where the ECT values in the outer header are ignored | in Figure 3, where the ECT values in the outer header are ignored | |||
| unless the inner header is the same. Effectively the ECT(0) and | unless the inner header is the same. Effectively the ECT(0) and | |||
| ECT(1) codepoints have to be treated as just one codepoint when they | ECT(1) codepoints have to be treated as just one codepoint when they | |||
| could otherwise have been used for their intended purpose of | could otherwise have been used for their intended purpose of | |||
| congestion notification. Instead, the PCN w-g has had to propose | congestion notification. Instead, the PCN w-g has had to propose | |||
| using extra Diffserv codepoint(s) to encode the extra states | using extra Diffserv codepoint(s) to encode the extra states | |||
| [I-D.moncaster-pcn-3-state-encoding], using up the rapidly exhausting | [I-D.moncaster-pcn-3-state-encoding], using up the rapidly exhausting | |||
| DSCP space while leaving ECN codepoints unused. | DSCP space while leaving ECN codepoints unused. | |||
| Although this is currently most pressing for the PCN working group, | Although this is currently most pressing for the PCN working group, | |||
| the issue is more general. Under Security Considerations (Section 9) | the issue is more general. Under Security Considerations (Section 9) | |||
| skipping to change at page 28, line 17 | skipping to change at page 28, line 31 | |||
| More generally, the currently standardised tunnel decapsulation | More generally, the currently standardised tunnel decapsulation | |||
| behaviour unnecessarily wastes a quarter of two bits (i.e. half a | behaviour unnecessarily wastes a quarter of two bits (i.e. half a | |||
| bit) in the IP (v4 & v6) header. As explained in Section 3.1, the | bit) in the IP (v4 & v6) header. As explained in Section 3.1, the | |||
| original reason for not copying down outer ECT codepoints for onward | original reason for not copying down outer ECT codepoints for onward | |||
| forwarding was to limit the covert channel across a decapsulator to 1 | forwarding was to limit the covert channel across a decapsulator to 1 | |||
| bit per packet. However, now that the IETF Security Area has deemed | bit per packet. However, now that the IETF Security Area has deemed | |||
| that a 2-bit covert channel through an encapsulator is a manageable | that a 2-bit covert channel through an encapsulator is a manageable | |||
| risk, the same should be true for a decapsulator. | risk, the same should be true for a decapsulator. | |||
| Table 2 proposes a more ideal layered decapsulation behaviour. Note: | Figure 5 proposes a more ideal layered decapsulation behaviour. | |||
| this table is only to support discussion. It is not currently | Note: this table is only to support discussion. It is not currently | |||
| proposed for standards action. The only difference from Table 1 | proposed for standards action. The only difference from Figure 3 | |||
| (that is proposed for standards action), is the swapping of the cells | (that is proposed for standards action), is the swapping of the cells | |||
| highlighted as *ECT(X)*. | highlighted as *ECT(X)*. | |||
| +--Incoming Outer Header--- | +---------------------------------------------+ | |||
| | Incoming Outer Header | | ||||
| +---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | |||
| | Header | | | | | | | Header | | | | | | |||
| +---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| | Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) | | | Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) | | |||
| | ECT(0) | ECT(0) | ECT(0) | *ECT(1)* | CE | | | ECT(0) | ECT(0) | ECT(0) | *ECT(1)* | CE | | |||
| | ECT(1) | ECT(1) | *ECT(0)* | ECT(1) | CE | | | ECT(1) | ECT(1) | *ECT(0)* | ECT(1) | CE | | |||
| | CE | CE | CE | CE (!!!) | CE | | | CE | CE | CE | CE (!!!) | CE | | |||
| +---------------------+---------+-----------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| | Outgoing Header | | ||||
| +---------------------------------------------+ | ||||
| +-----Outgoing Header------ | Figure 5: Ideal IP in IP Decapsulation (currently informative, not | |||
| normative) | ||||
| Table 2: Ideal IP in IP Decapsulation (currently NOT REQUIRED) | ||||
| Note that, if this ideal proposal were taken up, extra backwards | Note that, if this ideal proposal were taken up, a tunnel egress | |||
| compatibility issues would have to be resolved. | complying with it would be backwards compatible with all previous | |||
| specifications for encapsulation of ECN at the ingress (RFC4301, both | ||||
| modes of RFC3168, both modes of RFC2481 and RFC2003). In comparison | ||||
| with an RFC3168 or RFC4301 tunnel egress, it would require no | ||||
| additional configuration at the ingress nor any additional | ||||
| negotiation with the ingress. The only new issue would be the burden | ||||
| of an extra standard to be compliant with, adding to the already | ||||
| complex history of ECN tunnelling RFCs. | ||||
| Appendix D. Non-Dependence of Tunnelling on In-path Load Regulation | Appendix D. Non-Dependence of Tunnelling on In-path Load Regulation | |||
| We have said that at any point in a network, the Congestion Baseline | We have said that at any point in a network, the Congestion Baseline | |||
| (where congestion notification starts from zero) should be the | (where congestion notification starts from zero) should be the | |||
| previous upstream Load Regulator. We have also said that the ingress | previous upstream Load Regulator. We have also said that the ingress | |||
| of an IP in IP tunnel must copy congestion indications to the | of an IP in IP tunnel must copy congestion indications to the | |||
| encapsulating outer headers it creates. If the Load Regulator is in- | encapsulating outer headers it creates. If the Load Regulator is in- | |||
| path rather than at the source, and also a tunnel ingress, these two | path rather than at the source, and also a tunnel ingress, these two | |||
| requirements seem to be contradictory. A tunnel ingress must not | requirements seem to be contradictory. A tunnel ingress must not | |||
| reset incoming congestion, but a Load Regulator must be the | reset incoming congestion, but a Load Regulator must be the | |||
| Congestion Baseline, implying it needs to reset incoming congestion. | Congestion Baseline, implying it needs to reset incoming congestion. | |||
| In fact, the two requirements are not contradictory, because a Load | In fact, the two requirements are not contradictory, because a Load | |||
| Regulator and a tunnel ingress are functions within a node that occur | Regulator and a tunnel ingress are functions within a node that | |||
| in sequence on a stream of packets, not at the same point. Figure 3 | typically occur in sequence on a stream of packets, not at the same | |||
| is borrowed from [RFC2983] (which was making a similar point about | point. Figure 6 is borrowed from [RFC2983] (which was making a | |||
| the location of Diffserv traffic conditioning relative to the | similar point about the location of Diffserv traffic conditioning | |||
| encapsulation function of a tunnel). An in-path Load Regulator can | relative to the encapsulation function of a tunnel). An in-path Load | |||
| act on packets either at [1 - Before] encapsulation or at [2 - Outer] | Regulator can act on packets either at [1 - Before] encapsulation or | |||
| after encapsulation. Load Regulation does not ever need to be | at [2 - Outer] after encapsulation. Load Regulation does not ever | |||
| integrated with the [Encapsulate] function (but it can be for | need to be integrated with the [Encapsulate] function (but it can be | |||
| efficiency). Therefore we can still maintain that the [Encapsulate] | for efficiency). Therefore we can still mandate that the | |||
| function always copies CE into the outer header. | [Encapsulate] function always copies CE into the outer header. | |||
| >>-----[1 - Before]--------[Encapsulate]----[3 - Inner]------------>> | >>-----[1 - Before]--------[Encapsulate]----[3 - Inner]---------->> | |||
| \ | \ | |||
| \ | \ | |||
| +--------[2 - Outer]--------->> | +--------[2 - Outer]------->> | |||
| Figure 3: Placement of In-Path Load Regulator Relative to Tunnel | Figure 6: Placement of In-Path Load Regulator Relative to Tunnel | |||
| Ingress | Ingress | |||
| Then separately, if there is a Load Regulator at location [2 - | Then separately, if there is a Load Regulator at location [2 - | |||
| Outer], it might reset CE to ECT(0), say. Then the Congestion | Outer], it might reset CE to ECT(0), say. Then the Congestion | |||
| Baseline for the lower layer (outer) will be [2 - Outer], while the | Baseline for the lower layer (outer) will be [2 - Outer], while the | |||
| Congestion Baseline of the inner layer will be unchanged. But how | Congestion Baseline of the inner layer will be unchanged. But how | |||
| encapsulation works has nothing to do with whether a Load Regulator | encapsulation works has nothing to do with whether a Load Regulator | |||
| is present or where it is. | is present or where it is. | |||
| If on the other hand a Load Regulator resets CE at [1 - Before], the | If on the other hand a Load Regulator resets CE at [1 - Before], the | |||
| skipping to change at page 30, line 12 | skipping to change at page 30, line 47 | |||
| desirable or practical for a node part way along the path to regulate | desirable or practical for a node part way along the path to regulate | |||
| the load. However, various reasonable proposals for in-path load | the load. However, various reasonable proposals for in-path load | |||
| regulation have been made from time to time (e.g. fair queuing, | regulation have been made from time to time (e.g. fair queuing, | |||
| traffic engineering, flow admission control). The IETF has recently | traffic engineering, flow admission control). The IETF has recently | |||
| chartered a working group to standardise admission control across a | chartered a working group to standardise admission control across a | |||
| part of a path using pre-congestion notification (PCN) [PCNcharter]. | part of a path using pre-congestion notification (PCN) [PCNcharter]. | |||
| This is of particular relevance here because it involves congestion | This is of particular relevance here because it involves congestion | |||
| notification with an in-path Load Regulator, it can involve | notification with an in-path Load Regulator, it can involve | |||
| tunnelling and it certainly involves encapsulation more generally. | tunnelling and it certainly involves encapsulation more generally. | |||
| We will use the more complex scenario in Figure 4 to tease out all | We will use the more complex scenario in Figure 7 to tease out all | |||
| the issues that arise when combining congestion notification and | the issues that arise when combining congestion notification and | |||
| tunnelling with various possible in-path load regulation schemes. In | tunnelling with various possible in-path load regulation schemes. In | |||
| this case 'I1' and 'E2' break up the path into three separate | this case 'I1' and 'E2' break up the path into three separate | |||
| congestion control loops. The feedback for these loops is shown | congestion control loops. The feedback for these loops is shown | |||
| going right to left across the top of the figure. The 'V's are arrow | going right to left across the top of the figure. The 'V's are arrow | |||
| heads representing the direction of feedback, not letters. But there | heads representing the direction of feedback, not letters. But there | |||
| are also two tunnels within the middle control loop: 'I1' to 'E1' and | are also two tunnels within the middle control loop: 'I1' to 'E1' and | |||
| 'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS | 'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS | |||
| core networks. M is a congestion monitoring point, perhaps between | core networks. M is a congestion monitoring point, perhaps between | |||
| two border routers where the same tunnel continues unbroken across | two border routers where the same tunnel continues unbroken across | |||
| the border. | the border. | |||
| ______ _______________________________________ _____ | ______ _______________________________________ _____ | |||
| / \ / \ / \ | / \ / \ / \ | |||
| V \ V M \ V \ | V \ V M \ V \ | |||
| A--->R--->I1===========>E1----->I2=========>==========>E2------->B | A--->R--->I1===========>E1----->I2=========>==========>E2------->B | |||
| Figure 4: complex Tunnel Scenario | Figure 7: complex Tunnel Scenario | |||
| The question is, should the congestion markings in the outer exposed | The question is, should the congestion markings in the outer exposed | |||
| headers of a tunnel represent congestion only since the tunnel | headers of a tunnel represent congestion only since the tunnel | |||
| ingress or over the whole upstream path from the source of the inner | ingress or over the whole upstream path from the source of the inner | |||
| header (whatever that may mean)? Or put another way, should 'I1' and | header (whatever that may mean)? Or put another way, should 'I1' and | |||
| 'I2' copy or reset CE markings? | 'I2' copy or reset CE markings? | |||
| Based on the design principles in Section 4, the answer is that the | Based on the design principles in Section 4, the answer is that the | |||
| Congestion Baseline should be the nearest upstream interface designed | Congestion Baseline should be the nearest upstream interface designed | |||
| to regulate traffic load--the Load Regulator. In Figure 4 'A', 'I1' | to regulate traffic load--the Load Regulator. In Figure 7 'A', 'I1' | |||
| or 'E2' are all Load Regulators. We have shown the feedback loops | or 'E2' are all Load Regulators. We have shown the feedback loops | |||
| returning to each of these nodes so that they can regulate the load | returning to each of these nodes so that they can regulate the load | |||
| causing the congestion notification. So the Congestion Baseline | causing the congestion notification. So the Congestion Baseline | |||
| exposed to M should be 'I1' (the Load Regulator), not 'I2'. | exposed to M should be 'I1' (the Load Regulator), not 'I2'. | |||
| Therefore I1 should reset any arriving CE markings. In this case, | Therefore I1 should reset any arriving CE markings. In this case, | |||
| 'I1' knows the tunnel to 'E1' is unrelated to its load regulation | 'I1' knows the tunnel to 'E1' is unrelated to its load regulation | |||
| function. So the load regulation function within 'I1' should be | function. So the load regulation function within 'I1' should be | |||
| placed at [1 - Before] tunnel encapsulation within 'I1' (using the | placed at [1 - Before] tunnel encapsulation within 'I1' (using the | |||
| terminology of Figure 3). Then the Congestion Baseline all across | terminology of Figure 6). Then the Congestion Baseline all across | |||
| the networks from 'I1' to 'E2' in both inner and outer headers will | the networks from 'I1' to 'E2' in both inner and outer headers will | |||
| be 'I1'. | be 'I1'. | |||
| The following further examples illustrate how this answer might be | The following further examples illustrate how this answer might be | |||
| applied: | applied: | |||
| o We argued in Appendix A that resetting CE on encapsulation could | o We argued in Appendix A that resetting CE on encapsulation could | |||
| harm PCN excess rate marking, which marks excess traffic for | harm PCN excess rate marking, which marks excess traffic for | |||
| removal in subsequent round trips. This marking relies on not | removal in subsequent round trips. This marking relies on not | |||
| marking packets if another node upstream has already marked them | marking packets if another node upstream has already marked them | |||
| for removal. If there were a tunnel ingress between the two which | for removal. If there were a tunnel ingress between the two which | |||
| reset CE markings, it would confuse the downstream node into | reset CE markings, it would confuse the downstream node into | |||
| marking far too much traffic for removal. So why do we say that | marking far too much traffic for removal. So why do we say that | |||
| 'I1' should reset CE, while a tunnel ingress shouldn't? The | 'I1' should reset CE, while a tunnel ingress shouldn't? The | |||
| answer is that it is the Load Regulator function at 'I1' that is | answer is that it is the Load Regulator function at 'I1' that is | |||
| resetting CE, not the tunnel encapsulator. The Load Regulator | resetting CE, not the tunnel encapsulator. The Load Regulator | |||
| needs to set itself as the Congestion Baseline, so the feedback it | needs to set itself as the Congestion Baseline, so the feedback it | |||
| gets will only be about congestion on links it can relieve itself | gets will only be about congestion on links it can relieve itself | |||
| by regulating the load into them. When it resets CE markings, it | (by regulating the load into them). When it resets CE markings, | |||
| knows that something else upstream will have dealt with the | it knows that something else upstream will have dealt with the | |||
| congestion notifications it removes, given it is part of an end- | congestion notifications it removes, given it is part of an end- | |||
| to-end admission control signalling loop. It therefore knows that | to-end admission control signalling loop. It therefore knows that | |||
| previous hops will be covered by other Load Regulators. | previous hops will be covered by other Load Regulators. | |||
| Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should | Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should | |||
| follow the new rule for any tunnel ingress and copy congestion | follow the new rule for any tunnel ingress and copy congestion | |||
| marking into the outer tunnel header. The ingress at 'I1' will | marking into the outer tunnel header. The ingress at 'I1' will | |||
| happen to copy headers that have already been reset just | happen to copy headers that have already been reset just | |||
| beforehand. But it doesn't need to know that. | beforehand. But it doesn't need to know that. | |||
| o [Shayman] suggested feedback of ECN accumulated across an MPLS | o [Shayman] suggested feedback of ECN accumulated across an MPLS | |||
| skipping to change at page 32, line 4 | skipping to change at page 32, line 41 | |||
| headers. Again, the tunnel encapsulation function at 'I' simply | headers. Again, the tunnel encapsulation function at 'I' simply | |||
| copies incoming headers, unaware that the load regulator will | copies incoming headers, unaware that the load regulator will | |||
| subsequently reset its outer headers. | subsequently reset its outer headers. | |||
| o The PWE3 working group of the IETF is considering the problem of | o The PWE3 working group of the IETF is considering the problem of | |||
| how and whether an aggregate edge-to-edge pseudo-wire emulation | how and whether an aggregate edge-to-edge pseudo-wire emulation | |||
| should respond to congestion [I-D.ietf-pwe3-congestion-frmwk]. | should respond to congestion [I-D.ietf-pwe3-congestion-frmwk]. | |||
| Although the study is still at the requirements stage, some | Although the study is still at the requirements stage, some | |||
| (controversial) solution proposals include in-path load regulation | (controversial) solution proposals include in-path load regulation | |||
| at the ingress to the tunnel that could lead to tunnel | at the ingress to the tunnel that could lead to tunnel | |||
| arrangements with similar complexity to that of Figure 4. | arrangements with similar complexity to that of Figure 7. | |||
| These are not contrived scenarios--they could be a lot worse. For | These are not contrived scenarios--they could be a lot worse. For | |||
| instance, a host may create a tunnel for IPsec which is placed inside | instance, a host may create a tunnel for IPsec which is placed inside | |||
| a tunnel for Mobile IP over a remote part of its path. And around | a tunnel for Mobile IP over a remote part of its path. And around | |||
| this all we may have MPLS labels being pushed and popped as packets | this all we may have MPLS labels being pushed and popped as packets | |||
| pass across different core networks. Similarly, it is possible that | pass across different core networks. Similarly, it is possible that | |||
| subnets could be built from link technology (e.g. future Ethernet | subnets could be built from link technology (e.g. future Ethernet | |||
| switches) so that link headers being added and removed could involve | switches) so that link headers being added and removed could involve | |||
| congestion notification in future Ethernet link headers with all the | congestion notification in future Ethernet link headers with all the | |||
| same issues as with IP in IP tunnels. | same issues as with IP in IP tunnels. | |||
| End of changes. 65 change blocks. | ||||
| 162 lines changed or deleted | 172 lines changed or added | |||
This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||