| draft-briscoe-tsvwg-ecn-tunnel-00.txt | draft-briscoe-tsvwg-ecn-tunnel-01.txt | |||
|---|---|---|---|---|
| Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
| Internet-Draft BT | Internet-Draft BT | |||
| Intended status: Standards Track June 30, 2007 | Intended status: Standards Track July 14, 2008 | |||
| Expires: January 1, 2008 | Expires: January 15, 2009 | |||
| Layered Encapsulation of Congestion Notification | Layered Encapsulation of Congestion Notification | |||
| draft-briscoe-tsvwg-ecn-tunnel-00 | draft-briscoe-tsvwg-ecn-tunnel-01 | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 34 | skipping to change at page 1, line 34 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on January 1, 2008. | This Internet-Draft will expire on January 15, 2009. | |||
| Copyright Notice | ||||
| Copyright (C) The IETF Trust (2007). | ||||
| Abstract | Abstract | |||
| This document redefines how the explicit congestion notification | This document redefines how the explicit congestion notification | |||
| (ECN) field of the outer IP header of a tunnel should be constructed. | (ECN) field of the outer IP header of a tunnel should be constructed. | |||
| It brings all IP in IP tunnels (v4 or v6) into line with the way | It brings all IP in IP tunnels (v4 or v6) into line with the way | |||
| IPsec tunnels now construct the ECN field, ensuring that the outer | IPsec tunnels now construct the ECN field. It includes a thorough | |||
| header reveals any congestion experienced so far on the path. It | analysis of the reasoning for this change and the implications. It | |||
| specifies the default ECN tunneling behaviour for any Diffserv per- | also gives guidelines on the encapsulation of IP congestion | |||
| hop behaviour (PHB), but also gives general principles to guide the | notification by any outer header, whether encapsulated in an IP | |||
| design of alternate congestion marking behaviours for specific PHBs | tunnel or in a lower layer header. Following these guidelines should | |||
| and for lower layer congestion notification schemes. | help interworking, if the IETF or other standards bodies specify any | |||
| new encapsulation of congestion notification. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 5 | 1.1. The Need for Rationalisation . . . . . . . . . . . . . . . 4 | |||
| 3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 6 | 1.2. Document Roadmap . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 6 | 1.3. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 7 | 2. Requirements Language . . . . . . . . . . . . . . . . . . . . 8 | |||
| 3.3. Management Constraints . . . . . . . . . . . . . . . . . . 8 | 3. Design Constraints . . . . . . . . . . . . . . . . . . . . . . 8 | |||
| 4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 9 | 3.1. Security Constraints . . . . . . . . . . . . . . . . . . . 8 | |||
| 5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 11 | 3.2. Control Constraints . . . . . . . . . . . . . . . . . . . 10 | |||
| 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 12 | 3.3. Management Constraints . . . . . . . . . . . . . . . . . . 11 | |||
| 7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 13 | 4. Design Principles . . . . . . . . . . . . . . . . . . . . . . 12 | |||
| 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 | 4.1. Design Guidelines for New Encapsulations of Congestion | |||
| 9. Security Considerations . . . . . . . . . . . . . . . . . . . 14 | Notification . . . . . . . . . . . . . . . . . . . . . . . 13 | |||
| 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 14 | 5. Default ECN Tunnelling Rules . . . . . . . . . . . . . . . . . 15 | |||
| 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 15 | 6. Backward Compatibility . . . . . . . . . . . . . . . . . . . . 16 | |||
| 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 15 | 7. Changes from Earlier RFCs . . . . . . . . . . . . . . . . . . 18 | |||
| 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 13.1. Normative References . . . . . . . . . . . . . . . . . . . 15 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 19 | |||
| 13.2. Informative References . . . . . . . . . . . . . . . . . . 16 | 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 21 | |||
| Appendix A. In-path Load Regulation . . . . . . . . . . . . . . . 17 | 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 20 | 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 22 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . . 21 | 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
| 13.1. Normative References . . . . . . . . . . . . . . . . . . . 22 | ||||
| 13.2. Informative References . . . . . . . . . . . . . . . . . . 23 | ||||
| Appendix A. Why resetting CE on encapsulation harms PCN . . . . . 25 | ||||
| Appendix B. Contribution to Congestion across a Tunnel . . . . . 25 | ||||
| Appendix C. Ideal Decapsulation Rules . . . . . . . . . . . . . . 27 | ||||
| Appendix D. Non-Dependence of Tunnelling on In-path Load | ||||
| Regulation . . . . . . . . . . . . . . . . . . . . . 28 | ||||
| D.1. Dependence of In-Path Load Regulation on Tunnelling . . . 29 | ||||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 32 | ||||
| Intellectual Property and Copyright Statements . . . . . . . . . . 34 | ||||
| Changes from previous drafts (to be removed by the RFC Editor) | ||||
| From -00 to -01: | ||||
| * Related everything conceptually to the uniform and pipe models | ||||
| of RFC2983 on Diffserv Tunnels, and completely removed the | ||||
| dependence of tunnelling behaviour on the presence of any in- | ||||
| path load regulation by using the [1 - Before] [2 - Outer] | ||||
| function placement concepts from RFC2983. | ||||
| * Added specifc cases where the existing standards limit new | ||||
| proposals. | ||||
| * Added sub-structure to Introduction (Need for Rationalisation, | ||||
| Roadmap), added new Introductory subsection on "Scope" and | ||||
| improved clarity | ||||
| * Added Design Guidelines for New Encapsulations of Congestion | ||||
| Notification | ||||
| * Considerably clarified the Backward Compatibility section | ||||
| * Considerably extended the Security Considerations section | ||||
| * Summarised the primary rationale much better in the conclusions | ||||
| * Added numerous extra acknowledgements | ||||
| * Added Appendix A. "Why resetting CE on encapsulation harms | ||||
| PCN", Appendix B. "Contribution to Congestion across a Tunnel" | ||||
| and Appendix C. "Ideal Decapsulation Rules" | ||||
| * Changed Appendix A "In-path Load Regulation" to "Non-Dependence | ||||
| of Tunnelling on In-path Load Regulation" and added sub-section | ||||
| on "Dependence of In-Path Load Regulation on Tunnelling" | ||||
| 1. Introduction | 1. Introduction | |||
| This document redefines how the explicit congestion notification | This document redefines how the explicit congestion notification | |||
| (ECN) field [RFC3168] of the outer IP header of a tunnel should be | (ECN) field [RFC3168] of the outer IP header of a tunnel should be | |||
| constructed. It brings all IP in IP tunnels (v4 or v6) into line | constructed. It brings all IP in IP tunnels (v4 or v6) into line | |||
| with the way IPsec tunnels [RFC4301] now construct the ECN field, | with the way IPsec tunnels [RFC4301] now construct the ECN field, | |||
| ensuring that the outer header reveals any congestion experienced so | ensuring that the outer header reveals any congestion experienced so | |||
| far on the path. Although this memo focuses on IP in IP tunnelling | far on the whole path, not just since the last tunnel ingress. | |||
| it also gives generalised advice for any encapsulation by lower layer | ||||
| headers. | ||||
| ECN allows a congested resource to notify the onset of congestion | ECN allows a congested resource to notify the onset of congestion | |||
| without having to drop packets, by explicitly marking a proportion of | without having to drop packets, by explicitly marking a proportion of | |||
| packets with the congestion experienced (CE) codepoint. Congestion | packets with the congestion experienced (CE) codepoint. Because | |||
| notification is unusual in that it propagates from the physical layer | congestion is exhaustion of a physical resource, if the transport | |||
| upwards to the transport layer, because congestion is exhaustion of a | layer is to deal with congestion, congestion notification must | |||
| physical resource. The transport layer can directly detect loss of a | propagate upwards; from the physical layer to the transport layer. | |||
| packet (or frame) by a lower layer. But if a lower layer marks a | The transport layer can directly detect loss of a packet (or frame) | |||
| packet (or frame) to notify incipient congestion, this marking has to | by a lower layer. But if a lower layer marks rather than drops a | |||
| be explicitly copied up the layers at every header decapsulation. | forward-travelling data packet (or frame) in order to notify | |||
| So, at each decapsulation of an outer (lower layer) header a | incipient congestion, this marking has to be explicitly copied up the | |||
| congestion marking has to be arranged to propagate into the forwarded | layers at every header decapsulation. So, at each decapsulation of | |||
| (upper layer) header. It must continue upwards until it reaches the | an outer (lower layer) header a congestion marking has to be arranged | |||
| destination transport, which should feed congestion notification back | to propagate into the forwarded (upper layer) header. It must | |||
| to the source transport. | continue upwards until it reaches the destination transport. Then | |||
| typically the destination feeds this congestion notification back to | ||||
| the source transport. Given encapsulation by lower layer headers is | ||||
| functionally similar to tunnelling, it is necessary to arrange | ||||
| similar propagation of congestion notification up the layers. For | ||||
| instance, ECN and its propagation up the layers has recently been | ||||
| specified for MPLS [RFC5129]. | ||||
| Note that often lower layer resources are arranged to be protected by | As packets pass up the layers, current specifications of | |||
| higher layer buffers, so instead of blocking occurring at the lower | decapsulation behaviours are largely all consistent and correct. | |||
| layer, it occurs when the higher layer queue overflows. Thus, non- | However, as packets pass down the layers, specifications of | |||
| blocking link and physical layer technologies do not have to | encapsulation behaviours are not consistent. This document is | |||
| implement congestion notification, which can be introduced solely in | primarily aimed at rationalising encapsulation. (Nevertheless, | |||
| IP layer active queue management (AQM). However, if we want to use | Appendix C explains why the consistency of decapsulation solutions | |||
| congestion notification, we have to arrange for it to be explicitly | will not last for long and proposes a fix to decapsulation rules as | |||
| copied up the layers when IP is tunnelled in IP (and if a particular | well. The IETF can then discuss whether to rationalise decapsulation | |||
| link layer technology isn't protected from blocking by network layer | at the same time as encapsulation.) | |||
| queues). | ||||
| 1.1. The Need for Rationalisation | ||||
| IPsec tunnel mode is a specific form of tunnelling that can hide the | IPsec tunnel mode is a specific form of tunnelling that can hide the | |||
| inner headers. Because the ECN field has to be mutable, it cannot be | inner headers. Because the ECN field has to be mutable, it cannot be | |||
| covered by IPsec encryption or authentication calculations. | covered by IPsec encryption or authentication calculations. | |||
| Therefore concern has been raised in the past that the ECN field | Therefore concern has been raised in the past that the ECN field | |||
| could be used as a low bandwidth covert channel to communicate with | could be used as a low bandwidth covert channel to communicate with | |||
| someone on the unprotected public Internet even if an end-host is | someone on the unprotected public Internet even if an end-host is | |||
| restricted to only communicate with the public Internet through an | restricted to only communicate with the public Internet through an | |||
| IPsec gateway. However, the recently updated version of IPsec | IPsec gateway. However, the updated version of IPsec [RFC4301] chose | |||
| [RFC4301] chose not to block this covert channel, deciding that the | not to block this covert channel, deciding that the threat could be | |||
| threat could be managed given the channel bandwidth is so limited | managed given the channel bandwidth is so limited (ECN is a 2-bit | |||
| (ECN is a 2-bit field). | field). | |||
| An unfortunate sequence of standards actions leading up to this | An unfortunate sequence of standards actions leading up to this | |||
| latest change in IPsec has left us with nearly the worst of all | latest change in IPsec has left us with nearly the worst of all | |||
| possible combinations of outcomes, despite the best endeavours of | possible combinations of outcomes, despite the best endeavours of | |||
| everyone concerned. Even though information about congestion | everyone concerned. The controversy has been over whether to reveal | |||
| experienced on the upstream path has various uses if it is revealed | information about congestion experienced on the path upstream of the | |||
| tunnel ingress. Even though this has various uses if it is revealed | ||||
| in the outer header of a tunnel, when ECN was standardised[RFC3168] | in the outer header of a tunnel, when ECN was standardised[RFC3168] | |||
| it was decided that all IP in IP tunnels should hide upstream | it was decided that all IP in IP tunnels should hide this upstream | |||
| congestion information simply to avoid the extra complexity of two | congestion simply to avoid the extra complexity of two different | |||
| different mechanisms for IPsec and non-IPsec tunnels. However, now | mechanisms for IPsec and non-IPsec tunnels. However, now that | |||
| that [RFC4301] IPsec tunnels deliberately no longer hide this | [RFC4301] IPsec tunnels deliberately no longer hide this information, | |||
| information, we are left in the perverse position where non-IPsec | we are left in the perverse position where non-IPsec tunnels still | |||
| tunnels still hide congestion information unnecessarily. This | hide congestion information unnecessarily. This document is designed | |||
| document is designed to correct that anomaly. | to correct that anomaly. | |||
| Specifically, RFC3168 says that, if a tunnel supports ECN (termed a | Specifically, RFC3168 says that, if a tunnel fully supports ECN | |||
| 'full-functionality' ECN tunnel), the tunnel ingress must not copy a | (termed a 'full-functionality' ECN tunnel in [RFC3168]), the tunnel | |||
| CE marking from the inner header into the outer header that it | ingress must not copy a CE marking from the inner header into the | |||
| creates. Instead the tunnel ingress has to set the ECN field of the | outer header that it creates. Instead the tunnel ingress has to set | |||
| outer header to ECT(0) (i.e. codepoint 10). We term this 'resetting' | the ECN field of the outer header to ECT(0) (i.e. codepoint 10). We | |||
| a CE codepoint. However, RFC4301 reverses this, stating that the | term this 'resetting' a CE codepoint. However, RFC4301 reverses | |||
| tunnel ingress must simply copy the ECN field from the inner to the | this, stating that the tunnel ingress must simply copy the ECN field | |||
| outer header. The main purpose of this document is to carry over | from the inner to the outer header. The main purpose of this | |||
| this new relaxed attitude to covert channels from IPsec to all IP in | document is to carry the new behaviour of IPsec over to all IP in IP | |||
| IP tunnels, so all tunnel ingress nodes consistently copy the ECN | tunnels, so all tunnel ingress nodes consistently copy the ECN field. | |||
| field. | ||||
| The rest of the document deals with the knock-on effects of this | Why does it matter if we have different ECN encapsulation behaviours | |||
| apparently minor change. It is organised as follows: | for IPsec and non-IPsec tunnels? The general argument is that | |||
| gratuitous inconsistency constrains the available design space and | ||||
| makes it harder to design networks and new protocols that work | ||||
| predictably. | ||||
| Already complicated constraints have had to be added to a standards | ||||
| track congestion marking proposal. The section of the pre-congestion | ||||
| notification (PCN) architecture [I-D.ietf-pcn-architecture] on | ||||
| tunnelling says PCN works correctly in the presence of RFC4301 IPsec | ||||
| encapsulation (and RFC5129 MPLS encapsulation). However it doesn't | ||||
| work with RFC3168 IP in IP encapsulation (Appendix A explains why). | ||||
| Section 3 assesses further security, control and management functions | ||||
| that cannot be achieved in each case (resetting vs copying CE | ||||
| markings). It finds that resetting CE makes life difficult in a | ||||
| number of directions, while copying CE harms nothing (other than | ||||
| opening a low bit-rate covert channel vulnerability which the | ||||
| Security Area deems is manageable). | ||||
| 1.2. Document Roadmap | ||||
| Most of the document gives a thorough analysis of the knock-on | ||||
| effects of the apparently minor change to tunnel encapsulation. The | ||||
| reader may jump to Section 5 if only interested in standards actions | ||||
| impacting implementation. The whole document is organised as | ||||
| follows: | ||||
| o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to | o S.5 of RFC3168 permits the Diffserv codepoint (DSCP)[RFC2474] to | |||
| 'switch in' different behaviours for marking the ECN field, just | 'switch in' different behaviours for marking the ECN field, just | |||
| as it switches in different per-hop behaviours (PHBs) for | as it switches in different per-hop behaviours (PHBs) for | |||
| scheduling. Therefore we cannot only discuss the ECN protocol | scheduling. Therefore we cannot only discuss the ECN protocol | |||
| that RFC3168 gives as a default. We need to also give guidance | that RFC3168 gives as a default. We need to also give guidance | |||
| for possible different marking schemes. Therefore in Section 3 we | for possible different marking schemes. Therefore in Section 3 we | |||
| lay out the design constraints when tunneling congestion | lay out the design constraints when tunnelling congestion | |||
| notification. | notification. | |||
| o Then in Section 4 we resolve the tensions between these | o Then in Section 4 we resolve the tensions between these | |||
| constraints to give general design principles on how a tunnel | constraints to give general design principles and guidelines on | |||
| should process congestion notification; principles that could | how a tunnel should process congestion notification; principles | |||
| apply to any marking behaviour for any PHB, not just the default | that could apply to any marking behaviour for any PHB, not just | |||
| in RFC3168. In particular, we examine the underlying principles | the default in RFC3168. In particular, we examine the underlying | |||
| behind whether CE should be reset or copied into the outer header | principles behind whether CE should be reset or copied into the | |||
| at the ingress to a tunnel--or indeed at the ingress of any | outer header at the ingress to a tunnel--or indeed at the ingress | |||
| layered encapsulation of headers with congestion notification | of any layered encapsulation of headers with congestion | |||
| fields. | notification fields. We end this section with a bulleted list of | |||
| more design guidelines for new encapsulations of congestion | ||||
| notification. | ||||
| o Section 5 then confirms the precise rules for the default ECN | o Section 5 then uses precise standards terminology to confirm the | |||
| tunnelling behaviour based on the above design principles. These | rules for the default ECN tunnelling behaviour based on the above | |||
| rules apply to all PHBs, unless stated otherwise in the | design principles. | |||
| specification of a PHB. There is no requirement for a PHB to | ||||
| state anything about ECN behaviour if the default behaviour is | ||||
| sufficient. | ||||
| o Extending the new IPsec tunnel ingress behaviour to all IP in IP | o Extending the new IPsec tunnel ingress behaviour to all IP in IP | |||
| tunnels causes one further knock-on effect that is dealt with in | tunnels requires consideration of backwards compatibility, which | |||
| Section 6 on Backward Compatibility. If one end of an IPsec | is covered in Section 6 and changes from earlier RFCs are brought | |||
| tunnel is compliant with [RFC4301], assuming IKEv2 key management | together in Section 7. | |||
| is used, the other end can be guaranteed to also be [RFC4301] | ||||
| compliant. So there is no backward compatibility problem with | ||||
| IKEv2 RFC4301 IPsec tunnels. But once we extend our scope to any | ||||
| IP in IP tunnel, we have to cater for the possibility that a | ||||
| tunnel ingress compliant with this specification is sending to an | ||||
| egress that doesn't even understand ECN (e.g. a legacy [RFC2003] | ||||
| tunnel egress). If a tunnel ingress copied incoming ECN-capable | ||||
| headers into outer headers, then a legacy tunnel egress would | ||||
| discard any congestion markings added to the outer header within | ||||
| the tunnel. ECN-capable traffic sources would not see any | ||||
| congestion feedback and instead continually ratchet up their share | ||||
| of the bandwidth without realising that cross-flows from other ECN | ||||
| sources were continually having to ratchet down. | ||||
| The scope of this document is all IP in IP tunnelling, irrespective | o Finally, a number of security considerations are discussed and | |||
| of whether IPv4 or IPv6 is used for either of the inner and outer | conclusions are drawn. | |||
| headers. The document only concerns wire protocol processing at | ||||
| tunnel endpoints and makes no changes or recommendations concerning | ||||
| algorithms for congestion marking or congestion response. The | ||||
| general design principles of Section 4 may also be useful when any | ||||
| datagram/packet/frame with a congestion notification capability is | ||||
| encapsulated by a connectionless outer header [BBnet] that might also | ||||
| support a congestion notification capability in the future as | ||||
| discussed in S.9.3 of [RFC3168] (e.g. IP encapsulated in L2TP | ||||
| [RFC2661], GRE [RFC1701] or PPTP [RFC2637]). However, of course, the | ||||
| IETF does not have standards authority over every link or tunnel | ||||
| protocol, so this document focuses only on IP in IP. | ||||
| [I-D.ietf-tsvwg-ecn-mpls] applies these principles to IP in MPLS and | ||||
| to MPLS in MPLS. | ||||
| 2. Requirements notation | 1.3. Scope | |||
| This document only concerns wire protocol processing at tunnel | ||||
| endpoints and makes no changes or recommendations concerning | ||||
| algorithms for congestion marking or congestion response. | ||||
| This document specifies a common, default congestion encapsulation | ||||
| for any IP in IP tunnelling, based on that now specified for IPsec. | ||||
| It applies irrespective of whether IPv4 or IPv6 is used for either of | ||||
| the inner and outer headers. It applies to all PHBs, unless stated | ||||
| otherwise in the specification of a PHB. It is intended to be a good | ||||
| trade off between somewhat conflicting security, control and | ||||
| management requirements. | ||||
| Nonetheless, if necessary, an alternate congestion encapsulation | ||||
| behaviour can be introduced as part of the definition of an alternate | ||||
| congestion marking scheme used by a specific Diffserv PHB (see S.5 of | ||||
| [RFC3168] and [RFC4774]). When designing such new encapsulation | ||||
| schemes, the principles in Section 4 should be followed as closely as | ||||
| possible. There is no requirement for a PHB to state anything about | ||||
| ECN tunnelling behaviour if the default behaviour is sufficient. | ||||
| Often lower layer resources (e.g. a point-to-point Ethernet link) are | ||||
| arranged to be protected by higher layer buffers, so instead of | ||||
| congestion occurring at the lower layer, it merely causes the queue | ||||
| from the higher layer to overflow. Such non-blocking link and | ||||
| physical layer technologies do not have to implement congestion | ||||
| notification, which can be introduced solely in the active queue | ||||
| management (AQM) from the IP layer. However, not all link layer | ||||
| technologies are always protected from congestion by buffers at | ||||
| higher layers (e.g. a subnetwork of Ethernet links and switches can | ||||
| congest internally). In these cases, when adding congestion | ||||
| notification to lower layers, we have to arrange for it to be | ||||
| explicitly copied up the layers, just as when IP is tunnelled in IP. | ||||
| As well as guiding alternate IP in IP tunnelling schemes, the design | ||||
| guidelines of Section 4 are intended to be followed when IP packets | ||||
| are encapsulated by any connectionless datagram/packet/frame where | ||||
| the outer header is designed to support a congestion notification | ||||
| capability. [RFC5129] already deals with handling ECN for IP in MPLS | ||||
| and MPLS in MPLS, and S.9.3 of [RFC3168] lists IP encapsulated in | ||||
| L2TP [RFC2661], GRE [RFC1701] or PPTP [RFC2637] as possible examples | ||||
| where ECN may be added in future. | ||||
| Of course, the IETF does not have standards authority over every link | ||||
| or tunnel protocol, so this document merely aims to define the | ||||
| interface between IP ECN and lower layer congestion notification. | ||||
| Then the IETF or the relevant standards body can be free to define | ||||
| the specifics of each lower layer scheme, but a common interface | ||||
| should ensure interworking across all technologies. | ||||
| Note that just because there is forward congestion notification in a | ||||
| lower layer protocol, if the lower layer has its own feedback and | ||||
| load regulation, there is no need to propagate it up the layers. For | ||||
| instance, FECN (forward ECN) has been present in Frame Relay and EFCI | ||||
| (explicit forward congestion indication) in ATM [ITU-T.I.371] for a | ||||
| long time, but they have been used for internal management rather | ||||
| than being propagated to endpoint transports for them to control end- | ||||
| to-end congestion. | ||||
| [RFC2983] is a comprehensive primer on differentiated services and | ||||
| tunnels. Given ECN raises similar issues to differentiated services | ||||
| when interacting with tunnels, useful concepts introduced in RFC2983 | ||||
| are used throughout, with brief recaps of the explanations where | ||||
| necessary. | ||||
| 2. Requirements Language | ||||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in RFC 2119 [RFC2119]. | |||
| 3. Design Constraints | 3. Design Constraints | |||
| Tunnel processing of a congestion notification field has to meet | Tunnel processing of a congestion notification field has to meet | |||
| congestion control needs without creating new information security | congestion control needs without creating new information security | |||
| vulnerabilities (if information security is required). | vulnerabilities (if information security is required). | |||
| 3.1. Security Constraints | 3.1. Security Constraints | |||
| Information security can be assured by using various end to end | Information security can be assured by using various end to end | |||
| skipping to change at page 6, line 48 | skipping to change at page 9, line 10 | |||
| IPsec encryption is typically used to prevent 'M' seeing messages | IPsec encryption is typically used to prevent 'M' seeing messages | |||
| from 'A' to 'B'. IPsec authentication is used to prevent 'M' | from 'A' to 'B'. IPsec authentication is used to prevent 'M' | |||
| masquerading as the sender of messages from 'A' to 'B' or altering | masquerading as the sender of messages from 'A' to 'B' or altering | |||
| their contents. But 'I' can also use IPsec tunnel mode to allow 'A' | their contents. But 'I' can also use IPsec tunnel mode to allow 'A' | |||
| to communicate with 'B', but impose encryption to prevent 'A' leaking | to communicate with 'B', but impose encryption to prevent 'A' leaking | |||
| information to 'M'. Or 'E' can insist that 'I' uses tunnel mode | information to 'M'. Or 'E' can insist that 'I' uses tunnel mode | |||
| authentication to prevent 'M' communicating information to 'B'. | authentication to prevent 'M' communicating information to 'B'. | |||
| Mutable IP header fields such as the ECN field (as well as the TTL/ | Mutable IP header fields such as the ECN field (as well as the TTL/ | |||
| Hop Limit and DS fields) cannot be included in the cryptographic | Hop Limit and DS fields) cannot be included in the cryptographic | |||
| calculations of IPsec. Therefore, if 'I' encrypts but copies these | calculations of IPsec. Therefore, if 'I' copies these mutable fields | |||
| mutable fields into the outer header that is exposed across the | into the outer header that is exposed across the tunnel it will have | |||
| tunnel it will have allowed a covert channel from 'A' to M. And if | allowed a covert channel from 'A' to M that bypasses its encryption | |||
| 'E' copies these fields from the outer header to the inner, even if | of the inner header. And if 'E' copies these fields from the outer | |||
| it validates authentication from 'I', it will have allowed a covert | header to the inner, even if it validates authentication from 'I', it | |||
| channel from 'M' to 'B'. | will have allowed a covert channel from 'M' to 'B'. | |||
| ECN at the IP layer is designed to carry information about congestion | ECN at the IP layer is designed to carry information about congestion | |||
| from a congested resource to some downstream node that will feed the | from a congested resource towards downstream nodes. Typically a | |||
| information back somehow to the point upstream of the congestion that | downstream transport might feed the information back somehow to the | |||
| can regulate the load on the congested resource. In terms of the | point upstream of the congestion that can regulate the load on the | |||
| above scenario, ECN is effectively intended to create an information | congested resource, but other actions are possible (see [RFC3168] | |||
| channel from 'M' to 'B', for 'B' to forward to 'A'. Therefore the | S.6). In terms of the above unicast scenario, ECN is typically | |||
| goals of IPsec and ECN are mutually incompatible. | intended to create an information channel from 'M' to 'B', for 'B' to | |||
| forward to 'A'. Therefore the goals of IPsec and ECN are mutually | ||||
| incompatible. | ||||
| With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, | With respect to the DS or ECN fields, S.5.1.2 of RFC4301 says, | |||
| "controls are provided to manage the bandwidth of this [covert] | "controls are provided to manage the bandwidth of this [covert] | |||
| channel". Using the ECN processing rules of RFC4301, the channel | channel". Using the ECN processing rules of RFC4301, the channel | |||
| bandwidth is two bits per datagram from 'A' to 'M' and one bit per | bandwidth is two bits per datagram from 'A' to 'M' and one bit per | |||
| datagram from 'M' to 'A' because 'E' limits the combinations it will | datagram from 'M' to 'A' (because 'E' limits the combinations of the | |||
| copy. In both cases the covert channel bandwidth is further reduced | 2-bit ECN field that it will copy). In both cases the covert channel | |||
| by noise from any real congestion marking. RFC4301 therefore implies | bandwidth is further reduced by noise from any real congestion | |||
| that these covert channels are sufficiently limited to be considered | marking. RFC4301 therefore implies that these covert channels are | |||
| a manageable threat. However, with respect to the larger (6b) DS | sufficiently limited to be considered a manageable threat. However, | |||
| field, the same section of RFC4301 says not copying is the default, | with respect to the larger (6b) DS field, the same section of RFC4301 | |||
| but a configuration option can allow copying "to allow a local | says not copying is the default, but a configuration option can allow | |||
| administrator to decide whether the covert channel provided by | copying "to allow a local administrator to decide whether the covert | |||
| copying these bits outweighs the benefits of copying". Of course, an | channel provided by copying these bits outweighs the benefits of | |||
| administrator considering copying of the DS field has to take into | copying". Of course, an administrator considering copying of the DS | |||
| account that it could be concatenated with the ECN field giving an 8b | field has to take into account that it could be concatenated with the | |||
| per datagram channel. | ECN field giving an 8b per datagram covert channel. | |||
| Thus, for tunnelling the 6b Diffserv field two conceptual models have | ||||
| had to be defined so that administrators can trade off security | ||||
| against the needs of traffic conditioning [RFC2983]: | ||||
| The uniform model: where the DIffserv field is preserved end-to-end | ||||
| by copying into the outer header on encapsulation and copying from | ||||
| the outer header on decapsulation. | ||||
| The pipe model: where the outer header is independent of that in the | ||||
| inner header so it hides the Diffserv field of the inner header | ||||
| from any interaction with nodes along the tunnel. | ||||
| However, for ECN, the new IPsec security architecture in RFC4301 only | ||||
| standardised one tunnelling model equivalent to the uniform model. | ||||
| It deemed that simplicity was more important than allowing | ||||
| administrators the option of a tiny increment in security especially | ||||
| given not copying congestion indications could seriously harm | ||||
| everyone's network service. | ||||
| 3.2. Control Constraints | 3.2. Control Constraints | |||
| Congestion control requires that any congestion notification marked | Congestion control requires that any congestion notification marked | |||
| into packets by a resource will be able to traverse a feedback loop | into packets by a resource will be able to traverse a feedback loop | |||
| back to a node capable of controlling the load on that resource. To | back to a node capable of controlling the load on that resource. To | |||
| avoid ambiguity later rather than calling this node the data source | be precise, rather than calling this node the data source, we will | |||
| we will call it the Load Regulator. This will allow us to deal with | call it the Load Regulator. This will allow us to deal with | |||
| exceptional cases where load is not regulated by the data source, but | exceptional cases where load is not regulated by the data source, but | |||
| usually the two will be synonymous. Note the term "a node _capable | usually the two terms will be synonymous. Note the term "a node | |||
| of_ controlling the load" deliberately includes a source application | _capable of_ controlling the load" deliberately includes a source | |||
| that doesn't actually control the load but ought to (e.g. an | application that doesn't actually control the load but ought to (e.g. | |||
| application without congestion control that uses UDP). | an application without congestion control that uses UDP). | |||
| A--->R--->I=========>M=========>E-------->B | A--->R--->I=========>M=========>E-------->B | |||
| Figure 2: Simple Tunnel Scenario | Figure 2: Simple Tunnel Scenario | |||
| We now consider a similar tunneling scenario to the IPsec one just | We now consider a similar tunnelling scenario to the IPsec one just | |||
| described, but without the different security domains so we can just | described, but without the different security domains so we can just | |||
| focus on ensuring the control loop and management monitoring can work | focus on ensuring the control loop and management monitoring can work | |||
| (Figure 2). If we want resources in the tunnel to be able to | (Figure 2). If we want resources in the tunnel to be able to | |||
| explicitly notify congestion and the feedback loop is from 'B' to | explicitly notify congestion and the feedback path is from 'B' to | |||
| 'A', it will certainly be necessary for 'E' to copy any CE marking | 'A', it will certainly be necessary for 'E' to copy any CE marking | |||
| from the outer header to the inner header for onward transmission to | from the outer header to the inner header for onward transmission to | |||
| 'B', otherwise congestion notification from resources like 'M' cannot | 'B', otherwise congestion notification from resources like 'M' cannot | |||
| be fed back to the Load Regulator ('A'). But it doesn't seem | be fed back to the Load Regulator ('A'). But it doesn't seem | |||
| necessary for 'I' to copy CE markings from the inner to the outer | necessary for 'I' to copy CE markings from the inner to the outer | |||
| header. For instance, if resource 'R' is congested, it can send | header. For instance, if resource 'R' is congested, it can send | |||
| congestion information to 'B' using the congestion field in the inner | congestion information to 'B' using the congestion field in the inner | |||
| header without 'I' copying the congestion field into the outer header | header without 'I' copying the congestion field into the outer header | |||
| and 'E' copying it back to the inner header. 'E' can then write any | and 'E' copying it back to the inner header. 'E' can still write any | |||
| additional congestion marking introduced across the tunnel into the | additional congestion marking introduced across the tunnel into the | |||
| congestion field of the inner header. | congestion field of the inner header. | |||
| Indeed, this arrangement can be extended to multi-level congestion | ||||
| marking (such as that proposed for PCN [PCN-arch]) as long as all the | ||||
| marks have unambiguously ranked values. For instance, if a | ||||
| hypothetical multi-level marking scheme for PCN had PCN-capable | ||||
| codepoints ranked 1, 2 and 3, then, if 'I' reset the outer congestion | ||||
| field to the lowest ranked value that is PCN-capable (1), 'E' would | ||||
| simply write the highest ranked of the inner and outer congestion | ||||
| markings into the forwarded header. For instance, if the inner | ||||
| marking on arrival at 'I' was 3 and 'I' reset the outer to 1, but 'M' | ||||
| subsequently set it to 2, then the header forwarded by 'E' would be | ||||
| max(3,2) = 3. | ||||
| It might be useful for the tunnel egress to be able to tell whether | It might be useful for the tunnel egress to be able to tell whether | |||
| congestion occurred across a tunnel or upstream of it. If outer | congestion occurred across a tunnel or upstream of it. If outer | |||
| header congestion marking was reset at the tunnel ingress ('I'), by | header congestion marking was reset by the tunnel ingress ('I'), at | |||
| the end of a tunnel ('E') the outer headers would indicate congestion | the end of a tunnel ('E') the outer headers would indicate congestion | |||
| experienced across the tunnel ('I' to 'E'), while the inner header | experienced across the tunnel ('I' to 'E'), while the inner header | |||
| would indicate congestion upstream of 'I'. But the same information | would indicate congestion upstream of 'I'. But similar information | |||
| could be gleaned even if the tunnel ingress copied the inner to the | can be gleaned even if the tunnel ingress copies the inner to the | |||
| outer headers. By the end of the tunnel ('E'), any packet with an | outer headers. At the end of the tunnel ('E'), any packet with an | |||
| _extra_ mark in the outer header relative to the inner header would | _extra_ mark in the outer header relative to the inner header | |||
| indicate congestion across the tunnel ('I' to 'E'), while the inner | indicates congestion across the tunnel ('I' to 'E'), while the inner | |||
| header would still indicate congestion upstream of ('I'). | header would still indicate congestion upstream of ('I'). Appendix B | |||
| gives a more precise method for inferring the congestion level | ||||
| introduced across a tunnel. | ||||
| All this shows that 'E' can preserve the control loop irrespective of | All this shows that 'E' can preserve the control loop irrespective of | |||
| whether 'I' copies congestion notification into the outer header or | whether 'I' copies congestion notification into the outer header or | |||
| resets it. | resets it. | |||
| That is the situation for existing control arrangements but, because | ||||
| copying reveals more information, it would open up possibilities for | ||||
| better control system designs. For instance, Appendix A describes | ||||
| how resetting CE marking at a tunnel ingress confuses a proposed | ||||
| congestion marking scheme on the standards track. It ends up | ||||
| removing excessive amounts of traffic unnecessarily. Whereas copying | ||||
| CE markings at ingress leads to the correct control behaviour. | ||||
| 3.3. Management Constraints | 3.3. Management Constraints | |||
| As well as control, there are also management constraints. | As well as control, there are also management constraints. | |||
| Specifically, a management system may monitor congestion markings in | Specifically, a management system may monitor congestion markings in | |||
| passing packets, perhaps at the border between networks as part of a | passing packets, perhaps at the border between networks as part of a | |||
| service level agreement. For instance, monitors at the borders of | service level agreement. For instance, monitors at the borders of | |||
| autonomous systems may need to measure how much congestion has | autonomous systems may need to measure how much congestion has | |||
| accumulated since the original source to determine between them how | accumulated since the original source to determine between them how | |||
| much of the congestion is contributed by each domain. | much of the congestion is contributed by each domain. | |||
| Therefore it should be clear how far back in the path the congestion | Therefore, when monitoring the middle of a path, it should be | |||
| markings have accumulated from. In this document we term this the | possible to establish how far back in the path congestion markings | |||
| baseline of the congestion marking, i.e. the source of the layer that | have accumulated from. In this document we term this the baseline of | |||
| last reset rather than copied the congestion notification field when | congestion marking (or the Congestion Baseline), i.e. the source of | |||
| creating an outer header. Given some tunnels cross domain borders | the layer that last reset (or created) the congestion notification | |||
| (e.g. consider M in Figure 2 is monitoring a border), it is therefore | field. Given some tunnels cross domain borders (e.g. consider M in | |||
| desirable for 'I' to copy congestion accumulated so far into the | Figure 2 is monitoring a border), it would therefore be desirable for | |||
| outer headers exposed across the tunnel. | 'I' to copy congestion accumulated so far into the outer headers | |||
| exposed across the tunnel. | ||||
| Appendix A discusses various scenarios where the Load Regulator lies | Appendix D discusses various scenarios where the Load Regulator lies | |||
| in-path, not at the source host as we would typically expect. It | in-path, not at the source host as we would typically expect. It | |||
| concludes that the baseline for congestion notification should be | concludes that a Congestion Baseline is determined by where the Load | |||
| determined by where the Load Regulator function is, whether it is at | Regulator function is, which should be identified in the transport | |||
| the source host or within the path. Therefore every tunnel ingress | layer, not by addresses in network layer headers. This applies | |||
| should copy the ECN field into the outer header it creates unless it | whether the Load Regulator is at the source host or within the path. | |||
| is also a Load Regulator, in which case it should reset any CE | The appendix also discusses where a Load Regulator function should be | |||
| markings, which is an exception to the normal copying rule for a | located relative to a local encapsulation function. | |||
| tunnel ingress. | ||||
| 4. Design Principles | 4. Design Principles | |||
| The constraints from the three perspectives of security, control and | The constraints from the three perspectives of security, control and | |||
| management in Section 3 are somewhat in tension as to whether a | management in Section 3 are somewhat in tension as to whether a | |||
| tunnel ingress should copy congestion markings into the outer header | tunnel ingress should copy congestion markings into the outer header | |||
| it creates or reset them. From the control perspective either | it creates or reset them. From the control perspective either | |||
| copying or resetting works. From the management perspective copying | copying or resetting works for existing arrangements, but copying has | |||
| is preferable (with the exception of an in-path load regulator). | more potential for simplifying control. From the management | |||
| From the security perspective resetting is preferable but copying is | perspective copying is preferable. From the security perspective | |||
| now considered acceptable given the bandwidth of a 2-bit covert | resetting is preferable but copying is now considered acceptable | |||
| channel can be managed. | given the bandwidth of a 2-bit covert channel can be managed. | |||
| Therefore an outer encapsulating header capable of carrying | Therefore an outer encapsulating header capable of carrying | |||
| congestion markings SHOULD reflect accumulated congestion since the | congestion markings SHOULD reflect accumulated congestion since the | |||
| last interface designed to regulate load (the Load Regulator). This | last interface designed to regulate load (the Load Regulator). This | |||
| implies congestion notification SHOULD be copied into the outer | implies congestion notification SHOULD be copied into the outer | |||
| header of each new encapsulating header that supports it--except at | header of each new encapsulating header that supports it. | |||
| an in-path Load Regulator. An in-path Load Regulator knows its | ||||
| function is to regulate load, so if it also acts as the ingress to a | We have said that a tunnel ingress SHOULD (as opposed to MUST) copy | |||
| tunnel, in every new outer header it creates it MUST reset any | incoming congestion notification into an outer encapsulating header | |||
| congestion marking. | that supports it. In the case of 2-bit ECN, the IETF security area | |||
| has deemed the benefit always outweighs the risk. Therefore for | ||||
| 2-bit ECN we can and we will say 'MUST' (Section 5). But in this | ||||
| section where we are setting down general design principles, we leave | ||||
| it as a 'SHOULD'. This allows for future multi-bit congestion | ||||
| notification fields where the risk from the covert channel created by | ||||
| copying congestion notification might outweigh the congestion control | ||||
| benefit of copying. | ||||
| The Load Regulator is the node to which congestion feedback should be | The Load Regulator is the node to which congestion feedback should be | |||
| returned by the next downstream node with a transport layer function | returned by the next downstream node with a transport layer feedback | |||
| (typically but not always the data receiver). The Load Regulator is | function (typically but not always the data receiver). The Load | |||
| not always (or even typically) the same thing as the node identified | Regulator is not always (or even typically) the same thing as the | |||
| by the source address of the outermost exposed header. In general | node identified by the source address of the outermost exposed | |||
| the addressing of the outermost encapsulation header says nothing | header. In general the addressing of the outermost encapsulation | |||
| about the identifiers of either the upstream or the downstream | header says nothing about the identifiers of either the upstream or | |||
| transport layer functions. As long as the transport functions know | the downstream transport layer functions. As long as the transport | |||
| each other's addresses, they don't have to be identified in the | functions know each other's addresses, they don't have to be | |||
| network layer or in any link layer. It was only a convenience that a | identified in the network layer or in any link layer. It was only a | |||
| TCP receiver assumed that the address of the source transport is the | convenience that a TCP receiver assumed that the address of the | |||
| same as the network layer source address of a packet it receives. | source transport is the same as the network layer source address of | |||
| an IP packet it receives. | ||||
| More generally, the return transport address could be identified | More generally, the return transport address for feedback could be | |||
| solely in the transport layer protocol. For instance, a signalling | identified solely in the transport layer protocol. For instance, a | |||
| protocol like RSVP [RFC2205] breaks up a path into transport layer | signalling protocol like RSVP [RFC2205] breaks up a path into | |||
| hops and informs each hop of the address of its transport layer | transport layer hops and informs each hop of the address of its | |||
| neighbour without any need to identify these hops in the network | transport layer neighbour without any need to identify these hops in | |||
| layer. RSVP can be arranged so that these transport layer hops are | the network layer. RSVP can be arranged so that these transport | |||
| bigger than the underlying network layer hops. The host identity | layer hops are bigger than the underlying network layer hops. The | |||
| protocol (HIP) architecture [RFC4423] also supports the same | host identity protocol (HIP) architecture [RFC4423] also supports the | |||
| principled separation (for mobility amongst other things), where the | same principled separation (for mobility amongst other things), where | |||
| transport layer receiver identifies the transport layer sender using | the transport layer sender identifies its transport address for | |||
| an identifier provided by the transport layer, which gets mapped to a | feedback to be sent to, using an identifier provided by a shim below | |||
| network layer address below the transport layer. | the transport layer. | |||
| Note that this principle deliberately doesn't require a packet header | Keeping to this layering principle deliberately doesn't require a | |||
| to reveal the origin address of the baseline that congestion | network layer packet header to reveal the origin address from where | |||
| notification has accumulated from. It is not necessary for the | congestion notification accumulates (its Congestion Baseline). It is | |||
| network and lower layers to know the address of the Load Regulator. | not necessary for the network and lower layers to know the address of | |||
| Only the destination transport needs to know that. With congestion | the Load Regulator. Only the destination transport needs to know | |||
| notification, the network and link layers only notify congestion | that. With forward congestion notification, the network and link | |||
| forwards, they aren't involved in feeding it backwards. If they are, | layers only notify congestion forwards; they aren't involved in | |||
| e.g. backward congestion notification (BCN) in Ethernet [802.1au], | feeding it backwards. If they are (e.g. backward congestion | |||
| that should be considered as a transport function added to the lower | notification (BCN) in Ethernet [IEEE802.1au] or EFCI in ATM | |||
| layer, which must sort out its own addressing. Indeed, this is one | [ITU-T.I.371]), that should be considered as a transport function | |||
| reason why ICMP source quench is now deprecated [RFC1254]; when | added to the lower layer, which must sort out its own addressing. | |||
| congestion occurs within a tunnel it is complex (particularly in the | Indeed, this is one reason why ICMP source quench is now deprecated | |||
| case of IPsec tunnels) to return the ICMP messages beyond the tunnel | [RFC1254]; when congestion occurs within a tunnel it is complex | |||
| ingress back to the Load Regulator . | (particularly in the case of IPsec tunnels) to return the ICMP | |||
| messages beyond the tunnel ingress back to the Load Regulator. | ||||
| Similarly, if a management system is monitoring congestion and needs | Similarly, if a management system is monitoring congestion and needs | |||
| to know the baseline of congestion notification, the management | to know the Congestion Baseline, the management system has to find | |||
| system has to find this out from the transport; in general it cannot | this out from the transport; in general it cannot tell solely by | |||
| tell solely by looking at the network or link layer headers. | looking at the network or link layer headers. | |||
| We have said that a tunnel ingress that is not a Load Regulator | 4.1. Design Guidelines for New Encapsulations of Congestion | |||
| SHOULD (as opposed to MUST) copy incoming congestion notification | Notification | |||
| into an outer encapsulating header that supports it. In the case of | ||||
| 2-bit ECN, the IETF security area have deemed the benefit always | The following guidelines are for specifications of new schemes for | |||
| outweighs the risk. Therefore for 2-bit ECN we can and we will say | encapsulating congestion notification (e.g. for specialised Diffserv | |||
| 'MUST' (Section 5). But in this section where we are setting down | PHBs in IP, or for lower layer technologies): | |||
| general design principles, we leave it as a 'SHOULD'. This allows | ||||
| for future multi-bit congestion notification fields where the risk | 1. Congestion notification in outer headers SHOULD be relative to a | |||
| from the covert channel created by copying congestion notification | Congestion Baseline at the node expected to regulate the load on | |||
| might outweigh the congestion control benefit of copying. | the link in question (the Load Regulator). This implies incoming | |||
| congestion notifications from the higher layer SHOULD be copied | ||||
| into encapsulating headers. This guideline is particularly | ||||
| important where outer headers might cross trust boundaries, but | ||||
| less important otherwise. | ||||
| 2. Congestion notification MUST NOT simply be copied from outer | ||||
| headers to the forwarded header on decapsulation. The forwarded | ||||
| congestion notification field SHOULD be calculated from the inner | ||||
| and outer headers, taking account of the following, in the order | ||||
| given: | ||||
| 1. If the inner header does not support congestion notification, | ||||
| or indicates that the transport does not support congestion | ||||
| notification, any explicit congestion notifications in the | ||||
| outer header will not be understood if propagated further, so | ||||
| if the only way to indicate congestion to onward nodes is to | ||||
| drop the packet, it MUST be dropped. | ||||
| 2. If the outer header does not support explicit congestion | ||||
| notification, but the inner header does, the inner header | ||||
| SHOULD be forwarded unchanged. | ||||
| 3. Congestion indications may be ranked by strength. For | ||||
| instance no congestion would be the weakest indication, with | ||||
| possibly increasing levels of congestion given increasingly | ||||
| stronger indications. | ||||
| 4. Where the inner and outer headers carry indications of | ||||
| congestion of different strengths, the stronger indication | ||||
| SHOULD be forwarded in preference to the weaker. Obviously, | ||||
| if the strengths in both inner and outer are the same, the | ||||
| same strength should be forwarded. | ||||
| 5. If the outer header carries a weaker indication of congestion | ||||
| than the inner, it MAY be appropriate to raise a warning, as | ||||
| this would be in illegal combination if Guideline Paragraph 1 | ||||
| had been followed. | ||||
| 3. Where framing boundaries are different between the two layers, | ||||
| congestion indications SHOULD be propagated on the basis that a | ||||
| congestion indication in a packet or frame applies to all the | ||||
| octets in the frame/packet. On average, a tunnel endpoint SHOULD | ||||
| approximately preserve the number of marked octets arriving and | ||||
| leaving. An algorithm for spreading congestion indications over | ||||
| multiple smaller `fragments' SHOULD propagate congestion | ||||
| indications as soon as they arrive, and SHOULD NOT hold them back | ||||
| for later frames. | ||||
| 4. Assumptions on incremental deployment MUST be stated. | ||||
| Regarding incremental deployment, the Per-Domain ECT Checking | ||||
| of[RFC5129] is a good example to follow. In this example, header | ||||
| space in the lower layer protocol (MPLS) was extremely limited, so no | ||||
| ECN-capable transport codepoint was added to the MPLS header. | ||||
| Interior nodes in a domain were allowed to set explicit congestion | ||||
| indications without checking whether the frame was destined for a | ||||
| transport that would understand them. This was made safe by | ||||
| emphasising repeatedly that all the decapsulating edges of a whole | ||||
| domain had to be upgraded at once, so there would always be a check | ||||
| that the higher layer transport was ECN-capable on decapsulation. If | ||||
| the decapsulator discovered that the higher layer showed the | ||||
| transport would not understand ECN, it dropped the packet on behalf | ||||
| of the earlier congestion node (see Guideline Paragraph 2.1). | ||||
| Note that such a deployment strategy that assumes a savvy operator | ||||
| was only appropriate because MPLS is targeted solely at professional | ||||
| operators. This strategy would not be appropriate for other link | ||||
| technologies (e.g. Ethernet) targeted at deployment by the general | ||||
| public. | ||||
| 5. Default ECN Tunnelling Rules | 5. Default ECN Tunnelling Rules | |||
| The following ECN tunnel processing rules are the default for a | The following ECN tunnel processing rules are the default for a | |||
| packet with any DSCP. If required, different ECN processing rules | packet with any DSCP. If required, different ECN encapsulation rules | |||
| MAY be defined for the appropriate Diffserv PHB using the guidelines | MAY be defined as part of the definition of an appropriate Diffserv | |||
| in Section 4. | PHB using the guidelines in Section 4. However, the burden of | |||
| handling exceptional PHBs in implementations of all affected tunnels | ||||
| and lower layer link protocols should not be underestimated. | ||||
| When a tunnel ingress creates an encapsulating IP header, the 2-bit | A tunnel ingress compliant with this specification MUST copy the | |||
| ECN field of the inner IP header MUST be copied into the outer IP | 2-bit ECN field of the arriving IP header into the outer | |||
| header, for all types of IP in IP tunnel (except if the tunnel | encapsulating IP header, for all types of IP in IP tunnel. This | |||
| ingress is in compatibility mode--see Section 6). If the tunnel | encapsulation behaviour MUST only be used if the tunnel ingress is in | |||
| ingress is also a Load Regulator, it MUST instead reset the outer | `normal state'. A `compatibility state' with a different | |||
| header to ECT(0). | encapsulation behaviour is also specified in Section 6 for backward | |||
| compatibility with legacy tunnel egresses that do not understand ECN. | ||||
| To decapsulate the inner header at the tunnel egress, the outgoing | To decapsulate the inner header at the tunnel egress, a compliant | |||
| inner header MUST be calculated from the combination of the incoming | tunnel egress MUST set the outgoing ECN field to the codepoint at the | |||
| inner and outer headers setting the outgoing ECN field to the | intersection of the appropriate incoming inner header (row) and outer | |||
| codepoints displayed in the body of Table 1. | header (column) in Table 1. | |||
| +--Incoming Outer Header--- | +--Incoming Outer Header--- | |||
| +--------------------+---------+------------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | |||
| | Header | | | | | | | Header | | | | | | |||
| +--------------------+---------+------------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| | Not-ECT | Not-ECT | drop (!!!) | drop(!!!) | drop(!!!) | | | Not-ECT | Not-ECT | drop (!!!) | drop(!!!) | drop(!!!) | | |||
| | ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE | | | ECT(0) | ECT(0) | ECT(0) | ECT(0) | CE | | |||
| | ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE | | | ECT(1) | ECT(1) | ECT(1) | ECT(1) | CE | | |||
| | CE | CE | CE (!!!) | CE (!!!) | CE | | | CE | CE | CE | CE (!!!) | CE | | |||
| +--------------------+---------+------------+-----------+-----------+ | +---------------------+---------+-----------+-----------+-----------+ | |||
| +-----Outgoing Header------ | +-----Outgoing Header------ | |||
| Table 1: IP in IP Decapsulation | Table 1: IP in IP Decapsulation | |||
| The exclamation marks '(!!!)' in Table 1 indicate that this | The exclamation marks '(!!!)' in Table 1 indicate that this | |||
| combination of inner and outer headers should not be possible if only | combination of inner and outer headers should not be possible if only | |||
| legal transitions have taken place. So, the decapsulator should drop | legal transitions have taken place. So, the decapsulator should drop | |||
| or mark the ECN field as the table specifies, but it MAY also raise | or mark the ECN field as the table specifies, but it MAY also raise | |||
| an appropriate alarm. It MUST NOT raise an alarm so often that the | an appropriate alarm. It MUST NOT raise an alarm so often that the | |||
| illegal combinations would amplify into a flood of alarm messages. | illegal combinations would amplify into a flood of alarm messages. | |||
| 6. Backward Compatibility | 6. Backward Compatibility | |||
| A legacy tunnel egress may not know how to process an ECN field, so | Note: in RFC3168, a tunnel was in one of two modes: limited | |||
| it will most likely simply disregard all outer headers. Therefore, | functionality or full functionality. Rather than working with modes | |||
| unless a compliant tunnel ingress has established that the tunnel | of the tunnel as a whole, this specification uses the term `state' to | |||
| egress understands ECN processing, it MUST only send packets with the | refer separately to the state of each tunnel end point, which is how | |||
| ECN field set to Not-ECT in the outer header. Otherwise, if ECN | implementations have to work. | |||
| capable outer headers were sent towards a legacy egress, it would | ||||
| dangerously remove information about congestion experienced within | ||||
| the tunnel. | ||||
| A tunnel ingress may establish whether its tunnel egress will | If one end of an IPsec tunnel is compliant with [RFC4301], the other | |||
| understand ECN processing by configuration or by negotiation. Note | end can be guaranteed to also be [RFC4301] compliant (there could be | |||
| that a [RFC4301] tunnel ingress that has used IKEv2 key management | corner cases where manual keying is used, but they will be ignored | |||
| [RFC4306] can guarantee that the tunnel egress is also RFC4301- | here). So there is no backward compatibility problem with IKEv2 | |||
| compliant and therefore need not negotiate ECN capabilities. | RFC4301 IPsec tunnels. But once we extend our scope to any IP in IP | |||
| tunnel, we have to cater for the possibility that a legacy tunnel | ||||
| egress may not know how to process an ECN field, so if ECN capable | ||||
| outer headers were sent towards a legacy (e.g. [RFC2003]) egress, it | ||||
| would most likely simply disregard the outer headers, dangerously | ||||
| discarding information about congestion experienced within the | ||||
| tunnel. ECN-capable traffic sources would not see any congestion | ||||
| feedback and instead continually ratchet up their share of the | ||||
| bandwidth without realising that cross-flows from other ECN sources | ||||
| were continually having to ratchet down. | ||||
| To be compliant with this specification a tunnel ingress that does | To be compliant with this specification a tunnel ingress that does | |||
| not know the egress ECN capability (e.g. by configuration) MUST | not always know the ECN capability of its tunnel egress MUST | |||
| implement a 'normal' mode and a 'compatibility' mode, and it MUST | implement a 'normal' state and a 'compatibility' state, and it MUST | |||
| initiate each negotiated tunnel in compatibility mode. On the other | initiate each negotiated tunnel in the compatibility state. | |||
| hand, a compliant tunnel egress MUST merely implement the one | ||||
| behaviour in Section 5, which we term 'full-functionality' mode. | ||||
| Before switching to normal mode, a compliant tunnel ingress that does | However, a tunnel ingress can be compliant even if it only implements | |||
| not know the egress ECN capability (e.g. by configuration) MUST | the 'normal state' of encapsulation behaviour, but only as long as it | |||
| negotiate with the tunnel egress to establish whether the egress is | is designed or configured so that all possible tunnel egress nodes it | |||
| in full functionality mode. If the egress is in full functionality | will ever talk to will have full ECN functionality (RFC3168 full | |||
| mode, the ingress puts itself into normal mode. In normal mode the | functionality mode, RFC4301 and this present specification). The | |||
| ingress follows the encapsulation rule in Section 5 (i.e. it copies | `normal state' is that defined in Section 5 (i.e. header copying). | |||
| the inner ECN field into the outer header). If the egress is not in | Note that a [RFC4301] tunnel ingress that has used IKEv2 key | |||
| full-functionality mode or doesn't understand the question, the | management [RFC4306] can guarantee that its tunnel egress is also | |||
| tunnel ingress MUST remain in compatibility mode. | RFC4301-compliant and therefore need not further negotiate ECN | |||
| capabilities. | ||||
| A tunnel ingress in compatibility mode MUST set all outer headers to | Before switching to normal state, a compliant tunnel ingress that | |||
| Not-ECT. | does not know the egress ECN capability MUST negotiate with the | |||
| tunnel egress. If the egress says it is in full functionality state | ||||
| (or mode), the ingress puts itself into normal state. In normal | ||||
| state the ingress follows the encapsulation rule in Section 5 (i.e. | ||||
| header copying). If the egress says it is not in full-functionality | ||||
| state/mode or doesn't understand the question, the tunnel ingress | ||||
| MUST remain in compatibility state. | ||||
| A tunnel ingress in compatibility state MUST set all outer headers to | ||||
| Not-ECT. This is the same per packet behaviour as the ingress end of | ||||
| RFC3168's limited functionality mode. | ||||
| A tunnel ingress that only implements compatibility state is at least | ||||
| safe with the ECN behaviour of any egress it may encounter (any of | ||||
| RFC2003, RFC2401, either mode of RFC2481 and RFC3168's limited | ||||
| functionality mode). But an ingress cannot claim compliance with | ||||
| this specification simply by disabling ECN processing across the | ||||
| tunnel. A compliant tunnel ingress MUST at least implement `normal | ||||
| state' and, if it might be used with arbitrary tunnel egress nodes, | ||||
| it MUST also implement `compatibility state'. | ||||
| A compliant tunnel egress on the other hand merely needs to implement | ||||
| the one behaviour in Section 5, which we term 'full-functionality' | ||||
| state, as it is the same as the egress end of the full-functionality | ||||
| mode of [RFC3168]. It is also the same as the [RFC4301] egress | ||||
| behaviour. | ||||
| The decapsulation rules for the egress of the tunnel in Section 5 | The decapsulation rules for the egress of the tunnel in Section 5 | |||
| have been defined in such a way that congestion control will still | have been defined in such a way that congestion control will still | |||
| work safely if any of the earlier versions of ECN processing are used | work safely if any of the earlier versions of ECN processing are used | |||
| unilaterally at the encapsulating ingress of the tunnel. If a tunnel | unilaterally at the encapsulating ingress of the tunnel (any of | |||
| ingress tries to negotiate to use limited functionality mode or full | RFC2003, RFC2401, either mode of RFC2481, either mode of RFC3168, | |||
| functionality mode, a decapsulating tunnel egress compliant with this | RFC4301 and this present specification). If a tunnel ingress tries | |||
| specification MUST agree to the request, even though its behaviour | to negotiate to use limited functionality mode or full functionality | |||
| will be the same in both cases. For 'forward compatibility', a | mode [RFC3168], a decapsulating tunnel egress compliant with this | |||
| compliant tunnel egress MUST raise a warning about any requests to | specification MUST agree to either request, as its behaviour will be | |||
| enter modes it doesn't recognise, but it can continue operating. If | the same in both cases. | |||
| no ECN-related mode is requested, no error or warning need be raised | ||||
| as the egress behaviour is compatible with all the legacy ingress | ||||
| behaviours that don't negotiate capabilities. | ||||
| Note that if a compliant node is the ingress for multiple tunnels, a | For 'forward compatibility', a compliant tunnel egress SHOULD raise a | |||
| mode setting will need to be stored for each tunnel ingress. | warning about any requests to enter states or modes it doesn't | |||
| However, if a node is the egress for multiple tunnels, none of the | recognise, but it can continue operating. If no ECN-related state or | |||
| tunnels will need to store a mode setting, because a compliant egress | mode is requested, a compliant tunnel egress need not raise an error | |||
| can only be in one mode. | or warning as its egress behaviour is compatible with all the legacy | |||
| ingress behaviours that don't negotiate capabilities. | ||||
| Implementation note: if a compliant node is the ingress for multiple | ||||
| tunnels, a state setting will need to be stored for each tunnel | ||||
| ingress. However, if a node is the egress for multiple tunnels, none | ||||
| of the tunnels will need to store a state setting, because a | ||||
| compliant egress can only be in one state. | ||||
| 7. Changes from Earlier RFCs | 7. Changes from Earlier RFCs | |||
| The rule that a tunnel ingress MUST copy any ECN field into the outer | The rule that a normal state tunnel ingress MUST copy any ECN field | |||
| header is a change to RFC3168 (unless it is a Load Regulator as well, | into the outer header is a change to the ingress behaviour of | |||
| in which case there is no change). | RFC3168, but it is the same as the rules for IPsec tunnels in | |||
| RFC4301. | ||||
| The rules for calculating the outgoing ECN field on decapsulation at | The rules for calculating the outgoing ECN field on decapsulation at | |||
| a tunnel egress are in line with the full functionality mode of ECN | a tunnel egress are in line with the full functionality mode of ECN | |||
| in RFC3168 and with RFC4301, except that neither identified the need | in RFC3168 and with RFC4301, except that neither identified that an | |||
| to raise an alarm if the inner header was CE but the outer header was | outer header of ECT(1) combined with an inner header of CE was an | |||
| ECT. | illegal combination. | |||
| The rules for how a tunnel establishes whether the egress has full | The rules for how a tunnel establishes whether the egress has full | |||
| functionality ECN capabilities are an update to RFC3168. For all the | functionality ECN capabilities are an update to RFC3168. For all the | |||
| typical cases, RFC4301 is not updated by the ECN capability check in | typical cases, RFC4301 is not updated by the ECN capability check in | |||
| this specification, because a typical RFC4301 tunnel ingress will | this specification, because a typical RFC4301 tunnel ingress will | |||
| have already established that it is talking to an RFC4301 tunnel | have already established that it is talking to an RFC4301 tunnel | |||
| egress (e.g. if it uses IKEv2). However, there may be some corner | egress (e.g. if it uses IKEv2). However, there may be some corner | |||
| cases (e.g. manual keying) where an RFC4301 tunnel ingress talks with | cases (e.g. manual keying) where an RFC4301 tunnel ingress talks with | |||
| an egress with limited functionality ECN handling. For such corner | an egress with limited functionality ECN handling. Strictly, for | |||
| cases, the requirement to use compatibility mode in this | such corner cases, the requirement to use compatibility mode in this | |||
| specification updates RFC4301. | specification updates RFC4301. | |||
| The optional ECN Tunnel field in the IPsec security association | The optional ECN Tunnel field in the IPsec security association | |||
| database (SAD) and the optional ECN Tunnel Security Association | database (SAD) and the optional ECN Tunnel Security Association | |||
| Attribute defined in RFC3168 are no longer needed. The security | Attribute defined in RFC3168 are no longer needed. The security | |||
| association (SA) has no policy on ECN usage, because all RFC4301 | association (SA) has no policy on ECN usage, because all RFC4301 | |||
| tunnels now support ECN without any policy choice. | tunnels now support ECN without any policy choice. | |||
| RFC3168 defines a (required) limited functionality mode and an | RFC3168 defines a (required) limited functionality mode and an | |||
| (optional) full functionality mode for a tunnel, but RFC4301 doesn't | (optional) full functionality mode for a tunnel, but RFC4301 doesn't | |||
| need modes. In this specification only the ingress might need two | need modes. In this specification only the ingress might need two | |||
| modes, unlike the modes of RFC3168 that were properties of the pair | states: a normal state (required) and a compatibility state (required | |||
| of tunnel endpoints after negotiation. | in some scenarios, optional in others). The egress needs only full- | |||
| functionality state which handles ECN the same as either mode of | ||||
| RFC3168 or RFC4301. | ||||
| All these ECN processing rules update RFC2003 on IP in IP tunnelling. | Additional changes to the RFC Index (to be removed by the RFC Editor): | |||
| In the RFC index, RFC3168 should be identified as an update to | ||||
| RFC2003 and RFC4301 should be identified as an update to RFC3168. | ||||
| This specification updates RFC3168. It also suggests a minor | ||||
| optional warning and a corner-case change to RFC4301, but these don't | ||||
| really count as an update. | ||||
| 8. IANA Considerations | 8. IANA Considerations | |||
| This memo includes no request to IANA. | This memo includes no request to IANA. | |||
| 9. Security Considerations | 9. Security Considerations | |||
| Section 3.1 discusses the security constraints imposed on ECN tunnel | Section 3.1 discusses the security constraints imposed on ECN tunnel | |||
| processing. The Design Principles of Section 4 trade-off between | processing. The Design Principles of Section 4 trade-off between | |||
| security (covert channels) and congestion monitoring & control. In | security (covert channels) and congestion monitoring & control. In | |||
| fact, ensuring congestion markings are not lost is itself another | fact, ensuring congestion markings are not lost is itself another | |||
| aspect of security, because if we allowed congestion notification to | aspect of security, because if we allowed congestion notification to | |||
| be lost, any attempt to enforce a response to congestion would be | be lost, any attempt to enforce a response to congestion would be | |||
| much harder. | much harder. | |||
| We keep the behaviour defined in both RFC3168 and RFC4301 where, if | If alternate congestion notification semantics are defined for a | |||
| the inner and outer headers carry contradictory ECT values the inner | certain PHB (e.g. the pre-congestion notification architecture | |||
| header is preserved for onward forwarding. However, in writing this | [I-D.ietf-pcn-architecture]), the scope of the alternate semantics | |||
| document we noticed this behaviour would hide illegal suppression of | might typically be bounded by the limits of a Diffserv region or | |||
| congestion notification from the detection mechanism designed for | regions, as envisaged in [RFC4774]. The inner headers in tunnels | |||
| this attack. One reason two ECT codepoints were defined was to | crossing the boundary of such a Diffserv region but ending within the | |||
| enable the source to detect if a CE marking had been applied then | region can potentially leak the external congestion notification | |||
| subsequently removed. The source could detect this by weaving a | semantics into the region, or leak the internal semantics out of the | |||
| pseudo-random sequence of ECT(0) and ECT(1) values into a stream of | region. [RFC2983] discusses the need for Diffserv traffic | |||
| packets [RFC3540]. With the rules as they stand in RFC3168 and | conditioning to be applied at these tunnel endpoints as if they are | |||
| RFC4301, within a tunnel a CE marking could be added and subsequently | at the edge of the Diffserv region. Similar concerns apply to any | |||
| removed by a non-compliant node without detection, because the | processing or propagation of the ECN field at the edges of a Diffserv | |||
| evidence of such misbehaviour is removed by the decapsulator. | region with alternate ECN semantics. Such edge processing must also | |||
| be applied at the endpoints of tunnels with ends both inside and | ||||
| outside the domain. [I-D.ietf-pcn-architecture] gives specific | ||||
| advice on this for the PCN case, but other definitions of alternate | ||||
| semantics will need to discuss the specific security implications in | ||||
| their case. | ||||
| We could have specified that an outer header value of ECT should | With the rules as they stand in RFC3168 and RFC4301, a small part of | |||
| overwrite a contradictory ECT value in the inner header to close this | the protection of the ECN nonce [RFC3540] is compromised. One reason | |||
| loophole. But we chose not to for two reasons: i) we wanted to avoid | two ECT codepoints were defined was to enable the data source to | |||
| any changes to IPsec tunnelling behaviour; ii) allowing ECT values in | detect if a CE marking had been applied then subsequently removed. | |||
| the outer header to override the inner header would have increased | The source could detect this by weaving a pseudo-random sequence of | |||
| the bandwidth of the covert channel through the egress gateway from 1 | ECT(0) and ECT(1) values into a stream of packets, which is termed an | |||
| to 1.5 bit per datagram, potentially threatening to upset the | ECN nonce. By the decapsulation rules in RFC3168 and RFC4301, if the | |||
| consensus established in the security area that says that the | inner and outer headers carry contradictory ECT values only the inner | |||
| bandwidth of this covert channel can now be safely managed. | header is preserved for onward forwarding. So if a CE marking added | |||
| to the outer ECN field has been illegally (or accidentally) | ||||
| suppressed by a subsequent node in the tunnel, the decapsulator will | ||||
| revert the ECN field to its value before tampering, hiding all | ||||
| evidence of the crime from the onward feedback loop. To close this | ||||
| loophole, we could have specified that an outer header value of ECT | ||||
| should overwrite a contradictory ECT value in the inner header (for | ||||
| how, see the ideal decapsulation rules proposed in Appendix C). But | ||||
| currently we choose to keep the 'broken' behaviour defined in RFC3168 | ||||
| & RFC4301 for all the following reasons: | ||||
| 1. We wanted to avoid any changes to IPsec tunnelling behaviour; | ||||
| 2. Allowing ECT values in the outer header to override the inner | ||||
| header would have increased the bandwidth of the covert channel | ||||
| through the egress gateway from 1 to 1.5 bit per datagram, | ||||
| potentially threatening to upset the consensus established in the | ||||
| security area that says that the bandwidth of this covert channel | ||||
| can now be safely managed; | ||||
| 3. This loophole is only applicable in the corner case where the | ||||
| attacker is a network node downstream of a congested node in the | ||||
| same tunnel; | ||||
| 4. In tunnelling scenarios, the ECN nonce is already vulnerable to | ||||
| suppression by nodes downstream of a congested node in the same | ||||
| tunnel, if they can copy the ECT value in the inner header to the | ||||
| outer header (any node in the tunnel can do this if the inner | ||||
| header is not encrypted, and an IPsec tunnel egress can do it | ||||
| whether or not the tunnel is encrypted); | ||||
| 5. Although the 'broken' decapsulation behaviour removes evidence of | ||||
| congestion suppression from the onward feedback loop, the | ||||
| decapsulator itself can at least detect that congestion within | ||||
| the tunnel has been suppressed; | ||||
| 6. The ECN nonce [RFC3540] currently has experimental status and | ||||
| there has been no evidence that anyone has implemented it beyond | ||||
| the author's prototype. | ||||
| If a legacy security policy configures a legacy tunnel ingress to | ||||
| negotiate to turn off ECN processing, a compliant tunnel egress will | ||||
| agree to a request to turn off ECN processing but it will actually | ||||
| still copy CE markings from the outer to the forwarded header. | ||||
| Although the tunnel ingress 'I' in Figure 1 will set all ECN fields | ||||
| in outer headers to Not-ECT, 'M' could still toggle CE on and off to | ||||
| communicate covertly with 'B', because we have specified that 'E' | ||||
| only has one mode regardless of what mode it says it has negotiated. | ||||
| We could have specified that 'E' should have a limited functionality | ||||
| mode and check for such behaviour. But we decided not to add the | ||||
| extra complexity of two modes on a compliant tunnel egress merely to | ||||
| cater for a legacy security concern that is now considered | ||||
| manageable. | ||||
| 10. Conclusions | 10. Conclusions | |||
| This document updates the tunnelling treatment of RFC3168 ECN for all | This document updates the ingress tunnelling encapsulation of RFC3168 | |||
| IP in IP tunnels to bring it into line with the new behaviour in the | ECN for all IP in IP tunnels to bring it into line with the new | |||
| IPsec architecture of RFC4301. | behaviour in the IPsec architecture of RFC4301. | |||
| At the tunnel egress, header decapsulation for the default ECN | At a tunnel egress, header decapsulation for the default ECN marking | |||
| marking behaviour is broadly unchanged except that one exceptional | behaviour is broadly unchanged except that one exceptional case has | |||
| case has been catered for. At the ingress, for all forms of IP in IP | been catered for. At the ingress, for all forms of IP in IP tunnel, | |||
| tunnel, encapsulation has been brought into line with the new IPsec | encapsulation has been brought into line with the new IPsec rules in | |||
| rules in RFC4301 which copy rather than reset CE markings when | RFC4301 which copy rather than reset CE markings when creating outer | |||
| creating outer headers. Previously, upstream congestion information | headers. | |||
| was not revealed in the outer header, which limited the scope of some | ||||
| management monitoring techniques and prevented certain active queue | This change to encapsulation has been motivated by analysis from the | |||
| management algorithms from taking account of upstream congestion | three perspectives of security, control and management. They are | |||
| markings. The change ensures all IP in IP tunnels reflect the more | somewhat in tension as to whether a tunnel ingress should copy | |||
| relaxed attitude to revealing congestion information in the new IPsec | congestion markings into the outer header it creates or reset them. | |||
| architecture, which now deems that the threat from 2-bit covert | From the control perspective either copying or resetting works for | |||
| channels can be managed without disabling ECN. | existing arrangements, but copying has more potential for simplifying | |||
| control and resetting breaks at least one proposal already on the | ||||
| standards track. From the management and monitoring perspective | ||||
| copying is preferable. From the network security perspective (theft | ||||
| of service etc) copying is preferable. From the information security | ||||
| perspective resetting is preferable, but the IETF Security Area now | ||||
| considers copying acceptable given the bandwidth of a 2-bit covert | ||||
| channel can be managed. Therefore there are no points against | ||||
| copying and a number against resetting CE on ingress. | ||||
| The change ensures ECN processing in all IP in IP tunnels reflects | ||||
| this slightly more permissive attitude to revealing congestion | ||||
| information in the new IPsec architecture. Once all tunnelling of | ||||
| ECN works the same, ECN markings will have a defined meaning when | ||||
| measured at any point in a network. This new certainty will enable | ||||
| new uses of the ECN field that would otherwise be confounded by | ||||
| ambiguity. | ||||
| Also, this document defines more generic principles to guide the | Also, this document defines more generic principles to guide the | |||
| design of alternate forms of tunnel processing of congestion | design of alternate forms of tunnel processing of congestion | |||
| notification, if required for specific Diffserv PHBs (such as will be | notification, if required for specific Diffserv PHBs or for other | |||
| required for the PCN working group) or for other lower layer | lower layer encapsulating protocols that might support congestion | |||
| encapsulating protocols that might support congestion notification in | notification in the future. | |||
| the future (e.g. MPLS). | ||||
| 11. Acknowledgements | 11. Acknowledgements | |||
| Thanks to David Black, Bruce Davie, Toby Moncaster and Gabriele | Thanks to David Black for explaining a better way to think about | |||
| Corliano for their careful review comments. | function placement and to Louise Burness for a better way to think | |||
| about multilayer transports and networks, having read | ||||
| [Patterns_Arch]. Also thanks to Arnaud Jacquet for ideas behind the | ||||
| algorithms in Appendix B. Thanks to Bruce Davie, Toby Moncaster, | ||||
| Gorry Fairhurst, Sally Floyd, Alfred Hoenes and Gabriele Corliano for | ||||
| their thoughts and careful review comments. | ||||
| 12. Comments Solicited | 12. Comments Solicited | |||
| Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
| addressed to the IETF Transport Area working group mailing list | addressed to the IETF Transport Area working group mailing list | |||
| <tsvwg@ietf.org>, and/or to the authors. | <tsvwg@ietf.org>, and/or to the authors. | |||
| 13. References | 13. References | |||
| 13.1. Normative References | 13.1. Normative References | |||
| skipping to change at page 16, line 14 | skipping to change at page 23, line 14 | |||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", | of Explicit Congestion Notification (ECN) to IP", | |||
| RFC 3168, September 2001. | RFC 3168, September 2001. | |||
| [RFC4301] Kent, S. and K. Seo, "Security Architecture for the | [RFC4301] Kent, S. and K. Seo, "Security Architecture for the | |||
| Internet Protocol", RFC 4301, December 2005. | Internet Protocol", RFC 4301, December 2005. | |||
| 13.2. Informative References | 13.2. Informative References | |||
| [802.1au] "IEEE Standard for Local and Metropolitan Area Networks-- | [I-D.eardley-pcn-marking-behaviour] | |||
| Virtual Bridged Local Area Networks - Amendment 10: | Eardley, P., "Marking behaviour of PCN-nodes", | |||
| Congestion Notification", 2006, | draft-eardley-pcn-marking-behaviour-01 (work in progress), | |||
| <http://www.ieee802.org/1/pages/802.1au.html>. | June 2008. | |||
| (Work in Progress; Access Controlled link within page) | [I-D.ietf-pcn-architecture] | |||
| Eardley, P., "Pre-Congestion Notification Architecture", | ||||
| draft-ietf-pcn-architecture-03 (work in progress), | ||||
| February 2008. | ||||
| [BBnet] Sexton, M. and A. Reid, "Broadband Networking: {ATM}, | [I-D.ietf-pwe3-congestion-frmwk] | |||
| {SDH} and {SONET}", Artech House telecommunications | Bryant, S., Davie, B., Martini, L., and E. Rosen, | |||
| library ISBN: 0-89006-578-0, 1997. | "Pseudowire Congestion Control Framework", | |||
| draft-ietf-pwe3-congestion-frmwk-01 (work in progress), | ||||
| May 2008. | ||||
| [I-D.ietf-tsvwg-ecn-mpls] | [I-D.moncaster-pcn-3-state-encoding] | |||
| Davie, B., "Explicit Congestion Marking in MPLS", | Moncaster, T., Briscoe, B., and M. Menth, "A three state | |||
| draft-ietf-tsvwg-ecn-mpls-00 (work in progress), | extended PCN encoding scheme", | |||
| March 2007. | draft-moncaster-pcn-3-state-encoding-00 (work in | |||
| progress), June 2008. | ||||
| [I-D.rosen-pwe3-congestion] | [IEEE802.1au] | |||
| Rosen, E., "Pseudowire Congestion Control Framework", | IEEE, "IEEE Standard for Local and Metropolitan Area | |||
| draft-rosen-pwe3-congestion-04 (work in progress), | Networks--Virtual Bridged Local Area Networks - Amendment | |||
| October 2006. | 10: Congestion Notification", 2008, | |||
| <http://www.ieee802.org/1/pages/802.1au.html>. | ||||
| [PCN-arch] | (Work in Progress; Access Controlled link within page) | |||
| Eardley, P., Babiarz, J., Chan, K., Charny, A., Geib, R., | ||||
| Karagiannis, G., Menth, M., and T. Tsou, "Pre-Congestion | [ITU-T.I.371] | |||
| Notification Architecture", | ITU-T, "Traffic Control and Congestion Control in | |||
| draft-eardley-pcn-architecture-00 (work in progress), | {B-ISDN}", ITU-T Rec. I.371 (03/04), March 2004. | |||
| June 2007. | ||||
| [PCNcharter] | [PCNcharter] | |||
| IETF, "Congestion and Pre-Congestion Notification (pcn)", | IETF, "Congestion and Pre-Congestion Notification (pcn)", | |||
| IETF w-g charter , Feb 2007, | IETF w-g charter , Feb 2007, | |||
| <http://www.ietf.org/html.charters/pcn-charter.html>. | <http://www.ietf.org/html.charters/pcn-charter.html>. | |||
| [Patterns_Arch] | ||||
| Day, J., "Patterns in Network Architecture: A Return to | ||||
| Fundamentals", Pub: Prentice Hall ISBN-13: 9780132252423, | ||||
| Jan 2008. | ||||
| [RFC1254] Mankin, A. and K. Ramakrishnan, "Gateway Congestion | [RFC1254] Mankin, A. and K. Ramakrishnan, "Gateway Congestion | |||
| Control Survey", RFC 1254, August 1991. | Control Survey", RFC 1254, August 1991. | |||
| [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic | [RFC1701] Hanks, S., Li, T., Farinacci, D., and P. Traina, "Generic | |||
| Routing Encapsulation (GRE)", RFC 1701, October 1994. | Routing Encapsulation (GRE)", RFC 1701, October 1994. | |||
| [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. | [RFC2205] Braden, B., Zhang, L., Berson, S., Herzog, S., and S. | |||
| Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 | Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 | |||
| Functional Specification", RFC 2205, September 1997. | Functional Specification", RFC 2205, September 1997. | |||
| [RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, | [RFC2637] Hamzeh, K., Pall, G., Verthein, W., Taarud, J., Little, | |||
| W., and G. Zorn, "Point-to-Point Tunneling Protocol", | W., and G. Zorn, "Point-to-Point Tunneling Protocol", | |||
| RFC 2637, July 1999. | RFC 2637, July 1999. | |||
| [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, | [RFC2661] Townsley, W., Valencia, A., Rubens, A., Pall, G., Zorn, | |||
| G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", | G., and B. Palter, "Layer Two Tunneling Protocol "L2TP"", | |||
| RFC 2661, August 1999. | RFC 2661, August 1999. | |||
| [RFC2983] Black, D., "Differentiated Services and Tunnels", | ||||
| RFC 2983, October 2000. | ||||
| [RFC3426] Floyd, S., "General Architectural and Policy | [RFC3426] Floyd, S., "General Architectural and Policy | |||
| Considerations", RFC 3426, November 2002. | Considerations", RFC 3426, November 2002. | |||
| [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit | |||
| Congestion Notification (ECN) Signaling with Nonces", | Congestion Notification (ECN) Signaling with Nonces", | |||
| RFC 3540, June 2003. | RFC 3540, June 2003. | |||
| [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", | [RFC4306] Kaufman, C., "Internet Key Exchange (IKEv2) Protocol", | |||
| RFC 4306, December 2005. | RFC 4306, December 2005. | |||
| [RFC4423] Moskowitz, R. and P. Nikander, "Host Identity Protocol | [RFC4423] Moskowitz, R. and P. Nikander, "Host Identity Protocol | |||
| (HIP) Architecture", RFC 4423, May 2006. | (HIP) Architecture", RFC 4423, May 2006. | |||
| [RFC4774] Floyd, S., "Specifying Alternate Semantics for the | ||||
| Explicit Congestion Notification (ECN) Field", BCP 124, | ||||
| RFC 4774, November 2006. | ||||
| [RFC5129] Davie, B., Briscoe, B., and J. Tay, "Explicit Congestion | ||||
| Marking in MPLS", RFC 5129, January 2008. | ||||
| [Shayman] "Using ECN to Signal Congestion Within an MPLS Domain", | [Shayman] "Using ECN to Signal Congestion Within an MPLS Domain", | |||
| 2000, <http://www.ee.umd.edu/~shayman/papers.d/ | 2000, <http://www.ee.umd.edu/~shayman/papers.d/ | |||
| draft-shayman-mpls-ecn-00.txt>. | draft-shayman-mpls-ecn-00.txt>. | |||
| (Expired) | (Expired) | |||
| Appendix A. In-path Load Regulation | Appendix A. Why resetting CE on encapsulation harms PCN | |||
| Regarding encapsulation, the section of the PCN architecture | ||||
| [I-D.ietf-pcn-architecture] on tunnelling says that header copying | ||||
| (RFC4301) allows PCN to work correctly. However, resetting CE | ||||
| markings confuses PCN marking. | ||||
| The specific issue here concerns PCN excess rate marking | ||||
| [I-D.eardley-pcn-marking-behaviour], i.e. the bulk marking of traffic | ||||
| that exceeds a configured threshold rate. One of the goals of excess | ||||
| rate marking is to enable the speedy removal of excess admission | ||||
| controlled traffic following re-routes caused by link failures or | ||||
| other disasters. This maintains a share of the capacity for | ||||
| competing admission controlled traffic and for traffic in lower | ||||
| priority classes. After failures, traffic re-routed onto remaining | ||||
| links can often stress multiple links along a path. Therefore, | ||||
| traffic can arrive at a link under stress with some proportion | ||||
| already marked for removal by a previous link. By design, marked | ||||
| traffic will be removed by the overall system in subsequent round | ||||
| trips. So when the excess rate marking algorithm decides how much | ||||
| traffic to mark for removal, it doesn't include traffic already | ||||
| marked for removal by another node upstream (the `Excess traffic | ||||
| meter function' of [I-D.eardley-pcn-marking-behaviour]). | ||||
| However, if an RFC3168 tunnel ingress intervenes, it resets the ECN | ||||
| field in all the outer headers, hiding all the evidence of problems | ||||
| upstream. Thus, although excess rate marking works fine with RFC4301 | ||||
| IPsec tunnels, with RFC3168 tunnels it typically removes large | ||||
| volumes of traffic that it didn't need to remove at all. | ||||
| Appendix B. Contribution to Congestion across a Tunnel | ||||
| This specification mandates that a tunnel ingress determines the ECN | ||||
| field of each new outer tunnel header by copying the arriving header. | ||||
| If instead the outer ECN field were reset at a tunnel ingress (as it | ||||
| was for the full functionality mode of RFC3168), it would be possible | ||||
| for the tunnel egress to measure: | ||||
| o congestion marking before the tunnel ingress (fraction of inner | ||||
| header markings, p_i); | ||||
| o congestion marking across the tunnel (fraction of outer header | ||||
| markings, p_t); | ||||
| o congestion marking after the tunnel egress (fraction of departing | ||||
| header markings, p_o). | ||||
| Although the newly mandated copying behaviour at ingress gains the | ||||
| advantages described in the body of this specification, this one | ||||
| advantage of the resetting behaviour of RFC3168 seems to have been | ||||
| lost: on first impressions, it seems that the egress can no longer | ||||
| accurately measure congestion contributed along the tunnel (p_t). | ||||
| The egress could _estimate _the contribution along the tunnel by | ||||
| measure which packets carry only a mark in the outer header (not the | ||||
| inner). But this is not precisely the same as the congestion | ||||
| contributed along the tunnel; tunnel nodes may have tried to mark | ||||
| some packets that already had a marking in both the inner and outer | ||||
| header. Measuring only additional outer markings will miss these. | ||||
| Nonetheless, with the newly proposed scheme, a tunnel egress can | ||||
| derive a precise estimate of marking introduced across a tunnel (p_t) | ||||
| as follows. | ||||
| The combined fraction of markings at the tunnel egress will be p_o = | ||||
| 1 - (1 - p_i)(1 - p_t). Explanation: this is (1 - the probability a | ||||
| departing packet is not marked), which is (1 - (prob not marked | ||||
| before tunnel)(prob not marked along tunnel)). Therefore, | ||||
| rearranging, the egress can infer the fraction of marks introduced | ||||
| across the tunnel as p_t = (p_o - p_i)/(1 - p_i). If arriving | ||||
| congestion is low (p_i <<1), then the approximation p_t ~ (p_o - p_i) | ||||
| should be good enough. This is the estimate we advised originally; | ||||
| i.e. measuring only the extra markings in the outer header that are | ||||
| not present in the inner header. If a better approximation is needed | ||||
| p_t ~ (p_o - p_i)(1 + p_i), which removes the division, but still | ||||
| assumes p_i<<1. | ||||
| Using any of these formulae (including the precise one), it would be | ||||
| possible for a tunnel egress to calculate a moving average of the | ||||
| fraction of packets being marked by tunnel nodes, including those | ||||
| already marked in the inner header. Alternatively, it should even be | ||||
| possible for a tunnel egress to reverse engineer which packets would | ||||
| have been marked across the tunnel if CE was reset on ingress even if | ||||
| CE was actually copied on ingress.[[anchor3: Note from Bob: I've | ||||
| worked out an algorithm so the tunnel egress can reverse engineer | ||||
| marking as if CE was reset at the ingress even though CE was copied | ||||
| at the ingress. It typically consumes 2 cycles / pkt, occasionally 4 | ||||
| and very occasionally 8. {ToDo: On testing an implementation just now | ||||
| it still has a wrinkle in it, but with a little more development I | ||||
| believe it would work well. I'll write it into the next revision if | ||||
| I get it working.}]] | ||||
| Appendix C. Ideal Decapsulation Rules | ||||
| Compliance with this appendix is NOT REQUIRED for compliance with the | ||||
| present specification. | ||||
| If the default ECN encapsulation behaviour does not offer suitable | ||||
| trade offs, procedures exist for associating a new behaviour with a | ||||
| new Diffserv PHB. However, it is unrealistic to expect vendors of | ||||
| all IPSec and all IP in IP tunnel endpoints to cater for the | ||||
| exceptional behaviour of PHB XXX. If all tunnels did require XXX- | ||||
| specific behaviour, the resulting patchy and error-prone deployment | ||||
| would probably cause XXX to suffer byzantine feature interactions | ||||
| with poorly implemented tunnels. The default rules for tunnel | ||||
| endpoints to handle both the Diffserv field and the ECN field should | ||||
| 'just work' when handling packets with an XXX Diffserv codepoint. | ||||
| Given this specification requests a standards action to update the | ||||
| RFC3168 encapsulation behaviour, this appendix explores a further | ||||
| change to decapsulation that we ought to specify at the same time. | ||||
| If instead this further change is added later, it will add another | ||||
| set of backward compatibility combinations to the already complicated | ||||
| change history of ECN tunnelling. | ||||
| Multi-level congestion notification is currently on the IETF's | ||||
| standards track agenda in the Congestion and Pre-Congestion | ||||
| Notification (PCN) working group. The PCN working group requires | ||||
| three congestion states (not marked and two levels of congestion | ||||
| marking) [I-D.ietf-pcn-architecture]. The aim is for the first level | ||||
| of marking to stop admitting new traffic and the second level to | ||||
| terminate sufficient existing flows to bring a network back to its | ||||
| operating point after a serious failure. | ||||
| Although the ECN field gives sufficient codepoints for these three | ||||
| states, the PCN working group cannot use them in case any tunnel | ||||
| decapsulations occur within a PCN region. If a node in a tunnel sets | ||||
| the ECN field to ECT(0) or ECT(1), this change will be discarded by a | ||||
| tunnel egress compliant with RFC4301 and RFC3168. This can be seen | ||||
| in Table 1, where the ECT values in the outer header are ignored | ||||
| unless the inner header is the same. Effectively the ECT(0) and | ||||
| ECT(1) codepoints have to be treated as just one codepoint when they | ||||
| could otherwise have been used for their intended purpose of | ||||
| congestion notification. Instead, the PCN w-g has had to propose | ||||
| using extra Diffserv codepoint(s) to encode the extra states | ||||
| [I-D.moncaster-pcn-3-state-encoding], using up the rapidly exhausting | ||||
| DSCP space while leaving ECN codepoints unused. | ||||
| Although this is currently most pressing for the PCN working group, | ||||
| the issue is more general. Under Security Considerations (Section 9) | ||||
| it has already been explained that a data sender cannot use the | ||||
| experimental ECN nonce [RFC3540] to detect suppression of congestion | ||||
| notification along a tunnel. | ||||
| More generally, the currently standardised tunnel decapsulation | ||||
| behaviour unnecessarily wastes a quarter of two bits (i.e. half a | ||||
| bit) in the IP (v4 & v6) header. As explained in Section 3.1, the | ||||
| original reason for not copying down outer ECT codepoints for onward | ||||
| forwarding was to limit the covert channel across a decapsulator to 1 | ||||
| bit per packet. However, now that the IETF Security Area has deemed | ||||
| that a 2-bit covert channel through an encapsulator is a manageable | ||||
| risk, the same should be true for a decapsulator. | ||||
| Table 2 proposes a more ideal layered decapsulation behaviour. Note: | ||||
| this table is only to support discussion. It is not currently | ||||
| proposed for standards action. The only difference from Table 1 | ||||
| (that is proposed for standards action), is the swapping of the cells | ||||
| highlighted as *ECT(X)*. | ||||
| +--Incoming Outer Header--- | ||||
| +---------------------+---------+-----------+-----------+-----------+ | ||||
| | Incoming Inner | Not-ECT | ECT(0) | ECT(1) | CE | | ||||
| | Header | | | | | | ||||
| +---------------------+---------+-----------+-----------+-----------+ | ||||
| | Not-ECT | Not-ECT | drop(!!!) | drop(!!!) | drop(!!!) | | ||||
| | ECT(0) | ECT(0) | ECT(0) | *ECT(1)* | CE | | ||||
| | ECT(1) | ECT(1) | *ECT(0)* | ECT(1) | CE | | ||||
| | CE | CE | CE | CE (!!!) | CE | | ||||
| +---------------------+---------+-----------+-----------+-----------+ | ||||
| +-----Outgoing Header------ | ||||
| Table 2: Ideal IP in IP Decapsulation (currently NOT REQUIRED) | ||||
| Note that, if this ideal proposal were taken up, extra backwards | ||||
| compatibility issues would have to be resolved. | ||||
| Appendix D. Non-Dependence of Tunnelling on In-path Load Regulation | ||||
| We have said that at any point in a network, the Congestion Baseline | ||||
| (where congestion notification starts from zero) should be the | ||||
| previous upstream Load Regulator. We have also said that the ingress | ||||
| of an IP in IP tunnel must copy congestion indications to the | ||||
| encapsulating outer headers it creates. If the Load Regulator is in- | ||||
| path rather than at the source, and also a tunnel ingress, these two | ||||
| requirements seem to be contradictory. A tunnel ingress must not | ||||
| reset incoming congestion, but a Load Regulator must be the | ||||
| Congestion Baseline, implying it needs to reset incoming congestion. | ||||
| In fact, the two requirements are not contradictory, because a Load | ||||
| Regulator and a tunnel ingress are functions within a node that occur | ||||
| in sequence on a stream of packets, not at the same point. Figure 3 | ||||
| is borrowed from [RFC2983] (which was making a similar point about | ||||
| the location of Diffserv traffic conditioning relative to the | ||||
| encapsulation function of a tunnel). An in-path Load Regulator can | ||||
| act on packets either at [1 - Before] encapsulation or at [2 - Outer] | ||||
| after encapsulation. Load Regulation does not ever need to be | ||||
| integrated with the [Encapsulate] function (but it can be for | ||||
| efficiency). Therefore we can still maintain that the [Encapsulate] | ||||
| function always copies CE into the outer header. | ||||
| >>-----[1 - Before]--------[Encapsulate]----[3 - Inner]------------>> | ||||
| \ | ||||
| \ | ||||
| +--------[2 - Outer]--------->> | ||||
| Figure 3: Placement of In-Path Load Regulator Relative to Tunnel | ||||
| Ingress | ||||
| Then separately, if there is a Load Regulator at location [2 - | ||||
| Outer], it might reset CE to ECT(0), say. Then the Congestion | ||||
| Baseline for the lower layer (outer) will be [2 - Outer], while the | ||||
| Congestion Baseline of the inner layer will be unchanged. But how | ||||
| encapsulation works has nothing to do with whether a Load Regulator | ||||
| is present or where it is. | ||||
| If on the other hand a Load Regulator resets CE at [1 - Before], the | ||||
| Congestion Baseline of both the inner and outer headers will be [1 - | ||||
| Before]. But again, encapsulation is independent of load regulation. | ||||
| D.1. Dependence of In-Path Load Regulation on Tunnelling | ||||
| Although encapsulation doesn't need to depend on in-path load | ||||
| regulation, the reverse is not true. The placement of an in-path | ||||
| Load Regulator must be carefully considered relative to | ||||
| encapsulation. Some examples are given in the following for | ||||
| guidance. | ||||
| In the traditional Internet architecture one tends to think of the | In the traditional Internet architecture one tends to think of the | |||
| source host as the Load Regulator for a path. It is generally not | source host as the Load Regulator for a path. It is generally not | |||
| desirable or practical for a node part way along the path to regulate | desirable or practical for a node part way along the path to regulate | |||
| the load. However, various reasonable proposals for in-path load | the load. However, various reasonable proposals for in-path load | |||
| regulation have been made from time to time (e.g. fair queuing, | regulation have been made from time to time (e.g. fair queuing, | |||
| traffic engineering). Also the IETF has recently chartered a working | traffic engineering, flow admission control). The IETF has recently | |||
| group to standardise admission control across a part of a path using | chartered a working group to standardise admission control across a | |||
| pre-congestion notification (PCN) [PCNcharter], which involves in- | part of a path using pre-congestion notification (PCN) [PCNcharter]. | |||
| path load regulation. This is of particular relevance here because | This is of particular relevance here because it involves congestion | |||
| it involves congestion notification with an in-path Load Regulator | notification with an in-path Load Regulator, it can involve | |||
| and it can involve tunnelling. | tunnelling and it certainly involves encapsulation more generally. | |||
| We will use the more complex scenario in Figure 3 to tease out all | We will use the more complex scenario in Figure 4 to tease out all | |||
| the issues that arise when combining congestion notification and | the issues that arise when combining congestion notification and | |||
| tunnelling with various possible in-path load regulation schemes. In | tunnelling with various possible in-path load regulation schemes. In | |||
| this case 'I1' and 'E2' break up the path into three separate | this case 'I1' and 'E2' break up the path into three separate | |||
| congestion control loops. The feedback for these loops is shown | congestion control loops. The feedback for these loops is shown | |||
| going right to left across the top of the figure. The 'V's are arrow | going right to left across the top of the figure. The 'V's are arrow | |||
| heads representing the direction of feedback, not letters. But there | heads representing the direction of feedback, not letters. But there | |||
| are also two tunnels within the middle control loop: 'I1' to 'E1' and | are also two tunnels within the middle control loop: 'I1' to 'E1' and | |||
| 'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS | 'I2' to 'E2'. The two tunnels might be VPNs, perhaps over two MPLS | |||
| core networks. M is a congestion monitoring point, perhaps between | core networks. M is a congestion monitoring point, perhaps between | |||
| two border routers where the same tunnel continues unbroken across | two border routers where the same tunnel continues unbroken across | |||
| the border. | the border. | |||
| ______ _______________________________________ _____ | ______ _______________________________________ _____ | |||
| / \ / \ / \ | / \ / \ / \ | |||
| V \ V M \ V \ | V \ V M \ V \ | |||
| A--->R--->I1===========>E1----->I2=========>==========>E2------->B | A--->R--->I1===========>E1----->I2=========>==========>E2------->B | |||
| Figure 3: complex Tunnel Scenario | Figure 4: complex Tunnel Scenario | |||
| The question is, should the congestion markings in the outer exposed | The question is, should the congestion markings in the outer exposed | |||
| headers of a tunnel represent congestion only since the tunnel | headers of a tunnel represent congestion only since the tunnel | |||
| ingress or over the whole upstream path from the source of the inner | ingress or over the whole upstream path from the source of the inner | |||
| header (whatever that may mean)? Or put another way, should 'I1' and | header (whatever that may mean)? Or put another way, should 'I1' and | |||
| 'I2' copy or reset CE markings? | 'I2' copy or reset CE markings? | |||
| The answer is that the baseline of congestion marking should be the | Based on the design principles in Section 4, the answer is that the | |||
| nearest upstream interface designed to regulate traffic load--the | Congestion Baseline should be the nearest upstream interface designed | |||
| Load Regulator. In Figure 3 'A', 'I1' or 'E2' are all Load | to regulate traffic load--the Load Regulator. In Figure 4 'A', 'I1' | |||
| Regulators. We have shown the feedback loops returning to each of | or 'E2' are all Load Regulators. We have shown the feedback loops | |||
| these nodes so that they can regulate the load causing the congestion | returning to each of these nodes so that they can regulate the load | |||
| notification. So the baseline for congestion markings exposed to M | causing the congestion notification. So the Congestion Baseline | |||
| should be 'I1' (the Load Regulator), not 'I2'. That is, 'I2' SHOULD | exposed to M should be 'I1' (the Load Regulator), not 'I2'. | |||
| copy any CE marking into the outer header it creates, while 'I1' is | Therefore I1 should reset any arriving CE markings. In this case, | |||
| an exception because it is an in-path load regulator, so it should | 'I1' knows the tunnel to 'E1' is unrelated to its load regulation | |||
| reset the ECN field in the outer header it creates. | function. So the load regulation function within 'I1' should be | |||
| placed at [1 - Before] tunnel encapsulation within 'I1' (using the | ||||
| terminology of Figure 3). Then the Congestion Baseline all across | ||||
| the networks from 'I1' to 'E2' in both inner and outer headers will | ||||
| be 'I1'. | ||||
| The following further examples illustrate how this answer might be | The following further examples illustrate how this answer might be | |||
| applied: | applied: | |||
| o Preemption marking is currently defined for PCN [PCN-arch] so that | o We argued in Appendix A that resetting CE on encapsulation could | |||
| the rate of unmarked packets at the end of a path of multiple | harm PCN excess rate marking, which marks excess traffic for | |||
| bottlenecks determines the maximum sustainable aggregate bit rate | removal in subsequent round trips. This marking relies on not | |||
| over that path. To produce the correct marking by the end, each | marking packets if another node upstream has already marked them | |||
| congested node must only consider packets to be eligible for | for removal. If there were a tunnel ingress between the two which | |||
| marking if they have not already been marked by any previous | reset CE markings, it would confuse the downstream node into | |||
| bottleneck along a path that may span multiple tunnels (including | marking far too much traffic for removal. So why do we say that | |||
| MPLS encapsulations etc.). This scheme only results in the | 'I1' should reset CE, while a tunnel ingress shouldn't? The | |||
| correct marking rate if the markings accumulated so far along the | answer is that it is the Load Regulator function at 'I1' that is | |||
| path are copied into the outer exposed header of each tunnel or | resetting CE, not the tunnel encapsulator. The Load Regulator | |||
| encapsulation. Consider that 'I1' and 'E2' in the complex | needs to set itself as the Congestion Baseline, so the feedback it | |||
| scenario of Figure 3 are edge gateways of a PCN region. Admission | gets will only be about congestion on links it can relieve itself | |||
| control based on PCN measurements is a form of load regulation, so | by regulating the load into them. When it resets CE markings, it | |||
| 'I1' regulates the load on the PCN region. Therefore 'I1' should | knows that something else upstream will have dealt with the | |||
| be the baseline of congestion marking for _both_ tunnels within | congestion notifications it removes, given it is part of an end- | |||
| the scope of its feedback loop. Therefore 'I2' should follow the | to-end admission control signalling loop. It therefore knows that | |||
| normal rules and copy congestion marking into the outer tunnel | previous hops will be covered by other Load Regulators. | |||
| header, while 'I1' is an exception because it is also a load | Meanwhile, the tunnel ingresses at both 'I1' and 'I2' should | |||
| regulator, so it should reset CE markings in the outer header. | follow the new rule for any tunnel ingress and copy congestion | |||
| marking into the outer tunnel header. The ingress at 'I1' will | ||||
| happen to copy headers that have already been reset just | ||||
| beforehand. But it doesn't need to know that. | ||||
| o [Shayman] suggested feedback of ECN accumulated across an MPLS | o [Shayman] suggested feedback of ECN accumulated across an MPLS | |||
| domain could cause the ingress to trigger re-routing to mitigate | domain could cause the ingress to trigger re-routing to mitigate | |||
| congestion. This case is more like the simple scenario of | congestion. This case is more like the simple scenario of | |||
| Figure 2, with a feedback loop across the MPLS domain ('E' back to | Figure 2, with a feedback loop across the MPLS domain ('E' back to | |||
| 'I'). The baseline for congestion exposed in outer headers in | 'I'). I is a Load Regulator because re-routing around congestion | |||
| this case will be the tunnel ingress, which should therefore reset | is a load regulation function. But in this case 'I' should only | |||
| the ECN field in the outer headers it creates. But the reason it | reset itself as the Congestion Baseline in outer headers, as it is | |||
| should act as the baseline is because it is an in-path load | not handling congestion outside its domain, so it must preserve | |||
| regulator (re-routing around congestion is a load regulation | the end-to-end congestion feedback loop for something else to | |||
| function), not just because it is a tunnel ingress. | handle (probably the data source). Therefore the Load Regulator | |||
| within 'I' should be placed at [2 - Outer] to reset CE markings | ||||
| just after the tunnel ingress has copied them from arriving | ||||
| headers. Again, the tunnel encapsulation function at 'I' simply | ||||
| copies incoming headers, unaware that the load regulator will | ||||
| subsequently reset its outer headers. | ||||
| o The PWE3 working group of the IETF is considering the problem of | o The PWE3 working group of the IETF is considering the problem of | |||
| how and whether an aggregate private wire emulation should respond | how and whether an aggregate edge-to-edge pseudo-wire emulation | |||
| to congestion [I-D.rosen-pwe3-congestion]. Although the study is | should respond to congestion [I-D.ietf-pwe3-congestion-frmwk]. | |||
| still at the requirements stage, some (controversial) solution | Although the study is still at the requirements stage, some | |||
| proposals include in-path load regulation at the ingress to the | (controversial) solution proposals include in-path load regulation | |||
| tunnel that could lead to tunnel arrangements with similar | at the ingress to the tunnel that could lead to tunnel | |||
| complexity to that of Figure 3. | arrangements with similar complexity to that of Figure 4. | |||
| These are not contrived scenarios--they could be a lot worse. For | These are not contrived scenarios--they could be a lot worse. For | |||
| instance, a host may create a tunnel for IPsec which is placed inside | instance, a host may create a tunnel for IPsec which is placed inside | |||
| a tunnel for Mobile IP over a remote part of its path. And around | a tunnel for Mobile IP over a remote part of its path. And around | |||
| this all we may have MPLS labels being pushed and popped as packets | this all we may have MPLS labels being pushed and popped as packets | |||
| pass across different core networks. Similarly, it is possible that | pass across different core networks. Similarly, it is possible that | |||
| subnets could be built from link technology (e.g. ethernet switches) | subnets could be built from link technology (e.g. future Ethernet | |||
| so that link headers being added and removed could involve congestion | switches) so that link headers being added and removed could involve | |||
| notification in future link headers with all the same issues as with | congestion notification in future Ethernet link headers with all the | |||
| IP in IP tunnels. | same issues as with IP in IP tunnels. | |||
| The reason we introduced the concept of a Load Regulator was to allow | One reason we introduced the concept of a Load Regulator was to allow | |||
| for in-path load regulation. In the traditional Internet | for in-path load regulation. In the traditional Internet | |||
| architecture one tends to think of a host and a Load Regulator as | architecture one tends to think of a host and a Load Regulator as | |||
| synonymous, but when considering tunnelling, even the definition of a | synonymous, but when considering tunnelling, even the definition of a | |||
| host is too fuzzy, whereas a Load Regulator is a clearly defined | host is too fuzzy, whereas a Load Regulator is a clearly defined | |||
| function. Similarly, the concept of innermost header is too fuzzy to | function. Similarly, the concept of innermost header is too fuzzy to | |||
| be able to (wrongly) say that the source address of the innermost | be able to (wrongly) say that the source address of the innermost | |||
| header should be the baseline. Which is the innermost header when | header should be the Congestion Baseline. Which is the innermost | |||
| multiple encapsulations may be in use? Where do we stop? If we say | header when multiple encapsulations may be in use? Where do we stop? | |||
| the original source in the above IPsec-Mobile IP case is the host, | If we say the original source in the above IPsec-Mobile IP case is | |||
| how do we know it isn't tunnelling an encrypted packet stream on | the host, how do we know it isn't tunnelling an encrypted packet | |||
| behalf of another host in a p2p network? | stream on behalf of another host in a p2p network? | |||
| The reason there has been so much confusion over the question of | We have become used to thinking that only hosts regulate load. The | |||
| whether a tunnel ingress should copy or reset CE markings is that we | end to end design principle advises that this is a good idea | |||
| have become used to thinking that only hosts regulate load. The end | [RFC3426], but it also advises that it is solely a guiding principle | |||
| to end design principle advises that this is a good idea [RFC3426], | intended to make the designer think very carefully before breaking | |||
| but it also advises that it is only a guiding principle intended to | it. We do have proposals where load regulation functions sit within | |||
| make the designer think very carefully before breaking it. We do | a network path for good, if sometimes controversial, reasons, e.g. | |||
| have proposals where load regulation functions sit within a network | PCN edge admission control gateways [I-D.ietf-pcn-architecture] or | |||
| path for good, if sometimes controversial, reasons, e.g. PCN edge | traffic engineering functions at domain borders to re-route around | |||
| admission control gateways [PCN-arch] or traffic engineering | congestion [Shayman]. Whether or not we want in-path load | |||
| functions at domain borders to re-route around congestion [Shayman]. | regulation, we have to work round the fact that it will not go away. | |||
| Author's Address | Author's Address | |||
| Bob Briscoe | Bob Briscoe | |||
| BT | BT | |||
| B54/77, Adastral Park | B54/77, Adastral Park | |||
| Martlesham Heath | Martlesham Heath | |||
| Ipswich IP5 3RE | Ipswich IP5 3RE | |||
| UK | UK | |||
| Phone: +44 1473 645196 | Phone: +44 1473 645196 | |||
| Email: bob.briscoe@bt.com | Email: bob.briscoe@bt.com | |||
| URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/ | URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/ | |||
| Full Copyright Statement | Full Copyright Statement | |||
| Copyright (C) The IETF Trust (2007). | Copyright (C) The IETF Trust (2008). | |||
| This document is subject to the rights, licenses and restrictions | This document is subject to the rights, licenses and restrictions | |||
| contained in BCP 78, and except as set forth therein, the authors | contained in BCP 78, and except as set forth therein, the authors | |||
| retain all their rights. | retain all their rights. | |||
| This document and the information contained herein are provided on an | This document and the information contained herein are provided on an | |||
| "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS | |||
| OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND | OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND | |||
| THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS | THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS | |||
| OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF | OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF | |||
| skipping to change at page 21, line 45 | skipping to change at page 34, line 45 | |||
| such proprietary rights by implementers or users of this | such proprietary rights by implementers or users of this | |||
| specification can be obtained from the IETF on-line IPR repository at | specification can be obtained from the IETF on-line IPR repository at | |||
| http://www.ietf.org/ipr. | http://www.ietf.org/ipr. | |||
| The IETF invites any interested party to bring to its attention any | The IETF invites any interested party to bring to its attention any | |||
| copyrights, patents or patent applications, or other proprietary | copyrights, patents or patent applications, or other proprietary | |||
| rights that may cover technology that may be required to implement | rights that may cover technology that may be required to implement | |||
| this standard. Please address the information to the IETF at | this standard. Please address the information to the IETF at | |||
| ietf-ipr@ietf.org. | ietf-ipr@ietf.org. | |||
| Acknowledgments | Acknowledgment | |||
| Funding for the RFC Editor function is provided by the IETF | This document was produced using xml2rfc v1.33 (of | |||
| Administrative Support Activity (IASA). This document was produced | http://xml.resource.org/) from a source in RFC-2629 XML format. | |||
| using xml2rfc v1.32 (of http://xml.resource.org/) from a source in | ||||
| RFC-2629 XML format. | ||||
| End of changes. 90 change blocks. | ||||
| 471 lines changed or deleted | 1061 lines changed or added | |||
This html diff was produced by rfcdiff 1.35. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||