Congestion and Pre Congestion T. Moncaster Internet-Draft BT Intended status: Experimental B. Briscoe Expires: December 25, 2008 BT & UCL M. Menth University of Wuerzburg June 23, 2008 A three state extended PCN encoding scheme draft-moncaster-pcn-3-state-encoding-00 Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on December 25, 2008. Copyright Notice Copyright (C) The IETF Trust (2008). Abstract Pre-congestion notification (PCN) is a mechanism designed to protect the Quality of Service of inelastic flows. It does this by marking packets when traffic load on a link is approaching or has exceeded a threshold below the physical link rate. This baseline encoding specified how two encoding states could be encoded into the IP Moncaster, et al. Expires December 25, 2008 [Page 1] Internet-Draft 3 State PCN Encoding June 2008 header. This document specified an extension to the baseline encoding that enables three encoding states to be carried in the IP header as well as enabling limited support for end-to-end ECN. Status This memo is posted as an Internet-Draft with an intent to eventually be published as an experimental RFC. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 3 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 4. The Requirement for Three PCN Encoding States . . . . . . . . 4 5. Adding Limited End-to-End ECN Support to PCN . . . . . . . . . 4 6. Encoding Three PCN States in IP . . . . . . . . . . . . . . . 5 6.1. Forwarding Traffic Out of the PCN-domain . . . . . . . . . 6 7. PCN domain support for the PCN extension encoding . . . . . . 6 7.1. End-to-End transport behaviour compliant with the PCN extension encoding . . . . . . . . . . . . . . . . . . . . 7 7.2. PCN-boundary-node behaviour compliant with the PCN extension encoding . . . . . . . . . . . . . . . . . . . . 7 7.2.1. Behaviour for packets belonging to a PCN-flow . . . . 7 7.2.2. Behaviour for packets belonging to a PCN-enabled-ECN-flow . . . . . . . . . . . . . . . . . 8 7.3. PCN-interior-node behaviour compliant with the PCN extension encoding . . . . . . . . . . . . . . . . . . . . 8 7.4. Behaviour of any PCN node compliant with the PCN extension encoding . . . . . . . . . . . . . . . . . . . . 8 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 8 9. Security Considerations . . . . . . . . . . . . . . . . . . . 8 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 9 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 9 12. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 9 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 9 13.1. Normative References . . . . . . . . . . . . . . . . . . . 9 13.2. Informative References . . . . . . . . . . . . . . . . . . 10 Appendix A. Tunnelling Constraints . . . . . . . . . . . . . . . 11 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 Intellectual Property and Copyright Statements . . . . . . . . . . 13 Moncaster, et al. Expires December 25, 2008 [Page 2] Internet-Draft 3 State PCN Encoding June 2008 1. Introduction Pre-congestion notification provides information to support admission control and flow termination at the boundary nodes of a Diffserv region in order to protect the quality of service (QoS) of inelastic flows [PCN-arch]. This is achieved by marking packets on interior nodes according to some metering function implemented at each node. Excess traffic marking marks PCN packets that exceed a certain reference rate on a link while threshold marking marks all PCN packets on a link when the PCN traffic rate exceeds a higher reference rate. These marks are monitored by the egress nodes of the PCN domain. The baseline encoding described in [PCN-base-encode] provides for deployment scenarios that only require two PCN encoding states. This document describes an experimental extension to the base-encoding in the IP header that adds two capabilities: o the encoding of a third PCN encoding state in the IP header o preservation of the end-to-end semantics of the ECN field even though PCN uses the field within a PCN-region that interrupts the end-to-end path 2. Requirements notation The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. 3. Terminology Most of the terminology used in this document is defined either in [PCN-arch] or in [PCN-base-encode]. The following additional terms are defined in this document: o PCN-flow - a flow covered by a reservation but which hasn't signalled that it requires end-to-end ECN support. o PCN-enabled-ECN-flow - a flow covered by reservation and for which the end-to-end transport has explicitly negotiated ECN support from the PCN-boundary-nodes. o Not-Marked (xxx), where xxx represents a standard ECN codepoint - packets that are PCN capable but carry no PCN mark. Also NM(xxx). The (xxx) represents the ECN codepoint that the packet arrived Moncaster, et al. Expires December 25, 2008 [Page 3] Internet-Draft 3 State PCN Encoding June 2008 with at the PCN-ingress-node e.g. NM(CE) represents a PCN capable packet that has no PCN marking but which arrived with the ECN bits set to congestion experienced. 4. The Requirement for Three PCN Encoding States The PCN architecture [PCN-arch] describes proposed PCN schemes that require traffic to be metered and marked using both Threshold and Excess Traffic schemes. In order to achieve this it is necessary to allow for three PCN encoding states. The constraints imposed by the way tunnels process the ECN field severely limit how to encode these states as explained in [PCN-base-encode]. The obvious way to provide one more encoding state than the base encoding is through the use of an additional PCN enabled DiffServ codepoints. One aim of this document is to allow for experiments to show whether such schemes are better than those that only employ two PCN encoding states. As such the additional DSCP will be taken from as the EXP/LU pools defined in [RFC2474]. If the experiments demonstrate that PCN schemes employing three encoding states are significantly better than those only employing two then at a later date IANA might be asked to assign a new PCN enabled DSCP from pool 1. 5. Adding Limited End-to-End ECN Support to PCN [ecn-pcn-usecases] suggests a number of use-cases where explicit preservation of end-to-end ECN semantics might be needed across a PCN domain. One of the use-cases suggests that the end-nodes might be running rate-adaptive codecs that would respond to ECN marks by reducing their transmission rate. If the sending transport sets the ECT codepoint, the setting of the ECN field as it arrives at the PCN ingress node will need to be re-instated as it leaves the PCN egress node. If a PCN region is starting to suffer pre-congestion then it may make sense to expose marks generated within the PCN region by forwarding CE marks from the PCN egress to such a rate-adaptive endpoint., They would be in addition to any CE marks generated elsewhere on the end- to-end path. This would allow the endpoints to reduce the traffic rate. This will in turn help to alleviate the pre-congestion, potentially averting any need for call blocking or termination. However, the 'leaking' of CE marks out of the PCN region is potentially dangerous and could violate [RFC4774] if the end hosts don't understand ECN (see section 18.1.4 of [RFC3168]). Therefore, a PCN region can only support end-to-end ECN if the PCN edge nodes are sure that the end-to-end transport is ECN-capable. Moncaster, et al. Expires December 25, 2008 [Page 4] Internet-Draft 3 State PCN Encoding June 2008 That way the PCN egress nodes can ensure that they only expose CE marks to those receivers that will correctly interpret them as a notification of congestion. The end-points may indicate they are ECN-capable through some signalling process that sets up their reservation with the PCN boundary nodes. The exact process of negotiation is beyond the scope of this document but is likely to involve explicit two way signalling between the end-host and the PCN- domain. In the absence of such signalling the default behaviour of the PCN egress node will be to clear the ECN field to 00 as in the baseline PCN encoding [PCN-base-encode]. 6. Encoding Three PCN States in IP The three state PCN encoding scheme is based closely on that defined in [PCN-base-encode] so that there will be no compatibility issues if a PCN-domain changes from using the baseline encoding scheme to the experimental scheme described here. The exact manner in which the PCN encoding states are carried in the IP header is shown in Table 1. In the following table ThM refers to packets that have been metered and marked according to a Threshold Markins scheme and ETM refers to packets that have been metered and marked according to an Excess Traffic Marking scheme. +--------+--------------+-------------+-------------+---------+ | DSCP | Not-ECT (00) | ECT(0) (10) | ECT(1) (01) | CE (11) | +--------+--------------+-------------+-------------+---------+ | DSCP 1 | Not-PCN | NM(Not-ECT) | NM(CE) | ThM | | DSCP 2 | Not-PCN | NM(ECT(0)) | NM(ECT(1)) | ETM | +--------+--------------+-------------+-------------+---------+ Where DSCP 1 is a PCN-enabled DiffServ codepoint (see [PCN-base-encode]) and DSCP 2 is a PCN-enabled-DSCP from the EXP/LU pools as defined in [RFC2474] Table 1: Encoding three PCN states in IP The four different Not Marked (NM) states allow for the addition of limited end-to-end ECN support as explained in the previous section. Warning Moncaster, et al. Expires December 25, 2008 [Page 5] Internet-Draft 3 State PCN Encoding June 2008 6.1. Forwarding Traffic Out of the PCN-domain As each packet exits the PCN-domain, the PCN-egress-node MUST check whether it belongs to a PCN-enabled-ECN-flow. If it belongs to such a flow then the following table shows how the ECN field should be re- set. In addition all packets should have their DSCP reset to the appropriate DSCP for the next hop. If the next hop is not another PCN region this will not be a PCN enabled DSCP, and by default will be the best-efforts DSCP. Alterntively higher layer signalling mechanisms may allow the DSCP that packets entered the PCN-domain with to be re-instated. +-------+-------------+-----------------+-----------------+---------+ | DSCP | 00 | 10 | 01 | 11 | +-------+-------------+-----------------+-----------------+---------+ | DSCP | Not PCN --> | NM(Not-ECT) --> | NM(CE) --> CE | ThM --> | | 1 | Not ECT | not-ECT | | CE | | DSCP | Not PCN --> | NM(ECT(0)) --> | NM(ECT(1)) --> | ETM --> | | 2 | Not ECT | ECT(0) | ECT(1) | CE | +-------+-------------+-----------------+-----------------+---------+ Where each cell gives the incoming PCN state and the outgoing ECN state. Table 2: Egress rules for resetting ECN field for PCN Enabled ECN Flows For packets belonging to a PCN-flow the ECN field MUST be reset to not-ECT (00) as defined in [PCN-base-encode]. 7. PCN domain support for the PCN extension encoding PCN traffic MUST be marked with a DiffServ codepoint that indicates PCN is enabled. To comply with the PCN extension encoding, this codepoint is either a PCN enabled DSCP assigned by IANA for use with the baseline PCN encoding [PCN-base-encode] or a DSCP from pools 2 or 3 for experimental and local use [RFC2474]. The exact DSCP may vary between PCN-domains but MUST be fixed within each PCN-domain. All nodes within a PCN-domain MUST understand and support the three PCN states of the PCN extension coding. Therefore if any PCN-node does not support three PCN encoding states, any node in the same PCN- domain MUST NOT be configured to use three PCN encoding states as defined here. Moncaster, et al. Expires December 25, 2008 [Page 6] Internet-Draft 3 State PCN Encoding June 2008 7.1. End-to-End transport behaviour compliant with the PCN extension encoding Transports wishing to use both a reservation and end-to-end ECN MUST establish that their path supports this combination. Support of end- to-end ECN by PCN boundary nodes is OPTIONAL. Therefore transports MUST check with both the PCN-ingress-node and PCN-egress-node for each flow. The sending of such a request MUST NOT be taken to mean the request has been granted. The PCN-boundary-nodes MAY choose to inform the end-node of a successful request. The exact mechanism for such negotiation is beyond the scope of this document. A transport that receives no response or a negative response to a request to support end-to-end ECN within a flow reservation MUST set the ECN field of all subsequent packets in that flow to Not-ECT if it wishes to guarantee that the flow will receive PCN treatment. 7.2. PCN-boundary-node behaviour compliant with the PCN extension encoding o If both the PCN ingress and egress nodes support end-to-end ECN, and the transport has successfully requested end-to-end ECN the flow becomes a PCN-enabled-ECN-flow. o If either of a PCN ingress-egress pair does not support end-to-end ECN or if the end-to-end transport does not request support for end-to-end ECN then the PCN-boundary-nodes MUST assume the packet belongs to a PCN-flow. 7.2.1. Behaviour for packets belonging to a PCN-flow o If a packet belongs to a PCN-flow arrives at the PCN-ingress-node with its ECN field already marked as CE or ECT, it SHOULD be dropped. Alternatively it MAY be downgraded to a lower (non-PCN) service class or MAY be tunnelled through the PCN region. It MUST NOT be admitted to the PCN region directly. o When a packet belonging to a PCN-flow carrying the not-ECT codepoint arrives at the PCN-ingress-node, the ECN field MUST be set to ECT(0) (10) and the DiffServ field set to DSCP 1. o When a packet belonging to a PCN-flow leaves the PCN-domain through the PCN-egress-node, the ECN bits MUST be set to not-ECT (00). Moncaster, et al. Expires December 25, 2008 [Page 7] Internet-Draft 3 State PCN Encoding June 2008 7.2.2. Behaviour for packets belonging to a PCN-enabled-ECN-flow o When a packet belonging to a PCN-enabled-ECN-flow arrives at the PCN-ingress-node, then the ECN field and DSCP MUST be set to the appropriate NM(xxx) setting as shown in Table 1. o When a packet belonging to a PCN-enabled-ECN-flow leaves the PCN- region through a PCN-egress-node, the ECN bits MUST be set according to Table 2 and the DSCP MUST be set to the appropriate DSCP for the next hop as discussed in Section 6.1 above. 7.3. PCN-interior-node behaviour compliant with the PCN extension encoding o If a PCN interior node indicates that a packet is to be threshold marked then the ThM codepoint MUST be set by changing the ECN bits to 11 and ensuring the Diffserv field is set to DSCP1. o If a PCN interior node indicates that a packet is to be excess traffic marked then the EM codepoint MUST be set by changing the ECN bits to 11 and ensuring the Diffserv field is set to DSCP2 as defined above. 7.4. Behaviour of any PCN node compliant with the PCN extension encoding o PCN nodes MUST NOT change not-PCN to another codepoint and they MUST NOT change a PCN-Capable codepoint to not-PCN. o ThM MUST NOT be changed to NM. o ETM MUST NOT be changed to ThM or to NM. 8. IANA Considerations This document asks IANA to assign one DiffServ codepoint from Pool 2 or Pool 3 (for experimental/local use)[RFC2474]. Should any of the three encoding state experimental PCN schemes prove sufficiently successful then, at a later date, IANA will be requested in a later document to assign a dedicated DiffServ codepoint from pool 1 for standards use. 9. Security Considerations The security concerns relating to this extended PCN encoding are essentially the same as those in [PCN-base-encode]. Moncaster, et al. Expires December 25, 2008 [Page 8] Internet-Draft 3 State PCN Encoding June 2008 This extension coding gives end-to-end support for the ECN nonce [RFC3540], which is intended to protect the sender against the receiver or against network elements concealing a congestion experienced marking or a lost packet. PCN-based reservations combined with end-to-end ECN are intended for partially inelastic traffic using rate-adaptive codecs. Therefore the end-to-end transport is unlikely to be TCP, but at this time the nonce has only been defined for TCP transports. 10. Conclusions This document describes an extended encoding scheme for PCN that provides for three encoding states as well as support for end-to-end ECN. The encoding scheme builds on the baseline encoding described in [PCN-base-encode]. Using this encoding scheme it is possible for operators to conduct experiments to check whether the addition of an extra encoding state will significantly improve the performance of PCN. It will also allow experiments to determine whether there is a need for end-to-end ECN support within the PCN-domain (as against end-to-end ECN support through the use of IP-in-IP tunnelling or by downgrading the traffic to a lower service class). 11. Acknowledgements This document builds extensively on work done in the PCN working group by Kwok Ho Chan, Georgios Karagiannis, Philip Eardley, Joe Babiarz and others. Full details of alternative schemes that were considered for adoption can be found in the document [pcn-enc-compare]. 12. Comments Solicited Comments and questions are encouraged and very welcome. They can be addressed to the IETF Transport Area working group mailing list , and/or to the authors. 13. References 13.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC4774] Floyd, S., "Specifying Alternate Semantics for the Moncaster, et al. Expires December 25, 2008 [Page 9] Internet-Draft 3 State PCN Encoding June 2008 Explicit Congestion Notification (ECN) Field", BCP 124, RFC 4774, November 2006. 13.2. Informative References [PCN-arch] Eardley, P., "Pre-Congestion Notification Architecture", draft-ietf-pcn-architecture-03 (work in progress), February 2008. [PCN-base-encode] Moncaster, T., Briscoe, B., and M. Menth, "A three state extended PCN encoding scheme", draft-moncaster-pcn-baseline-encoding-01 (work in progress), June 2008. [RFC2474] Nichols, K., Blake, S., Baker, F., and D. Black, "Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers", RFC 2474, December 1998. [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of Explicit Congestion Notification (ECN) to IP", RFC 3168, September 2001. [RFC3540] Spring, N., Wetherall, D., and D. Ely, "Robust Explicit Congestion Notification (ECN) Signaling with Nonces", RFC 3540, June 2003. [RFC4301] Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, December 2005. [ecn-pcn-usecases] Sarker, Z. and I. Johansson, "Usecases and Benefits of end to end ECN support in PCN Domains", draft-sarker-pcn-ecn-pcn-usecases-01 (work in progress), May 2008. [pcn-enc-compare] Chan, K., Karagiannis, G., Moncaster, T., Menth, M., Eardley, P., and B. Briscoe, "Pre-Congestion Notification Encoding Comparison", draft-chan-pcn-encoding-comparison-03 (work in progress), February 2008. Moncaster, et al. Expires December 25, 2008 [Page 10] Internet-Draft 3 State PCN Encoding June 2008 Appendix A. Tunnelling Constraints The rules that govern the behaviour of the ECN field for IP-in-IP tunnels were defined in [RFC3168]. This allowed for two tunnel modes to exist. The limited functionality mode sets the outer header to Not ECT, regardless of the value of the inner header. The full functionality mode copies the inner ECN field into the outer header if the inner header is Not ECT or either of the 2 ECT codepoints. If the inner header is CE then the outer header is set to ECT(0). On decapsulation, if the CE codepoint is set on the outer header then this is copied into the inner header. Otherwise the inner header is left unchanged. The reason for blocking CE from being copied to the outer header was to prevent this from being used as a covert channel through IPSec tunnels. The IPSec protocol [RFC4301] changed the ECN tunnelling rule to allow IPSec tunnels to simply copy the inner header into the outer header. This was because the security community had decided the available bandwidth of the covert channel offered by ECN was too low to be a significant threat. On decapsulation the outer header is discarded and the ECN field is only copied down if it is set to CE. Because of the possible existence of tunnels, only CE (11) can be used as a PCN marking as it is the only mark that will survive decapsulation. There is a further issue involving tunnelling. In [RFC3168], IP in IP tunnels are expected to set the ECN field to ECT(0) if the inner ECN field is set to CE. This leads to the possibility that some packets within the PCN field that have already been marked may have that mark concealed further into the region. This is undesirable for many PCN schemes and thus standard IP in IP tunnels SHOULD NOT be used within a PCN region. Authors' Addresses Toby Moncaster BT B54/70, Adastral Park Martlesham Heath Ipswich IP5 3RE UK Phone: +44 1473 648734 Email: toby.moncaster@bt.com URI: http://www.cs.ucl.ac.uk/staff/B.Briscoe/ Moncaster, et al. Expires December 25, 2008 [Page 11] Internet-Draft 3 State PCN Encoding June 2008 Bob Briscoe BT & UCL B54/77, Adastral Park Martlesham Heath Ipswich IP5 3RE UK Phone: +44 1473 645196 Email: bob.briscoe@bt.com Michael Menth University of Wuerzburg room B206, Institute of Computer Science Am Hubland Wuerzburg D-97074 Germany Phone: +49 931 888 6644 Email: menth@informatik.uni-wuerzburg.de Moncaster, et al. Expires December 25, 2008 [Page 12] Internet-Draft 3 State PCN Encoding June 2008 Full Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgments Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). This document was produced using xml2rfc v1.32 (of http://xml.resource.org/) from a source in RFC-2629 XML format. Moncaster, et al. Expires December 25, 2008 [Page 13]