| < draft-briscoe-aqm-dualq-coupled-00a.txt | draft-briscoe-aqm-dualq-coupled-00.txt > | |||
|---|---|---|---|---|
| Active Queue Management (aqm) K. De Schepper | Active Queue Management (aqm) K. De Schepper | |||
| Internet-Draft Bell Labs | Internet-Draft Bell Labs | |||
| Intended status: Standards Track B. Briscoe, Ed. | Intended status: Standards Track B. Briscoe, Ed. | |||
| Expires: January 11, 2016 Independent | Expires: February 8, 2016 Independent | |||
| O. Bondarenko | O. Bondarenko | |||
| Simula Research Lab | Simula Research Lab | |||
| I. Tsang | I. Tsang | |||
| Bell Labs | Bell Labs | |||
| July 10, 2015 | August 07, 2015 | |||
| DualQ Coupled AQM for Low Latency, Low Loss and Scalable Throughput | DualQ Coupled AQM for Low Latency, Low Loss and Scalable Throughput | |||
| draft-briscoe-aqm-dualq-coupled-00 | draft-briscoe-aqm-dualq-coupled-00 | |||
| Abstract | Abstract | |||
| Data Centre TCP (DCTCP) was designed to provide predictably low | Data Centre TCP (DCTCP) was designed to provide predictably low | |||
| queuing latency, near-zero loss, and throughput scalability using | queuing latency, near-zero loss, and throughput scalability using | |||
| explicit congestion notification (ECN) and an extremely simple | explicit congestion notification (ECN) and an extremely simple | |||
| marking behaviour on switches. However, DCTCP does not co-exist with | marking behaviour on switches. However, DCTCP does not co-exist with | |||
| existing TCP traffic---throughput starves. So, until now, DCTCP | existing TCP traffic---throughput starves. So, until now, DCTCP | |||
| could only be deployed where a clean-slate environment could be | could only be deployed where a clean-slate environment could be | |||
| arranged, such as in private data centres. This specification | arranged, such as in private data centres. This specification | |||
| defines `DualQ Coupled Active Queue Management (AQM)' to allow | defines `DualQ Coupled Active Queue Management (AQM)' to allow | |||
| scalable congestion controls like DCTCP to safely co-exist with | scalable congestion controls like DCTCP to safely co-exist with | |||
| classic Internet traffic. The Coupled AQM ensures that a flow runs | classic Internet traffic. The Coupled AQM ensures that a flow runs | |||
| at about the same rate whether it uses DCTCP or TCP Reno/Cubic, but | at about the same rate whether it uses DCTCP or TCP Reno/Cubic, but | |||
| without inspecting transport layer flow identifiers. When tested in | without inspecting transport layer flow identifiers. When tested in | |||
| a residential broadband setting, DCTCP achieved sub-millisecond | a residential broadband setting, DCTCP achieved sub-millisecond | |||
| average queuing delay and zero congestion loss under a wide range of | average queuing delay and zero congestion loss under a wide range of | |||
| mixes of DCTCP and `classic' broadband Internet traffic, without | mixes of DCTCP and `Classic' broadband Internet traffic, without | |||
| compromising the performance of the classic traffic. The solution | compromising the performance of the Classic traffic. The solution | |||
| also reduces network complexity and eliminates network configuration. | also reduces network complexity and eliminates network configuration. | |||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on January 11, 2016. | This Internet-Draft will expire on February 8, 2016. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2015 IETF Trust and the persons identified as the | Copyright (c) 2015 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| skipping to change at page 2, line 31 | skipping to change at page 2, line 31 | |||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
| 1.1. Problem and Scope . . . . . . . . . . . . . . . . . . . . 2 | 1.1. Problem and Scope . . . . . . . . . . . . . . . . . . . . 2 | |||
| 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | 1.2. Terminology . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
| 1.3. Features . . . . . . . . . . . . . . . . . . . . . . . . 5 | 1.3. Features . . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
| 2. DualQ Coupled AQM Algorithm . . . . . . . . . . . . . . . . . 6 | 2. DualQ Coupled AQM Algorithm . . . . . . . . . . . . . . . . . 6 | |||
| 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 6 | 2.1. Coupled AQM . . . . . . . . . . . . . . . . . . . . . . . 6 | |||
| 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 7 | 2.2. Dual Queue . . . . . . . . . . . . . . . . . . . . . . . 7 | |||
| 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 7 | 2.3. Traffic Classification . . . . . . . . . . . . . . . . . 7 | |||
| 2.4. Normative Requirements . . . . . . . . . . . . . . . . . 8 | 2.4. Normative Requirements . . . . . . . . . . . . . . . . . 9 | |||
| 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9 | 3. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 4. Security Considerations . . . . . . . . . . . . . . . . . . . 9 | 4. Security Considerations . . . . . . . . . . . . . . . . . . . 10 | |||
| 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 9 | 4.1. Overload Handling . . . . . . . . . . . . . . . . . . . . 10 | |||
| 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 10 | 5. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 6.1. Normative References . . . . . . . . . . . . . . . . . . 10 | 6. References . . . . . . . . . . . . . . . . . . . . . . . . . 11 | |||
| 6.2. Informative References . . . . . . . . . . . . . . . . . 10 | 6.1. Normative References . . . . . . . . . . . . . . . . . . 11 | |||
| Appendix A. Example DualQ Coupled Algorithm . . . . . . . . . . 12 | 6.2. Informative References . . . . . . . . . . . . . . . . . 11 | |||
| Appendix B. Guidance on Controlling Throughput Equivalence . . . 17 | Appendix A. Example DualQ Coupled Algorithm . . . . . . . . . . 14 | |||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 | Appendix B. Guidance on Controlling Throughput Equivalence . . . 20 | |||
| Appendix C. DCTCP Safety Enhancements . . . . . . . . . . . . . 21 | ||||
| Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 | ||||
| 1. Introduction | 1. Introduction | |||
| 1.1. Problem and Scope | 1.1. Problem and Scope | |||
| Latency is becoming the critical performance factor for many (most?) | Latency is becoming the critical performance factor for many (most?) | |||
| applications on the public Internet, e.g. Web, voice, conversational | applications on the public Internet, e.g. Web, voice, conversational | |||
| video, gaming and finance apps. In the developed world, further | video, gaming and finance apps. In the developed world, further | |||
| increases in access network bit-rate offer diminishing returns, | increases in access network bit-rate offer diminishing returns, | |||
| whereas latency is still a multi-facetted problem. In the last | whereas latency is still a multi-faceted problem. In the last decade | |||
| decade or so, much has been done to reduce propagation time by | or so, much has been done to reduce propagation time by placing | |||
| placing caches or servers closer to users. However, queuing remains | caches or servers closer to users. However, queuing remains a major | |||
| a major component of latency. | component of latency. | |||
| The Diffserv architecture provides Expedited Forwarding [RFC3246], so | The Diffserv architecture provides Expedited Forwarding [RFC3246], so | |||
| that low latency traffic can jump the queue of other traffic. | that low latency traffic can jump the queue of other traffic. | |||
| However, on access links dedicated to individual sites (homes, small | However, on access links dedicated to individual sites (homes, small | |||
| enterprises or mobile devices), often all traffic at any one time | enterprises or mobile devices), often all traffic at any one time | |||
| will be latency-sensitive. Then Diffserv is of little use. Instead, | will be latency-sensitive. Then Diffserv is of little use. Instead, | |||
| we need to remove the causes of any unnecessary delay. | we need to remove the causes of any unnecessary delay. | |||
| The bufferbloat project has shown that excessively-large buffering | The bufferbloat project has shown that excessively-large buffering | |||
| (`bufferbloat') has been introducing significantly more delay than | (`bufferbloat') has been introducing significantly more delay than | |||
| skipping to change at page 3, line 49 | skipping to change at page 3, line 51 | |||
| 1. the `sawtooth' varying rate of TCP itself; | 1. the `sawtooth' varying rate of TCP itself; | |||
| 2. the smoothing delay deliberately introduced into AQMs to permit | 2. the smoothing delay deliberately introduced into AQMs to permit | |||
| bursts without triggering losses. | bursts without triggering losses. | |||
| The former causes a flow's round trip time (RTT) to vary from about 1 | The former causes a flow's round trip time (RTT) to vary from about 1 | |||
| to 2 times the base RTT between the machines in question. The latter | to 2 times the base RTT between the machines in question. The latter | |||
| delays the system's response to change by a worst-case | delays the system's response to change by a worst-case | |||
| (transcontinental) RTT, which could be hundreds of times the actual | (transcontinental) RTT, which could be hundreds of times the actual | |||
| RTT of typical traffic from localised CDNs. | RTT of typical traffic from localized CDNs. | |||
| Latency is not our only concern: | Latency is not our only concern: | |||
| 3. It was known when TCP was first developed that it would not scale | 3. It was known when TCP was first developed that it would not scale | |||
| to high bandwidth-delay products. | to high bandwidth-delay products. | |||
| Given regular broadband bit-rates over WAN distances are | Given regular broadband bit-rates over WAN distances are | |||
| already [RFC3649] beyond the scaling range of `classic' TCP Reno, | already [RFC3649] beyond the scaling range of `classic' TCP Reno, | |||
| `less unscalable' Cubic [I-D.zimmermann-tcpm-cubic] and | `less unscalable' Cubic [I-D.zimmermann-tcpm-cubic] and | |||
| Compound [I-D.sridharan-tcpm-ctcp] variants of TCP have been | Compound [I-D.sridharan-tcpm-ctcp] variants of TCP have been | |||
| successfully deployed. However, these are now approaching their | successfully deployed. However, these are now approaching their | |||
| scaling limits. Unfortunately, fully scalable TCPs such as DCTCP | scaling limits. Unfortunately, fully scalable TCPs such as DCTCP | |||
| cause `classic' TCP to starve itself, which is why they have been | cause `classic' TCP to starve itself, which is why they have been | |||
| confined to private data centres or research testbeds (until now). | confined to private data centres or research testbeds (until now). | |||
| This document specifies a `DualQ Coupled AQM' that solves the problem | This document specifies a `DualQ Coupled AQM' that solves the problem | |||
| of coexistence between DCTCP and classic flows, without having to | of coexistence between DCTCP and classic flows, without having to | |||
| inspect flow identifiers. It needs fewer operations per packet than | inspect flow identifiers. The AQM is not like flow-queuing | |||
| RED uses. Also, no network configuration is needed for a wide range | approaches [I-D.ietf-aqm-fq-codel] that classify packets by flow | |||
| of scenarios. Therefore it is believed it would be applicable and | identifier into numerous separate queues in order to isolate sparse | |||
| easy to deploy in all types of buffers; buffers in cost-reduced mass- | flows from the higher latency in the queues assigned to heavier flow. | |||
| market residential equipment; buffers in end-system stacks; buffers | In contrast, the AQM exploits the behaviour of scalable congestion | |||
| in carrier-scale equipment including remote access servers, routers, | controls like DCTCP so that every packet in every flow sharing the | |||
| firewalls and ethernet switches; buffers in network interface cards, | queue for DCTCP-like traffic can be served with very low latency. | |||
| buffers in virtualised network appliances, hypervisors, and so on. | ||||
| The AQM needs fewer operations per packet than RED uses. Also, no | ||||
| network configuration is needed for a wide range of scenarios where | ||||
| the range of RTTs is typical for the public Internet. Therefore it | ||||
| is believed the Coupled AQM would be applicable and easy to deploy in | ||||
| all types of buffers; buffers in cost-reduced mass-market residential | ||||
| equipment; buffers in end-system stacks; buffers in carrier-scale | ||||
| equipment including remote access servers, routers, firewalls and | ||||
| Ethernet switches; buffers in network interface cards, buffers in | ||||
| virtualized network appliances, hypervisors, and so on. | ||||
| The supporting paper [DCttH15] gives the full rationale for the AQM's | The supporting paper [DCttH15] gives the full rationale for the AQM's | |||
| design, both discursively and in more precise mathematical form. | design, both discursively and in more precise mathematical form. | |||
| 1.2. Terminology | 1.2. Terminology | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119]. In this | document are to be interpreted as described in [RFC2119]. In this | |||
| document, these words will appear with that interpretation only when | document, these words will appear with that interpretation only when | |||
| skipping to change at page 5, line 20 | skipping to change at page 5, line 34 | |||
| The AQM couples marking and/or dropping across the two queues such | The AQM couples marking and/or dropping across the two queues such | |||
| that a flow will get roughly the same throughput whichever it uses. | that a flow will get roughly the same throughput whichever it uses. | |||
| Therefore both queues can feed into the full capacity of a link and | Therefore both queues can feed into the full capacity of a link and | |||
| no rates need to be configured for the queues. The L4S queue enables | no rates need to be configured for the queues. The L4S queue enables | |||
| scalable congestion controls like DCTCP to give stunningly low and | scalable congestion controls like DCTCP to give stunningly low and | |||
| predictably low latency, without compromising the performance of | predictably low latency, without compromising the performance of | |||
| competing 'Classic' Internet traffic. Thousands of tests have been | competing 'Classic' Internet traffic. Thousands of tests have been | |||
| conducted in a typical fixed residential broadband setting. Typical | conducted in a typical fixed residential broadband setting. Typical | |||
| experiments used a base round trip delay of 7ms between the data | experiments used a base round trip delay of 7ms between the data | |||
| centre and home network, and large amounts of background traffic in | centre and home network, and large amounts of background traffic in | |||
| both queues. For all L4S traffic, the AQM kept the 99th percentile | both queues. For every L4S packet, the AQM kept the 99th percentile | |||
| of queuing delay to about 1ms, and no losses at all were introduced | of queuing delay to about 1ms, and no losses at all were introduced | |||
| by the AQM. Details of the extensive experiments will be made | by the AQM. Details of the extensive experiments will be made | |||
| available [DCttH15]. | available [DCttH15]. | |||
| Subjective testing was also conducted using a demanding panoramic | Subjective testing was also conducted using a demanding panoramic | |||
| interactive video application run over a stack with DCTCP enabled and | interactive video application run over a stack with DCTCP enabled and | |||
| deployed on the testbed. Each user could pan or zoom their own high | deployed on the testbed. Each user could pan or zoom their own high | |||
| definition (HD) sub-window of a larger video scene from a football | definition (HD) sub-window of a larger video scene from a football | |||
| match. Even though the user was also downloading large amounts of | match. Even though the user was also downloading large amounts of | |||
| L4S and Classic data, latency was so low that the picture appeared to | L4S and Classic data, latency was so low that the picture appeared to | |||
| stick to their finger on the touchpad (all the L4S data achieved the | stick to their finger on the touchpad (all the L4S data achieved the | |||
| same ultra-low latency). With an alternative AQM, the video | same ultra-low latency). With an alternative AQM, the video | |||
| noticeably lagged behind the finger gestures. | noticeably lagged behind the finger gestures. | |||
| The experiments used the implementation of DCTCP that is deployed in | Unlike Diffserv Expedited Forwarding, the L4S queue does not have to | |||
| private data centres, without any modification despite its known | be limited to a small proportion of the link capacity in order to | |||
| deficiencies. Nonetheless, certain modifications will be necessary | achieve low delay. The L4S queue can be filled with a heavy load of | |||
| before DCTCP is safe to use on the Internet, In particular it needs | capacity-seeking flows like DCTCP and still achieve low delay. The | |||
| to: | L4S queue does not rely on the presence of other traffic in the | |||
| Classic queue that can be 'overtaken'. It gives low latency to L4S | ||||
| o negotiate its altered feedback semantics, which conveys the extent | traffic whether or not there is Classic traffic, and the latency of | |||
| of ECN marking, not just its existence, and this feedback needs to | Classic traffic does not suffer when a proportion of the traffic is | |||
| be robust to loss [I-D.ietf-tcpm-accecn-reqs]; | L4S. The two queues are only necessary because DCTCP-like flows | |||
| cannot keep latency predictably low and keep utilization high if they | ||||
| o fall back to Reno or Cubic behaviour on loss; | are mixed with legacy TCP flows, | |||
| o average ECN feedback over its own RTT, not the hard-coded RTT | ||||
| suitable only for data-centres, perhaps like Relentless | ||||
| TCP [Mathis09]; | ||||
| o handle a window of less than 2 when the RTT is low, rather than | ||||
| increase the queue [TCP-sub-mss-w]. | ||||
| Other, non-essential enhancements to DCTCP can be envisaged. | The experiments used the Linux implementation of DCTCP that is | |||
| However, the focus of this specification is to get the network | deployed in private data centres, without any modification despite | |||
| service in place. Then, without any management intervention, | its known deficiencies. Nonetheless, certain modifications will be | |||
| applications can exploit it by migrating to scalable controls like | necessary before DCTCP is safe to use on the Internet, which are | |||
| DCTCP, which can then evolve _while_ their benefits are being enjoyed | recorded for now in Appendix C. However, the focus of this | |||
| by everyone on the Internet. | specification is to get the network service in place. Then, without | |||
| any management intervention, applications can exploit it by migrating | ||||
| to scalable controls like DCTCP, which can then evolve _while_ their | ||||
| benefits are being enjoyed by everyone on the Internet. | ||||
| 2. DualQ Coupled AQM Algorithm | 2. DualQ Coupled AQM Algorithm | |||
| There are two main aspects to the algorithm: | There are two main aspects to the algorithm: | |||
| o the Coupled AQM that addresses throughput equivalence between | o the Coupled AQM that addresses throughput equivalence between | |||
| Classic (e.g. Reno, Cubic) flows and L4S (e.g. DCTCP) flows | Classic (e.g. Reno, Cubic) flows and L4S (e.g. DCTCP) flows | |||
| o the Dual Queue structure that provides latency separation for L4S | o the Dual Queue structure that provides latency separation for L4S | |||
| flows to isolate them from the typically large Classic queue. | flows to isolate them from the typically large Classic queue. | |||
| 2.1. Coupled AQM | 2.1. Coupled AQM | |||
| In the 1990s, the `TCP formula' was derived for the relationship | In the 1990s, the `TCP formula' was derived for the relationship | |||
| between TCP's congestion window, cwnd, and its drop probability, p. | between TCP's congestion window, cwnd, and its drop probability, p. | |||
| To a first order approximation, cwnd of TCP Reno is inversely | To a first order approximation, cwnd of TCP Reno is inversely | |||
| proportional to the square root of p. In our supporting paper | proportional to the square root of p. TCP Cubic implements a Reno- | |||
| [DCttH15], we derive the equivalent relationship for DCTCP, for which | compatibility mode, which is the only relevant mode for typical RTTs | |||
| cwnd is inversely proportional to p, where in this case p is the ECN | under 20ms, while the throughput of a single flow is less than about | |||
| marking probability. | 500Mb/s. Therefore we can assume that Cubic traffic behaves similar | |||
| to Reno (but with a slightly different constant of proportionality), | ||||
| and we shall use the term 'Classic' for the collection of Reno and | ||||
| Cubic in Reno mode. | ||||
| TCP Cubic implements a Reno-compatibility mode, which is the only | In our supporting paper [DCttH15], we derive the equivalent rate | |||
| relevant mode for typical RTTs under 20ms, while the throughput of a | equation for DCTCP, for which cwnd is inversely proportional to p | |||
| single flow is less than about 500Mb/s. Therefore we can assume that | (not the square root), where in this case p is the ECN marking | |||
| Cubic traffic behaves similar to Reno (but with a slightly different | probability. DCTCP is not the only congestion control that behaves | |||
| constant of proportionality). | like this, so we use the term 'L4S' traffic for all similar | |||
| behaviour. | ||||
| Therefore, in order to make a DCTCP flow run at roughly the same rate | In order to make a DCTCP flow run at roughly the same rate as a Reno | |||
| as a Reno or Cubic TCP flow (all other factors being equal), the drop | TCP flow (all other factors being equal), we make the drop | |||
| probability of Reno and Cubic has to be proportional to the square of | probability for Classic traffic, p_C distinct from the marking | |||
| the ECN marking probability applied to DCTCP. It turns out there is | probability for L4S traffic, p_L (in contrast to RFC3168 which | |||
| a really simple way to implement the square of a probability - by | requires them to be the same). We make the Classic drop probability | |||
| testing the queue against two random numbers not one. This is the | p_C proportional to the square of the L4S marking probability p_L. | |||
| approach adopted in Appendix A. | This is because we need to make the Reno flow rate equal the DCTCP | |||
| flow rate, so we have to square the square root of p_C in the Reno | ||||
| rate equation to make it the same as the straight p_L in the DCTCP | ||||
| rate equation. | ||||
| Stating this as a formula, the relation between L4S marking | There is a really simple way to implement the square of a probability | |||
| probability, p_L and Classic drop probability, p_C needs to take the | - by testing the queue against two random numbers not one. This is | |||
| the approach adopted in Appendix A. | ||||
| Stating this as a formula, the relation between Classic drop | ||||
| probability, p_C, and L4S marking probability, p_L needs to take the | ||||
| form: | form: | |||
| p_C = ( p_L / 2^k )^2 (1) | p_C = ( p_L / 2^k )^2 (1) | |||
| where 2^k is the constant of proportionality, which is expressed as a | where 2^k is the constant of proportionality, which is expressed as a | |||
| power of 2 so that implementations can avoid costly division by | power of 2 so that implementations can avoid costly division by | |||
| shifting p_L by k bits to the right. | shifting p_L by k bits to the right. | |||
| 2.2. Dual Queue | 2.2. Dual Queue | |||
| skipping to change at page 7, line 22 | skipping to change at page 7, line 43 | |||
| for L4S traffic, and it is scheduled with strict priority. | for L4S traffic, and it is scheduled with strict priority. | |||
| Nonetheless, coupled marking ensures that giving priority to L4S | Nonetheless, coupled marking ensures that giving priority to L4S | |||
| traffic still leaves the right amount of spare scheduling time for | traffic still leaves the right amount of spare scheduling time for | |||
| Classic flows to each get equivalent throughput to DCTCP flows (all | Classic flows to each get equivalent throughput to DCTCP flows (all | |||
| other factors such as RTT being equal). The algorithm achieves this | other factors such as RTT being equal). The algorithm achieves this | |||
| without having to inspect flow identifiers. | without having to inspect flow identifiers. | |||
| 2.3. Traffic Classification | 2.3. Traffic Classification | |||
| Both the Coupled AQM and DualQ mechanisms need an identifier to | Both the Coupled AQM and DualQ mechanisms need an identifier to | |||
| distinguish L4S and C packets, which will need to be standaridsed. | distinguish L4S and C packets, which will need to be standardized. | |||
| In our tests we used a cleared ECN field to indicate C packets and | In our tests we used a cleared ECN field to indicate C packets and | |||
| L4S otherwise. The ECN specification [RFC3168] currently defines a | L4S otherwise. The ECN specification [RFC3168] currently defines a | |||
| mark as equivalent to a drop. However, it says | mark as equivalent to a drop. However, it says | |||
| "An environment where all end nodes were ECN-Capable could allow | "An environment where all end nodes were ECN-Capable could allow | |||
| new criteria to be developed for setting the CE codepoint, and new | new criteria to be developed for setting the CE codepoint, and new | |||
| congestion control mechanisms for end-node reaction to CE packets. | congestion control mechanisms for end-node reaction to CE packets. | |||
| However, this is a research issue, and as such is not addressed in | However, this is a research issue, and as such is not addressed in | |||
| this document." | this document." | |||
| and [RFC4774]} gives valid ways to alter ECN's semantics without | and [RFC4774]} gives valid ways to alter ECN's semantics without | |||
| harming interoperability. | harming interoperability. | |||
| Since publication in 2001,deployment of RFC3168 ECN has been dogged | Since publication in 2001,deployment of RFC3168 ECN has been dogged | |||
| by bugs and misunderstandings. In recent years RFC3168 ECN has been | by bugs and misunderstandings. In recent years RFC3168 ECN has been | |||
| deployed quite successfully on servers [ECN_Deploy], and until | deployed quite successfully on servers [ECN_Deploy], and until | |||
| recently it was deployed but not enabled on a fair proportion of user | recently it was deployed but not enabled on a fair proportion of user | |||
| machines. Recently one major developer developer of client devices | machines. Recently one major developer of client devices has | |||
| has finally widely deployed ECN by remote configuration, with it | configured ECN on-by-default in its beta releases. However although | |||
| configured on-by-default. However although some network equipment | some network equipment vendors and developers have implemented ECN, | |||
| vendors and developers have implemented ECN, there is little evidence | there is little evidence that any public network operator is | |||
| that any public network operator is considering or has deployed ECN- | considering or has deployed ECN-capable AQMs on network equipment | |||
| capable AQMs on network equipment yet. | yet. | |||
| A number of private data centre operators have deployed ECN, but not | A number of private data centre operators have deployed ECN, but not | |||
| RFC3168 ECN. Instead, they are using DCTCP to get predictable ultra- | RFC3168 ECN. Instead, they are using DCTCP to get predictable ultra- | |||
| low latency, and they are either ensuring that there is no non-DCTCP | low latency, and they are either ensuring that there is no non-DCTCP | |||
| traffic [I-D.bensley-tcpm-dctcp], or they are segregating such | traffic [I-D.bensley-tcpm-dctcp], or they are segregating such | |||
| traffic from DCTCP using Diffserv [DCTCP_Pitfalls]. The RFC3168 | traffic from DCTCP using Diffserv [DCTCP_Pitfalls]. The RFC3168 | |||
| approach merely prevents drop, whereas the DCTCP approach provides | approach merely prevents drop, whereas the DCTCP approach provides | |||
| scalable throughput and ultra-low latency as well as avoiding drop. | scalable throughput and ultra-low latency as well as avoiding drop. | |||
| Consequently it has been questioned whether the RFC3168 approach | Consequently it has been questioned whether the RFC3168 approach | |||
| offers enough performance improvement for an operator to countenance | offers enough performance improvement for an operator to countenance | |||
| skipping to change at page 8, line 50 | skipping to change at page 9, line 24 | |||
| necessary to agree on the meaning of an ECN marking on L4S traffic, | necessary to agree on the meaning of an ECN marking on L4S traffic, | |||
| relative to a drop of Classic traffic. In order to prevent | relative to a drop of Classic traffic. In order to prevent | |||
| starvation of Classic traffic by scalable L4S traffic (e.g. DCTCP) | starvation of Classic traffic by scalable L4S traffic (e.g. DCTCP) | |||
| the drop probability of Classic traffic MUST be proportional to the | the drop probability of Classic traffic MUST be proportional to the | |||
| square of the marking probability of L4S traffic, In other words, the | square of the marking probability of L4S traffic, In other words, the | |||
| power to which p_L is raised in Eqn. (1) MUST be 2. | power to which p_L is raised in Eqn. (1) MUST be 2. | |||
| The constant of proportionality, k, in Eqn (1) determines the | The constant of proportionality, k, in Eqn (1) determines the | |||
| relative flow rates of Classic and L4S flows when the AQM concerned | relative flow rates of Classic and L4S flows when the AQM concerned | |||
| is the bottleneck (all other factors being equal). k does not have to | is the bottleneck (all other factors being equal). k does not have to | |||
| be standardised because differences do not prevent interoperability. | be standardized because differences do not prevent interoperability. | |||
| However, k has to take some value, and each operator can make that | However, k has to take some value, and each operator can make that | |||
| choice. | choice. | |||
| A value of k=0 is RECOMMENDED as the default for public Internet | A value of k=0 is RECOMMENDED as the default for public Internet | |||
| access networks, assuming the DCTCP algorithm remains similar to that | access networks, assuming the DCTCP algorithm remains similar to that | |||
| in [I-D.bensley-tcpm-dctcp]. Nonetheless choice of k is a matter of | in [I-D.bensley-tcpm-dctcp]. Nonetheless choice of k is a matter of | |||
| operator policy, and operators MAY choose a different value using | operator policy, and operators MAY choose a different value using | |||
| Table 1 and the guidelines in Appendix B. | Table 1 and the guidelines in Appendix B. | |||
| Typically, access network operators isolate customers from each other | Typically, access network operators isolate customers from each other | |||
| with some form of layer-2 multiplexing (TDM in DOCSIS, CDMA in 3G) or | with some form of layer-2 multiplexing (TDM in DOCSIS, CDMA in 3G) or | |||
| L3 scheduling (WRR in broadband), rather than relying on TCP to share | L3 scheduling (WRR in broadband), rather than relying on TCP to share | |||
| capacity between customers [RFC0970]}. In such cases, the choice of k | capacity between customers [RFC0970]. In such cases, the choice of k | |||
| will solely affect relative flow rates within the customer's access | will solely affect relative flow rates within the customer's access | |||
| capacity, not between customers. Also, k would not affect rates of | capacity, not between customers. Also, k would not affect rates of | |||
| small flows, nor long flows at any times when they are all Classic or | small flows, nor long flows at any times when they are all Classic or | |||
| all L4S. | all L4S. | |||
| An example DualQ Coupled AQM algorithm is given in Appendix A. The | An example DualQ Coupled AQM algorithm is given in Appendix A. | |||
| AQM in each queue is based on an AQM called Curvy RED, which is | Marking and dropping in each queue is based on an AQM called Curvy | |||
| intended to improve on RED, PIE and CoDel. We have found that Curvy | RED, which is intended to improve on RED, PIE and CoDel. We have | |||
| RED offers good performance, requires less operations per packet than | found that Curvy RED offers good performance, requires less | |||
| RED and is insensitive to configuration. Nonetheless, it would be | operations per packet than RED and is insensitive to configuration. | |||
| possible to control each queue with an alternative AQM, as long as | Nonetheless, it would be possible to control each queue with an | |||
| the above normative requirements (those expressed in capitals) are | alternative AQM, as long as the above normative requirements (those | |||
| observed, which are intended to be independent of the specific AQM. | expressed in capitals) are observed, which are intended to be | |||
| independent of the specific AQM. | ||||
| {ToDo: Add management and monitoring requirements} | ||||
| 3. IANA Considerations | 3. IANA Considerations | |||
| This specification contains no IANA considerations. | This specification contains no IANA considerations. | |||
| 4. Security Considerations | 4. Security Considerations | |||
| 4.1. Overload Handling | ||||
| Where the interests of users or flows might conflict, it could be | Where the interests of users or flows might conflict, it could be | |||
| necessary to police traffic to isolate any harm to performance. This | necessary to police traffic to isolate any harm to performance. This | |||
| is a policy issue that needs to be seperable from a basic AQM, but | is a policy issue that needs to be separable from a basic AQM, but | |||
| the scheme does need to handle overload. A tradeoff needs to be made | the scheme does need to handle overload. A trade-off needs to be | |||
| between complexity and the risk of either class harming the other. | made between complexity and the risk of either class harming the | |||
| It is an operator policy to define what must happen if the service | other. It is an operator policy to define what must happen if the | |||
| time of the classic queue becomes too big. Actions can include delay | service time of the classic queue becomes too great. In the | |||
| based scheduling, common drop, etc. {ToDo: Expand this discussion in | following subsections three optional non-exclusive overload | |||
| a future draft, with examples.} | protections are defined. Their objective is for the overload | |||
| behaviour of the DualQ AQM to be similar to a single queue AQM. | ||||
| Other overload protections can be envisaged: | ||||
| Minimum throughput service: By replacing the priority scheduler | ||||
| with a weighted round robin scheduler, a minimum throughput | ||||
| service can be guaranteed for Classic traffic. Typically the | ||||
| scheduling weight of the Classic queue will be small (e.g. 5%) to | ||||
| avoid interference with the coupling but big enough to avoid | ||||
| complete starvation of Classic traffic. | ||||
| Drop on overload: On severe overload, e.g. due to non responsive | ||||
| traffic, queues will typically overflow and packet drop will be | ||||
| unavoidable when the queues reach their limits. The drop-limit of | ||||
| each queue should be configured by specifying the maximum | ||||
| supported load and determining the expected maximum size of each | ||||
| queue when that load is separately applied to each queue. The | ||||
| Classic queue limit will typically be larger than the L4S queue | ||||
| limit. Overflow of one traffic type will automatically result in | ||||
| drop in its respective queue. Both traffic types will get a high | ||||
| congestion signal, due to the coupled marking, which will result | ||||
| in similar starvation of responsive traffic in both queues. Thus, | ||||
| the behaviour will be like a single queue AQM. To further improve | ||||
| the arrival fairness of a single queue an extra overall AQM limit | ||||
| can be applied, which is a limit to the sum of both queues. To be | ||||
| effective, it should be configured to be less than the sum of the | ||||
| limits of both queues, but greater than the maximum individual | ||||
| queue limit. It ensures that the drop probability of unresponsive | ||||
| traffic will be independent of its traffic type. | ||||
| Delay on overload: To control milder overload of responsive traffic, | ||||
| particularly when close to the maximum congestion signal, delay | ||||
| can be used as an alternative congestion control mechanism. The | ||||
| Dual Queue Coupled AQM can be made to behave like a single FIFO | ||||
| queue with differentiated service times by replacing the priority | ||||
| scheduler with a very simple "biased longest sojourn time first | ||||
| scheduler". The bias is defined as a maximum sojourn time | ||||
| difference (T_m) between the Classic and L4S packets. The | ||||
| scheduler adds T_m to the sojourn time of the next L4S packet, | ||||
| before comparing it with the timestamp of the next Classic packet, | ||||
| then it selects the packet with the greater adjusted sojourn time. | ||||
| This time shifted FIFO queue behaves just like a single FIFO queue | ||||
| under moderate and high overload. | ||||
| 5. Acknowledgements | 5. Acknowledgements | |||
| Thanks to Anil Agarwal for detailed review comments and suggestions | ||||
| on how to make our explanation clearer. | ||||
| The authors' contributions are part-funded by the European Community | The authors' contributions are part-funded by the European Community | |||
| under its Seventh Framework Programme through the Reducing Internet | under its Seventh Framework Programme through the Reducing Internet | |||
| Transport Latency (RITE) project (ICT-317700). The views expressed | Transport Latency (RITE) project (ICT-317700). The views expressed | |||
| here are solely those of the authors. | here are solely those of the authors. | |||
| 6. References | 6. References | |||
| 6.1. Normative References | 6.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, | |||
| DOI 10.17487/RFC2119, March 1997, | ||||
| <http://www.rfc-editor.org/info/rfc2119>. | ||||
| 6.2. Informative References | 6.2. Informative References | |||
| [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | [ARED01] Floyd, S., Gummadi, R., and S. Shenker, "Adaptive RED: An | |||
| Algorithm for Increasing the Robustness of RED's Active | Algorithm for Increasing the Robustness of RED's Active | |||
| Queue Management", ACIRI Technical Report , August 2001, | Queue Management", ACIRI Technical Report , August 2001, | |||
| <http://www.icir.org/floyd/red.html>. | <http://www.icir.org/floyd/red.html>. | |||
| [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | [CoDel] Nichols, K. and V. Jacobson, "Controlling Queue Delay", | |||
| ACM Queue 10(5), May 2012, | ACM Queue 10(5), May 2012, | |||
| <http://queue.acm.org/issuedetail.cfm?issue=2208917>. | <http://queue.acm.org/issuedetail.cfm?issue=2208917>. | |||
| [CRED_Insights] | [CRED_Insights] | |||
| Briscoe, B., "Insights from Curvy RED (Random Early | Briscoe, B., "Insights from Curvy RED (Random Early | |||
| Detection)", BT Technical Report TR-TUB8-2015-003, May | Detection)", BT Technical Report TR-TUB8-2015-003, July | |||
| 2015, | 2015, | |||
| <http://www.bobbriscoe.net/projects/latency/credi_tr.pdf>. | <http://www.bobbriscoe.net/projects/latency/credi_tr.pdf>. | |||
| [DCTCP_Pitfalls] | [DCTCP_Pitfalls] | |||
| Judd, G., "Attaining the Promise and Avoiding the Pitfalls | Judd, G., "Attaining the Promise and Avoiding the Pitfalls | |||
| of TCP in the Datacenter", 12th USENIX Symposium on | of TCP in the Datacenter", 12th USENIX Symposium on | |||
| Networked Systems Design and Implementation (NSDI 15) | Networked Systems Design and Implementation (NSDI 15) | |||
| 145--157, May 2015, | 145--157, May 2015, | |||
| <http://blogs.usenix.org/conference/nsdi15/technical- | <http://blogs.usenix.org/conference/nsdi15/technical- | |||
| sessions/presentation/judd>. | sessions/presentation/judd>. | |||
| skipping to change at page 11, line 9 | skipping to change at page 12, line 37 | |||
| Trammell, B., Kuehlewind, M., Boppart, D., Learmonth, I., | Trammell, B., Kuehlewind, M., Boppart, D., Learmonth, I., | |||
| Fairhurst, G., and R. Scheffenegger, "Enabling Internet- | Fairhurst, G., and R. Scheffenegger, "Enabling Internet- | |||
| Wide Deployment of Explicit Congestion Notification", Proc | Wide Deployment of Explicit Congestion Notification", Proc | |||
| Passive & Active Measurement (PAM'15) Conference , 2015, | Passive & Active Measurement (PAM'15) Conference , 2015, | |||
| <http://ecn.ethz.ch/ecn-pam15.pdf>. | <http://ecn.ethz.ch/ecn-pam15.pdf>. | |||
| [I-D.bensley-tcpm-dctcp] | [I-D.bensley-tcpm-dctcp] | |||
| Bensley, S., Eggert, L., Thaler, D., Balasubramanian, P., | Bensley, S., Eggert, L., Thaler, D., Balasubramanian, P., | |||
| and G. Judd, "Microsoft's Datacenter TCP (DCTCP): TCP | and G. Judd, "Microsoft's Datacenter TCP (DCTCP): TCP | |||
| Congestion Control for Datacenters", draft-bensley-tcpm- | Congestion Control for Datacenters", draft-bensley-tcpm- | |||
| dctcp-04 (work in progress), July 2015. | dctcp-05 (work in progress), July 2015. | |||
| [I-D.ietf-aqm-fq-codel] | [I-D.ietf-aqm-fq-codel] | |||
| Hoeiland-Joergensen, T., McKenney, P., | Hoeiland-Joergensen, T., McKenney, P., | |||
| dave.taht@gmail.com, d., Gettys, J., and E. Dumazet, | dave.taht@gmail.com, d., Gettys, J., and E. Dumazet, | |||
| "FlowQueue-Codel", draft-ietf-aqm-fq-codel-01 (work in | "FlowQueue-Codel", draft-ietf-aqm-fq-codel-01 (work in | |||
| progress), July 2015. | progress), July 2015. | |||
| [I-D.ietf-aqm-pie] | [I-D.ietf-aqm-pie] | |||
| Pan, R., Natarajan, P., Baker, F., and G. White, "PIE: A | Pan, R., Natarajan, P., Baker, F., and G. White, "PIE: A | |||
| Lightweight Control Scheme To Address the Bufferbloat | Lightweight Control Scheme To Address the Bufferbloat | |||
| skipping to change at page 11, line 46 | skipping to change at page 13, line 28 | |||
| Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | Rhee, I., Xu, L., Ha, S., Zimmermann, A., Eggert, L., and | |||
| R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | R. Scheffenegger, "CUBIC for Fast Long-Distance Networks", | |||
| draft-zimmermann-tcpm-cubic-01 (work in progress), April | draft-zimmermann-tcpm-cubic-01 (work in progress), April | |||
| 2015. | 2015. | |||
| [Mathis09] | [Mathis09] | |||
| Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , | Mathis, M., "Relentless Congestion Control", PFLDNeT'09 , | |||
| May 2009, <http://www.hpcc.jp/pfldnet2009/ | May 2009, <http://www.hpcc.jp/pfldnet2009/ | |||
| Program_files/1569198525.pdf>. | Program_files/1569198525.pdf>. | |||
| [RFC0970] Nagle, J., "On packet switches with infinite storage", RFC | [RFC0970] Nagle, J., "On Packet Switches With Infinite Storage", | |||
| 970, December 1985. | RFC 970, DOI 10.17487/RFC0970, December 1985, | |||
| <http://www.rfc-editor.org/info/rfc970>. | ||||
| [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | |||
| S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | |||
| Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | |||
| S., Wroclawski, J., and L. Zhang, "Recommendations on | S., Wroclawski, J., and L. Zhang, "Recommendations on | |||
| Queue Management and Congestion Avoidance in the | Queue Management and Congestion Avoidance in the | |||
| Internet", RFC 2309, April 1998. | Internet", RFC 2309, DOI 10.17487/RFC2309, April 1998, | |||
| <http://www.rfc-editor.org/info/rfc2309>. | ||||
| [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | [RFC3168] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition | |||
| of Explicit Congestion Notification (ECN) to IP", RFC | of Explicit Congestion Notification (ECN) to IP", | |||
| 3168, September 2001. | RFC 3168, DOI 10.17487/RFC3168, September 2001, | |||
| <http://www.rfc-editor.org/info/rfc3168>. | ||||
| [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, | [RFC3246] Davie, B., Charny, A., Bennet, J., Benson, K., Le Boudec, | |||
| J., Courtney, W., Davari, S., Firoiu, V., and D. | J., Courtney, W., Davari, S., Firoiu, V., and D. | |||
| Stiliadis, "An Expedited Forwarding PHB (Per-Hop | Stiliadis, "An Expedited Forwarding PHB (Per-Hop | |||
| Behavior)", RFC 3246, March 2002. | Behavior)", RFC 3246, DOI 10.17487/RFC3246, March 2002, | |||
| <http://www.rfc-editor.org/info/rfc3246>. | ||||
| [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", | [RFC3649] Floyd, S., "HighSpeed TCP for Large Congestion Windows", | |||
| RFC 3649, December 2003. | RFC 3649, DOI 10.17487/RFC3649, December 2003, | |||
| <http://www.rfc-editor.org/info/rfc3649>. | ||||
| [RFC4774] Floyd, S., "Specifying Alternate Semantics for the | [RFC4774] Floyd, S., "Specifying Alternate Semantics for the | |||
| Explicit Congestion Notification (ECN) Field", BCP 124, | Explicit Congestion Notification (ECN) Field", BCP 124, | |||
| RFC 4774, November 2006. | RFC 4774, DOI 10.17487/RFC4774, November 2006, | |||
| <http://www.rfc-editor.org/info/rfc4774>. | ||||
| [TCP-sub-mss-w] | [TCP-sub-mss-w] | |||
| Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion | Briscoe, B. and K. De Schepper, "Scaling TCP's Congestion | |||
| Window for Small Round Trip Times", BT Technical Report | Window for Small Round Trip Times", BT Technical Report | |||
| TR-TUB8-2015-002, May 2015, | TR-TUB8-2015-002, May 2015, | |||
| <http://www.bobbriscoe.net/projects/latency/ | <http://www.bobbriscoe.net/projects/latency/ | |||
| sub-mss-w.pdf>. | sub-mss-w.pdf>. | |||
| Appendix A. Example DualQ Coupled Algorithm | Appendix A. Example DualQ Coupled Algorithm | |||
| The pseudocode below gives the DualQ Coupled AQM algorithm we used in | As a concrete example, the pseudocode below gives the DualQ Coupled | |||
| testing, as a concrete example. | AQM algorithm we used in testing. Although we designed the AQM to be | |||
| efficient in integer arithmetic, to aid understanding it is first | ||||
| given using real-number arithmetic. Then, one possible optimization | ||||
| for integer arithmetic is given, also in pseudocode. To aid | ||||
| comparison, the line numbers are kept in step between the two by | ||||
| using letter suffixes where the longer code needs extra lines. | ||||
| 1: dualq_dequeue(lq, cq) { % Couples L4S & Classic queues, lq & cq | 1: dualq_dequeue(lq, cq) { % Couples L4S & Classic queues, lq & cq | |||
| 2: if ( lq.dequeue(pkt) ) { | 2: if ( lq.dequeue(pkt) ) { | |||
| 3: if ( (lq.len() > T) || ((cq.time() << S_L) > maxrand(U)) ) { | 3a: p_L = cq.sec() / 2^S_L | |||
| 3b: if ( lq.byt() > T ) | ||||
| 3c: mark(pkt) | ||||
| 3d: elif ( p_L > maxrand(U) ) | ||||
| 4: mark(pkt) | 4: mark(pkt) | |||
| 5: } | 5: return(pkt) % return the packet and stop here | |||
| 6: return(pkt) % return the packet and stop here | 6: } | |||
| 7: } | 7: while ( cq.dequeue(pkt) ) { | |||
| 8: while ( cq.dequeue(pkt) ) { | 8a: alpha = 2^(-f_C) | |||
| 9: Q_C += (pkt.time() - Q_C) >> f_C % Classic Q EWMA | 8b: Q_C = alpha * pkt.sec() + (1-alpha)* Q_C % Classic Q EWMA | |||
| 10: if ( (Q_C << S_C) > maxrand(2*U) ) { | 9a: sqrt_p_C = Q_C / 2^S_C | |||
| 11: drop(pkt) % Squared drop, redo loop | 9b: if ( sqrt_p_C > maxrand(2*U) ) | |||
| 12: } else { | 10: drop(pkt) % Squared drop, redo loop | |||
| 13: return(pkt) % return the packet and stop here | 11: else | |||
| 14: } | 12: return(pkt) % return the packet and stop here | |||
| 13: } | ||||
| 14: return(NULL) % no packet to dequeue | ||||
| 15: } | 15: } | |||
| 16: } | ||||
| 17: maxrand(u) { % return the max of u random numbers | 16: maxrand(u) { % return the max of u random numbers | |||
| 18: maxr=0 | 17: maxr=0 | |||
| 19: while (u-- > 0) { | 18: while (u-- > 0) | |||
| 20: maxr = max(maxr, rand()) | 19: maxr = max(maxr, rand()) % 0 <= rand() < 1 | |||
| 20: return(maxr) | ||||
| 21: } | 21: } | |||
| 22: return(maxr) | ||||
| 23: } | ||||
| Figure 1: Example Dequeue Algorithm for Coupled DualQ AQM | Figure 1: Example Dequeue Pseudocode for Coupled DualQ AQM | |||
| Packet classification code is not shown, as it is no different from | Packet classification code is not shown, as it is no different from | |||
| regular packet classification. Potential classification schemes are | regular packet classification. Potential classification schemes are | |||
| discussed in Section 2. Overload protection code will be included in | discussed in Section 2. Overload protection code will be included in | |||
| a future draft {ToDo}. | a future draft {ToDo}. | |||
| At the outer level, the structure of dualQCoupled() implements strict | At the outer level, the structure of dualq_dequeue() implements | |||
| priority scheduling. The code is written assuming the AQM is applied | strict priority scheduling. The code is written assuming the AQM is | |||
| on dequeue (Note 1) . Every time dualQCoupled() is called, the if- | applied on dequeue (Note 1) . Every time dualq_dequeue() is called, | |||
| block in lines 2-7 determines whether there is an L4S packet to | the if-block in lines 2-6 determines whether there is an L4S packet | |||
| dequeue by calling lq.dequeue(pkt), and otherwise the while-block in | to dequeue by calling lq.dequeue(pkt), and otherwise the while-block | |||
| lines 8-15 determines whether there is a Classic packet to dequeue, | in lines 7-13 determines whether there is a Classic packet to | |||
| by calling cq.dequeue(pkt). (Note 2) | dequeue, by calling cq.dequeue(pkt). (Note 2) | |||
| In the lower priority Classic queue, a while loop is used so that, if | In the lower priority Classic queue, a while loop is used so that, if | |||
| the AQM determines that a classic packet should be dropped, it | the AQM determines that a classic packet should be dropped, it | |||
| continues to test for classic packets deciding whether to drop each | continues to test for classic packets deciding whether to drop each | |||
| until it actually forwards one. Thus, every call to dualQCoupled() | until it actually forwards one. Thus, every call to dualq_dequeue() | |||
| returns one packet if at least one is present in either queue, | returns one packet if at least one is present in either queue, | |||
| otherwise it returns none. (Note 3) | otherwise it returns NULL at line 14. (Note 3) | |||
| Within each queue, the decision whether to drop or mark is taken as | Within each queue, the decision whether to drop or mark is taken as | |||
| follows (to simplify the explanation, for now it is assumed that | follows (to simplify the explanation, it is assumed that U=1): | |||
| U=1): | ||||
| L4S: If the test at line 2 determines there is an L4S packet to | L4S: If the test at line 2 determines there is an L4S packet to | |||
| dequeue, the test at line 3 determines whether to mark it. It is | dequeue, the tests at lines 3a and 3c determine whether to mark | |||
| actually the logical OR of two tests. The first is a simple test | it. The first is a simple test of whether the L4S queue (lq.byt() | |||
| of whether the L4S queue (lq.len() [bytes]) is greater than a | in bytes) is greater than a step threshold T in bytes (Note 4). | |||
| simple step threshold T [bytes]. The second test is similar to | The second test is similar to the random ECN marking in RED, but | |||
| the random ECN marking in RED, but with the following differences: | with the following differences: i) the marking function does not | |||
| i) the marking function does not start with a plateau of zero | start with a plateau of zero marking until a minimum threshold, | |||
| marking until a minimum threshold, rather the marking probability | rather the marking probability starts to increase as soon as the | |||
| starts to increase as soon as the queue is positive; ii) marking | queue is positive; ii) marking depends on queuing time, not bytes, | |||
| of the L4S queue does not depend on itself, it depends on the | in order to scale for any link rate without being reconfigured; | |||
| queuing time of the _other_ (Classic) queue, cq.time(); iii) | iii) marking of the L4S queue does not depend on itself, it | |||
| marking depends on the instantaneous queuing time (of the other | depends on the queuing time of the _other_ (Classic) queue, where | |||
| queue), not a smoothed average; iv) the queue is compared with the | cq.sec() is the queuing time of the packet at the head of the | |||
| maximum of U random numbers (but if U=1, this is the same as the | Classic queue (zero if empty); iv) marking depends on the | |||
| single random number used in RED). | instantaneous queuing time (of the other queue), not a smoothed | |||
| average; v) the queue is compared with the maximum of U random | ||||
| numbers (but if U=1, this is the same as the single random number | ||||
| used in RED). | ||||
| Specifically, the queueing time qc.time() is multipled by the L4S | Specifically, in line 3a the marking probability p_L is set to the | |||
| slope parameter S_L that determines how steeply marking | Classic queueing time qc.sec() in seconds divided by the L4S | |||
| probability rises with queuing time. The slope parameter S_L is | scaling parameter 2^S_L, which represents the queuing time (in | |||
| expressed as a power of 2, so that multiplication can be | seconds) at which marking probability would hit 100%. Then in line | |||
| implemented as a left bit-shift (<<). Then (if U=1) the result is | 3d (if U=1) the result is compared with a uniformly distributed | |||
| compared with a uniformly distributed random number, which ensures | random number between 0 and 1, which ensures that marking | |||
| that marking probability will linearly increase with queueing | probability will linearly increase with queueing time. The | |||
| time. | scaling parameter is expressed as a power of 2 so that division | |||
| can be implemented as a right bit-shift (>>) in line 3 of the | ||||
| integer variant of the pseudocode (Figure 2). | ||||
| Classic: If the test at line 8 determines that there is at least one | Classic: If the test at line 7 determines that there is at least one | |||
| packet to dequeue from the Classic queue, the test at line 10 | Classic packet to dequeue, the test at line 9b determines whether | |||
| determines whether to drop it. But before that, line 9 updates | to drop it. But before that, line 8b updates Q_C, which is an | |||
| Q_C, which is an exponentially weighted moving average (Note 4) of | exponentially weighted moving average (Note 5) of the queuing time | |||
| the queuing time in the Classic queue, pkt.time() is the | in the Classic queue, where pkt.sec() is the instantaneous | |||
| instantaneous Classic queueing time and f_C is the EWMA constant | queueing time of the current Classic packet and alpha is the EWMA | |||
| for the classic queue, represented as an integer power of 2, so | constant for the classic queue. In line 8a, alpha is represented | |||
| that the division needed to weight the moving average can be | as an integer power of 2, so that in line 8 of the integer code | |||
| implemented by a right bit-shift (>>). | the division needed to weight the moving average can be | |||
| implemented by a right bit-shift (>> f_C). | ||||
| Line 10 implements the drop function. Drop is proportional to the | Lines 9a and 9b implement the drop function. In line 9a the | |||
| square (assuming U=1) of the averaged queueing time. First the | averaged queuing time Q_C is divided by the Classic scaling | |||
| averaged queuing time Q_C is multiplied by the Classic slope | parameter 2^S_C, in the same way that queuing time was scaled for | |||
| parameter S_C that determines how steeply drop probability rises | L4S marking. This scaled queuing time is given the variable name | |||
| with queuing time. Again, the slope parameter S_C is expressed as | sqrt_p_C because it will be squared to compute Classic drop | |||
| a power of 2, so that multiplication can be implemented as a left | probability, so before it is squared it is effectively the square | |||
| bit-shift (<<). The scaled queuing time is then compared with the | root of the drop probability. The squaring is done by comparing | |||
| maximum of two random numbers (assuming U=1). Comparing it with | it with the maximum out of two random numbers (assuming U=1). | |||
| the maximum of two, is the same as the logical `AND' of two tests, | Comparing it with the maximum out of two is the same as the | |||
| which ensures drop probability rises with the square of queuing | logical `AND' of two tests, which ensures drop probability rises | |||
| time. (Note 5) | with the square of queuing time (Note 6). Again, the scaling | |||
| parameter is expressed as a power of 2 so that division can be | ||||
| implemented as a right bit-shift in line 9 of the integer | ||||
| pseudocode. | ||||
| The marking/dropping functions in each queue (lines 3 & 10) are a new | The marking/dropping functions in each queue (lines 3 & 9) are two | |||
| generalisation of RED called Curvy RED, motivated as follows. When | cases of a new generalization of RED called Curvy RED, motivated as | |||
| we compared the performance of our AQM with fq_CoDel and PIE, we came | follows. When we compared the performance of our AQM with fq_CoDel | |||
| to the conclusion that their goal of holding queuing delay to a hard | and PIE, we came to the conclusion that their goal of holding queuing | |||
| target is misguided [CRED_Insights]. As the number of flows | delay to a fixed target is misguided [CRED_Insights]. As the number | |||
| increases, if the AQM does not allow TCP to increase queuing delay, | of flows increases, if the AQM does not allow TCP to increase queuing | |||
| it has to introduce unusually high levels of loss. Then loss rather | delay, it has to introduce abnormally high levels of loss. Then loss | |||
| than queuing becomes the dominant cause of delay for short flows, due | rather than queuing becomes the dominant cause of delay for short | |||
| to timeouts and tail losses. | flows, due to timeouts and tail losses. | |||
| Curvy RED constrains delay with a softened target that allows some | Curvy RED constrains delay with a softened target that allows some | |||
| increase in delay as load increases. This is achieved by increasing | increase in delay as load increases. This is achieved by increasing | |||
| drop probability on a convex curve relative to queue growth (a square | drop probability on a convex curve relative to queue growth (the | |||
| curve in the Classic queue, if U=1). Like RED, the curve hugs the | square curve in the Classic queue, if U=1). Like RED, the curve hugs | |||
| zero axis while the queue is shallow. Then, as load increases, it | the zero axis while the queue is shallow. Then, as load increases, | |||
| introduces a growing barrier to higher delay. But, unlike RED, it | it introduces a growing barrier to higher delay. But, unlike RED, it | |||
| requires only one parameter, the slope, not three. | requires only one parameter, the scaling, not three. | |||
| There follows a summary listing of the two parameters used for each | There follows a summary listing of the two parameters used for each | |||
| of the two queues: | of the two queues: | |||
| Classic: | Classic: | |||
| S_C : The slope that scales the queuing time of the Classic | S_C : The scaling factor of the dropping function scales Classic | |||
| queue into a dropping probability in the range [0,1). To make | queuing times in the range [0, 2^(S_C)] seconds into a dropping | |||
| multiplication efficient, we use an integer power of two. So | probability in the range [0,1]. To make division efficient, it | |||
| we define the slope of the AQM's square curve as 2^(S_C); | is constrained to be an integer power of two; | |||
| f_C : To smooth the queuing time of the Classic queue and make | f_C : To smooth the queuing time of the Classic queue and make | |||
| multiplication efficient, we use an integer power of two for | multiplication efficient, we use a negative integer power of | |||
| the EWMA constant, which we define as 2^(-f_C). | two for the dimensionless EWMA constant, which we define as | |||
| 2^(-f_C). | ||||
| L4S : | L4S : | |||
| S_L (or k): As for the Classic queue, we define the slope of the | S_L (and k): As for the Classic queue, the scaling factor of the | |||
| L4S marking function as 2^(S_L). Note that S_L = S_C + k, | L4S marking function scales Classic queueing times in the range | |||
| where k is the coupling between the queues (Section 2.1). So | [0, 2^(S_L)] seconds into a probability in the range [0,1]. | |||
| S_L and k count as only one parameter; | Note that S_L = S_C + k, where k is the coupling between the | |||
| queues (Section 2.1). So S_L and k count as only one | ||||
| parameter; | ||||
| T : The queue size in packets at which step threshold marking | T : The queue size in bytes at which step threshold marking | |||
| starts in the L4S queue. | starts in the L4S queue. | |||
| {ToDo: These are the raw parameters used within the algorithm. A | ||||
| configuration front-end could accept more meaningful parameters and | ||||
| convert them into these raw parameters.} | ||||
| From our experiments so far, recommended values for these parameters | From our experiments so far, recommended values for these parameters | |||
| are: S_C = 1; f_C = 5; T = 5 packets. The setting of k depends on | are: S_C = -1; f_C = 5; T = 5 * MTU for the range of base RTTs | |||
| policy (see Section 2.4) | typical on the public Internet. [CRED_Insights] explains why these | |||
| There is also a cUrviness parameter, U, which is likely to take the | parameters are applicable whatever rate link this AQM implementation | |||
| same hard-coded value for all implementations, once experiments have | is deployed on and how the parameters would need to be adjusted for a | |||
| determined a good value. We have solely used U=1 in our experiments | scenario with a different range of RTTs (e.g. a data centre) {ToDo | |||
| so far, but results might be even better with U=2 or higher. | incorporate a summary of that report into this draft}. The setting of | |||
| k depends on policy (see Section 2.4 and Appendix B respectively for | ||||
| its recommended setting and guidance on alternatives). | ||||
| Note that the dropping function at line 10 calls maxrand(2*U), which | There is also a cUrviness parameter, U, which is a small positive | |||
| gives twice as much curviness as the the call to maxrand(U) in the | integer. It is likely to take the same hard-coded value for all | |||
| implementations, once experiments have determined a good value. We | ||||
| have solely used U=1 in our experiments so far, but results might be | ||||
| even better with U=2 or higher. | ||||
| Note that the dropping function at line 9 calls maxrand(2*U), which | ||||
| gives twice as much curviness as the call to maxrand(U) in the | ||||
| marking function at line 3. This is the trick that implements the | marking function at line 3. This is the trick that implements the | |||
| square rule in equation (1) (Section 2.1). So, when U=1, the L4S | square rule in equation (1) (Section 2.1). This is based on the fact | |||
| marking function is linear and the Classic dropping function is | that, given a number X from 1 to 6, the probability that two dice | |||
| squared. When U=2, L4S is squared and Classic is quartic. And so | throws will both be less than X is the square of the probability that | |||
| on. | one throw will be less than X. So, when U=1, the L4S marking | |||
| function is linear and the Classic dropping function is squared. If | ||||
| U=2, L4S would be a square function and Classic would be quartic. | ||||
| And so on. | ||||
| The maxrand(u) function in lines 17-22 simply generates u random | The maxrand(u) function in lines 16-21 simply generates u random | |||
| numbers and returns the maximum. Typically, maxrand(u) could be run | numbers and returns the maximum (Note 7). Typically, maxrand(u) | |||
| in parallel out of band. For instance, if U=1, the Classic queue | could be run in parallel out of band. For instance, if U=1, the | |||
| would require the maximum of two random numbers. So, the maximum of | Classic queue would require the maximum of two random numbers. So, | |||
| every pair of values from a pseudorandom number generator could be | instead of calling maxrand(2*U) in-band, the maximum of every pair of | |||
| generated out-of-band, and held in a buffer ready for the Classic | values from a pseudorandom number generator could be generated out- | |||
| queue to consume, instead of calling maxrand(2*U) in-band. | of-band, and held in a buffer ready for the Classic queue to consume. | |||
| 1: dualq_dequeue(lq, cq) { % Couples L4S & Classic queues, lq & cq | ||||
| 2: if ( lq.dequeue(pkt) ) { | ||||
| 3: if ((lq.byt() > T) || ((cq.ns() >> (S_L-2)) > maxrand(U))) | ||||
| 4: mark(pkt) | ||||
| 5: return(pkt) % return the packet and stop here | ||||
| 6: } | ||||
| 7: while ( cq.dequeue(pkt) ) { | ||||
| 8: Q_C += (pkt.ns() - Q_C) >> f_C % Classic Q EWMA | ||||
| 9: if ( (Q_C >> (S_C-2) ) > maxrand(2*U) ) | ||||
| 10: drop(pkt) % Squared drop, redo loop | ||||
| 11: else | ||||
| 12: return(pkt) % return the packet and stop here | ||||
| 13: } | ||||
| 14: return(NULL) % no packet to dequeue | ||||
| 15: } | ||||
| Figure 2: Optimised Example Dequeue Pseudocode for Coupled DualQ AQM | ||||
| using Integer Arithmetic | ||||
| Notes: | Notes: | |||
| 1. The drain rate of the queue can vary if it is scheduled relative | 1. The drain rate of the queue can vary if it is scheduled relative | |||
| to other queues, or to cater for fluctuations in a wireless | to other queues, or to cater for fluctuations in a wireless | |||
| medium. To auto-adjust to changes in drain rate, the queue must | medium. To auto-adjust to changes in drain rate, the queue must | |||
| be measured in time, not bytes or packets [CoDel]. In our Linux | be measured in time, not bytes or packets [CoDel]. In our Linux | |||
| implementation, it was easiest to measure queuing time at | implementation, it was easiest to measure queuing time at | |||
| dequeue. Queuing time can be estimated when a packet is enqueued | dequeue. Queuing time can be estimated when a packet is enqueued | |||
| by measuring the queue length in bytes and dividing by the recent | by measuring the queue length in bytes and dividing by the recent | |||
| drain rate. | drain rate. | |||
| 2. An implementation has to use priority queueing, but it need not | 2. An implementation has to use priority queueing, but it need not | |||
| implement strict priority. | implement strict priority. | |||
| 3. If packets can be enqueued while processinging dequeue code, an | 3. If packets can be enqueued while processing dequeue code, an | |||
| implementer might prefer to place the while loop around both | implementer might prefer to place the while loop around both | |||
| queues so that it goes back to test again whether any L4S packets | queues so that it goes back to test again whether any L4S packets | |||
| arrived while it was dropping a Classic packet. | arrived while it was dropping a Classic packet. | |||
| 4. An EWMA is only one possible way to filter bursts; other more | 4. In order not to change too many factors at once, for now, we keep | |||
| adaptive smoothing methods could be valid. | the marking function for DCTCP-only traffic as similar as | |||
| possible to DCTCP. However, unlike DCTCP, all processing is at | ||||
| dequeue, so we determine whether to mark a packet at the head of | ||||
| the queue by the byte-length of the queue _behind_ it. We plan | ||||
| to test whether using queuing time will work in all | ||||
| circumstances, and if we find that the step can cause | ||||
| oscillations, we will investigate replacing it with a steep | ||||
| random marking curve. | ||||
| 5. In practice at line 11 the Classic queue would probably test for | 5. An EWMA is only one possible way to filter bursts; other more | |||
| adaptive smoothing methods could be valid and it might be | ||||
| appropriate to decrease the EWMA faster than it increases. | ||||
| 6. In practice at line 10 the Classic queue would probably test for | ||||
| ECN capability on the packet to determine whether to drop or mark | ECN capability on the packet to determine whether to drop or mark | |||
| the packet. However, for brevity such detail is omitted. All | the packet. However, for brevity such detail is omitted. All | |||
| packets classified into the L4S queue have to be ECN-capable, so | packets classified into the L4S queue have to be ECN-capable, so | |||
| no dropping logic is necessary at line 3. Nonetheless, L4S | no dropping logic is necessary at line 3. Nonetheless, L4S | |||
| packets could be dropped by overload code. | packets could be dropped by overload code (see Section 4.1). | |||
| 7. In the integer variant of the pseudocode (Figure 2) real numbers | ||||
| are all represented as integers scaled up by 2^32. In lines 3 & | ||||
| 9 the function maxrand() is arranged to return an integer in the | ||||
| range 0 <= maxrand() < 2^32. Queuing times are also scaled up by | ||||
| 2^32, but in two stages: i) In lines 3 and 8 queuing times | ||||
| cq.ns() and pkt.ns() are returned in integer nanoseconds, making | ||||
| the values about 2^30 times larger than when the units were | ||||
| seconds, ii) then in lines 3 and 9 an adjustment of -2 to the | ||||
| right bit-shift multiplies the result by 2^2, to complete the | ||||
| scaling by 2^32. | ||||
| Appendix B. Guidance on Controlling Throughput Equivalence | Appendix B. Guidance on Controlling Throughput Equivalence | |||
| +---------------+------+-------+ | +---------------+------+-------+ | |||
| | RTT_C / RTT_L | Reno | Cubic | | | RTT_C / RTT_L | Reno | Cubic | | |||
| +---------------+------+-------+ | +---------------+------+-------+ | |||
| | 1 | k=1 | k=0 | | | 1 | k=1 | k=0 | | |||
| | 2 | k=2 | k=1 | | | 2 | k=2 | k=1 | | |||
| | 3 | k=2 | k=2 | | | 3 | k=2 | k=2 | | |||
| | 4 | k=3 | k=2 | | | 4 | k=3 | k=2 | | |||
| skipping to change at page 17, line 33 | skipping to change at page 21, line 7 | |||
| whether it wants DCTCP flows to have roughly equal throughput with | whether it wants DCTCP flows to have roughly equal throughput with | |||
| Reno or with Cubic (because, even in its Reno-compatibility mode, | Reno or with Cubic (because, even in its Reno-compatibility mode, | |||
| Cubic is about 1.4 times more aggressive than Reno). Then the | Cubic is about 1.4 times more aggressive than Reno). Then the | |||
| operator needs to decide at what ratio of RTTs it wants DCTCP and | operator needs to decide at what ratio of RTTs it wants DCTCP and | |||
| Classic flows to have roughly equal throughput. For example choosing | Classic flows to have roughly equal throughput. For example choosing | |||
| the recommended value of k=0 will make DCTCP throughput roughly the | the recommended value of k=0 will make DCTCP throughput roughly the | |||
| same as Cubic, _if their RTTs are the same_. | same as Cubic, _if their RTTs are the same_. | |||
| However, even if the base RTTs are the same, the actual RTTs are | However, even if the base RTTs are the same, the actual RTTs are | |||
| unlikely to be the same, because Classic (Cubic or Reno) traffic | unlikely to be the same, because Classic (Cubic or Reno) traffic | |||
| needs a large queue to avoid under-utilisation and excess drop, | needs a large queue to avoid under-utilization and excess drop, | |||
| whereas L4S (DCTCP) does not. The operator might still choose this | whereas L4S (DCTCP) does not. The operator might still choose this | |||
| policy if it judges that DCTCP throughput should be rewarded for | policy if it judges that DCTCP throughput should be rewarded for | |||
| keeping its own queue short. | keeping its own queue short. | |||
| On the other hand, the operator will choose one of the higher values | On the other hand, the operator will choose one of the higher values | |||
| for k, if it wants to slow DCTCP down to roughly the same throughput | for k, if it wants to slow DCTCP down to roughly the same throughput | |||
| as Classic flows, to compensate for Classic flows slowing themselves | as Classic flows, to compensate for Classic flows slowing themselves | |||
| down by causing themselves extra queuing delay. | down by causing themselves extra queuing delay. | |||
| The values for k in the table are derived from the formulae, which | The values for k in the table are derived from the formulae, which | |||
| was developed in [DCttH15]: | was developed in [DCttH15]: | |||
| 2^k = 1.64 (RTT_reno / RTT_dc) (2) | 2^k = 1.64 (RTT_reno / RTT_dc) (2) | |||
| 2^k = 1.19 (RTT_cubic / RTT_dc ) (3) | 2^k = 1.19 (RTT_cubic / RTT_dc ) (3) | |||
| For localised traffic from a particular ISP's data centre, we used | For localized traffic from a particular ISP's data centre, we used | |||
| the measured RTTs to calculate that a value of k=3 would achieve | the measured RTTs to calculate that a value of k=3 would achieve | |||
| throughput equivalence, and our experiments verified the formula very | throughput equivalence, and our experiments verified the formula very | |||
| closely. | closely. | |||
| Appendix C. DCTCP Safety Enhancements | ||||
| This Appendix is informational not normative. It records changes | ||||
| needed to DCTCP implementations so they can co-exist safely alongside | ||||
| other traffic sources. They are recorded here until a more | ||||
| appropriate draft is available to hold them. | ||||
| Proposed changes are listed in rough order of criticality. Therefore | ||||
| those later in the list may not be necessary: | ||||
| o Negotiate its altered feedback semantics, which conveys the extent | ||||
| of ECN marking, not just its existence, and this feedback needs to | ||||
| be robust to loss [I-D.ietf-tcpm-accecn-reqs]; | ||||
| o fall back to Reno or Cubic behaviour on loss; | ||||
| o use a packet identifier associated with the L4S service; | ||||
| o average ECN feedback over its own RTT, not the hard-coded RTT | ||||
| suitable only for data-centres, perhaps like Relentless | ||||
| TCP [Mathis09]; | ||||
| o handle a window of less than 2 when the RTT is low, rather than | ||||
| increase the queue [TCP-sub-mss-w]. | ||||
| o test heuristically whether ECN marking is emanating from an | ||||
| RFC3168 AQM. | ||||
| Other, non-essential enhancements to DCTCP can be envisaged. | ||||
| Authors' Addresses | Authors' Addresses | |||
| Koen De Schepper | Koen De Schepper | |||
| Bell Labs | Bell Labs | |||
| Antwerp | Antwerp | |||
| Belgium | Belgium | |||
| Email: koen.de_schepper@alcatel-lucent.com | Email: koen.de_schepper@alcatel-lucent.com | |||
| URI: https://www.bell-labs.com/researchers/638/ | URI: https://www.bell-labs.com/usr/koen.de_schepper | |||
| Bob Briscoe (editor) | Bob Briscoe (editor) | |||
| Independent | Independent | |||
| Email: ietf@bobbriscoe.net | Email: ietf@bobbriscoe.net | |||
| URI: http://bobbriscoe.net/ | URI: http://bobbriscoe.net/ | |||
| Olga Bondarenko | Olga Bondarenko | |||
| Simula Research Lab | Simula Research Lab | |||
| Lysaker | Lysaker | |||
| End of changes. 66 change blocks. | ||||
| 233 lines changed or deleted | 414 lines changed or added | |||
This html diff was produced by rfcdiff 1.42. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||