| draft-briscoe-tsvwg-byte-pkt-mark-00.txt | draft-briscoe-tsvwg-byte-pkt-mark-01.txt | |||
|---|---|---|---|---|
| Transport Area Working Group B. Briscoe | Transport Area Working Group B. Briscoe | |||
| Internet-Draft BT & UCL | Internet-Draft BT & UCL | |||
| Intended status: Informational June 17, 2007 | Intended status: Informational November 19, 2007 | |||
| Expires: December 19, 2007 | Expires: May 22, 2008 | |||
| Byte and Packet Congestion Notification | Byte and Packet Congestion Notification | |||
| draft-briscoe-tsvwg-byte-pkt-mark-00 | draft-briscoe-tsvwg-byte-pkt-mark-01 | |||
| Status of this Memo | Status of this Memo | |||
| By submitting this Internet-Draft, each author represents that any | By submitting this Internet-Draft, each author represents that any | |||
| applicable patent or other IPR claims of which he or she is aware | applicable patent or other IPR claims of which he or she is aware | |||
| have been or will be disclosed, and any of which he or she becomes | have been or will be disclosed, and any of which he or she becomes | |||
| aware will be disclosed, in accordance with Section 6 of BCP 79. | aware will be disclosed, in accordance with Section 6 of BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF), its areas, and its working groups. Note that | Task Force (IETF), its areas, and its working groups. Note that | |||
| skipping to change at page 1, line 34 | skipping to change at page 1, line 34 | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| The list of current Internet-Drafts can be accessed at | The list of current Internet-Drafts can be accessed at | |||
| http://www.ietf.org/ietf/1id-abstracts.txt. | http://www.ietf.org/ietf/1id-abstracts.txt. | |||
| The list of Internet-Draft Shadow Directories can be accessed at | The list of Internet-Draft Shadow Directories can be accessed at | |||
| http://www.ietf.org/shadow.html. | http://www.ietf.org/shadow.html. | |||
| This Internet-Draft will expire on December 19, 2007. | This Internet-Draft will expire on May 22, 2008. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (C) The IETF Trust (2007). | Copyright (C) The IETF Trust (2007). | |||
| Abstract | Abstract | |||
| This memo was written to clarify how (and whether) to take packet | This memo concerns dropping or marking packets using active queue | |||
| size into account when notifying congestion using active queue | management (AQM) such as random early detection (RED) or pre- | |||
| management (AQM) such as random early detection (RED). The scope | congestion notification (PCN). It answers the question of whether to | |||
| includes resource congestion by bytes and by packet processing, even | take packet size into account when network equipment writes | |||
| though the latter is less common. It answers the question of whether | congestion notification, or when transports read it. The primary | |||
| packet size should be taken into account when network equipment | conclusion is that the variant of RED that gives lower drop | |||
| writes congestion notification, or when transports read it. The | probability to smaller packets (byte-mode packet drop) should not be | |||
| primary conclusion is that RED's byte-mode packet drop should not be | ||||
| used because it creates a perverse incentive for transports to use | used because it creates a perverse incentive for transports to use | |||
| tiny segments. TCP's lack of attention to packet size should be | tiny segments, consequently also opening up a DoS vulnerability. | |||
| fixed in TCP, not by reverse engineering network forwarding to fix | TCP's lack of attention to packet size and its sensitivity to loss of | |||
| transport protocols. | SYNs and ACKs should be fixed in TCP, not by reverse engineering | |||
| network forwarding to fix transport protocols. Nonetheless raw drop- | ||||
| tail is just as vulnerable to gaming by small packets, so AQM itself | ||||
| should not be turned off. | ||||
| Table of Contents | Table of Contents | |||
| 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
| 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 6 | 2. Requirements notation . . . . . . . . . . . . . . . . . . . . 6 | |||
| 3. Working Definition of Congestion Notification . . . . . . . . 6 | 3. Working Definition of Congestion Notification . . . . . . . . 7 | |||
| 4. Congestion Measurement . . . . . . . . . . . . . . . . . . . . 7 | 4. Congestion Measurement . . . . . . . . . . . . . . . . . . . . 7 | |||
| 5. Idealised Wire Protocol Coding . . . . . . . . . . . . . . . . 8 | 5. Idealised Wire Protocol Coding . . . . . . . . . . . . . . . . 8 | |||
| 6. The State of the Art . . . . . . . . . . . . . . . . . . . . . 10 | 6. The State of the Art . . . . . . . . . . . . . . . . . . . . . 10 | |||
| 6.1. Congestion Measurement: Status . . . . . . . . . . . . . . 10 | 6.1. Congestion Measurement: Status . . . . . . . . . . . . . . 10 | |||
| 6.2. Congestion Coding: Status . . . . . . . . . . . . . . . . 11 | 6.2. Congestion Coding: Status . . . . . . . . . . . . . . . . 11 | |||
| 6.2.1. Network Bias when Encoding . . . . . . . . . . . . . . 11 | 6.2.1. Network Bias when Encoding . . . . . . . . . . . . . . 11 | |||
| 6.2.2. Transport Bias when Decoding . . . . . . . . . . . . . 12 | 6.2.2. Transport Bias when Decoding . . . . . . . . . . . . . 13 | |||
| 6.2.3. Congestion Coding: Summary of Status . . . . . . . . . 14 | 6.2.3. Congestion Coding: Summary of Status . . . . . . . . . 14 | |||
| 7. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 15 | 7. Outstanding Issues and Next Steps . . . . . . . . . . . . . . 15 | |||
| 7.1. Bit-congestible World . . . . . . . . . . . . . . . . . . 15 | 7.1. Bit-congestible World . . . . . . . . . . . . . . . . . . 15 | |||
| 7.2. Bit- & Packet-congestible World . . . . . . . . . . . . . 16 | 7.2. Bit- & Packet-congestible World . . . . . . . . . . . . . 16 | |||
| 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 17 | |||
| 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 19 | 9. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . 17 | |||
| 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 | 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
| 11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 20 | 11. Comments Solicited . . . . . . . . . . . . . . . . . . . . . . 19 | |||
| Appendix A. Example Scenarios . . . . . . . . . . . . . . . . . . 20 | Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . . | |||
| A.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . 20 | Appendix A. Example Scenarios . . . . . . . . . . . . . . . . . . 19 | |||
| A.2. Bit-congestible resource, equal bit rates (Ai) . . . . . . 21 | A.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . 19 | |||
| A.3. Bit-congestible resource, equal packet rates (Bi) . . . . 22 | A.2. Bit-congestible resource, equal bit rates (Ai) . . . . . . 20 | |||
| A.3. Bit-congestible resource, equal packet rates (Bi) . . . . 21 | ||||
| A.4. Pkt-congestible resource, equal bit rates (Aii) . . . . . 22 | A.4. Pkt-congestible resource, equal bit rates (Aii) . . . . . 22 | |||
| A.5. Pkt-congestible resource, equal packet rates (Bii) . . . . 23 | A.5. Pkt-congestible resource, equal packet rates (Bii) . . . . 22 | |||
| 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 | Appendix B. Congestion Notification Definition: Further | |||
| 12.1. Normative References . . . . . . . . . . . . . . . . . . . 23 | Justification . . . . . . . . . . . . . . . . . . . . 23 | |||
| 12.2. Informative References . . . . . . . . . . . . . . . . . . 24 | Appendix C. Byte-mode Drop Complicates Policing Congestion | |||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 26 | Response . . . . . . . . . . . . . . . . . . . . . . 23 | |||
| Intellectual Property and Copyright Statements . . . . . . . . . . 27 | 12. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 | |||
| 12.1. Normative References . . . . . . . . . . . . . . . . . . . 25 | ||||
| 12.2. Informative References . . . . . . . . . . . . . . . . . . 26 | ||||
| Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 28 | ||||
| Intellectual Property and Copyright Statements . . . . . . . . . . 29 | ||||
| 1. Introduction | 1. Introduction | |||
| When notifying congestion, the problem of how (and whether) to take | When notifying congestion, the problem of how (and whether) to take | |||
| packet sizes into account has exercised the minds of researchers and | packet sizes into account has exercised the minds of researchers and | |||
| practitioners for as long as active queue management (AQM) has been | practitioners for as long as active queue management (AQM) has been | |||
| discussed. This memo aims to state the principles we should be using | discussed. Indeed, AQM was originally introduced largely to remove | |||
| and to come to conclusions on what these principles will mean for | the advantage that small packets get from drop-tail queues. This | |||
| future protocol design, taking into account the deployments we have | memo aims to state the principles we should be using and to come to | |||
| already. | conclusions on what these principles will mean for future protocol | |||
| design, taking into account the deployments we have already. | ||||
| Note that the byte vs. packet dilemma concerns congestion | ||||
| notification irrespective of whether it is signalled implicitly by | ||||
| drop or using explicit congestion notification (ECN [RFC3168]). | ||||
| Throughout this document, unless clear from the context, the term | ||||
| congestion marking, or just marking, will be used to mean either drop | ||||
| or explicit congestion notification. | ||||
| If the load on a resource depends on the rate at which packets | If the load on a resource depends on the rate at which packets | |||
| arrive, it is called packet-congestible. If the load depends on the | arrive, it is called packet-congestible. If the load depends on the | |||
| rate at which bits arrive it is called bit-congestible. | rate at which bits arrive it is called bit-congestible. | |||
| Examples of packet-congestible resources are route look-up engines | Examples of packet-congestible resources are route look-up engines | |||
| and firewalls, because load depends on how many packet headers they | and firewalls, because load depends on how many packet headers they | |||
| have to process. Examples of bit-congestible resources are | have to process. Examples of bit-congestible resources are | |||
| transmission links, and buffer memory, because the load depends on | transmission links, and buffer memory, because the load depends on | |||
| how many bits they have to transmit or store. Note that information | how many bits they have to transmit or store. Note that information | |||
| is generally processed or transmitted with a minimum granularity | is generally processed or transmitted with a minimum granularity | |||
| greater than a bit. The appropriate granularity for the resource in | greater than a bit (e.g. octets). The appropriate granularity for | |||
| question SHOULD be used, but for the sake of brevity we will talk in | the resource in question SHOULD be used, but for the sake of brevity | |||
| terms of bytes in this memo. | we will talk in terms of bytes in this memo. | |||
| Resources may be congestible at higher levels of granularity than | Resources may be congestible at higher levels of granularity than | |||
| packets, for instance stateful firewalls are flow-congestible and | packets, for instance stateful firewalls are flow-congestible and | |||
| call-servers are session-congestible. This memo focuses on | call-servers are session-congestible. This memo focuses on | |||
| congestion of connectionless resources, but the same principles may | congestion of connectionless resources, but the same principles may | |||
| be applied for congestion notification protocols controlling per-flow | be applied for congestion notification protocols controlling per-flow | |||
| and per-session processing or state. | and per-session processing or state. | |||
| The byte vs. packet dilemma arises at three stages in the congestion | The byte vs. packet dilemma arises at three stages in the congestion | |||
| notification process: | notification process: | |||
| Measuring congestion When the congested resource decides locally how | Measuring congestion When the congested resource decides locally how | |||
| to measure how congested it is (should the queue be measured in | to measure how congested it is. (Should the queue be measured in | |||
| bytes or packets?); | bytes or packets?); | |||
| Coding congestion notification into the wire protocol: When the | Coding congestion notification into the wire protocol: When the | |||
| congested resource decides how to notify the level of congestion | congested resource decides how to notify the level of congestion. | |||
| (should the level of notification depend on the byte-size of each | (Should the level of notification depend on the byte-size of each | |||
| particular packet carrying the notification?); | particular packet carrying the notification?); | |||
| Decoding congestion notification from the wire protocol: When the | Decoding congestion notification from the wire protocol: When the | |||
| transport interprets the notification (should the byte-size of a | transport interprets the notification. (Should the byte-size of a | |||
| missing or marked packet be taken into account?). | missing or marked packet be taken into account?). | |||
| In RED, whether to use packets or bytes when measuring queues is | In RED, whether to use packets or bytes when measuring queues is | |||
| called packet-mode or byte-mode queue measurement. This choice is | called packet-mode or byte-mode queue measurement. This choice is | |||
| now fairly well understood but is included in Section 4 to document | now fairly well understood but is included in Section 4 to document | |||
| it in the RFC series. | it in the RFC series. | |||
| The controversy is mainly around the other two stages: whether to | The controversy is mainly around the other two stages: whether to | |||
| allow for packet size when the network codes or when the transport | allow for packet size when the network codes or when the transport | |||
| decodes congestion notification. In RED, this choice is termed | decodes congestion notification. In RED, the variant that reduces | |||
| packet-mode or byte-mode drop as opposed to queue measurement, which | drop probability for packets based on their size in bytes is called | |||
| is an orthogonal choice. Note that this issue concerns how much each | byte-mode drop, while the variant that doesn't is called packet mode | |||
| congestion notification on a packet should be taken to mean, | drop. Whether queues are measured in bytes or packets is an | |||
| irrespective of whether it is signalled implicitly by drop or | orthogonal choice, termed byte-mode queue measurement or packet-mode | |||
| explicitly using ECN [RFC3168]. | queue measurement. | |||
| Currently, the paper trail of advice referenced from the RFC series | ||||
| conditionally recommends byte-mode (packet-size dependent) drop, | ||||
| although all the implementers who responded to our survey have | ||||
| ignored this advice. The primary purpose of this memo is to build a | ||||
| definitive consensus against allowing for packet size in AQM | ||||
| algorithms and record this advice within the RFC series. | ||||
| Increasingly, it is being recognised that a protocol design must take | Increasingly, it is being recognised that a protocol design must take | |||
| care not to cause unintended consequences by giving the parties in | care not to cause unintended consequences by giving the parties in | |||
| the protocol exchange perverse incentives [Evol_cc][RFC3426]. For | the protocol exchange perverse incentives [Evol_cc][RFC3426]. For | |||
| instance, imagine a scenario where the same bit rate of packets will | instance, imagine a scenario where the same bit rate of packets will | |||
| contribute the same to congestion of a link irrespective of whether | contribute the same to congestion of a link irrespective of whether | |||
| it is sent as fewer larger packets or more smaller packets. A | it is sent as fewer larger packets or more smaller packets. A | |||
| protocol design that caused larger packets to be more likely to be | protocol design that caused larger packets to be more likely to be | |||
| dropped than smaller ones would be dangerous in this case. | dropped than smaller ones would be dangerous in this case. | |||
| Transports would tend to act in their own interests by breaking their | Transports would tend to act in their own interests by breaking their | |||
| data stream down into tiny segments, reducing their drop rate without | data stream down into tiny segments, reducing their drop rate without | |||
| reducing their bit rate. Encouraging a high volume of tiny packets | reducing their bit rate. Further, encouraging a high volume of tiny | |||
| might in turn unnecessarily overload a completely unrelated part of | packets might in turn unnecessarily overload a completely unrelated | |||
| the system. | part of the system, perhaps more limited by header-processing than | |||
| bandwidth. | ||||
| Currently, the paper trail of advice referenced from the RFC series | ||||
| (sort of) recommends exactly such packet-size dependent drop, | ||||
| although we believe implementers may have ignored the advice. The | ||||
| primary purpose of this memo is to explain why that advice should be | ||||
| reversed and eventually to record a definitive consensus within the | ||||
| RFC series. | ||||
| Imagine two flows arrive at a bit-congestible transmission link each | Imagine two flows arrive at a bit-congestible transmission link each | |||
| with the same bit rate, say 1Mbps, but one consists of 1500B and the | with the same bit rate, say 1Mbps, but one consists of 1500B and the | |||
| other 60B packets. For bit-congestible resources, it is currently | other 60B packets, which are 25x smaller. If the advice referred to | |||
| recommended that RED should be configured to adjust the drop | from RFC2309 is followed, gentle RED [gentle_RED] would be used, | |||
| probability of packets in proportion to each packet's size (byte mode | configured to adjust the drop probability of packets in proportion to | |||
| packet drop). So in this case, if RED drops 25% of the larger | each packet's size (byte mode packet drop). So in this case, if RED | |||
| packets, it will drop 1% of the smaller packets. The bit rate passed | drops 25% of the larger packets, it will aim to drop 1% of the | |||
| to the line by the RED queue will therefore be 750k for the flow of | smaller packets (but in practice it may drop more as congestion | |||
| larger packets but 990k for flow of smaller packets, even though they | increases [RFC4828](S.B.4)[Note_Variation]). Even though both flows | |||
| both arrived with the same bit rate. | arrive with the same bit rate, the bit rate the RED queue aims to | |||
| pass to the line will be 750k for the flow of larger packet but 990k | ||||
| for the smaller packets (but because of rate variation it will be | ||||
| less than this target). It can be seen that this behaviour reopens | ||||
| the same denial of service vulnerability that drop tail queues offer | ||||
| to floods of small packet, though not necessarily as strongly (see | ||||
| Section 8). | ||||
| The reason it was recommended that RED should work like this is that | The above advice (that referred to by RFC2309) says the question of | |||
| TCP has always been the predominant transport used in the Internet, | whether a packet's own size should affect its drop probability | |||
| and TCP congestion control ensures that flows competing for the same | "depends on the dominant end-to-end congestion control mechanisms". | |||
| resource each maintain the same number of segments in flight, | But we argue the network layer should not be optimised for whatever | |||
| irrespective of segment size. Rather than discuss the possibility of | transport is predominant. For instance, TCP congestion control | |||
| fixing the problem in TCP, it was recommended that routers should be | ensures that flows competing for the same resource each maintain the | |||
| altered to reverse engineer the network layer around TCP, contrary to | same number of segments in flight, irrespective of segment size. | |||
| the excellent advice in [RFC3426], which asks designers to question | Even though reducing the drop probability of small packets helps | |||
| "Why are you proposing a solution at this layer of the protocol | correct this feature of TCP, we argue it should be corrected in TCP | |||
| stack, rather than at another layer?" The implicit plan seems to | itself, not in the network. Favouring small packets also reduces the | |||
| have been to use gradual RED deployment in the network as a way to | chance of dropping SYNs and pure ACKs, which has a disproportionate | |||
| make the fairness that the TCP algorithm achieves gradually change | effect on TCP performance. But again, rather than fix these problems | |||
| from equalising segment-rate to equalising bit-rate between flows. | in the network, we argue that TCP should be altered. Effectively, | |||
| This seems to be how we ended up recommending RED should use byte- | favouring small packets is reverse engineering of the network layer | |||
| mode packet drop to discard equal numbers of packets, not bits, from | around TCP, contrary to the excellent advice in [RFC3426], which asks | |||
| equal bit-rate flows. | designers to question "Why are you proposing a solution at this layer | |||
| of the protocol stack, rather than at another layer?" | ||||
| Now is a good time to discuss whether fairness between different | Now is a good time to discuss whether fairness between different | |||
| sized packets would best be implemented in the network layer, or at | sized packets would best be implemented in the network layer, or at | |||
| the transport, for a number of reasons: | the transport, for a number of reasons: | |||
| 1. The packet vs. byte issue requires speedy resolution because the | 1. The packet vs. byte issue requires speedy resolution because the | |||
| IETF pre-congestion notification (PCN) working group is in the | IETF pre-congestion notification (PCN) working group is in the | |||
| process of being chartered to produce a standards track | process of being chartered to produce a standards track | |||
| specification of its congestion marking (AQM) algorithm | specification of its congestion marking (AQM) algorithm | |||
| [PCNcharter]; | [PCNcharter]; | |||
| 2. [RFC2309] says RED may either take account of packet size or not | 2. [RFC2309] says RED may either take account of packet size or not | |||
| when dropping, but gives no recommendation between the two, | when dropping, but gives no recommendation between the two, | |||
| referring instead to advice on the performance implications in an | referring instead to advice on the performance implications in an | |||
| email [pktByteEmail], which recommends byte-mode drop, but | email [pktByteEmail], which recommends byte-mode drop. Further, | |||
| without really discussing performance. Further, just before | just before RFC2309 was issued, an addendum was added to the | |||
| RFC2309 was issued, an addendum was added to the archived email | archived email that revisited the issue of packet vs. byte-mode | |||
| that revisited the issue of packet vs. byte-mode drop in its last | drop in its last para, making the recommendation less clear-cut; | |||
| para, making the recommendation less clear-cut; | ||||
| 3. Currently, no active queue management behaviour like RED has been | 3. Without this memo, the only advice in the RFC series on packet | |||
| standardised, so implementers have no other standards guidance | size bias in AQM algorithms would be a reference to an archived | |||
| than [RFC2309], which is informational; | email in [RFC2309] (including an addendum at the end of the email | |||
| to correct the original). | ||||
| 4. The IRTF Internet Congestion Control Research Group (ICCRG) | 4. The IRTF Internet Congestion Control Research Group (ICCRG) | |||
| recently took on the challenge of building consensus on what | recently took on the challenge of building consensus on what | |||
| common congestion control support should be required from | common congestion control support should be required from | |||
| forwarding engines on routers in the future; | forwarding engines on routers in the future | |||
| [I-D.irtf-iccrg-welzl-congestion-control-open-research]. The | ||||
| 5. The Internet community needs to discuss widely whether the | wider Internet community needs to discuss whether the complexity | |||
| complexity of adjusting for packet size should be on routers or | of adjusting for packet size should be on routers or in | |||
| in transports; | transports; | |||
| 6. Given there are many good reasons why larger path max | 5. Given there are many good reasons why larger path max | |||
| transmission units (PMTUs) would help solve a number of scaling | transmission units (PMTUs) would help solve a number of scaling | |||
| issues, we don't want to create any bias against large packets | issues, we don't want to create any bias against large packets | |||
| that is greater than their true cost; | that is greater than their true cost; | |||
| 7. And finally, given it has recently been shown that TCP doesn't | 6. And finally, given it has recently been pointed out that TCP | |||
| achieve any meaningful fairness anyway | doesn't achieve any meaningful fairness anyway [Rate_fair_Dis], | |||
| [I-D.briscoe-tsvarea-fair], because it doesn't consider fairness | because it doesn't consider fairness over all the flows a user | |||
| over all the flows a user transmits nor over time, modifying the | transmits nor over time, modifying the network rather than | |||
| network so as not to have to modify TCP still won't achieve | modifying TCP still won't achieve fairness. It seems more likely | |||
| fairness. It seems more likely we have to face up to changing | we have to face up to evolving beyond TCP anyway. | |||
| TCP anyway. | ||||
| This memo starts from first principles, defining congestion | This memo starts from first principles, defining congestion | |||
| notification in Section 3 then determining the correct way to measure | notification in Section 3 then determining the correct way to measure | |||
| congestion (Section 4) and to design an idealised congestion | congestion (Section 4) and to design an idealised congestion | |||
| notification protocol (Section 5). It then surveys the advice given | notification protocol (Section 5). It then surveys the advice given | |||
| previously in the RFC series, the research literature and the | previously in the RFC series, the research literature and the | |||
| deployed legacy (Section 6) before summarising the recommended way | deployed legacy (Section 6) before listing outstanding issues | |||
| forward and listing outstanding issues (Section 7) that will need | (Section 7) that will need resolution both to achieve the ideal | |||
| resolution both to achieve the ideal protocol and to handle legacy. | protocol and to handle legacy. After discussing security | |||
| considerations (Section 8) strong recommendations for the way forward | ||||
| are given in the conclusions (Section 9). | ||||
| 2. Requirements notation | 2. Requirements notation | |||
| The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this | |||
| document are to be interpreted as described in [RFC2119]. | document are to be interpreted as described in [RFC2119]. | |||
| 3. Working Definition of Congestion Notification | 3. Working Definition of Congestion Notification | |||
| Rather than aim to achieve what many have tried and failed, this memo | Rather than aim to achieve what many have tried and failed, this memo | |||
| will not try to define congestion. It will give a working definition | will not try to define congestion. It will give a working definition | |||
| of what congestion notification should be taken to mean for this | of what congestion notification should be taken to mean for this | |||
| document. Congestion notification is a changing signal that aims to | document. Congestion notification is a changing signal that aims to | |||
| communicate the ratio E/L, where E is the instantaneous excess load | communicate the ratio E/L, where E is the instantaneous excess load | |||
| offered to a resource that it cannot (or would not) serve and L is | offered to a resource that it cannot (or would not) serve and L is | |||
| the instantaneous offered load. | the instantaneous offered load. | |||
| The phrase `would not serve' is added, because AQM systems (e.g. | The phrase `would not serve' is added, because AQM systems (e.g. | |||
| RED, PCN [PCN]) use a virtual capacity smaller than actual capacity, | RED, PCN [I-D.ietf-pcn-architecture]) use a virtual capacity smaller | |||
| then notify congestion of this virtual capacity in order to avoid | than actual capacity, then notify congestion of this virtual capacity | |||
| congestion of the actual capacity. | in order to avoid congestion of the actual capacity. | |||
| Note that the denominator is offered load, not capacity. Therefore | Note that the denominator is offered load, not capacity. Therefore | |||
| congestion notification is a real number bounded by the range [0,1]. | congestion notification is a real number bounded by the range [0,1]. | |||
| This ties in with the most well-understood form of congestion | This ties in with the most well-understood form of congestion | |||
| notification: drop rate. It also means that congestion has a natural | notification: drop rate. It also means that congestion has a natural | |||
| interpretation as a probability; the probability of offered traffic | interpretation as a probability; the probability of offered traffic | |||
| not being served (or being marked as at risk of not being served). | not being served (or being marked as at risk of not being served). | |||
| Appendix B describes a further incidental benefit that arises from | ||||
| Incidentally, load being the denominator also has a subtle | using load as the denominator of congestion notification. | |||
| significance in the related debate over whether desired flow rates | ||||
| should be communicated between transport and network and whether | ||||
| achievable flow rates should then be communicated back again (e.g. in | ||||
| XCP [I-D.falk-xcp-spec] & Quickstart [RFC4782]). Even though | ||||
| congestion notification doesn't communicate a rate explicitly, from | ||||
| each source's point of view congestion notification represents the | ||||
| fraction of the rate it was sending a round trip ago that couldn't | ||||
| (or wouldn't) be served by available resources. After they were | ||||
| sent, all these fractions of each source's offered load added up to | ||||
| the aggregate fraction of offered load seen by the congested | ||||
| resource. Therefore the instantaneous excess flow rate an RTT ago is | ||||
| implicitly communicated within this one scale-free dimensionless | ||||
| fraction (and a lot more). | ||||
| 4. Congestion Measurement | 4. Congestion Measurement | |||
| Queue length is usually the most correct and simplest way to measure | Queue length is usually the most correct and simplest way to measure | |||
| congestion of a resource. To avoid the pathological effects of drop | congestion of a resource. To avoid the pathological effects of drop | |||
| tail, an AQM function can then be used to transform queue length into | tail, an AQM function can then be used to transform queue length into | |||
| the probability of dropping or marking a packet (e.g. RED's | the probability of dropping or marking a packet (e.g. RED's | |||
| piecewise linear function between thresholds). If the resource is | piecewise linear function between thresholds). If the resource is | |||
| bit-congestible, the length of the queue SHOULD be measured in bytes. | bit-congestible, the length of the queue SHOULD be measured in bytes. | |||
| If the resource is packet-congestible, the length of the queue SHOULD | If the resource is packet-congestible, the length of the queue SHOULD | |||
| skipping to change at page 8, line 49 | skipping to change at page 9, line 10 | |||
| We are not saying two ECN fields will be needed (and we are not | We are not saying two ECN fields will be needed (and we are not | |||
| saying that somehow a resource should be able to drop a packet in one | saying that somehow a resource should be able to drop a packet in one | |||
| of two different ways so that the transport can distinguish which | of two different ways so that the transport can distinguish which | |||
| sort of drop it was!). These two congestion notification channels | sort of drop it was!). These two congestion notification channels | |||
| are just a conceptual device. They allow us to defer having to | are just a conceptual device. They allow us to defer having to | |||
| decide whether to distinguish between byte and packet congestion when | decide whether to distinguish between byte and packet congestion when | |||
| the network resource codes the signal or when the transport decodes | the network resource codes the signal or when the transport decodes | |||
| it. | it. | |||
| However, although this idealised mechanism isn't intended for | However, although this idealised mechanism isn't intended for | |||
| implementation, we do want to emphasise that we must find a way to | implementation, we do want to emphasise that we may need to find a | |||
| implement it, because it could become necessary to somehow | way to implement it, because it could become necessary to somehow | |||
| distinguish between bit and packet congestion [RFC3714]. Currently a | distinguish between bit and packet congestion [RFC3714]. Currently a | |||
| design goal of network processing equipment such as routers and | design goal of network processing equipment such as routers and | |||
| firewalls is to keep packet processing uncongested even under worst | firewalls is to keep packet processing uncongested even under worst | |||
| case bit rates with minimum packet sizes. Therefore, packet- | case bit rates with minimum packet sizes. Therefore, packet- | |||
| congestion is currently rare, but there is no guarantee that it will | congestion is currently rare, but there is no guarantee that it will | |||
| not become common with future technology trends. | not become common with future technology trends. | |||
| The idealised wire protocol is given below. It allows for packet | The idealised wire protocol is given below. It accounts for packet | |||
| size at the transport layer, not in the network, and then only in the | sizes at the transport layer, not in the network, and then only in | |||
| case of bit-congestible resources. This avoids the perverse | the case of bit-congestible resources. This avoids the perverse | |||
| incentive to send smaller packets that would otherwise result if the | incentive to send smaller packets and the DoS vulnerability that | |||
| network were to bias towards them (see Introduction). Incidentally, | would otherwise result if the network were to bias towards them (see | |||
| it also ensures neither the network nor the transport needs to do a | Introduction). Incidentally, it also ensures neither the network nor | |||
| multiply--multiplication by packet size is effectively achieved as a | the transport needs to do a multiply--multiplication by packet size | |||
| repeated add when the transport adds to its count of marked bytes as | is effectively achieved as a repeated add when the transport adds to | |||
| each congestion event is fed to it: | its count of marked bytes as each congestion event is fed to it: | |||
| o A packet-congestible resource trying to code congestion level p_p | o A packet-congestible resource trying to code congestion level p_p | |||
| into a packet stream should mark the `packet congestion' field in | into a packet stream should mark the idealised `packet congestion' | |||
| each packet with probability p_p irrespective of the packet's | field in each packet with probability p_p irrespective of the | |||
| size. The transport should then take a packet with the packet | packet's size. The transport should then take a packet with the | |||
| congestion field marked to mean just one mark, irrespective of the | packet congestion field marked to mean just one mark, irrespective | |||
| packet size. | of the packet size. | |||
| o A bit-congestible resource trying to code time-varying byte- | o A bit-congestible resource trying to code time-varying byte- | |||
| congestion level p_b into a packet stream should mark the `byte | congestion level p_b into a packet stream should mark the `byte | |||
| congestion' field in each packet with probability p_b, again | congestion' field in each packet with probability p_b, again | |||
| irrespective of the packet's size. Unlike before, the transport | irrespective of the packet's size. Unlike before, the transport | |||
| should take a packet with the byte congestion field marked to | should take a packet with the byte congestion field marked to | |||
| count as a mark on each byte in the packet. | count as a mark on each byte in the packet. | |||
| The worked examples in Appendix A show that transports can extract | The worked examples in Appendix A show that transports can extract | |||
| sufficient and correct congestion notification from these protocols | sufficient and correct congestion notification from these protocols | |||
| for cases when two flows with different packet sizes have matching | for cases when two flows with different packet sizes have matching | |||
| bit rates or matching packet rates. Examples are also given that mix | bit rates or matching packet rates. Examples are also given that mix | |||
| these two flows into one to show that a flow with mixed packet sizes | these two flows into one to show that a flow with mixed packet sizes | |||
| would still be able to extract sufficient and correct information. | would still be able to extract sufficient and correct information. | |||
| Sufficient and correct congestion information means that there is | Sufficient and correct congestion information means that there is | |||
| sufficient information for the two different types of transport | sufficient information for the two different types of transport | |||
| requirements: | requirements: | |||
| o Established transport congestion controls like TCP's [RFC2581] aim | Ratio-based: Established transport congestion controls like TCP's | |||
| to achieve equal segment rates per RTT through the same | [RFC2581] aim to achieve equal segment rates per RTT through the | |||
| bottleneck--TCP `fairness' [RFC3448]. They work with the ratio of | same bottleneck--TCP friendliness [RFC3448]. They work with the | |||
| marked to unmarked segments. The example scenarios show that | ratio of marked to unmarked segments. The example scenarios show | |||
| these ratio-based transports are effectively the same whether | that these ratio-based transports are effectively the same whether | |||
| counting in bytes or marks, because the units cancel out. | counting in bytes or marks, because the units cancel out. | |||
| (Incidentally, this is why TCP's bit rate is still proportional to | (Incidentally, this is why TCP's bit rate is still proportional to | |||
| packet size even when byte-counting is used, as recommended for | packet size even when byte-counting is used, as recommended for | |||
| TCP in [I-D.ietf-tcpm-rfc2581bis], mainly for orthogonal security | TCP in [I-D.ietf-tcpm-rfc2581bis], mainly for orthogonal security | |||
| reasons.) | reasons.) | |||
| o Other congestion controls proposed in the research community aim | Absolute-target-based: Other congestion controls proposed in the | |||
| to limit the volume of congestion caused to a constant weight | research community aim to limit the volume of congestion caused to | |||
| parameter. [MulTCP][WindowPropFair] are examples of weighted | a constant weight parameter. [MulTCP][WindowPropFair] are | |||
| proportionally fair transports designed for cost-fair environments | examples of weighted proportionally fair transports designed for | |||
| [I-D.briscoe-tsvarea-fair]. In this case, the transport requires | cost-fair environments [Rate_fair_Dis]. In this case, the | |||
| a count (not a ratio) of marked bytes in the bit-congestible case | transport requires a count (not a ratio) of dropped/marked bytes | |||
| and of marked packets in the packet congestible case. | in the bit-congestible case and of dropped/marked packets in the | |||
| packet congestible case. | ||||
| 6. The State of the Art | 6. The State of the Art | |||
| The original 1993 paper on RED [RED93] proposed two options for the | The original 1993 paper on RED [RED93] proposed two options for the | |||
| RED active queue management algorithm: packet mode and byte mode. | RED active queue management algorithm: packet mode and byte mode. | |||
| Packet mode measured the queue length in packets and marked (or | Packet mode measured the queue length in packets and marked (or | |||
| dropped) individual packets with a probability independent of their | dropped) individual packets with a probability independent of their | |||
| size. Byte mode measured the queue length in bytes and marked an | size. Byte mode measured the queue length in bytes and marked an | |||
| individual packet with probability in proportion to its size | individual packet with probability in proportion to its size | |||
| (relative to the maximum packet size). In the paper's outline of | (relative to the maximum packet size). In the paper's outline of | |||
| further work, it was stated that no recommendation had been made on | further work, it was stated that no recommendation had been made on | |||
| whether the queue size should be measured in bytes or packets, but | whether the queue size should be measured in bytes or packets, but | |||
| noted that the difference could be significant. | noted that the difference could be significant. | |||
| When RED was recommended for general deployment in 1998 [RFC2309], | When RED was recommended for general deployment in 1998 [RFC2309], | |||
| the two modes were mentioned implying the choice between them was a | the two modes were mentioned implying the choice between them was a | |||
| question of performance, referring to a 1997 email [pktByteEmail] for | question of performance, referring to a 1997 email [pktByteEmail] for | |||
| advice on tuning. This email clarified that there were in fact two | advice on tuning. This email clarified that there were in fact two | |||
| orthogonal choices: whether to measure queue length in bytes or | orthogonal choices: whether to measure queue length in bytes or | |||
| packets (Section 6.1) and whether the drop probability of an | packets (Section 6.1 below) and whether the drop probability of an | |||
| individual packet should depend on its own size (Section 6.2). | individual packet should depend on its own size (Section 6.2 below). | |||
| 6.1. Congestion Measurement: Status | 6.1. Congestion Measurement: Status | |||
| The choice of which metric to use to measure queue length was left | The choice of which metric to use to measure queue length was left | |||
| open in RFC2309. It is now well understood that queues for bit- | open in RFC2309. It is now well understood that queues for bit- | |||
| congestible resources should be measured in bytes, and queues for | congestible resources should be measured in bytes, and queues for | |||
| packet-congestible resources should be measured in packets (see | packet-congestible resources should be measured in packets (see | |||
| Section 4). | Section 4). | |||
| Where buffers are not configured or legacy buffers cannot be | Where buffers are not configured or legacy buffers cannot be | |||
| configured to the above guideline, we needn't have to make allowances | configured to the above guideline, we needn't have to make allowances | |||
| for such legacy in future protocol design. If a bit-congestible | for such legacy in future protocol design. If a bit-congestible | |||
| buffer is measured in packets, the operator will have set the | buffer is measured in packets, the operator will have set the | |||
| thresholds mindful of a typical mix of packets sizes. Any AQM | thresholds mindful of a typical mix of packets sizes. Any AQM | |||
| algorithm on such a buffer will be oversensitive to high proportions | algorithm on such a buffer will be oversensitive to high proportions | |||
| of small packets, and undersensitive to high proportions of large | of small packets, e.g. a DoS attack, and undersensitive to high | |||
| packets. But an operator can safely keep such a legacy buffer | proportions of large packets. But an operator can safely keep such a | |||
| because any undersensitivity during unusual traffic mixes cannot lead | legacy buffer because any undersensitivity during unusual traffic | |||
| to congestion collapse given the buffer will eventually revert to | mixes cannot lead to congestion collapse given the buffer will | |||
| tail drop. | eventually revert to tail drop, discarding proportionately more large | |||
| packets. | ||||
| Some modern router implementations give a choice for setting RED's | Some modern router implementations give a choice for setting RED's | |||
| thresholds in byte-mode or packet-mode. This may merely be an | thresholds in byte-mode or packet-mode. This may merely be an | |||
| administrator-interface preference, not altering how the queue itself | administrator-interface preference, not altering how the queue itself | |||
| is measured but on some hardware it does actually change the way it | is measured but on some hardware it does actually change the way it | |||
| measures its queue. Whether a resource is bit-congestible or packet- | measures its queue. Whether a resource is bit-congestible or packet- | |||
| congestible is a property of the resource, so an admin SHOULD NOT | congestible is a property of the resource, so an admin SHOULD NOT | |||
| ever need to, or be able to, configure the way it measures itself. | ever need to, or be able to, configure the way a queue measures | |||
| itself. | ||||
| We believe the question of whether to measure queues in bytes or | We believe the question of whether to measure queues in bytes or | |||
| packets is fairly well understood these days. The only outstanding | packets is fairly well understood these days. The only outstanding | |||
| issues concern how to measure congestion when the queue is bit | issues concern how to measure congestion when the queue is bit | |||
| congestible but the resource is packet congestible or vice versa (see | congestible but the resource is packet congestible or vice versa (see | |||
| Section 4). | Section 4). | |||
| 6.2. Congestion Coding: Status | 6.2. Congestion Coding: Status | |||
| 6.2.1. Network Bias when Encoding | 6.2.1. Network Bias when Encoding | |||
| The previously mentioned email [pktByteEmail] referred to by | The previously mentioned email [pktByteEmail] referred to by | |||
| [RFC2309] said that the choice over whether a packet's own size | [RFC2309] said that the choice over whether a packet's own size | |||
| should affect its drop probability "depends on the dominant end-to- | should affect its drop probability "depends on the dominant end-to- | |||
| end congestion control mechanisms". [This assumes the network should | end congestion control mechanisms". [Section 1 argues against this | |||
| be changed to accommodate the predominant transport, without | approach, citing the excellent advice in RFC3246.] The referenced | |||
| questioning whether the transport should be fixed instead.] The line | email went on to argue that drop probability should depend on the | |||
| of reasoning went on to say that congestion control in protocols such | size of the packet being considered for drop if the resource is bit- | |||
| as TCP doesn't depend on the fraction of bytes or packets that are | congestible, but not if it is packet-congestible, but advised that | |||
| dropped from a flow, but merely on whether or not one or more drops | most scarce resources in the Internet were currently bit-congestible. | |||
| were present in the most recent window [this is incorrect]. It | The argument continued that if packet drops were inflated by packet | |||
| argued that drop probability should depend on the size of the packet | size (byte-mode dropping), "a flow's fraction of the packet drops is | |||
| being considered for drop if the resource is bit-congestible, but not | then a good indication of that flow's fraction of the link bandwidth | |||
| if it is packet-congestible, but advised that most scarce resources | in bits per second". This was consistent with a referenced policing | |||
| in the Internet were currently bit-congestible. The argument | mechanism being worked on at the time for detecting unusually high | |||
| continued that if packet drops were inflated by packet size (byte- | bandwidth flows, eventually published in 1999 [pBox]. [The problem | |||
| mode dropping), "a flow's fraction of the packet drops is then a good | could have been solved by making the policing mechanism count the | |||
| indication of that flow's fraction of the link bandwidth in bits per | volume of bytes randomly dropped, not the number of packets.] | |||
| second". This was consistent with a referenced policing mechanism | ||||
| being worked on at the time for detecting unusually high bandwidth | ||||
| flows, eventually published in 1999 [pBox]. [The problem could have | ||||
| been solved by making the policing mechanism count the volume of | ||||
| bytes randomly dropped, not the number of packets.] | ||||
| A few months before RFC2309 was published, an addendum was added to | A few months before RFC2309 was published, an addendum was added to | |||
| the above archived email referenced from the RFC, in which the final | the above archived email referenced from the RFC, in which the final | |||
| paragraph seemed to partially retract what had previously been said. | paragraph seemed to partially retract what had previously been said. | |||
| It clarified that the question of whether the probability of marking | It clarified that the question of whether the probability of marking | |||
| a packet should depend on its size was not related to whether the | a packet should depend on its size was not related to whether the | |||
| resource itself was bit congestible, but a completely orthogonal | resource itself was bit congestible, but a completely orthogonal | |||
| question. However the only example given had the queue measured in | question. However the only example given had the queue measured in | |||
| packets but packet drop depended on the byte-size of the packet in | packets but packet drop depended on the byte-size of the packet in | |||
| question. No example was given the other way round. [One can only | question. No example was given the other way round. | |||
| assume that the reasoning for byte-mode drop in this case was still | ||||
| to try to reverse engineer the network to allow for TCP not | ||||
| accounting for packet size.] | ||||
| In 2000, Cnodder et al [REDbyte] pointed out that there was an error | In 2000, Cnodder et al [REDbyte] pointed out that there was an error | |||
| in the part of the original 1993 RED algorithm that aimed to | in the part of the original 1993 RED algorithm that aimed to | |||
| distribute drops uniformly, because it didn't correctly take into | distribute drops uniformly, because it didn't correctly take into | |||
| account the adjustment for packet size. They recommended an | account the adjustment for packet size. They recommended an | |||
| algorithm called RED_4 to fix this. But they also recommended a | algorithm called RED_4 to fix this. But they also recommended a | |||
| further change, RED_5, to adjust drop rate dependent on the square of | further change, RED_5, to adjust drop rate dependent on the square of | |||
| relative packet size. This was indeed correct,... but only if one | relative packet size. This was indeed consistent with the stated | |||
| agrees with the original principle behind RED's byte mode drop--that | motivation behind RED's byte mode drop--that we should reverse | |||
| we should reverse engineer the network in order to arrange for TCP | engineer the network to improve the performance of dominant end-to- | |||
| flows with different packet sizes to achieve equal rates through the | end congestion control mechanisms. | |||
| same bottleneck. | ||||
| By 2003, a further change had been made to the adjustment for packet | By 2003, a further change had been made to the adjustment for packet | |||
| size, this time in the RED algorithm of the ns2 simulator. Instead | size, this time in the RED algorithm of the ns2 simulator. Instead | |||
| of taking each packet's size relative to a `maximum packet size' it | of taking each packet's size relative to a `maximum packet size' it | |||
| was taken relative to a `mean packet size', intended to be a static | was taken relative to a `mean packet size', intended to be a static | |||
| value representative of the `typical' packet size on the link. We | value representative of the `typical' packet size on the link. We | |||
| have not been able to find a justification for this change in the | have not been able to find a justification for this change in the | |||
| literature, however Eddy and Allman conducted experiments [REDbias] | literature, however Eddy and Allman conducted experiments [REDbias] | |||
| that assessed how sensitive RED was to this parameter, amongst other | that assessed how sensitive RED was to this parameter, amongst other | |||
| things. No-one seems to have pointed out that this changed algorithm | things. No-one seems to have pointed out that this changed algorithm | |||
| can often lead to drop probabilities of greater than 1 [which should | can often lead to drop probabilities of greater than 1 [which should | |||
| ring alarm bells hinting that there's a mistake in the theory | ring alarm bells hinting that there's a mistake in the theory | |||
| somewhere]. | somewhere]. On 10-Nov-2004, this variant of byte-mode packet drop | |||
| was made the default in the ns2 simulator. | ||||
| More recently, two drafts have proposed changes to TCP that make it | ||||
| more robust against losing small control packets | ||||
| [I-D.ietf-tcpm-ecnsyn] [I-D.floyd-tcpm-ackcc]. In both cases they | ||||
| note that the case for these TCP changes would be weaker if RED were | ||||
| biased against dropping small packets. We argue here that these two | ||||
| proposals are a safer and more principled way to achieve TCP | ||||
| performance improvements than reverse engineering RED to benefit TCP. | ||||
| 6.2.2. Transport Bias when Decoding | 6.2.2. Transport Bias when Decoding | |||
| The above proposals to alter the network layer to fix TCP's | The above proposals to alter the network layer to fix TCP's | |||
| insensitivity to segment size have largely carried on outside the | insensitivity to segment size have largely carried on outside the | |||
| IETF process (unless one counts a reference in an informational RFC | IETF process (unless one counts a reference in an informational RFC | |||
| to an archived email!). | to an archived email!). | |||
| However, a recently approved experimental RFC adapts its transport | Within the IETF, a recently approved experimental RFC adapts its | |||
| layer protocol to take account of packet sizes relative to typical | transport layer protocol to take account of packet sizes relative to | |||
| TCP packet sizes. This proposes a new small-packet variant of TCP- | typical TCP packet sizes. This proposes a new small-packet variant | |||
| friendly rate control [RFC3448] called TFRC-SP [RFC4828]. | of TCP-friendly rate control [RFC3448] called TFRC-SP [RFC4828]. | |||
| Essentially, it proposes a rate equation that inflates the flow rate | Essentially, it proposes a rate equation that inflates the flow rate | |||
| by the ratio of a typical TCP segment size (1500B including TCP | by the ratio of a typical TCP segment size (1500B including TCP | |||
| header) over the actual segment size [PktSizeEquCC]. There are also | header) over the actual segment size [PktSizeEquCC]. (There are also | |||
| other important differences of detail relative to TFRC, such as using | other important differences of detail relative to TFRC, such as using | |||
| virtual packets [CCvarPktSize] to avoid responding to multiple losses | virtual packets [CCvarPktSize] to avoid responding to multiple losses | |||
| per round trip and using a minimum inter-packet interval. | per round trip and using a minimum inter-packet interval.) | |||
| Section 4.5.1 of this TFRC-SP spec discusses the implications of | Section 4.5.1 of this TFRC-SP spec discusses the implications of | |||
| operating in an environment where routers have been configured to | operating in an environment where routers have been configured to | |||
| drop smaller packets with proportionately lower probability than | drop smaller packets with proportionately lower probability than | |||
| larger ones. But surprisingly, it only discusses TCP operating in | larger ones. But surprisingly, it only discusses TCP operating in | |||
| such an environment, only mentioning TFRC-SP briefly when discussing | such an environment, only mentioning TFRC-SP briefly when discussing | |||
| how to define fairness with TCP. And it only discusses the byte-mode | how to define fairness with TCP. And it only discusses the byte-mode | |||
| dropping version of RED as it was before Cnodder et al pointed out it | dropping version of RED as it was before Cnodder et al pointed out it | |||
| didn't sufficiently bias towards small packets to make TCP | didn't sufficiently bias towards small packets to make TCP | |||
| independent of packet size. | independent of packet size. | |||
| skipping to change at page 13, line 41 | skipping to change at page 13, line 52 | |||
| The paper originally proposing TFRC with virtual packets (VP-TFRC) | The paper originally proposing TFRC with virtual packets (VP-TFRC) | |||
| [CCvarPktSize] proposed that there should perhaps be two variants to | [CCvarPktSize] proposed that there should perhaps be two variants to | |||
| cater for the different variants of RED. However, as the TFRC-SP | cater for the different variants of RED. However, as the TFRC-SP | |||
| authors point out, there is no way for a transport to know whether | authors point out, there is no way for a transport to know whether | |||
| some queues on its path have deployed RED with byte-mode packet drop | some queues on its path have deployed RED with byte-mode packet drop | |||
| (except if an exhaustive survey found that no-one has deployed it!-- | (except if an exhaustive survey found that no-one has deployed it!-- | |||
| see Section 6.2.3). Incidentally, VP-TFRC also proposed that byte- | see Section 6.2.3). Incidentally, VP-TFRC also proposed that byte- | |||
| mode RED dropping should really square the packet size compensation | mode RED dropping should really square the packet size compensation | |||
| factor (like that of RED_5, but apparently unaware of it). | factor (like that of RED_5, but apparently unaware of it). | |||
| Pre-congestion notification [PCN] is a proposal to use a virtual | Pre-congestion notification [I-D.ietf-pcn-architecture] is a proposal | |||
| queue for AQM marking for packets within one Diffserv class in order | to use a virtual queue for AQM marking for packets within one | |||
| to give early warning prior to any real queuing. The proposed PCN | Diffserv class in order to give early warning prior to any real | |||
| marking algorithms have been designed not to take account of packet | queuing. The proposed PCN marking algorithms have been designed not | |||
| size on routers. Instead the general principle has been to take | to take account of packet size on routers. Instead the general | |||
| account of the sizes of marked packets when monitoring the fraction | principle has been to take account of the sizes of marked packets | |||
| of marking at the edge of the network. | when monitoring the fraction of marking at the edge of the network. | |||
| 6.2.3. Congestion Coding: Summary of Status | 6.2.3. Congestion Coding: Summary of Status | |||
| +-----------+----------------+-----------------+--------------------+ | +-----------+----------------+-----------------+--------------------+ | |||
| | transport | RED_1 (packet | RED_4 (linear | RED_5 (square byte | | | transport | RED_1 (packet | RED_4 (linear | RED_5 (square byte | | |||
| | cc | mode drop) | byte mode drop) | mode drop) | | | cc | mode drop) | byte mode drop) | mode drop) | | |||
| +-----------+----------------+-----------------+--------------------+ | +-----------+----------------+-----------------+--------------------+ | |||
| | TCP or | s/sqrt(p) | sqrt(s/p) | 1/sqrt(p) | | | TCP or | s/sqrt(p) | sqrt(s/p) | 1/sqrt(p) | | |||
| | TFRC | | | | | | TFRC | | | | | |||
| | TFRC-SP | 1/sqrt(p) | 1/sqrt(sp) | 1/(s.sqrt(p)) | | | TFRC-SP | 1/sqrt(p) | 1/sqrt(sp) | 1/(s.sqrt(p)) | | |||
| +-----------+----------------+-----------------+--------------------+ | +-----------+----------------+-----------------+--------------------+ | |||
| Table 1: Dependence of flow bit-rate per RTT on packet size s and | Table 1: Dependence of flow bit-rate per RTT on packet size s and | |||
| drop rate p when network and/or transport bias towards small packets | drop rate p when network and/or transport bias towards small packets | |||
| to varying degrees | to varying degrees | |||
| Table 1 aims to summarise the positions we may now be in. Each | Table 1 aims to summarise the positions we may now be in. Each | |||
| column shows a different possible AQM behaviour in the network, using | column shows a different possible AQM behaviour on different routers | |||
| the terminology of Cnodder et al outlined earlier (RED_1 is basic RED | in the network, using the terminology of Cnodder et al outlined | |||
| with packet-mode drop). Each row shows a different transport | earlier (RED_1 is basic RED with packet-mode drop). Each row shows a | |||
| behaviour: TCP [RFC2581] and TFRC [RFC3448] on the top row with | different transport behaviour: TCP [RFC2581] and TFRC [RFC3448] on | |||
| TFRC-SP [RFC4828] below. Suppressing all inessential details the | the top row with TFRC-SP [RFC4828] below. Suppressing all | |||
| table shows that independence from packet size should either be | inessential details the table shows that independence from packet | |||
| achievable by not altering the TCP transport in a RED_5 network, or | size should either be achievable by not altering the TCP transport in | |||
| using the small packet TFRC-SP transport in a network without any | a RED_5 network, or using the small packet TFRC-SP transport in a | |||
| byte-mode dropping RED (top right and bottom left). Top left is the | network without any byte-mode dropping RED (top right and bottom | |||
| `do nothing' scenario, while bottom right is the `do-both' scenario | left). Top left is the `do nothing' scenario, while bottom right is | |||
| in which bit-rate would become far too biased towards small packets. | the `do-both' scenario in which bit-rate would become far too biased | |||
| Of course, if any form of byte-mode dropping RED has been deployed on | towards small packets. Of course, if any form of byte-mode dropping | |||
| some congested routers, each path will present a different hybrid | RED has been deployed on a selection of congested routers, each path | |||
| scenario to its transport. | will present a different hybrid scenario to its transport. | |||
| Whatever, we can see that the linear byte-mode drop column in the | Whatever, we can see that the linear byte-mode drop column in the | |||
| middle considerably complicates the Internet. It's a half-way house | middle considerably complicates the Internet. It's a half-way house | |||
| that doesn't bias enough towards small packets even if one believes | that doesn't bias enough towards small packets even if one believes | |||
| the network should be doing the biasing. We argue below that _all_ | the network should be doing the biasing. We argue below that _all_ | |||
| network layer bias towards small packets should be turned off--if | network layer bias towards small packets should be turned off--if | |||
| indeed any router vendors have implemented it--leaving packet size | indeed any router vendors have implemented it--leaving packet size | |||
| bias solely as the preserve of the transport layer (solely the | bias solely as the preserve of the transport layer (solely the | |||
| leftmost, packet-mode drop column). | leftmost, packet-mode drop column). | |||
| A survey is being conducted of over a hundred vendors to assess how | A survey has been conducted of 84 vendors to assess how widely drop | |||
| widely drop probability based on packet size has been implemented in | probability based on packet size has been implemented in RED. Prior | |||
| RED. Prior to the survey, an individual approach to Cisco received | to the survey, an individual approach to Cisco received confirmation | |||
| confirmation that, having checked the codebase for each of the | that, having checked the code-base for each of the product ranges, | |||
| product ranges, Cisco has not implemented any discrimination based on | Cisco has not implemented any discrimination based on packet size in | |||
| packet size in any AQM algorithm in any of its products. Also an | any AQM algorithm in any of its products. Also an individual | |||
| individual approach to Alcatel-Lucent drew a confirmation that it was | approach to Alcatel-Lucent drew a confirmation that it was very | |||
| very likely that none of their products contained RED code that | likely that none of their products contained RED code that | |||
| implemented any packet-size bias. | implemented any packet-size bias. | |||
| Turning to our more formal survey, about 10% of those surveyed have | Turning to our more formal survey, about 19% of those surveyed have | |||
| replied so far, giving a sample size of only about a dozen. They | replied so far, giving a sample size of 16. Although we do not have | |||
| range across the large network equipment vendors at L3 & L2, firewall | permission to identify the respondents, we can say that those that | |||
| vendors, wireless equipment vendors, as well as large software | have responded include most of the larger vendors, covering a large | |||
| businesses with a small selection of networking products. So far all | fraction of the market. They range across the large network | |||
| have confirmed that they have not implemented the variant of RED with | equipment vendors at L3 & L2, firewall vendors, wireless equipment | |||
| drop dependent on packet size. Where reasons have been given, the | vendors, as well as large software businesses with a small selection | |||
| extra complexity of packet bias code has been most prevalent, though | of networking products. So far, all those who have responded have | |||
| one vendor had a more principled reason for avoiding it--similar to, | confirmed that they have not implemented the variant of RED with drop | |||
| but not the same as the argument of this document. We have | dependent on packet size (2 are fairly sure they haven't but need to | |||
| established that Linux does not implement RED with packet size drop | check more thoroughly). | |||
| bias, although we have not investigated a wider range of open source | ||||
| code. | ||||
| It is RECOMMENDED that adjusting drop probability relative to packet | ||||
| size (byte-mode dropping) SHOULD NOT be used in router AQM algorithms | ||||
| and SHOULD be turned off wherever it has been deployed. Note that | ||||
| RED as a whole SHOULD NOT be turned off, as without it, a drop tail | ||||
| queue also biases against large packets. Also note that turning off | ||||
| byte-mode may alter the relative performance of applications using | ||||
| different packet sizes, so it would be advisable to establish the | ||||
| implications before turning it off. | ||||
| Instead we argue that only transports, not AQM in the network, SHOULD | Where reasons have been given, the extra complexity of packet bias | |||
| make allowance for the size of dropped or marked packets. If a | code has been most prevalent, though one vendor had a more principled | |||
| transport protocol doesn't take account of packet size when | reason for avoiding it--similar to the argument of this document. We | |||
| controlling the rate of a flow, it SHOULD be corrected in that | have established that Linux does not implement RED with packet size | |||
| transport protocol. No matter how predominant a transport protocol | drop bias, although we have not investigated a wider range of open | |||
| is (even if it's TCP), trying to correct for its failings in the | source code. | |||
| network layer creates a perverse incentive to break down all flows | ||||
| from all transports into tiny segments. | ||||
| 7. Outstanding Issues and Next Steps | 7. Outstanding Issues and Next Steps | |||
| 7.1. Bit-congestible World | 7.1. Bit-congestible World | |||
| For a connectionless network with only bit-congestible resources we | For a connectionless network with only bit-congestible resources we | |||
| believe the recommended position is now unarguably clear--that the | believe the recommended position is now unarguably clear--that the | |||
| network should not make allowance for packet sizes and the transport | network should not make allowance for packet sizes and the transport | |||
| should. This leaves two outstanding issues: | should. This leaves two outstanding issues: | |||
| skipping to change at page 16, line 4 | skipping to change at page 15, line 43 | |||
| 7.1. Bit-congestible World | 7.1. Bit-congestible World | |||
| For a connectionless network with only bit-congestible resources we | For a connectionless network with only bit-congestible resources we | |||
| believe the recommended position is now unarguably clear--that the | believe the recommended position is now unarguably clear--that the | |||
| network should not make allowance for packet sizes and the transport | network should not make allowance for packet sizes and the transport | |||
| should. This leaves two outstanding issues: | should. This leaves two outstanding issues: | |||
| o How to handle any legacy of AQM with byte-mode drop already | o How to handle any legacy of AQM with byte-mode drop already | |||
| deployed; | deployed; | |||
| o The need to start a programme to update transport congestion | o The need to start a programme to update transport congestion | |||
| control protocol standards to take account of packet size. | control protocol standards to take account of packet size. | |||
| The sample of returns from our vendor survey Section 6.2.3 suggest | The sample of returns from our vendor survey Section 6.2.3 suggest | |||
| that byte-mode packet drop seems not to be implemented at all let | that byte-mode packet drop seems not to be implemented at all let | |||
| alone deployed, or if it is, it is likely to be very sparse. | alone deployed, or if it is, it is likely to be very sparse. | |||
| Therefore, we do not really need a migration strategy from nearly | Therefore, we do not really need a migration strategy from all but | |||
| nothing to nothing. | nothing to nothing. | |||
| A programme of standards updates to take account of packet size in | A programme of standards updates to take account of packet size in | |||
| transport congestion control protocols has started with TFRC-SP | transport congestion control protocols has started with TFRC-SP | |||
| [RFC4828], while weighted TCPs implemented in the research community | [RFC4828], while weighted TCPs implemented in the research community | |||
| [MulTCP][WindowPropFair] could form the basis of a future change to | [WindowPropFair] could form the basis of a future change to TCP | |||
| TCP congestion control [RFC2581] itself. | congestion control [RFC2581] itself. | |||
| 7.2. Bit- & Packet-congestible World | 7.2. Bit- & Packet-congestible World | |||
| Nonetheless, a connectionless network with both bit-congestible and | Nonetheless, a connectionless network with both bit-congestible and | |||
| packet-congestible resources is a different matter. If we believe we | packet-congestible resources is a different matter. If we believe we | |||
| should allow for this possibility in the future, this space contains | should allow for this possibility in the future, this space contains | |||
| a truly open research issue. | a truly open research issue. | |||
| The idealised wire protocol coding described in Section 5 requires at | The idealised wire protocol coding described in Section 5 requires at | |||
| least two flags for congestion of bit-congestible and packet- | least two flags for congestion of bit-congestible and packet- | |||
| congestible resources. This hides a fundamental problem--much more | congestible resources. This hides a fundamental problem--much more | |||
| fundamental than whether we can magically create header space for yet | fundamental than whether we can magically create header space for yet | |||
| another ECN flag in IPv4, or whether it would work while being | another ECN flag in IPv4, or whether it would work while being | |||
| deployed incrementally. A congestion notification protocol must | deployed incrementally. A congestion notification protocol must | |||
| survive a transition from low levels of congestion to high. Marking | survive a transition from low levels of congestion to high. Marking | |||
| two states is feasible with explicit marking, but much harder if | two states is feasible with explicit marking, but much harder if | |||
| packets are dropped. Also, it will not always be cost-effective to | packets are dropped. Also, it will not always be cost-effective to | |||
| implement AQM at every low level resource, so drop will often have to | implement AQM at every low level resource, so drop will often have to | |||
| suffice. Distinguishing drop from delivery naturally provides just | suffice. Distinguishing drop from delivery naturally provides just | |||
| one congestion flag--it is hard to drop a packet in two ways that are | one congestion flag--it is hard to drop a packet in two ways that are | |||
| distinguishable remotely. This is the same problem we have | distinguishable remotely. This is a similar problem to that of | |||
| distinguishing wireless transmission losses from congestive losses. | distinguishing wireless transmission losses from congestive losses. | |||
| We should also note that, strictly, packet-congestible resources are | We should also note that, strictly, packet-congestible resources are | |||
| actually cycle-congestible because load also depends on the | actually cycle-congestible because load also depends on the | |||
| complexity of each look-up and whether the pattern of arrivals is | complexity of each look-up and whether the pattern of arrivals is | |||
| amenable to caching or not. Further, this reminds us that any | amenable to caching or not. Further, this reminds us that any | |||
| solution must not require a forwarding engine to use excessive | solution must not require a forwarding engine to use excessive | |||
| processor cycles in order to decide how to say it has no spare | processor cycles in order to decide how to say it has no spare | |||
| processor cycles. | processor cycles. | |||
| The problem of signalling packet processing congestion is not | The problem of signalling packet processing congestion is not | |||
| pressing, as most if not all Internet resources are designed to be | pressing, as most if not all Internet resources are designed to be | |||
| bit-congestible before packet processing starts to congest. However, | bit-congestible before packet processing starts to congest. However, | |||
| given the task is to reach consensus on generic router mechanisms | given the IRTF ICCRG has set itself the task of reaching consensus on | |||
| that are necessary and sufficient to support the Internet's future | generic router mechanisms that are necessary and sufficient to | |||
| congestion control requirements, we must not give this problem no | support the Internet's future congestion control requirements | |||
| thought at all, just because it is hard and currently hypothetical. | [I-D.irtf-iccrg-welzl-congestion-control-open-research], we must not | |||
| give this problem no thought at all, just because it is hard and | ||||
| currently hypothetical. | ||||
| 8. Security Considerations | 8. Security Considerations | |||
| This draft recommends that routers do not bias drop probability | This draft recommends that queues do not bias drop probability | |||
| towards small packets as this creates a perverse incentive for | towards small packets as this creates a perverse incentive for | |||
| transports to break down their flows into tiny segments. Of course, | transports to break down their flows into tiny segments. One of the | |||
| this still involves transports being trusted to adjust their rate to | benefits of implementing AQM was meant to be to remove this perverse | |||
| take account of the size of dropped or marked packets. But, in the | incentive that drop-tail queues gave to small packets. Of course, if | |||
| current Internet architecture, transports are already trusted to act | transports really want to make the greatest gains, they don't have to | |||
| against their own interests by reducing their rate in response to | respond to congestion anyway. But we don't want applications that | |||
| congestion. Therefore at least this recommendation makes the problem | are trying to behave to discover that they can go faster by using | |||
| no worse. | smaller packets. | |||
| Much more importantly though, the ability of networks to police the | ||||
| response of _any_ transport to congestion depends on networks only | ||||
| doing packet-mode not byte-mode drop, as we will now try to explain. | ||||
| Byte-mode drop was originally proposed alongside a RED-based approach | ||||
| to policing unusually high rate TCP flows [pBox] that has spawned | ||||
| other similar approaches in the research community. The idea was to | ||||
| place this policing function at any potential bottleneck. It was | ||||
| crafted specifically around policing the bit-rate (not packet rate) | ||||
| of TCP or TCP-friendly flows, by using its knowledge of its own local | ||||
| MTU. If these bottleneck TCP policers were effective against | ||||
| cheating (which [Re-TCP] has shown they are not), they would end up | ||||
| embedding a TCP-fairness policy throughout the network layer. | ||||
| [I-D.briscoe-tsvarea-fair] has recently shown that TCP fairness is an | ||||
| insufficient basis for judging fairness because (amongst other | ||||
| criticisms) it is instantaneous, myopically not taking account of | ||||
| which individuals have congested resources more over time. If | ||||
| fairness did take account of factors like duration, instantaneous | ||||
| flow rates would necessarily have to be very _unequal_ to be fair. | ||||
| So if TCP-fairness were to be embedded throughout the network layer, | ||||
| it would prevent these highly unequal rate allocations that would be | ||||
| essential for improving fairness. | ||||
| So far, the argument goes that we will need transports that are not | ||||
| TCP-`fair' in order to be more truly fair. So far this is only an | ||||
| argument against bottleneck TCP-policers, not against byte-mode | ||||
| packet drop. | ||||
| The argument continues that, to be able to police a transport's | ||||
| response to congestion when fairness can only be judged over time and | ||||
| over all an individual's flows, the policer has to have an integrated | ||||
| view of all the congestion an individual (not just one flow) is | ||||
| causing due to all traffic entering the Internet from that | ||||
| individual. | ||||
| But with byte-mode drop, one marked packet is not necessarily | ||||
| equivalent to another unless you know the MTU that caused it to be | ||||
| marked. If congestion policing has to be located at an individual's | ||||
| attachment point to the Internet, it cannot know the MTU of each | ||||
| remote router that caused each mark. Therefore it cannot take an | ||||
| integrated approach to policing all the responses to congestion of | ||||
| all the transports of one individual. Therefore it cannot police any | ||||
| of the flows. | ||||
| That has been quite a specialised although strong argument against | ||||
| byte-mode drop. The security/incentive argument _for_ packet-mode | ||||
| drop is similar. | ||||
| Firstly, confining RED to packet-mode drop would not preclude | ||||
| bottleneck policing approaches such as [pBox] as it seems likely they | ||||
| could work just as well by monitoring the volume of dropped bytes | ||||
| rather than packets. | ||||
| Secondly packet-mode drop naturally allows the congestion marking on | ||||
| packets to be globally meaningful without relying on information held | ||||
| elsewhere. Given this congestion marking has an economic | ||||
| interpretation, it can be used as part of a globally distributed | ||||
| incentive system to ensure the parties responsible for congestion can | ||||
| be made accountable for it. | ||||
| Such a system has recently been proposed based on a protocol called | In practice, transports cannot all be trusted to respond to | |||
| re-ECN [Re-TCP]. Re-ECN was designed to be robust to the self- | congestion. So another reason for recommending that queues do not | |||
| interest of the different parties providing and using the Internet, | bias drop probability towards small packets is to avoid the | |||
| based on this economic interpretation of congestion. Re-ECN policers | vulnerability to small packet DDoS attacks that would otherwise | |||
| are specifically designed to allow evolution of new congestion | result. One of the benefits of implementing AQM was meant to be to | |||
| control protocols operating across multiple domains by confining | remove drop-tail's DoS vulnerability to small packets, so we | |||
| policing to the extreme edges of the Internet. | shouldn't add it back again. | |||
| Because a marked packet is taken to mean all the bytes in the packet | If most queues implemented AQM with byte-mode drop, the resulting | |||
| are congestion marked the re-ECN system remains robust against bits | network would amplify the potency of a small packet DDoS attack. At | |||
| being re-divided into different size packets or across different size | the first queue the stream of packets would push aside a greater | |||
| flows [I-D.briscoe-tsvarea-fair]. Therefore it works naturally with | proportion of large packets, so more of the small packets would | |||
| just simple packet-mode drop in RED. | survive to attack the next queue. Thus a flood of small packets | |||
| would continue on towards the destination, pushing regular traffic | ||||
| with large packets out of the way in one queue after the next, but | ||||
| suffering much less drop itself. | ||||
| In summary, making drop probability depend on the size of the packets | Appendix C explains why the ability of networks to police the | |||
| that bits happen to be divided into simply encourages the bits to be | response of _any_ transport to congestion depends on bit-congestible | |||
| divided into smaller packets. Byte-mode drop would therefore | network resources only doing packet-mode not byte-mode drop. In | |||
| irreversibly complicate any attempt to fix the Internet's incentive | summary, it says that making drop probability depend on the size of | |||
| structures. | the packets that bits happen to be divided into simply encourages the | |||
| bits to be divided into smaller packets. Byte-mode drop would | ||||
| therefore irreversibly complicate any attempt to fix the Internet's | ||||
| incentive structures. | ||||
| 9. Conclusions | 9. Conclusions | |||
| The strong conclusion is that AQM algorithms such as RED SHOULD NOT | The strong conclusion is that AQM algorithms such as RED SHOULD NOT | |||
| use byte-mode drop. More generally, the Internet's congestion | use byte-mode drop. More generally, the Internet's congestion | |||
| notification protocols (drop and ECN) SHOULD take account of packet | notification protocols (drop and ECN) SHOULD take account of packet | |||
| size when the notification is read by the transport layer, NOT when | size when the notification is read by the transport layer, NOT when | |||
| it is written by the network layer. This approach offers sufficient | it is written by the network layer. This approach offers sufficient | |||
| and correct congestion information for all known and future transport | and correct congestion information for all known and future transport | |||
| protocols and also ensures no perverse incentives are created that | protocols and also ensures no perverse incentives are created that | |||
| would encourage transports to use inappropriately small packet sizes. | would encourage transports to use inappropriately small packet sizes. | |||
| The alternative of deflating RED's drop probability for smaller | The alternative of deflating RED's drop probability for smaller | |||
| packet sizes (byte-mode drop) has no enduring advantages. It is more | packet sizes (byte-mode drop) has no enduring advantages. It is more | |||
| complex and creates the perverse incentive to fragment segments into | complex, it creates the perverse incentive to fragment segments into | |||
| tiny pieces. It was proposed as a way for the network layer to make | tiny pieces and it reopens the vulnerability to foods of small- | |||
| packets that drop-tail queues suffered from and AQM was designed to | ||||
| remove. Byte-mode drop is a change to the network layer that makes | ||||
| allowance for an omission from the design of TCP, effectively reverse | allowance for an omission from the design of TCP, effectively reverse | |||
| engineering the network layer to contrive to make TCPs with different | engineering the network layer to contrive to make two TCPs with | |||
| packet sizes run at equal bit rates (rather than packet rates) under | different packet sizes run at equal bit rates (rather than packet | |||
| the same path conditions. We SHOULD NOT hack the network layer to | rates) under the same path conditions. It also improves TCP | |||
| fix a problem with certain transport protocols, even one as prevalent | performance by reducing the chance that a SYN or a pure ACK will be | |||
| as TCP. | dropped, because they are small. But we SHOULD NOT hack the network | |||
| layer to improve or fix certain transport protocols. No matter how | ||||
| predominant a transport protocol is (even if it's TCP), trying to | ||||
| correct for its failings by biasing towards small packets in the | ||||
| network layer creates a perverse incentive to break down all flows | ||||
| from all transports into tiny segments. | ||||
| So far, our survey of over 100 vendors across the industry has drawn | So far, our survey of over 100 vendors across the industry has drawn | |||
| responses from about 10%, none of whom have implemented the byte mode | responses from about 19%, none of whom have implemented the byte mode | |||
| packet drop variant of RED. | packet drop variant of RED. Given there appears to be little, if | |||
| any, installed base recommending removal of byte-mode drop from RED | ||||
| is possibly only a paper exercise with few, if any, incremental | ||||
| deployment issues. | ||||
| If a vendor has implemented byte-mode drop, and an operator has | If a vendor has implemented byte-mode drop, and an operator has | |||
| turned it on, it is strongly RECOMMENDED that it SHOULD be turned | turned it on, it is strongly RECOMMENDED that it SHOULD be turned | |||
| off. Note that RED as a whole SHOULD NOT be turned off, as without | off. Note that RED as a whole SHOULD NOT be turned off, as without | |||
| it, a drop tail queue also biases against large packets. Turning off | it, a drop tail queue also biases against large packets. But note | |||
| byte-mode may alter the relative performance of applications using | also that turning off byte-mode may alter the relative performance of | |||
| different packet sizes, so it would be advisable to establish the | applications using different packet sizes, so it would be advisable | |||
| implications before turning it off. | to establish the implications before turning it off. | |||
| Instead, the IETF transport area should continue its programme of | Instead, the IETF transport area should continue its programme of | |||
| updating congestion control protocols to take account of packet size. | updating congestion control protocols to take account of packet size | |||
| and to make transports less sensitive to losing control packets like | ||||
| SYNs and pure ACKS. | ||||
| NOTE WELL that RED's byte-mode queue measurement is fine, being | NOTE WELL that RED's byte-mode queue measurement is fine, being | |||
| completely orthogonal to byte-mode drop. If a RED implementation has | completely orthogonal to byte-mode drop. If a RED implementation has | |||
| a byte-mode but does not specify what sort of byte-mode, it is most | a byte-mode but does not specify what sort of byte-mode, it is most | |||
| probably byte-mode queue measurement, which is fine. However, if in | probably byte-mode queue measurement, which is fine. However, if in | |||
| doubt, the vendor should be consulted. | doubt, the vendor should be consulted. | |||
| The above conclusions cater for the Internet as it is today with | The above conclusions cater for the Internet as it is today with | |||
| most, if not all, resources being primarily bit-congestible. A | most, if not all, resources being primarily bit-congestible. A | |||
| secondary conclusion of this memo is that we may see more packet- | secondary conclusion of this memo is that we may see more packet- | |||
| congestible resources in the future, so research may be needed to | congestible resources in the future, so research may be needed to | |||
| extend the Internet's congestion notification (drop or ECN) so that | extend the Internet's congestion notification (drop or ECN) so that | |||
| it can handle a mix of bit-congestible and packet-congestible | it can handle a mix of bit-congestible and packet-congestible | |||
| resources. | resources. | |||
| 10. Acknowledgements | 10. Acknowledgements | |||
| Sally Floyd and Arnaud Jacquet gave very useful review comments. | Thank you to Sally Floyd, who gave extensive and useful review | |||
| Bruce Davie and his colleagues provided a timely and efficient survey | comments. Also thanks for the reviews from Toby Moncaster and Arnaud | |||
| of RED implementation in Cisco's product range. Toby Moncaster, Will | Jacquet. I am grateful to Bruce Davie and his colleagues for | |||
| providing a timely and efficient survey of RED implementation in | ||||
| Cisco's product range. Also grateful thanks to Toby Moncaster, Will | ||||
| Dormann, John Regnault, Simon Carter and Stefaan De Cnodder further | Dormann, John Regnault, Simon Carter and Stefaan De Cnodder further | |||
| helped survey the current status of RED implementation and | helped survey the current status of RED implementation and deployment | |||
| deployment. | and, finally, thanks to the anonymous individuals who responded. | |||
| 11. Comments Solicited | 11. Comments Solicited | |||
| Comments and questions are encouraged and very welcome. They can be | Comments and questions are encouraged and very welcome. They can be | |||
| addressed to the IETF Transport Area working group mailing list | addressed to the IETF Transport Area working group mailing list | |||
| <tsvwg@ietf.org>, and/or to the authors. | <tsvwg@ietf.org>, and/or to the authors. | |||
| Editorial Comments | ||||
| [Note_Variation] The algorithm of the byte-mode drop variant of RED | ||||
| switches off any bias towards small packets | ||||
| whenever the smoothed queue length dictates that | ||||
| the drop probability of large packets should be | ||||
| 100%. In the example in the Introduction, as the | ||||
| large packet drop probability varies around 25% the | ||||
| small packet drop probability will vary around 1%, | ||||
| but with occasional jumps to 100% whenever the | ||||
| instantaneous queue (after drop) manages to sustain | ||||
| a length above the 100% drop point for longer than | ||||
| the queue averaging period. | ||||
| Appendix A. Example Scenarios | Appendix A. Example Scenarios | |||
| A.1. Notation | A.1. Notation | |||
| To prove the two sets of assertions in the idealised wire protocol | To prove the two sets of assertions in the idealised wire protocol | |||
| (Section 5) are true, we will compare two flows with different packet | (Section 5) are true, we will compare two flows with different packet | |||
| sizes, s_1 and s_2 [bit/pkt], to make sure their transports each see | sizes, s_1 and s_2 [bit/pkt], to make sure their transports each see | |||
| the correct congestion notification. Initially, within each flow we | the correct congestion notification. Initially, within each flow we | |||
| will take all packets as having equal sizes, but later we will | will take all packets as having equal sizes, but later we will | |||
| generalise to flows within which packet sizes vary. A flow's bit | generalise to flows within which packet sizes vary. A flow's bit | |||
| skipping to change at page 21, line 21 | skipping to change at page 20, line 37 | |||
| instance, a flow of 60B packets would have to send 25x more packets | instance, a flow of 60B packets would have to send 25x more packets | |||
| to achieve the same bit rate as a flow of 1500B packets. If a | to achieve the same bit rate as a flow of 1500B packets. If a | |||
| congested resource marks proportion p_b of packets irrespective of | congested resource marks proportion p_b of packets irrespective of | |||
| size, the ratio of marked packets received by each transport will | size, the ratio of marked packets received by each transport will | |||
| still be the same as the ratio of their packet rates, p_b.u_2/p_b.u_1 | still be the same as the ratio of their packet rates, p_b.u_2/p_b.u_1 | |||
| = s_1/s_2. So of the 25x more 60B packets sent, 25x more will be | = s_1/s_2. So of the 25x more 60B packets sent, 25x more will be | |||
| marked than in the 1500B packet flow, but 25x more won't be marked | marked than in the 1500B packet flow, but 25x more won't be marked | |||
| too. | too. | |||
| In this scenario, the resource is bit-congestible, so it always uses | In this scenario, the resource is bit-congestible, so it always uses | |||
| the bit-congestion field when it marks packets. Therefore the | our idealised bit-congestion field when it marks packets. Therefore | |||
| transport should count marked bytes not packets. But it doesn't | the transport should count marked bytes not packets. But it doesn't | |||
| actually matter. The ratio of marked to unmarked bytes seen by each | actually matter for ratio-based transports like TCP (Section 5). The | |||
| flow will be p_b, as will the ratio of marked to unmarked packets. | ratio of marked to unmarked bytes seen by each flow will be p_b, as | |||
| Because they are ratios (as used by TCP), the units cancel out. | will the ratio of marked to unmarked packets. Because they are | |||
| ratios, the units cancel out. | ||||
| If a flow sent an inconsistent mixture of packet sizes, we have said | If a flow sent an inconsistent mixture of packet sizes, we have said | |||
| it should count the ratio of marked and unmarked bytes not packets in | it should count the ratio of marked and unmarked bytes not packets in | |||
| order to correctly decode the level of congestion. But actually, if | order to correctly decode the level of congestion. But actually, if | |||
| all it is trying to do is decode p_b, it still doesn't matter. For | all it is trying to do is decode p_b, it still doesn't matter. For | |||
| instance, imagine the two equal bit rate flows were actually one flow | instance, imagine the two equal bit rate flows were actually one flow | |||
| at twice the bit rate sending a mixture of one 1500B packet for every | at twice the bit rate sending a mixture of one 1500B packet for every | |||
| thirty 60B packets. 25x more small packets will be marked and 25x | thirty 60B packets. 25x more small packets will be marked and 25x | |||
| more will be unmarked. The transport can still calculate p_b whether | more will be unmarked. The transport can still calculate p_b whether | |||
| it uses bytes or packets for the ratio. In general, for any | it uses bytes or packets for the ratio. In general, for any | |||
| algorithm which works on a ratio of marks to non-marks, either bytes | algorithm which works on a ratio of marks to non-marks, either bytes | |||
| or packets can be counted interchangeably, because the choice cancels | or packets can be counted interchangeably, because the choice cancels | |||
| out in the ratio calculation. | out in the ratio calculation. | |||
| However, where the absolute rather than relative volume of congestion | However, where an absolute target rather than relative volume of | |||
| caused is important, as it is for cost-fairness | congestion caused is important (Section 5), as it is for congestion | |||
| [I-D.briscoe-tsvarea-fair], the transport must count marked bytes not | accountability [Rate_fair_Dis], the transport must count marked bytes | |||
| packets, in this bit-congestible case. Aside from the goal of cost- | not packets, in this bit-congestible case. Aside from the goal of | |||
| fairness, this is how the bit rate of a transport can be made | congestion accountability, this is how the bit rate of a transport | |||
| independent of packet size; by ensuring the rate of congestion caused | can be made independent of packet size; by ensuring the rate of | |||
| is kept to a constant weight [WindowPropFair], rather than merely | congestion caused is kept to a constant weight [WindowPropFair], | |||
| responding to the ratio of marked and unmarked bytes. | rather than merely responding to the ratio of marked and unmarked | |||
| bytes. | ||||
| Note the unit of byte-congestion volume is the byte. | Note the unit of byte-congestion volume is the byte. | |||
| A.3. Bit-congestible resource, equal packet rates (Bi) | A.3. Bit-congestible resource, equal packet rates (Bi) | |||
| If two flows send different packet sizes but at the same packet rate, | If two flows send different packet sizes but at the same packet rate, | |||
| their bit rates will be in the same ratio as their packet sizes, x_2/ | their bit rates will be in the same ratio as their packet sizes, x_2/ | |||
| x_1 = s_2/s_1. For instance, a flow sending 1500B packets at the | x_1 = s_2/s_1. For instance, a flow sending 1500B packets at the | |||
| same packet rate as another sending 60B packets will be sending at | same packet rate as another sending 60B packets will be sending at | |||
| 25x greater bit rate. In this case, if a congested resource marks | 25x greater bit rate. In this case, if a congested resource marks | |||
| proportion p_b of packets irrespective of size, the ratio of packets | proportion p_b of packets irrespective of size, the ratio of packets | |||
| received with the byte-congestion field marked by each transport will | received with the byte-congestion field marked by each transport will | |||
| be the same, p_b.u_2/p_b.u_1 = 1. | be the same, p_b.u_2/p_b.u_1 = 1. | |||
| Because the byte-congestion field is marked, the transport should | Because the byte-congestion field is marked, the transport should | |||
| count marked bytes not packets. But because each flow sends | count marked bytes not packets. But because each flow sends | |||
| consistently sized packets it still doesn't matter. The ratio of | consistently sized packets it still doesn't matter for ratio-based | |||
| marked to unmarked bytes seen by each flow will be p_b, as will the | transports. The ratio of marked to unmarked bytes seen by each flow | |||
| ratio of marked to unmarked packets. Therefore, if the congestion | will be p_b, as will the ratio of marked to unmarked packets. | |||
| control algorithm is only concerned with the ratio of marked to | Therefore, if the congestion control algorithm is only concerned with | |||
| unmarked packets (as is TCP), both flows will be able to decode p_b | the ratio of marked to unmarked packets (as is TCP), both flows will | |||
| correctly whether they count packets or bytes. | be able to decode p_b correctly whether they count packets or bytes. | |||
| But if the absolute volume of congestion is important, as it is to | But if the absolute volume of congestion is important, e.g. for | |||
| achieve cost-fairness, the transport must count marked bytes not | congestion accountability, the transport must count marked bytes not | |||
| packets. Then the lower bit rate flow using smaller packets will | packets. Then the lower bit rate flow using smaller packets will | |||
| rightly be perceived as causing less byte-congestion even though its | rightly be perceived as causing less byte-congestion even though its | |||
| packet rate is the same. | packet rate is the same. | |||
| If the two flows are mixed into one, of bit rate x1+x2, with equal | If the two flows are mixed into one, of bit rate x1+x2, with equal | |||
| packet rates of each size packet, the ratio p_b will still be | packet rates of each size packet, the ratio p_b will still be | |||
| measurable by counting the ratio of marked to unmarked bytes (or | measurable by counting the ratio of marked to unmarked bytes (or | |||
| packets because the ratio cancels out the units). However, if the | packets because the ratio cancels out the units). However, if the | |||
| absolute volume of congestion is required, the transport must count | absolute volume of congestion is required, the transport must count | |||
| the sum of congestion marked bytes, which indeed gives a correct | the sum of congestion marked bytes, which indeed gives a correct | |||
| skipping to change at page 23, line 7 | skipping to change at page 22, line 24 | |||
| bit-congestible resource, the flow with smaller packets will have a | bit-congestible resource, the flow with smaller packets will have a | |||
| higher packet rate, so more packets will be both marked and unmarked, | higher packet rate, so more packets will be both marked and unmarked, | |||
| but in the same proportion. | but in the same proportion. | |||
| This time, the transport should only count marks without taking into | This time, the transport should only count marks without taking into | |||
| account packet sizes. Transports will get the same result, p_p, by | account packet sizes. Transports will get the same result, p_p, by | |||
| decoding the ratio of marked to unmarked packets in either flow. | decoding the ratio of marked to unmarked packets in either flow. | |||
| If one flow imitates the two flows but merged together, the bit rate | If one flow imitates the two flows but merged together, the bit rate | |||
| will double with more small packets than large. The ratio of marked | will double with more small packets than large. The ratio of marked | |||
| to unmarked packets will still be p_p. But if the absolute volume of | to unmarked packets will still be p_p. But if the absolute number of | |||
| pkt-congestion marked packets is counted it will accumulate at the | pkt-congestion marked packets is counted it will accumulate at the | |||
| combined packet rate times the marking probability, p_p(u_1+u_2), 26x | combined packet rate times the marking probability, p_p(u_1+u_2), 26x | |||
| faster than packet congestion accumulates in the single 1500B packet | faster than packet congestion accumulates in the single 1500B packet | |||
| flow of our example, as required. | flow of our example, as required. | |||
| But if the transport is interested in the absolute volume of packet | But if the transport is interested in the absolute number of packet | |||
| congestion, it should just count how many marked packets arrive. For | congestion, it should just count how many marked packets arrive. For | |||
| instance, a flow sending 60B packets will see 25x more marked packets | instance, a flow sending 60B packets will see 25x more marked packets | |||
| than one sending 1500B packets at the same bit rate, because it is | than one sending 1500B packets at the same bit rate, because it is | |||
| sending more packets through a packet-congestible resource. | sending more packets through a packet-congestible resource. | |||
| Note the unit of packet congestion is packets. | Note the unit of packet congestion is packets. | |||
| A.5. Pkt-congestible resource, equal packet rates (Bii) | A.5. Pkt-congestible resource, equal packet rates (Bii) | |||
| Finally, if two flows with the same packet rate, pass through a | Finally, if two flows with the same packet rate, pass through a | |||
| skipping to change at page 23, line 40 | skipping to change at page 23, line 10 | |||
| Even if the transport is monitoring the absolute amount of packets | Even if the transport is monitoring the absolute amount of packets | |||
| congestion over a period, still it will see the same amount of packet | congestion over a period, still it will see the same amount of packet | |||
| congestion from either flow. | congestion from either flow. | |||
| And if the two equal packet rates of different size packets are mixed | And if the two equal packet rates of different size packets are mixed | |||
| together in one flow, the packet rate will double, so the absolute | together in one flow, the packet rate will double, so the absolute | |||
| volume of packet-congestion will accumulate at twice the rate of | volume of packet-congestion will accumulate at twice the rate of | |||
| either flow, 2p_p.u_1 = p_p(u_1+u_2). | either flow, 2p_p.u_1 = p_p(u_1+u_2). | |||
| Appendix B. Congestion Notification Definition: Further Justification | ||||
| In Section 3 on the definition of congestion notification, load not | ||||
| capacity was used as the denominator. This also has a subtle | ||||
| significance in the related debate over the design of new transport | ||||
| protocols--typical new protocol designs (e.g. in XCP | ||||
| [I-D.falk-xcp-spec] & Quickstart [RFC4782]) expect the sending | ||||
| transport to communicate its desired flow rate to the network and | ||||
| network elements to progressively subtract from this so that the | ||||
| achievable flow rate emerges at the receiving transport. | ||||
| Congestion notification with total load in the denominator can serve | ||||
| a similar purpose (though in retrospect not in advance like XCP & | ||||
| QuickStart). Congestion notification is a dimensionless fraction but | ||||
| each source can extract necessary rate information from it because it | ||||
| already knows what its own rate is. Even though congestion | ||||
| notification doesn't communicate a rate explicitly, from each | ||||
| source's point of view congestion notification represents the | ||||
| fraction of the rate it was sending a round trip ago that couldn't | ||||
| (or wouldn't) be served by available resources. After they were | ||||
| sent, all these fractions of each source's offered load added up to | ||||
| the aggregate fraction of offered load seen by the congested | ||||
| resource. So, the source can also know the total excess rate by | ||||
| multiplying total load by congestion level. Therefore congestion | ||||
| notification, as one scale-free dimensionless fraction, implicitly | ||||
| communicates the instantaneous excess flow rate, albeit a RTT ago. | ||||
| Appendix C. Byte-mode Drop Complicates Policing Congestion Response | ||||
| This appendix explains why the ability of networks to police the | ||||
| response of _any_ transport to congestion depends on bit-congestible | ||||
| network resources only doing packet-mode not byte-mode drop. | ||||
| To be able to police a transport's response to congestion when | ||||
| fairness can only be judged over time and over all an individual's | ||||
| flows, the policer has to have an integrated view of all the | ||||
| congestion an individual (not just one flow) has caused due to all | ||||
| traffic entering the Internet from that individual. This is termed | ||||
| congestion accountability. | ||||
| But with byte-mode drop, one dropped or marked packet is not | ||||
| necessarily equivalent to another unless you know the MTU that caused | ||||
| it to be dropped/marked. To have an integrated view of a user, we | ||||
| believe congestion policing has to be located at an individual's | ||||
| attachment point to the Internet [Re-TCP]. But from there it cannot | ||||
| know the MTU of each remote router that caused each mark. Therefore | ||||
| it cannot take an integrated approach to policing all the responses | ||||
| to congestion of all the transports of one individual. Therefore it | ||||
| cannot police anything. | ||||
| The security/incentive argument _for_ packet-mode drop is similar. | ||||
| Firstly, confining RED to packet-mode drop would not preclude | ||||
| bottleneck policing approaches such as [pBox] as it seems likely they | ||||
| could work just as well by monitoring the volume of dropped bytes | ||||
| rather than packets. Secondly packet-mode marking naturally allows | ||||
| the congestion marking on packets to be globally meaningful without | ||||
| relying on MTU information held elsewhere. | ||||
| Because we recommend that a marked packet should be taken to mean | ||||
| that all the bytes in the packet are congestion marked, a policer can | ||||
| remain robust against bits being re-divided into different size | ||||
| packets or across different size flows [Rate_fair_Dis]. Therefore | ||||
| policing would work naturally with just simple packet-mode drop in | ||||
| RED. | ||||
| In summary, making drop probability depend on the size of the packets | ||||
| that bits happen to be divided into simply encourages the bits to be | ||||
| divided into smaller packets. Byte-mode drop would therefore | ||||
| irreversibly complicate any attempt to fix the Internet's incentive | ||||
| structures. | ||||
| Changes from Previous Versions | ||||
| To be removed by the RFC Editor on publication. | ||||
| From -00 to -01: | ||||
| Clarified applicability to drop as well as ECN. | ||||
| Highlighted DoS vulnerability. | ||||
| Emphasised that drop-tail suffers from similar problems to | ||||
| byte-mode drop, so only byte-mode drop should be turned off, | ||||
| not RED itself. | ||||
| Clarified the original apparent motivations for recommending | ||||
| byte-mode drop included protecting SYNs and pure ACKs more than | ||||
| equalising the bit rates of TCPs with different segment sizes. | ||||
| Removed some conjectured motivations. | ||||
| Added support for updates to TCP in progress (ackcc & ecn-syn- | ||||
| ack). | ||||
| Updated survey results with newly arrived data. | ||||
| Pulled all recommendations together into the conclusions. | ||||
| Moved some detailed points into two additional appendices and a | ||||
| note. | ||||
| Considerable clarifications throughout. | ||||
| Updated references | ||||
| 12. References | 12. References | |||
| 12.1. Normative References | 12.1. Normative References | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, March 1997. | Requirement Levels", BCP 14, RFC 2119, March 1997. | |||
| [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | [RFC2309] Braden, B., Clark, D., Crowcroft, J., Davie, B., Deering, | |||
| S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | S., Estrin, D., Floyd, S., Jacobson, V., Minshall, G., | |||
| Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | Partridge, C., Peterson, L., Ramakrishnan, K., Shenker, | |||
| skipping to change at page 24, line 43 | skipping to change at page 26, line 24 | |||
| Siris, V., "Resource Control for Elastic Traffic in CDMA | Siris, V., "Resource Control for Elastic Traffic in CDMA | |||
| Networks", Proc. ACM MOBICOM'02 , September 2002, <http:// | Networks", Proc. ACM MOBICOM'02 , September 2002, <http:// | |||
| www.ics.forth.gr/netlab/publications/ | www.ics.forth.gr/netlab/publications/ | |||
| resource_control_elastic_cdma.html>. | resource_control_elastic_cdma.html>. | |||
| [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the | [Evol_cc] Gibbens, R. and F. Kelly, "Resource pricing and the | |||
| evolution of congestion control", Automatica 35(12)1969-- | evolution of congestion control", Automatica 35(12)1969-- | |||
| 1985, December 1999, | 1985, December 1999, | |||
| <http://www.statslab.cam.ac.uk/~frank/evol.html>. | <http://www.statslab.cam.ac.uk/~frank/evol.html>. | |||
| [I-D.briscoe-tsvarea-fair] | ||||
| Briscoe, B., "Flow Rate Fairness: Dismantling a Religion", | ||||
| draft-briscoe-tsvarea-fair-01 (work in progress), | ||||
| March 2007. | ||||
| [I-D.falk-xcp-spec] | [I-D.falk-xcp-spec] | |||
| Falk, A., "Specification for the Explicit Control Protocol | Falk, A., "Specification for the Explicit Control Protocol | |||
| (XCP)", draft-falk-xcp-spec-02 (work in progress), | (XCP)", draft-falk-xcp-spec-03 (work in progress), | |||
| November 2006. | July 2007. | |||
| [I-D.floyd-tcpm-ackcc] | ||||
| Floyd, S. and I. Property, "Adding Acknowledgement | ||||
| Congestion Control to TCP", draft-floyd-tcpm-ackcc-02 | ||||
| (work in progress), November 2007. | ||||
| [I-D.ietf-pcn-architecture] | ||||
| Eardley, P., "Pre-Congestion Notification Architecture", | ||||
| draft-ietf-pcn-architecture-01 (work in progress), | ||||
| October 2007. | ||||
| [I-D.ietf-tcpm-ecnsyn] | ||||
| Floyd, S. and I. Property, "Adding Explicit Congestion | ||||
| Notification (ECN) Capability to TCP's SYN/ACK Packets", | ||||
| draft-ietf-tcpm-ecnsyn-03 (work in progress), | ||||
| November 2007. | ||||
| [I-D.ietf-tcpm-rfc2581bis] | [I-D.ietf-tcpm-rfc2581bis] | |||
| Allman, M., "TCP Congestion Control", | Allman, M., "TCP Congestion Control", | |||
| draft-ietf-tcpm-rfc2581bis-02 (work in progress), | draft-ietf-tcpm-rfc2581bis-03 (work in progress), | |||
| February 2007. | September 2007. | |||
| [I-D.irtf-iccrg-welzl-congestion-control-open-research] | ||||
| Papadimitriou, D., "Open Research Issues in Internet | ||||
| Congestion Control", | ||||
| (work in progress), July 2007. | ||||
| [MulTCP] Crowcroft, J. and Ph. Oechslin, "Differentiated End to End | [MulTCP] Crowcroft, J. and Ph. Oechslin, "Differentiated End to End | |||
| Internet Services using a Weighted Proportional Fair | Internet Services using a Weighted Proportional Fair | |||
| Sharing TCP", CCR 28(3) 53--69, July 1998, <http:// | Sharing TCP", CCR 28(3) 53--69, July 1998, <http:// | |||
| www.cs.ucl.ac.uk/staff/J.Crowcroft/hipparch/pricing.html>. | www.cs.ucl.ac.uk/staff/J.Crowcroft/hipparch/pricing.html>. | |||
| [PCN] Briscoe, B., Eardley, P., Songhurst, D., Le Faucheur, F., | ||||
| Charny, A., Liatsos, V., Babiarz, J., Chan, K., Dudley, | ||||
| S., Westberg, L., Bader, A., and G. Karagiannis, "Pre- | ||||
| Congestion Notification Marking", | ||||
| draft-briscoe-tsvwg-cl-phb-03 (work in progress), | ||||
| October 2006. | ||||
| [PCNcharter] | [PCNcharter] | |||
| IETF, "Congestion and Pre-Congestion Notification (pcn)", | IETF, "Congestion and Pre-Congestion Notification (pcn)", | |||
| IETF w-g charter , Feb 2007, | IETF w-g charter , Feb 2007, | |||
| <http://www.ietf.org/html.charters/pcn-charter.html>. | <http://www.ietf.org/html.charters/pcn-charter.html>. | |||
| [PktSizeEquCC] | [PktSizeEquCC] | |||
| Vasallo, P., "Variable Packet Size Equation-Based | Vasallo, P., "Variable Packet Size Equation-Based | |||
| Congestion Control", ICSI Technical Report tr-00-008, | Congestion Control", ICSI Technical Report tr-00-008, | |||
| 2000, <http://http.icsi.berkeley.edu/ftp/global/pub/ | 2000, <http://http.icsi.berkeley.edu/ftp/global/pub/ | |||
| techreports/2000/tr-00-008.pdf>. | techreports/2000/tr-00-008.pdf>. | |||
| skipping to change at page 26, line 6 | skipping to change at page 27, line 44 | |||
| Computers and Communications (ISCC) 793--799, July 2000, | Computers and Communications (ISCC) 793--799, July 2000, | |||
| <http://www.icir.org/floyd/red/Elloumi99.pdf>. | <http://www.icir.org/floyd/red/Elloumi99.pdf>. | |||
| [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | [RFC3714] Floyd, S. and J. Kempf, "IAB Concerns Regarding Congestion | |||
| Control for Voice Traffic in the Internet", RFC 3714, | Control for Voice Traffic in the Internet", RFC 3714, | |||
| March 2004. | March 2004. | |||
| [RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick- | [RFC4782] Floyd, S., Allman, M., Jain, A., and P. Sarolahti, "Quick- | |||
| Start for TCP and IP", RFC 4782, January 2007. | Start for TCP and IP", RFC 4782, January 2007. | |||
| [Re-TCP] Briscoe, B., Jacquet, A., Salvatori, A., and M. Koyabi, | [Rate_fair_Dis] | |||
| "Re-ECN: Adding Accountability for Causing Congestion to | Briscoe, B., "Flow Rate Fairness: Dismantling a Religion", | |||
| TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-03 (work in | ACM CCR 37(2)63--74, April 2007, | |||
| progress), October 2006. | <http://portal.acm.org/citation.cfm?id=1232926>. | |||
| [Re-TCP] Briscoe, B., Jacquet, A., Salvatori, A., Koyabi, M., and | ||||
| T. Moncaster, "Re-ECN: Adding Accountability for Causing | ||||
| Congestion to TCP/IP", draft-briscoe-tsvwg-re-ecn-tcp-04 | ||||
| (work in progress), July 2007. | ||||
| [WindowPropFair] | [WindowPropFair] | |||
| Siris, V., "Service Differentiation and Performance of | Siris, V., "Service Differentiation and Performance of | |||
| Weighted Window-Based Congestion Control and Packet | Weighted Window-Based Congestion Control and Packet | |||
| Marking Algorithms in ECN Networks", Computer | Marking Algorithms in ECN Networks", Computer | |||
| Communications 26(4) 314--326, 2002, <http:// | Communications 26(4) 314--326, 2002, <http:// | |||
| www.ics.forth.gr/netgroup/publications/ | www.ics.forth.gr/netgroup/publications/ | |||
| weighted_window_control.html>. | weighted_window_control.html>. | |||
| [gentle_RED] | ||||
| Floyd, S., "Recommendation on using the "gentle_" variant | ||||
| of RED", Web page , March 2000, | ||||
| <http://www.icir.org/floyd/red/gentle.html>. | ||||
| [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End | [pBox] Floyd, S. and K. Fall, "Promoting the Use of End-to-End | |||
| Congestion Control in the Internet", IEEE/ACM Transactions | Congestion Control in the Internet", IEEE/ACM Transactions | |||
| on Networking 7(4) 458--472, August 1999, | on Networking 7(4) 458--472, August 1999, | |||
| <http://www.aciri.org/floyd/end2end-paper.html>. | <http://www.aciri.org/floyd/end2end-paper.html>. | |||
| [pktByteEmail] | [pktByteEmail] | |||
| Floyd, S., "RED: Discussions of Byte and Packet Modes", | Floyd, S., "RED: Discussions of Byte and Packet Modes", | |||
| email , March 1997, | email , March 1997, | |||
| <http://www-nrg.ee.lbl.gov/floyd/REDaveraging.txt>. | <http://www-nrg.ee.lbl.gov/floyd/REDaveraging.txt>. | |||
| End of changes. 79 change blocks. | ||||
| 411 lines changed or deleted | 512 lines changed or added | |||
This html diff was produced by rfcdiff 1.34. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||