Network Working Group F. Templin, Ed. Internet-Draft Boeing Phantom Works Intended status: Experimental May 11, 2007 Expires: November 12, 2007 Link Adaptation for IPv6-in-(foo)*-in-IPv4 Tunnels draft-templin-linkadapt-06.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on November 12, 2007. Copyright Notice Copyright (C) The IETF Trust (2007). Abstract IPv6-in-(foo)*-in-IPv4 tunnels must support a minimum Maximum Transmission Unit (MTU) of 1280 bytes for IPv6 via static prearrangements and/or dynamic MTU determination based on ICMPv4 messages, but these methods have known operational limitations. This document specifies a link adaptation mechanism for IPv6-in-(foo)*-in- IPv4 tunnels that presents an assured MTU to the IPv6 layer using tunnel endpoint-based segmentation/reassembly and dynamic segment size probing. Templin Expires November 12, 2007 [Page 1] Internet-Draft Link Adaptation for Tunnels May 2007 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Tunnel MTU Assurance Methods and Issues . . . . . . . . . . . 4 4. Link Adaptation for IPv6-in-(foo)*-in-IPv4 Tunnels . . . . . . 4 4.1. Layering . . . . . . . . . . . . . . . . . . . . . . . . . 4 4.2. Initial Negotiation Phase . . . . . . . . . . . . . . . . 5 4.3. Tunnel MTU and MRU . . . . . . . . . . . . . . . . . . . . 5 4.4. Ingress Tunnel Endpoint Specification . . . . . . . . . . 5 4.4.1. Segmentation and Encapsulation . . . . . . . . . . . . 6 4.4.2. IPv4 Fragmentation and Setting the DF Bit . . . . . . 8 4.4.3. Probing . . . . . . . . . . . . . . . . . . . . . . . 8 4.4.4. Processing Errors . . . . . . . . . . . . . . . . . . 9 4.5. Egress Tunnel Endpoint Specification . . . . . . . . . . . 10 4.5.1. Decapsulation and Reassembly . . . . . . . . . . . . . 10 4.5.2. Sending Errors . . . . . . . . . . . . . . . . . . . . 11 4.5.3. Sending Probe Replies . . . . . . . . . . . . . . . . 11 4.5.4. Active Reassembly Buffer Management . . . . . . . . . 12 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 7. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 12 8. Appendix A: Additional Considerations . . . . . . . . . . . . 12 9. Appendix B: Changes . . . . . . . . . . . . . . . . . . . . . 13 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14 10.1. Normative References . . . . . . . . . . . . . . . . . . . 14 10.2. Informative References . . . . . . . . . . . . . . . . . . 15 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16 Intellectual Property and Copyright Statements . . . . . . . . . . 17 Templin Expires November 12, 2007 [Page 2] Internet-Draft Link Adaptation for Tunnels May 2007 1. Introduction IPv6-in-(foo)*-in-IPv4 tunnels may span multiple IPv4 network hops yet are seen by IPv6 as ordinary links that must support the minimum IPv6 Maximum Transmission Unit (MTU) of 1280 bytes ([RFC2460], Section 5). Common tunneling mechanisms (e.g., [RFC3056][RFC4213][RFC4214][RFC4380], etc.) meet this requirement through conservative static prearrangements at the expense of degraded performance over some paths due to excessive IPv4 network- based fragmentation and/or missed opportunities to discover larger MTUs. Optional dynamic MTU determination methods [RFC1191] are also available, but may not provide adequate robustness. This document specifies a link adaptation mechanism for IPv6-in- (foo)*-in-IPv4 tunnels that presents an assured MTU to the IPv6 layer. It uses tunnel endpoint-based segmentation/reassembly and dynamic segment size probing with authenticated probe feedback. Thus, it provides greater robustness and efficiency by avoiding IPv4 network-based fragmentation and dependence on ICMPv4 feedback from IPv4 network middleboxes. 2. Terminology The following terms are defined within the scope of this document: Upper Layer Payload (ULP) a whole IPv6 packet, or a fragment packet created by IPv6 fragmentation. Ingress Tunnel Endpoint (ITE) the tunnel interface endpoint that accepts ULPs from the IP layer and segments/packetizes them for transmission into a tunnel. Egress Tunnel Endpoint (ETE) the tunnel interface endpoint that receives packets from a tunnel and de-packetizes/reassembles them into ULPs for delivery to the IP layer. IP Layer the layer above the tunnel interface, i.e., the IPv6 layer. Templin Expires November 12, 2007 [Page 3] Internet-Draft Link Adaptation for Tunnels May 2007 Sub-IP Layer any sublayers that occur within the tunnel interface, i.e., any (foo)* layers and including the upper portion of the IPv4 layer. Note that IPv4 is also viewed as the Layer 2 protocol from the perspective of the tunnel, so the Sub-IP layer begins below the IP layer and extends into Layer 2. The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this document, are to be interpreted as described in [RFC2119]. 3. Tunnel MTU Assurance Methods and Issues Common tunnel MTU assurance methods include classical IPv4 fragmentation [RFC0791], and IPv4/IPv6 Path MTU discovery [RFC1191][RFC1981]. Other possibilities include operational assurance of widely-deployed links with large MTUs. However, these methods have well-known operational limitations that are well documented [FRAG][I-D.heffner-frag-harmful][RFC2923][RFC4459]. This document specifies a link adaptation scheme for IPv6-in-(foo)*- in-IPv4 tunnels that is distinct from the above alternatives and avoids the issues. It entails segmentation at the ITR and reassembly at the ETR at a logical mid-layer between IPv6 fragmentation and IPv4 fragmentation. It therefore resembles classical IPv4 fragmentation but: 1) only allows fragmentation to occur at the ITE, 2) supports path probing to detect the optimum segment size, and 3) avoids sequence number wrapping and data integrity issues through careful reassembly buffer management at the ETR. The scheme is specified in the following sections: 4. Link Adaptation for IPv6-in-(foo)*-in-IPv4 Tunnels The following subsections specify link adaptation mechanisms for IPv6-in-(foo)*-in-IPv4 tunnels with properties similar to the link adaptation mechanisms defined for AAL5 [RFC2684] and IEEE 802.11 [WLAN]: 4.1. Layering IPv6-in-(foo)*-in-IPv4 tunnel endpoints operate at a logical midpoint between the IPv6 and IPv4 protocol modules. From the viewpoint of IPv6, the tunnel appears as an ordinary network interface module that delivers whole IPv6 packets and IPv6 fragment packets as ULPs to and from an underlying link. From the viewpoint of IPv4, the tunnel Templin Expires November 12, 2007 [Page 4] Internet-Draft Link Adaptation for Tunnels May 2007 appears as a packetization layer protocol that segments and reassembles ULPs. This document refers to the IPv6 layer as the "IP Layer" (i.e., layer 3) and any sublayers that occur within the tunnel interface (i.e., any (foo)* layers and including the upper portion of the IPv4 layer itself) as the "Sub-IP layer". Note that IPv4 is also viewed as the Layer 2 protocol from the perspective of the tunnel, so the Sub-IP layer begins below the IP layer and extends into Layer 2. Note also that (foo)* may entail multiple nested sublayers or may even be NULL, i.e., in the case of IPv6-in-IPv4 tunnels. 4.2. Initial Negotiation Phase IPv6-in-(foo)*-in-IPv4 tunnel endpoints MUST first determine that the link adaptation mechanisms are implemented by both the ITE and ETE through an initial negotiation phase specified outside the scope of this document. ITEs/ETEs for which one or both ends of the tunnel do not implement the scheme MUST use the default MTU assurance mechanisms specified for the particular IPv6-in-(foo)*-in-IPv4 tunneling mechanism, and do not implement any other aspects of this specification. 4.3. Tunnel MTU and MRU ITEs MUST configure a minimum IPv6 link MTU of 1280 bytes for all flows and SHOULD provide a configuration knob to set larger values. A nominal per-flow MTU of 9180 bytes (i.e., the same as defined in [RFC1626]) is RECOMMENDED, since it is large enough to accommodate frame sizes as large as Gigabit Ethernet Jumbo Frames [GIGE]. ITEs MAY set still larger MTU values, but are advised that this may lead to excessive packet loss and ICMPv6 "packet too big" messages. ETEs MUST configure a minimum per-flow Sub-IP layer reassembly buffer size (i.e., a minimum Sub-IP layer Maximum Receive Unit (MRU)) of 1280 bytes, and SHOULD configure an MRU of 9180 bytes or larger to accommodate the recommended nominal MTU for ITEs. A maximum MRU of 11454 bytes is RECOMMENDED, since 11454 bytes is the maximum packet size for which a 32-bit CRC can provide Ethernet-quality bit error detection [JAIN][AARNET]. ETEs MAY set still larger MRU values, but are advised that larger values may lead to unacceptable levels of undetected errors unless all physical segments in the path provide assured error-free delivery for larger packets. 4.4. Ingress Tunnel Endpoint Specification The following subsections specify mechanisms implemented by the ITE: Templin Expires November 12, 2007 [Page 5] Internet-Draft Link Adaptation for Tunnels May 2007 4.4.1. Segmentation and Encapsulation ITEs maintain a per-flow MTU and per-flow segment size ("SEGSIZE") for the purpose of segmenting ULPs that are too large to traverse the tunnel. It is RECOMMENDED that ITEs configure an initial per-flow SEGSIZE such that (SEGSIZE + length((foo)* headers) + length(IPv4 header)) yields an IPv4 datagram size between 256-576 bytes (since 256 bytes can safely accommodate the recommended nominal MTU (see below), and since IPv4 nodes are only required to accept datagrams of up to 576 bytes [RFC0791]). Since most IPv4 links in the Internet configure still larger MTUs [RFC3150][RFC3819], and since IPv4 nodes should accept packets as large as the underlying link MTU [RFC1122], ITEs MAY use a still larger initial per-flow SEGSIZE if there is assurance that it would not cause gratuitous IPv4 fragmentation and/or overrun the IPv4 reassembly buffer. ITEs probe the path to maintain SEGSIZE and/or discover larger SEGSIZEs during the lifetime of a flow (see: Section 4.4.3. ITEs split each ULP they send into a tunnel into chains of segments for packetization and presentation to the IPv4 layer. For ULPs that will span multiple segments, the ITE first uses the 2's compliment Fletcher-32 checksum [STONE][RFC3385] to calculate a checksum across the entire ULP, then appends the A and B results as a trailing 32-bit checksum at the end of the ULP. For ULPs that fit within a single segment, the ITE omits the trailing checksum. The ITE next splits the ULP into a chain of consecutive segments that MUST be created as contiguous and non-overlapping, i.e., the final byte of the (i)th segment MUST be the byte that immediately precedes the first byte of the (i+1)th segment. Non-final segments in the chain MUST be identical in length and no larger than SEGSIZE bytes; the final segment MAY be of different length. The ITE encapsulates each segment in Sub-IP layer headers (including any (foo)* headers and an IPv4 header) to form a chain of IPv4 packets; each packet in the chain MUST include Sub-IP layer encapsulation headers of identical length. The ITE sets the DF bit in the IPv4 header according to the specification in Section 4.4.2, and encodes the following information in the 16-bit IPv4 "Identification" field of each segment: 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ULPID | SEGID |P|A| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ IPv4 Identification Field Templin Expires November 12, 2007 [Page 6] Internet-Draft Link Adaptation for Tunnels May 2007 ULPID: 8 bits An identifying value assigned by the ITE to aid the ETE in reassembling the segments of a ULP. SEGID: 6 bits A value that identifies a specific segment within a ULP. P: 1 bit Probe flag; 0 = Ordinary Segment, 1 = Probe Segment. A: 1 bit Additional Segments flag; 0 = Last Segment, 1 = Additional Segments. The ITE encodes an identical value in the "ULPID" field (bits 0 - 7 of the IPv4 Identification field) of each IPv4 packet in a chain to identify the segments of a specific ULP; it encodes different ULPID values in IPv4 packets that encapsulate segments of different ULPs. The ITE also encodes an increasing Segment ID value between 0 - 62 in the "SEGID" field (bits 8 - 13 of the IPv4 Identification field) of consecutive packets in a chain, i.e., it encodes the value '0' in the first packet, encodes the value '1' in the second packet, etc. The ITE then sets the "Additional Segments - A" bit (bit 15 of the IPv4 Identification field) in each packet in the chain except the final one to indicate that additional segments follow. Finally, it delivers each packet in the chain to the link layer (i.e., the IPv4 layer) in increasing SEGID order, i.e., SEGID 0 first, followed by SEGID 1, etc., up to the final packet. The IPv4 layer SHOULD NOT reorder the packets in a chain, but rather SHOULD deliver them to the underlying link in the order in which the tunnel interface produced them. Note that IPv4 fragmentation in the network could theoretically result in silent packet loss along certain paths even for packets with the smallest recommended initial SEGSIZE (see: Section 4.4.2). As such, a robust ITE implementation could reduce its IPv4 packet sizes to as small as 68 bytes if it suspects that larger packets are disappearing into a fragmentation-related black hole, but such small packets might not satisfy the nominal tunnel MTU of 9180 bytes. ITEs SHOULD therefore return locally-generated IPv6 "packet too big" messages for IPv6 packets that cannot be segmented and encapsulated within current IPv4 packet size and chain length limitations for the tunnel. Templin Expires November 12, 2007 [Page 7] Internet-Draft Link Adaptation for Tunnels May 2007 4.4.2. IPv4 Fragmentation and Setting the DF Bit When an ITE segments a ULP (see: Section 4.4.1), it can optionally set or clear the "Don't Fragment - DF" bit in the encapsulating IPv4 headers of packets in the chain. If the DF bit is cleared, gratuitous network-based IPv4 fragmentation could result in well- known operational issues [FRAG] [I-D.heffner-frag-harmful]. Also, some middleboxes (such as IPv4 NATs and firewalls) may only be capable of passing the first fragment of a multi-fragment IPv4 datagram, and large multi-fragment datagrams could result in IPv4 reassembly buffer overruns. Finally, the minimum IPv4 MTU is only 68 bytes (i.e., the size required to encapsulate a maximum-length (60 byte) IPv4 header and a minimum-length (8 byte) fragment [RFC0791]) such that a limited amount of IPv4 fragmentation may occur in the network even for relatively small packets. Nonetheless, clearing the DF bit can in some circumstances increase the packet delivery ratio when setting the DF bit would otherwise result in excessive packet loss due to temporal link MTU restrictions. In view of the above considerations, the ITE: o SHOULD set the DF bit in probe packets (see: Section 4.4.3) larger than 576 bytes. o SHOULD set the DF bit in all packets larger than 576 bytes if it will not perform active probing (see: Section 4.4.3). o MAY clear the DF bit in any packets larger than 576 bytes if it will perform active probing. o MAY clear the DF bit in any packets of 576 bytes or smaller. 4.4.3. Probing To increase efficiency and avoid excessive packet chain lengths, ITEs SHOULD probe the path periodically to increase a flow's SEGSIZE to larger values. ITEs probe a candidate SEGSIZE value 'N' by setting the "Probe Segment - P" bit (bit 14 of the IPv4 Identification field) in packets that encapsulate a probe segment of size N. For probe segments that contain valid data for reassembly as part of a packet chain, the ITE sets the appropriate SEGID value in the IPv4 packet header as for ordinary segmentation. For probe segments that are to be discarded by the ETE, the ITE sets the value 63 in the SEGID field. When the ITE sends a probe packet, it marks the probe as "pending" for a period of 'MaxProbeDelay' msec (i.e., a per-flow round-trip time estimate for the tunnel) and caches the probe packet's IPv4 Templin Expires November 12, 2007 [Page 8] Internet-Draft Link Adaptation for Tunnels May 2007 destination, length and identification field values, as well as the IPv6 flow label value [RFC3697]. If the ITE receives a valid Node Information Query reply (NI Reply) [RFC4620] from the ETE (see: Section 4.5.3) before the probe period expires, it marks the probe as successful; otherwise, it marks the probe as failed. A valid NI Reply MUST have: o the Type, Code, Qtype and Flags fields set as specified for a NOOP reply in ([RFC4620], Section 6.1), and o the IPv4 length of the probe packet matches bits 0-15 of the Nonce field, and o the IPv4 identification of the probe packet matches bits 16-31 of the Nonce field, and o the IPv6 flow label value matches bits 32-51 of the Nonce field Following a successful probe, but before advancing SEGSIZE to N, the ITE SHOULD enter a brief verification phase during which it sends additional probe segments to detect asymmetric multipath MTU restrictions and/or route fluctuations. Thereafter, the ITE SHOULD re-probe periodically to confirm that packets with up to SEGSIZE byte segments are still reaching the ETE. After probing the path to discover a new SEGSIZE, the ITE may elect to set or clear the DF bit in subsequent non-probe packets (see: Section 4.4.2). For example, the ITE may elect to clear the DF bit to maintain an optimal packet delivery ratio across temporal link MTU restrictions (e.g., due to dynamic rerouting of flows, etc.) while it may elect to set the DF bit to avoid all IPv4 fragmentation in the network. ITEs that elect to clear the DF bit in non-probe packets SHOULD engage in "active probing" to periodically confirm SEGSIZE "frequently enough" such that cyclical misassociations and possible data corruptions at the ETE do not occur [I-D.heffner-frag-harmful] if a flow begins to fragment. ITEs that elect to set the DF bit in non-probe packets SHOULD carefully consider any ICMPv4 "fragmentation needed" messages that arrive (see: Section 4.4.4) but are advised that packet delivery ratios may suffer when the flow transmission rate is high and/or the path round trip time is large. 4.4.4. Processing Errors ITEs may receive ICMPv4 "fragmentation needed" error messages from middleboxes inside a tunnel, but are advised to consider them as "soft errors". Implementers are advised to consult Templin Expires November 12, 2007 [Page 9] Internet-Draft Link Adaptation for Tunnels May 2007 [RFC1191][RFC2923][RFC4821] for operational recommendations on processing ICMPv4 "fragmentation needed" messages. ITEs may receive encapsulated ICMPv6 "packet too big" messages [RFC1981] from an ETE at the far end of a tunnel (see: Section 4.5.2). The ITE SHOULD cache the MTU value encoded in the "packet too big" message as the new MTU for the flow, and relay the ICMPv6 message back to the original source. ITEs may receive encapsulated ICMPv6 "parameter problem" messages with code "reassembly/checksum error" [RFC4443] from an ETE at the far end of the tunnel (see: Section 4.5.2). This may indicate an isolated packet splicing error at the ETE, or packet loss due to temporal network conditions such as congestion, MTU restrictions, link errors, signal intermittence, etc. If the ITE receives persistent reassembly/checksum errors from an ETE, it SHOULD take adaptive measures, e.g., reduce the SEGSIZE for the flow, rate-limit the packets it sends into the tunnel, etc. Since each reassembly/ checksum error corresponds to a dropped packet, the ITE SHOULD relay the messages back to the original source (subject to rate limiting). 4.5. Egress Tunnel Endpoint Specification The following subsections specify mechanisms implemented by the ETE: 4.5.1. Decapsulation and Reassembly The IPv4 length, ULPID, SEGID and A fields in the IPv4 packets in a chain (along with the IPv6 flow label [RFC3697]) provide sufficient information for the ETE to reassemble an original ULP with protection for packet reordering in the network. ETEs MUST configure per-flow reassembly buffers of at least 1280 bytes and SHOULD configure reassembly buffers of 9180 bytes or larger to accommodate the nominal tunnel MTU (see: Section 4.2). Note that these reassembly buffers occur at the Sub-IP layer and are thus distinct from the IPv4 and IPv6 reassembly caches. ETEs use per-flow reassembly buffers to concatenate the segments received in packet chains for a particular ULPID in increasing SEGID order (i.e., SEGID 0, followed by SEGID 1, etc.) even if the packets were re-ordered by the network. When all segments for a particular ULPID have been concatenated into the reassembly buffer, the ETE uses 2's complement Fletcher-32 to verify the checksum if one was included (see: Section 4.4.1). The ETE the discards the Sub-IP layer encapsulation headers and trailing checksum, and delivers correctly- reassembled ULPs to the IP layer (i.e., IPv6). It discards incomplete ULPs and ULPs with incorrect checksums, and sends an appropriate error message as specified in Section 4.5.2. Templin Expires November 12, 2007 [Page 10] Internet-Draft Link Adaptation for Tunnels May 2007 4.5.2. Sending Errors If the ETE receives a packet chain that would overflow the reassembly buffer, it discards the chain and sends an ICMPv6 "packet too big" message [RFC1981] back to the IPv6 source via the reverse tunnel back to the ITE. The ETE includes in the message body up to 1280 bytes beginning with the upper layer packet headers (IPv6 and above) and the contents of the reassembly buffer beyond the upper layer packet headers; it encodes the size of the reassembly buffer in the MTU value. If the ETE receives at least one segment, but one or more segments are lost and/or checksum verification fails, it SHOULD send an ICMPv6 "parameter problem" message with code "reassembly/checksum error" [RFC4443] back to the IPv6 source via the reverse tunnel back to the ITE. The ETE includes in the message body up to 1280 bytes beginning with the upper layer packet headers (IPv6 and above) and contents of the reassembly buffer beyond the upper layer packet headers, and sets the pointer to either the beginning of the first missing segment or the beginning of the 4 byte checksum field (if no segments were missing). After sending the error, the ITE discards the packet-in-error, i.e., it does not deliver the packet as an ULP to the IP layer. 4.5.3. Sending Probe Replies If the ETE receives a segment used for probing (i.e., an IPv4 packet in the chain with the 'P' flag set), it sends a Node Information Query reply (NI Reply) [RFC4620] message back to the ITE. The ETE MUST construct the NI Reply as follows: o the Type, Code, Qtype and Flags fields set as specified for a NOOP reply in ([RFC4620], Section 6.1), and o the IPv4 length of the probe packet encoded in bits 0-15 of the Nonce field, and o the IPv4 identification of the probe packet encoded in bits 16-31 of the Nonce field, and o the IPv6 flow label value encoded in bits 32-51 of the Nonce field If the IPv4 packet containing the probe segment encodes the value 63 in the SEGID field, the ETE discards the segment; otherwise, it includes the segment as part of the normal reassembly procedure described above. Templin Expires November 12, 2007 [Page 11] Internet-Draft Link Adaptation for Tunnels May 2007 4.5.4. Active Reassembly Buffer Management The ETE MUST actively manage reassembly buffers and discard as early as possible any reassemblies that are not likely to complete due to, e.g., loss of one or more packets in the chain, gross reordering of packets in the network, etc. In particular, the ETE must discard partial reassemblies before the 8-bit ULPID encoded by the ITE wraps. The ETE therefore must augment the classical timer-driven reassembly buffer management strategy with an event-driven strategy. 5. IANA Considerations The IANA is instructed to assign a code type for "reassembly/checksum error" under the ICMPv6 Parameter Problem message type in the "ICMPv6 Type Numbers" registry. 6. Security Considerations The nonce values in NI Reply messages from ETEs provide spoofing protection against off-path attackers. 7. Acknowledgments This work has benefited from helpful discussions with many colleagues, friends and family. 8. Appendix A: Additional Considerations ITEs can use the probing mechanism described in Section 4.3 as a general-purpose method for eliciting acknowledgements from an ETE if improved reliability at the expense of additional overhead is desired. The equal size restriction for non-final segments and non-overlapping restriction for all segments in packet chains provides a significant simplification for reassembly algorithms [RFC0815]. Use of the link adaptation mechanisms specified in this document may lead to an overall increase in short chains of small packets in the Internet. Network administrators are advised to follow the recommendations in [RFC3150] to minimize packet loss and packet reordering. Also, overly-long packet chains should be avoided if possible due to interactions with Active Queue Management (AQM) in the network. Templin Expires November 12, 2007 [Page 12] Internet-Draft Link Adaptation for Tunnels May 2007 Since link-layer CRC-32 checks normally occur on each segment in the path, most errors detected during ULP reassembly are due to packet splices and/or errors in the data path between the NIC hardware and the reassembly buffer. The Fletcher-32 checksum algorithm has been shown to provide an effective edge-to-edge error detection capability for such errors [STONE]. The Fletcher-32 checksum is also dissimilar from both CRC-32 and the Internet checksum used by many upper layer protocols, thereby decreasing the likelihood of undetected errors. Some upper layer packetization protocols (e.g., NFS) may generate fixed payload sizes and rely on the network layer to deliver the payloads either as whole IP packets or as chains of IP fragments. Since NFS performance (and the performance of other upper layer packetization protocols) is sensitive to packet handling overhead, implementations should periodically attempt to increase the SEGSIZE through probing even if initial probe attempts fail. 9. Appendix B: Changes (Note to RFC Editor - please remove this section before publishing as an RFC.) Changes since -05: o Added back informative references to common tunneling mechanisms. o Citation of RFC4459 Changes since -04: o Rearranged sections for clarity. o removed setting of IPv4 "Reserved Fragmentation", since ITE/ETE capabilities can be discovered during the initial tunnel negotiation. Changes since -03: o Clarified that mechanisms cover IPv6-in-(foo)-in-IPv4; not just IPv6-in-IPv4. o New terminology for ITE/ETE o Clarifications to layering model o Replaced RA with NI Reply as probe response Templin Expires November 12, 2007 [Page 13] Internet-Draft Link Adaptation for Tunnels May 2007 o Reduced SEGID to 6 bits and increased ULPID to 8 bits o IPv6 flow label RFC cited Changes since -01, -02: o Updated references Changes since -00: o Defined new coding of segmentation/reassembly info in the IPv4 Identification field o Changed "tunneling mechanism" to "tunnel endpoint" o Clarified text on trailing checksums o general document cleanup; removed "additional considerations" that no longer apply 10. References 10.1. Normative References [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. [RFC1122] Braden, R., "Requirements for Internet Hosts - Communication Layers", STD 3, RFC 1122, October 1989. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2460] Deering, S. and R. Hinden, "Internet Protocol, Version 6 (IPv6) Specification", RFC 2460, December 1998. [RFC3697] Rajahalme, J., Conta, A., Carpenter, B., and S. Deering, "IPv6 Flow Label Specification", RFC 3697, March 2004. [RFC4443] Conta, A., Deering, S., and M. Gupta, "Internet Control Message Protocol (ICMPv6) for the Internet Protocol Version 6 (IPv6) Specification", RFC 4443, March 2006. [RFC4620] Crawford, M. and B. Haberman, "IPv6 Node Information Queries", RFC 4620, August 2006. Templin Expires November 12, 2007 [Page 14] Internet-Draft Link Adaptation for Tunnels May 2007 10.2. Informative References [AARNET] "AARNet: Network: Large MTU: Size, http:// www.aarnet.edu.au/engineering/networkdesign/mtu/ size.html", April 2007. [FRAG] Mogul, J. and C. Kent, "Fragmentation Considered Harmful, In Proc. SIGCOMM '87 Workshop on Frontiers in Computer Communications Technology.", August 1987. [GIGE] Dykstra, P., "Gigabit Ethernet Jumboframes (And Why You Should Care), http://sd.wareonearth.com/~phil/jumbo.html", December 1999. [I-D.heffner-frag-harmful] Heffner, J., "IPv4 Reassembly Errors at High Data Rates", draft-heffner-frag-harmful-05 (work in progress), May 2007. [JAIN] Jain, R., "Error Characteristics of Fiber Distributed Data Interface (FDDI), http://www.cse.wustl.edu/~jain/papers.html", August 1990. [RFC0815] Clark, D., "IP datagram reassembly algorithms", RFC 815, July 1982. [RFC1191] Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191, November 1990. [RFC1626] Atkinson, R., "Default IP MTU for use over ATM AAL5", RFC 1626, May 1994. [RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery for IP version 6", RFC 1981, August 1996. [RFC2684] Grossman, D. and J. Heinanen, "Multiprotocol Encapsulation over ATM Adaptation Layer 5", RFC 2684, September 1999. [RFC2923] Lahey, K., "TCP Problems with Path MTU Discovery", RFC 2923, September 2000. [RFC3056] Carpenter, B. and K. Moore, "Connection of IPv6 Domains via IPv4 Clouds", RFC 3056, February 2001. [RFC3150] Dawkins, S., Montenegro, G., Kojo, M., and V. Magret, "End-to-end Performance Implications of Slow Links", BCP 48, RFC 3150, July 2001. Templin Expires November 12, 2007 [Page 15] Internet-Draft Link Adaptation for Tunnels May 2007 [RFC3385] Sheinwald, D., Satran, J., Thaler, P., and V. Cavanna, "Internet Protocol Small Computer System Interface (iSCSI) Cyclic Redundancy Check (CRC)/Checksum Considerations", RFC 3385, September 2002. [RFC3819] Karn, P., Bormann, C., Fairhurst, G., Grossman, D., Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L. Wood, "Advice for Internet Subnetwork Designers", BCP 89, RFC 3819, July 2004. [RFC4213] Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms for IPv6 Hosts and Routers", RFC 4213, October 2005. [RFC4214] Templin, F., Gleeson, T., Talwar, M., and D. Thaler, "Intra-Site Automatic Tunnel Addressing Protocol (ISATAP)", RFC 4214, October 2005. [RFC4380] Huitema, C., "Teredo: Tunneling IPv6 over UDP through Network Address Translations (NATs)", RFC 4380, February 2006. [RFC4459] Savola, P., "MTU and Fragmentation Issues with In-the- Network Tunneling", RFC 4459, April 2006. [RFC4821] Mathis, M. and J. Heffner, "Packetization Layer Path MTU Discovery", RFC 4821, March 2007. [STONE] Stone, J., "Checksums in the Internet (Stanford Doctoral Dissertation)", August 2001. [WLAN] Society, I., "Part 11: Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specifications, IEEE Computer Society, ANSI/IEEE 802.11, 1999 Edition.". Author's Address Fred L. Templin (editor) Boeing Phantom Works P.O. Box 3707 Seattle, WA 98124 USA Email: fred.l.templin@boeing.com Templin Expires November 12, 2007 [Page 16] Internet-Draft Link Adaptation for Tunnels May 2007 Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf-ipr@ietf.org. Acknowledgment Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA). Templin Expires November 12, 2007 [Page 17]