Network Working Group                                    F. Templin, Ed.
Internet-Draft                                      Boeing Phantom Works
Intended status: Experimental                               May 11, 2007
Expires: November 12, 2007


           Link Adaptation for IPv6-in-(foo)*-in-IPv4 Tunnels
                     draft-templin-linkadapt-06.txt

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
   http://www.ietf.org/ietf/1id-abstracts.txt.

   The list of Internet-Draft Shadow Directories can be accessed at
   http://www.ietf.org/shadow.html.

   This Internet-Draft will expire on November 12, 2007.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Abstract

   IPv6-in-(foo)*-in-IPv4 tunnels must support a minimum Maximum
   Transmission Unit (MTU) of 1280 bytes for IPv6 via static
   prearrangements and/or dynamic MTU determination based on ICMPv4
   messages, but these methods have known operational limitations.  This
   document specifies a link adaptation mechanism for IPv6-in-(foo)*-in-
   IPv4 tunnels that presents an assured MTU to the IPv6 layer using
   tunnel endpoint-based segmentation/reassembly and dynamic segment
   size probing.


Templin                 Expires November 12, 2007               [Page 1]

Internet-Draft         Link Adaptation for Tunnels              May 2007


Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Terminology  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   3.  Tunnel MTU Assurance Methods and Issues  . . . . . . . . . . .  4
   4.  Link Adaptation for IPv6-in-(foo)*-in-IPv4 Tunnels . . . . . .  4
     4.1.  Layering . . . . . . . . . . . . . . . . . . . . . . . . .  4
     4.2.  Initial Negotiation Phase  . . . . . . . . . . . . . . . .  5
     4.3.  Tunnel MTU and MRU . . . . . . . . . . . . . . . . . . . .  5
     4.4.  Ingress Tunnel Endpoint Specification  . . . . . . . . . .  5
       4.4.1.  Segmentation and Encapsulation . . . . . . . . . . . .  6
       4.4.2.  IPv4 Fragmentation and Setting the DF Bit  . . . . . .  8
       4.4.3.  Probing  . . . . . . . . . . . . . . . . . . . . . . .  8
       4.4.4.  Processing Errors  . . . . . . . . . . . . . . . . . .  9
     4.5.  Egress Tunnel Endpoint Specification . . . . . . . . . . . 10
       4.5.1.  Decapsulation and Reassembly . . . . . . . . . . . . . 10
       4.5.2.  Sending Errors . . . . . . . . . . . . . . . . . . . . 11
       4.5.3.  Sending Probe Replies  . . . . . . . . . . . . . . . . 11
       4.5.4.  Active Reassembly Buffer Management  . . . . . . . . . 12
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 12
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 12
   7.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 12
   8.  Appendix A: Additional Considerations  . . . . . . . . . . . . 12
   9.  Appendix B: Changes  . . . . . . . . . . . . . . . . . . . . . 13
   10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
     10.1. Normative References . . . . . . . . . . . . . . . . . . . 14
     10.2. Informative References . . . . . . . . . . . . . . . . . . 15
   Author's Address . . . . . . . . . . . . . . . . . . . . . . . . . 16
   Intellectual Property and Copyright Statements . . . . . . . . . . 17


Templin                 Expires November 12, 2007               [Page 2]

Internet-Draft         Link Adaptation for Tunnels              May 2007


1.  Introduction

   IPv6-in-(foo)*-in-IPv4 tunnels may span multiple IPv4 network hops
   yet are seen by IPv6 as ordinary links that must support the minimum
   IPv6 Maximum Transmission Unit (MTU) of 1280 bytes ([RFC2460],
   Section 5).  Common tunneling mechanisms (e.g.,
   [RFC3056][RFC4213][RFC4214][RFC4380], etc.) meet this requirement
   through conservative static prearrangements at the expense of
   degraded performance over some paths due to excessive IPv4 network-
   based fragmentation and/or missed opportunities to discover larger
   MTUs.  Optional dynamic MTU determination methods [RFC1191] are also
   available, but may not provide adequate robustness.

   This document specifies a link adaptation mechanism for IPv6-in-
   (foo)*-in-IPv4 tunnels that presents an assured MTU to the IPv6
   layer.  It uses tunnel endpoint-based segmentation/reassembly and
   dynamic segment size probing with authenticated probe feedback.
   Thus, it provides greater robustness and efficiency by avoiding IPv4
   network-based fragmentation and dependence on ICMPv4 feedback from
   IPv4 network middleboxes.


2.  Terminology

   The following terms are defined within the scope of this document:

   Upper Layer Payload (ULP)
      a whole IPv6 packet, or a fragment packet created by IPv6
      fragmentation.


   Ingress Tunnel Endpoint (ITE)
      the tunnel interface endpoint that accepts ULPs from the IP layer
      and segments/packetizes them for transmission into a tunnel.


   Egress Tunnel Endpoint (ETE)
      the tunnel interface endpoint that receives packets from a tunnel
      and de-packetizes/reassembles them into ULPs for delivery to the
      IP layer.


   IP Layer
      the layer above the tunnel interface, i.e., the IPv6 layer.


Templin                 Expires November 12, 2007               [Page 3]

Internet-Draft         Link Adaptation for Tunnels              May 2007


   Sub-IP Layer
      any sublayers that occur within the tunnel interface, i.e., any
      (foo)* layers and including the upper portion of the IPv4 layer.
      Note that IPv4 is also viewed as the Layer 2 protocol from the
      perspective of the tunnel, so the Sub-IP layer begins below the IP
      layer and extends into Layer 2.


   The keywords MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD,
   SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear in this
   document, are to be interpreted as described in [RFC2119].


3.  Tunnel MTU Assurance Methods and Issues

   Common tunnel MTU assurance methods include classical IPv4
   fragmentation [RFC0791], and IPv4/IPv6 Path MTU discovery
   [RFC1191][RFC1981].  Other possibilities include operational
   assurance of widely-deployed links with large MTUs.  However, these
   methods have well-known operational limitations that are well
   documented [FRAG][I-D.heffner-frag-harmful][RFC2923][RFC4459].

   This document specifies a link adaptation scheme for IPv6-in-(foo)*-
   in-IPv4 tunnels that is distinct from the above alternatives and
   avoids the issues.  It entails segmentation at the ITR and reassembly
   at the ETR at a logical mid-layer between IPv6 fragmentation and IPv4
   fragmentation.  It therefore resembles classical IPv4 fragmentation
   but: 1) only allows fragmentation to occur at the ITE, 2) supports
   path probing to detect the optimum segment size, and 3) avoids
   sequence number wrapping and data integrity issues through careful
   reassembly buffer management at the ETR.  The scheme is specified in
   the following sections:


4.  Link Adaptation for IPv6-in-(foo)*-in-IPv4 Tunnels

   The following subsections specify link adaptation mechanisms for
   IPv6-in-(foo)*-in-IPv4 tunnels with properties similar to the link
   adaptation mechanisms defined for AAL5 [RFC2684] and IEEE 802.11
   [WLAN]:

4.1.  Layering

   IPv6-in-(foo)*-in-IPv4 tunnel endpoints operate at a logical midpoint
   between the IPv6 and IPv4 protocol modules.  From the viewpoint of
   IPv6, the tunnel appears as an ordinary network interface module that
   delivers whole IPv6 packets and IPv6 fragment packets as ULPs to and
   from an underlying link.  From the viewpoint of IPv4, the tunnel


Templin                 Expires November 12, 2007               [Page 4]

Internet-Draft         Link Adaptation for Tunnels              May 2007


   appears as a packetization layer protocol that segments and
   reassembles ULPs.

   This document refers to the IPv6 layer as the "IP Layer" (i.e., layer
   3) and any sublayers that occur within the tunnel interface (i.e.,
   any (foo)* layers and including the upper portion of the IPv4 layer
   itself) as the "Sub-IP layer".  Note that IPv4 is also viewed as the
   Layer 2 protocol from the perspective of the tunnel, so the Sub-IP
   layer begins below the IP layer and extends into Layer 2.  Note also
   that (foo)* may entail multiple nested sublayers or may even be NULL,
   i.e., in the case of IPv6-in-IPv4 tunnels.

4.2.  Initial Negotiation Phase

   IPv6-in-(foo)*-in-IPv4 tunnel endpoints MUST first determine that the
   link adaptation mechanisms are implemented by both the ITE and ETE
   through an initial negotiation phase specified outside the scope of
   this document.  ITEs/ETEs for which one or both ends of the tunnel do
   not implement the scheme MUST use the default MTU assurance
   mechanisms specified for the particular IPv6-in-(foo)*-in-IPv4
   tunneling mechanism, and do not implement any other aspects of this
   specification.

4.3.  Tunnel MTU and MRU

   ITEs MUST configure a minimum IPv6 link MTU of 1280 bytes for all
   flows and SHOULD provide a configuration knob to set larger values.
   A nominal per-flow MTU of 9180 bytes (i.e., the same as defined in
   [RFC1626]) is RECOMMENDED, since it is large enough to accommodate
   frame sizes as large as Gigabit Ethernet Jumbo Frames [GIGE].  ITEs
   MAY set still larger MTU values, but are advised that this may lead
   to excessive packet loss and ICMPv6 "packet too big" messages.

   ETEs MUST configure a minimum per-flow Sub-IP layer reassembly buffer
   size (i.e., a minimum Sub-IP layer Maximum Receive Unit (MRU)) of
   1280 bytes, and SHOULD configure an MRU of 9180 bytes or larger to
   accommodate the recommended nominal MTU for ITEs.  A maximum MRU of
   11454 bytes is RECOMMENDED, since 11454 bytes is the maximum packet
   size for which a 32-bit CRC can provide Ethernet-quality bit error
   detection [JAIN][AARNET].  ETEs MAY set still larger MRU values, but
   are advised that larger values may lead to unacceptable levels of
   undetected errors unless all physical segments in the path provide
   assured error-free delivery for larger packets.

4.4.  Ingress Tunnel Endpoint Specification

   The following subsections specify mechanisms implemented by the ITE:


Templin                 Expires November 12, 2007               [Page 5]

Internet-Draft         Link Adaptation for Tunnels              May 2007


4.4.1.  Segmentation and Encapsulation

   ITEs maintain a per-flow MTU and per-flow segment size ("SEGSIZE")
   for the purpose of segmenting ULPs that are too large to traverse the
   tunnel.  It is RECOMMENDED that ITEs configure an initial per-flow
   SEGSIZE such that (SEGSIZE + length((foo)* headers) + length(IPv4
   header)) yields an IPv4 datagram size between 256-576 bytes (since
   256 bytes can safely accommodate the recommended nominal MTU (see
   below), and since IPv4 nodes are only required to accept datagrams of
   up to 576 bytes [RFC0791]).  Since most IPv4 links in the Internet
   configure still larger MTUs [RFC3150][RFC3819], and since IPv4 nodes
   should accept packets as large as the underlying link MTU [RFC1122],
   ITEs MAY use a still larger initial per-flow SEGSIZE if there is
   assurance that it would not cause gratuitous IPv4 fragmentation
   and/or overrun the IPv4 reassembly buffer.  ITEs probe the path to
   maintain SEGSIZE and/or discover larger SEGSIZEs during the lifetime
   of a flow (see: Section 4.4.3.

   ITEs split each ULP they send into a tunnel into chains of segments
   for packetization and presentation to the IPv4 layer.  For ULPs that
   will span multiple segments, the ITE first uses the 2's compliment
   Fletcher-32 checksum [STONE][RFC3385] to calculate a checksum across
   the entire ULP, then appends the A and B results as a trailing 32-bit
   checksum at the end of the ULP.  For ULPs that fit within a single
   segment, the ITE omits the trailing checksum.

   The ITE next splits the ULP into a chain of consecutive segments that
   MUST be created as contiguous and non-overlapping, i.e., the final
   byte of the (i)th segment MUST be the byte that immediately precedes
   the first byte of the (i+1)th segment.  Non-final segments in the
   chain MUST be identical in length and no larger than SEGSIZE bytes;
   the final segment MAY be of different length.  The ITE encapsulates
   each segment in Sub-IP layer headers (including any (foo)* headers
   and an IPv4 header) to form a chain of IPv4 packets; each packet in
   the chain MUST include Sub-IP layer encapsulation headers of
   identical length.  The ITE sets the DF bit in the IPv4 header
   according to the specification in Section 4.4.2, and encodes the
   following information in the 16-bit IPv4 "Identification" field of
   each segment:

    0                   1
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |      ULPID      |  SEGID  |P|A|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       IPv4 Identification Field


Templin                 Expires November 12, 2007               [Page 6]

Internet-Draft         Link Adaptation for Tunnels              May 2007


   ULPID:  8 bits
      An identifying value assigned by the ITE to aid the ETE in
      reassembling the segments of a ULP.


   SEGID:  6 bits
      A value that identifies a specific segment within a ULP.


   P:  1 bit
      Probe flag; 0 = Ordinary Segment, 1 = Probe Segment.


   A:  1 bit
      Additional Segments flag; 0 = Last Segment, 1 = Additional
      Segments.


   The ITE encodes an identical value in the "ULPID" field (bits 0 - 7
   of the IPv4 Identification field) of each IPv4 packet in a chain to
   identify the segments of a specific ULP; it encodes different ULPID
   values in IPv4 packets that encapsulate segments of different ULPs.
   The ITE also encodes an increasing Segment ID value between 0 - 62 in
   the "SEGID" field (bits 8 - 13 of the IPv4 Identification field) of
   consecutive packets in a chain, i.e., it encodes the value '0' in the
   first packet, encodes the value '1' in the second packet, etc.

   The ITE then sets the "Additional Segments - A" bit (bit 15 of the
   IPv4 Identification field) in each packet in the chain except the
   final one to indicate that additional segments follow.  Finally, it
   delivers each packet in the chain to the link layer (i.e., the IPv4
   layer) in increasing SEGID order, i.e., SEGID 0 first, followed by
   SEGID 1, etc., up to the final packet.  The IPv4 layer SHOULD NOT
   reorder the packets in a chain, but rather SHOULD deliver them to the
   underlying link in the order in which the tunnel interface produced
   them.

   Note that IPv4 fragmentation in the network could theoretically
   result in silent packet loss along certain paths even for packets
   with the smallest recommended initial SEGSIZE (see: Section 4.4.2).
   As such, a robust ITE implementation could reduce its IPv4 packet
   sizes to as small as 68 bytes if it suspects that larger packets are
   disappearing into a fragmentation-related black hole, but such small
   packets might not satisfy the nominal tunnel MTU of 9180 bytes.  ITEs
   SHOULD therefore return locally-generated IPv6 "packet too big"
   messages for IPv6 packets that cannot be segmented and encapsulated
   within current IPv4 packet size and chain length limitations for the
   tunnel.


Templin                 Expires November 12, 2007               [Page 7]

Internet-Draft         Link Adaptation for Tunnels              May 2007


4.4.2.  IPv4 Fragmentation and Setting the DF Bit

   When an ITE segments a ULP (see: Section 4.4.1), it can optionally
   set or clear the "Don't Fragment - DF" bit in the encapsulating IPv4
   headers of packets in the chain.  If the DF bit is cleared,
   gratuitous network-based IPv4 fragmentation could result in well-
   known operational issues [FRAG] [I-D.heffner-frag-harmful].  Also,
   some middleboxes (such as IPv4 NATs and firewalls) may only be
   capable of passing the first fragment of a multi-fragment IPv4
   datagram, and large multi-fragment datagrams could result in IPv4
   reassembly buffer overruns.  Finally, the minimum IPv4 MTU is only 68
   bytes (i.e., the size required to encapsulate a maximum-length (60
   byte) IPv4 header and a minimum-length (8 byte) fragment [RFC0791])
   such that a limited amount of IPv4 fragmentation may occur in the
   network even for relatively small packets.

   Nonetheless, clearing the DF bit can in some circumstances increase
   the packet delivery ratio when setting the DF bit would otherwise
   result in excessive packet loss due to temporal link MTU
   restrictions.  In view of the above considerations, the ITE:

   o  SHOULD set the DF bit in probe packets (see: Section 4.4.3) larger
      than 576 bytes.

   o  SHOULD set the DF bit in all packets larger than 576 bytes if it
      will not perform active probing (see: Section 4.4.3).

   o  MAY clear the DF bit in any packets larger than 576 bytes if it
      will perform active probing.

   o  MAY clear the DF bit in any packets of 576 bytes or smaller.

4.4.3.  Probing

   To increase efficiency and avoid excessive packet chain lengths, ITEs
   SHOULD probe the path periodically to increase a flow's SEGSIZE to
   larger values.  ITEs probe a candidate SEGSIZE value 'N' by setting
   the "Probe Segment - P" bit (bit 14 of the IPv4 Identification field)
   in packets that encapsulate a probe segment of size N. For probe
   segments that contain valid data for reassembly as part of a packet
   chain, the ITE sets the appropriate SEGID value in the IPv4 packet
   header as for ordinary segmentation.  For probe segments that are to
   be discarded by the ETE, the ITE sets the value 63 in the SEGID
   field.

   When the ITE sends a probe packet, it marks the probe as "pending"
   for a period of 'MaxProbeDelay' msec (i.e., a per-flow round-trip
   time estimate for the tunnel) and caches the probe packet's IPv4


Templin                 Expires November 12, 2007               [Page 8]

Internet-Draft         Link Adaptation for Tunnels              May 2007


   destination, length and identification field values, as well as the
   IPv6 flow label value [RFC3697].  If the ITE receives a valid Node
   Information Query reply (NI Reply) [RFC4620] from the ETE (see:
   Section 4.5.3) before the probe period expires, it marks the probe as
   successful; otherwise, it marks the probe as failed.  A valid NI
   Reply MUST have:

   o  the Type, Code, Qtype and Flags fields set as specified for a NOOP
      reply in ([RFC4620], Section 6.1), and

   o  the IPv4 length of the probe packet matches bits 0-15 of the Nonce
      field, and

   o  the IPv4 identification of the probe packet matches bits 16-31 of
      the Nonce field, and

   o  the IPv6 flow label value matches bits 32-51 of the Nonce field

   Following a successful probe, but before advancing SEGSIZE to N, the
   ITE SHOULD enter a brief verification phase during which it sends
   additional probe segments to detect asymmetric multipath MTU
   restrictions and/or route fluctuations.  Thereafter, the ITE SHOULD
   re-probe periodically to confirm that packets with up to SEGSIZE byte
   segments are still reaching the ETE.

   After probing the path to discover a new SEGSIZE, the ITE may elect
   to set or clear the DF bit in subsequent non-probe packets (see:
   Section 4.4.2).  For example, the ITE may elect to clear the DF bit
   to maintain an optimal packet delivery ratio across temporal link MTU
   restrictions (e.g., due to dynamic rerouting of flows, etc.) while it
   may elect to set the DF bit to avoid all IPv4 fragmentation in the
   network.

   ITEs that elect to clear the DF bit in non-probe packets SHOULD
   engage in "active probing" to periodically confirm SEGSIZE
   "frequently enough" such that cyclical misassociations and possible
   data corruptions at the ETE do not occur [I-D.heffner-frag-harmful]
   if a flow begins to fragment.  ITEs that elect to set the DF bit in
   non-probe packets SHOULD carefully consider any ICMPv4 "fragmentation
   needed" messages that arrive (see: Section 4.4.4) but are advised
   that packet delivery ratios may suffer when the flow transmission
   rate is high and/or the path round trip time is large.

4.4.4.  Processing Errors

   ITEs may receive ICMPv4 "fragmentation needed" error messages from
   middleboxes inside a tunnel, but are advised to consider them as
   "soft errors".  Implementers are advised to consult


Templin                 Expires November 12, 2007               [Page 9]

Internet-Draft         Link Adaptation for Tunnels              May 2007


   [RFC1191][RFC2923][RFC4821] for operational recommendations on
   processing ICMPv4 "fragmentation needed" messages.

   ITEs may receive encapsulated ICMPv6 "packet too big" messages
   [RFC1981] from an ETE at the far end of a tunnel (see:
   Section 4.5.2).  The ITE SHOULD cache the MTU value encoded in the
   "packet too big" message as the new MTU for the flow, and relay the
   ICMPv6 message back to the original source.

   ITEs may receive encapsulated ICMPv6 "parameter problem" messages
   with code "reassembly/checksum error" [RFC4443] from an ETE at the
   far end of the tunnel (see: Section 4.5.2).  This may indicate an
   isolated packet splicing error at the ETE, or packet loss due to
   temporal network conditions such as congestion, MTU restrictions,
   link errors, signal intermittence, etc.  If the ITE receives
   persistent reassembly/checksum errors from an ETE, it SHOULD take
   adaptive measures, e.g., reduce the SEGSIZE for the flow, rate-limit
   the packets it sends into the tunnel, etc.  Since each reassembly/
   checksum error corresponds to a dropped packet, the ITE SHOULD relay
   the messages back to the original source (subject to rate limiting).

4.5.  Egress Tunnel Endpoint Specification

   The following subsections specify mechanisms implemented by the ETE:

4.5.1.  Decapsulation and Reassembly

   The IPv4 length, ULPID, SEGID and A fields in the IPv4 packets in a
   chain (along with the IPv6 flow label [RFC3697]) provide sufficient
   information for the ETE to reassemble an original ULP with protection
   for packet reordering in the network.  ETEs MUST configure per-flow
   reassembly buffers of at least 1280 bytes and SHOULD configure
   reassembly buffers of 9180 bytes or larger to accommodate the nominal
   tunnel MTU (see: Section 4.2).  Note that these reassembly buffers
   occur at the Sub-IP layer and are thus distinct from the IPv4 and
   IPv6 reassembly caches.

   ETEs use per-flow reassembly buffers to concatenate the segments
   received in packet chains for a particular ULPID in increasing SEGID
   order (i.e., SEGID 0, followed by SEGID 1, etc.) even if the packets
   were re-ordered by the network.  When all segments for a particular
   ULPID have been concatenated into the reassembly buffer, the ETE uses
   2's complement Fletcher-32 to verify the checksum if one was included
   (see: Section 4.4.1).  The ETE the discards the Sub-IP layer
   encapsulation headers and trailing checksum, and delivers correctly-
   reassembled ULPs to the IP layer (i.e., IPv6).  It discards
   incomplete ULPs and ULPs with incorrect checksums, and sends an
   appropriate error message as specified in Section 4.5.2.


Templin                 Expires November 12, 2007              [Page 10]

Internet-Draft         Link Adaptation for Tunnels              May 2007


4.5.2.  Sending Errors

   If the ETE receives a packet chain that would overflow the reassembly
   buffer, it discards the chain and sends an ICMPv6 "packet too big"
   message [RFC1981] back to the IPv6 source via the reverse tunnel back
   to the ITE.  The ETE includes in the message body up to 1280 bytes
   beginning with the upper layer packet headers (IPv6 and above) and
   the contents of the reassembly buffer beyond the upper layer packet
   headers; it encodes the size of the reassembly buffer in the MTU
   value.

   If the ETE receives at least one segment, but one or more segments
   are lost and/or checksum verification fails, it SHOULD send an ICMPv6
   "parameter problem" message with code "reassembly/checksum error"
   [RFC4443] back to the IPv6 source via the reverse tunnel back to the
   ITE.  The ETE includes in the message body up to 1280 bytes beginning
   with the upper layer packet headers (IPv6 and above) and contents of
   the reassembly buffer beyond the upper layer packet headers, and sets
   the pointer to either the beginning of the first missing segment or
   the beginning of the 4 byte checksum field (if no segments were
   missing).

   After sending the error, the ITE discards the packet-in-error, i.e.,
   it does not deliver the packet as an ULP to the IP layer.

4.5.3.  Sending Probe Replies

   If the ETE receives a segment used for probing (i.e., an IPv4 packet
   in the chain with the 'P' flag set), it sends a Node Information
   Query reply (NI Reply) [RFC4620] message back to the ITE.  The ETE
   MUST construct the NI Reply as follows:

   o  the Type, Code, Qtype and Flags fields set as specified for a NOOP
      reply in ([RFC4620], Section 6.1), and

   o  the IPv4 length of the probe packet encoded in bits 0-15 of the
      Nonce field, and

   o  the IPv4 identification of the probe packet encoded in bits 16-31
      of the Nonce field, and

   o  the IPv6 flow label value encoded in bits 32-51 of the Nonce field

   If the IPv4 packet containing the probe segment encodes the value 63
   in the SEGID field, the ETE discards the segment; otherwise, it
   includes the segment as part of the normal reassembly procedure
   described above.


Templin                 Expires November 12, 2007              [Page 11]

Internet-Draft         Link Adaptation for Tunnels              May 2007


4.5.4.  Active Reassembly Buffer Management

   The ETE MUST actively manage reassembly buffers and discard as early
   as possible any reassemblies that are not likely to complete due to,
   e.g., loss of one or more packets in the chain, gross reordering of
   packets in the network, etc.  In particular, the ETE must discard
   partial reassemblies before the 8-bit ULPID encoded by the ITE wraps.
   The ETE therefore must augment the classical timer-driven reassembly
   buffer management strategy with an event-driven strategy.


5.  IANA Considerations

   The IANA is instructed to assign a code type for "reassembly/checksum
   error" under the ICMPv6 Parameter Problem message type in the "ICMPv6
   Type Numbers" registry.


6.  Security Considerations

   The nonce values in NI Reply messages from ETEs provide spoofing
   protection against off-path attackers.


7.  Acknowledgments

   This work has benefited from helpful discussions with many
   colleagues, friends and family.


8.  Appendix A: Additional Considerations

   ITEs can use the probing mechanism described in Section 4.3 as a
   general-purpose method for eliciting acknowledgements from an ETE if
   improved reliability at the expense of additional overhead is
   desired.

   The equal size restriction for non-final segments and non-overlapping
   restriction for all segments in packet chains provides a significant
   simplification for reassembly algorithms [RFC0815].

   Use of the link adaptation mechanisms specified in this document may
   lead to an overall increase in short chains of small packets in the
   Internet.  Network administrators are advised to follow the
   recommendations in [RFC3150] to minimize packet loss and packet
   reordering.  Also, overly-long packet chains should be avoided if
   possible due to interactions with Active Queue Management (AQM) in
   the network.


Templin                 Expires November 12, 2007              [Page 12]

Internet-Draft         Link Adaptation for Tunnels              May 2007


   Since link-layer CRC-32 checks normally occur on each segment in the
   path, most errors detected during ULP reassembly are due to packet
   splices and/or errors in the data path between the NIC hardware and
   the reassembly buffer.  The Fletcher-32 checksum algorithm has been
   shown to provide an effective edge-to-edge error detection capability
   for such errors [STONE].  The Fletcher-32 checksum is also dissimilar
   from both CRC-32 and the Internet checksum used by many upper layer
   protocols, thereby decreasing the likelihood of undetected errors.

   Some upper layer packetization protocols (e.g., NFS) may generate
   fixed payload sizes and rely on the network layer to deliver the
   payloads either as whole IP packets or as chains of IP fragments.
   Since NFS performance (and the performance of other upper layer
   packetization protocols) is sensitive to packet handling overhead,
   implementations should periodically attempt to increase the SEGSIZE
   through probing even if initial probe attempts fail.


9.  Appendix B: Changes

   (Note to RFC Editor - please remove this section before publishing as
   an RFC.)

   Changes since -05:

   o  Added back informative references to common tunneling mechanisms.

   o  Citation of RFC4459

   Changes since -04:

   o  Rearranged sections for clarity.

   o  removed setting of IPv4 "Reserved Fragmentation", since ITE/ETE
      capabilities can be discovered during the initial tunnel
      negotiation.

   Changes since -03:

   o  Clarified that mechanisms cover IPv6-in-(foo)-in-IPv4; not just
      IPv6-in-IPv4.

   o  New terminology for ITE/ETE

   o  Clarifications to layering model

   o  Replaced RA with NI Reply as probe response


Templin                 Expires November 12, 2007              [Page 13]

Internet-Draft         Link Adaptation for Tunnels              May 2007


   o  Reduced SEGID to 6 bits and increased ULPID to 8 bits

   o  IPv6 flow label RFC cited

   Changes since -01, -02:

   o  Updated references

   Changes since -00:

   o  Defined new coding of segmentation/reassembly info in the IPv4
      Identification field

   o  Changed "tunneling mechanism" to "tunnel endpoint"

   o  Clarified text on trailing checksums

   o  general document cleanup; removed "additional considerations" that
      no longer apply


10.  References

10.1.  Normative References

   [RFC0791]  Postel, J., "Internet Protocol", STD 5, RFC 791,
              September 1981.

   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
              Communication Layers", STD 3, RFC 1122, October 1989.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 2460, December 1998.

   [RFC3697]  Rajahalme, J., Conta, A., Carpenter, B., and S. Deering,
              "IPv6 Flow Label Specification", RFC 3697, March 2004.

   [RFC4443]  Conta, A., Deering, S., and M. Gupta, "Internet Control
              Message Protocol (ICMPv6) for the Internet Protocol
              Version 6 (IPv6) Specification", RFC 4443, March 2006.

   [RFC4620]  Crawford, M. and B. Haberman, "IPv6 Node Information
              Queries", RFC 4620, August 2006.


Templin                 Expires November 12, 2007              [Page 14]

Internet-Draft         Link Adaptation for Tunnels              May 2007


10.2.  Informative References

   [AARNET]   "AARNet: Network: Large MTU: Size, http://
              www.aarnet.edu.au/engineering/networkdesign/mtu/
              size.html", April 2007.

   [FRAG]     Mogul, J. and C. Kent, "Fragmentation Considered Harmful,
              In Proc. SIGCOMM '87 Workshop on Frontiers in Computer
              Communications Technology.", August 1987.

   [GIGE]     Dykstra, P., "Gigabit Ethernet Jumboframes (And Why You
              Should Care), http://sd.wareonearth.com/~phil/jumbo.html",
              December 1999.

   [I-D.heffner-frag-harmful]
              Heffner, J., "IPv4 Reassembly Errors at High Data Rates",
              draft-heffner-frag-harmful-05 (work in progress),
              May 2007.

   [JAIN]     Jain, R., "Error Characteristics of Fiber Distributed Data
              Interface (FDDI),
              http://www.cse.wustl.edu/~jain/papers.html", August 1990.

   [RFC0815]  Clark, D., "IP datagram reassembly algorithms", RFC 815,
              July 1982.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              November 1990.

   [RFC1626]  Atkinson, R., "Default IP MTU for use over ATM AAL5",
              RFC 1626, May 1994.

   [RFC1981]  McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
              for IP version 6", RFC 1981, August 1996.

   [RFC2684]  Grossman, D. and J. Heinanen, "Multiprotocol Encapsulation
              over ATM Adaptation Layer 5", RFC 2684, September 1999.

   [RFC2923]  Lahey, K., "TCP Problems with Path MTU Discovery",
              RFC 2923, September 2000.

   [RFC3056]  Carpenter, B. and K. Moore, "Connection of IPv6 Domains
              via IPv4 Clouds", RFC 3056, February 2001.

   [RFC3150]  Dawkins, S., Montenegro, G., Kojo, M., and V. Magret,
              "End-to-end Performance Implications of Slow Links",
              BCP 48, RFC 3150, July 2001.


Templin                 Expires November 12, 2007              [Page 15]

Internet-Draft         Link Adaptation for Tunnels              May 2007


   [RFC3385]  Sheinwald, D., Satran, J., Thaler, P., and V. Cavanna,
              "Internet Protocol Small Computer System Interface (iSCSI)
              Cyclic Redundancy Check (CRC)/Checksum Considerations",
              RFC 3385, September 2002.

   [RFC3819]  Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
              Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
              Wood, "Advice for Internet Subnetwork Designers", BCP 89,
              RFC 3819, July 2004.

   [RFC4213]  Nordmark, E. and R. Gilligan, "Basic Transition Mechanisms
              for IPv6 Hosts and Routers", RFC 4213, October 2005.

   [RFC4214]  Templin, F., Gleeson, T., Talwar, M., and D. Thaler,
              "Intra-Site Automatic Tunnel Addressing Protocol
              (ISATAP)", RFC 4214, October 2005.

   [RFC4380]  Huitema, C., "Teredo: Tunneling IPv6 over UDP through
              Network Address Translations (NATs)", RFC 4380,
              February 2006.

   [RFC4459]  Savola, P., "MTU and Fragmentation Issues with In-the-
              Network Tunneling", RFC 4459, April 2006.

   [RFC4821]  Mathis, M. and J. Heffner, "Packetization Layer Path MTU
              Discovery", RFC 4821, March 2007.

   [STONE]    Stone, J., "Checksums in the Internet (Stanford Doctoral
              Dissertation)", August 2001.

   [WLAN]     Society, I., "Part 11: Wireless LAN Medium Access Control
              (MAC) and Physical Layer (PHY) Specifications, IEEE
              Computer Society, ANSI/IEEE 802.11, 1999 Edition.".


Author's Address

   Fred L. Templin (editor)
   Boeing Phantom Works
   P.O. Box 3707
   Seattle, WA  98124
   USA

   Email: fred.l.templin@boeing.com


Templin                 Expires November 12, 2007              [Page 16]

Internet-Draft         Link Adaptation for Tunnels              May 2007


Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.


Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.


Acknowledgment

   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).


Templin                 Expires November 12, 2007              [Page 17]