full pass

This commit is contained in:
Randy Bush 2019-07-07 10:27:54 -07:00
parent b3211adc46
commit 6d9b41fba5

View file

@ -11,14 +11,14 @@
<?rfc tocindent="yes"?> <?rfc tocindent="yes"?>
<?rfc tocompact="yes"?> <?rfc tocompact="yes"?>
<rfc category="std" docName="draft-ietf-lsvr-l3dl-01" ipr="trust200902"> <rfc category="std" docName="draft-ietf-lsvr-l3dl-02" ipr="trust200902">
<front> <front>
<title>Layer 3 Discovery and Liveness</title> <title>Layer 3 Discovery and Liveness</title>
<author fullname="Randy Bush" initials="R." surname="Bush"> <author fullname="Randy Bush" initials="R." surname="Bush">
<organization>Arrcus &amp; IIJ</organization> <organization>Arrcus &amp; Internet Initiative Japan</organization>
<address> <address>
<postal> <postal>
<street>5147 Crystal Springs</street> <street>5147 Crystal Springs</street>
@ -60,9 +60,9 @@
protocols are used to build topology and reachability databases. protocols are used to build topology and reachability databases.
These protocols need to discover IP Layer 3 attributes of links, These protocols need to discover IP Layer 3 attributes of links,
such as logical link IP encapsulation abilities, IP neighbor address such as logical link IP encapsulation abilities, IP neighbor address
discovery, and link liveness. The Layer 3 Discovery and Liveness discovery, and link liveness. This Layer 3 Discovery and Liveness
protocol specified in this document collects these data, which are protocol collects these data, which may then be disseminated using
then disseminated using BGP-SPF and similar protocols.</t> BGP-SPF and similar protocols.</t>
</abstract> </abstract>
@ -83,10 +83,10 @@
<section anchor="intro" title="Introduction"> <section anchor="intro" title="Introduction">
<t>The Massive Data Center (MDC) environment presents unusual <t>The Massive Data Center (MDC) environment presents unusual
problems of scale, e.g. O(10,000) devices, while its homogeneity problems of scale, e.g. O(10,000) forwarding devices, while its
presents opportunities for simple approaches. Approaches such as homogeneity presents opportunities for simple approaches.
Jupiter Rising <xref target="JUPITER"/> use a central controller to Approaches such as Jupiter Rising <xref target="JUPITER"/> use a
deal with scaling, while BGP-SPF <xref central controller to deal with scaling, while BGP-SPF <xref
target="I-D.ietf-lsvr-bgp-spf"/> provides massive scale-out without target="I-D.ietf-lsvr-bgp-spf"/> provides massive scale-out without
centralization using a tried and tested scalable distributed control centralization using a tried and tested scalable distributed control
plane, offering a scalable routing solution in Clos <xref plane, offering a scalable routing solution in Clos <xref
@ -99,17 +99,16 @@
<t>Layer 3 Discovery and Liveness (L3DL) provides brutally simple <t>Layer 3 Discovery and Liveness (L3DL) provides brutally simple
mechanisms for devices to <list style="symbols"> mechanisms for devices to <list style="symbols">
<t>Discover unique identities of devices/ports/... on a logical
link,</t>
<t>Run Layer 2 keep-alive messages for session continuity,</t>
<t>Discover each other's unique endpoint identification,</t> <t>Discover each other's unique endpoint identification,</t>
<t>Discover mutually supported encapsulations, e.g. IP/MPLS,</t> <t>Discover mutually supported layer 3 encapsulations,
e.g. IP/MPLS,</t>
<t>Discover Layer 3 IP and/or MPLS addressing of interfaces of the <t>Discover Layer 3 IP and/or MPLS addressing of interfaces of the
encapsulations,</t> encapsulations,</t>
<t>Enable layer 3 link liveness such as BFD, and finally</t>
<t>Present these data, using a very restricted profile of a BGP-LS <t>Present these data, using a very restricted profile of a BGP-LS
<xref target="RFC7752"/> API, to BGP-SPF which computes the <xref target="RFC7752"/> API, to BGP-SPF which computes the
topology and builds routing and forwarding tables.</t> topology and builds routing and forwarding tables,</t>
<t>Enable layer 3 link liveness such as BFD, and finally</t>
<t>Provide Layer 2 keep-alive messages for session continuity.</t>
</list></t> </list></t>
<t>This protocol may be more widely applicable to a range of routing <t>This protocol may be more widely applicable to a range of routing
@ -133,7 +132,7 @@
external components using the BGP routing protocol. See <xref external components using the BGP routing protocol. See <xref
target="RFC7752"/>.</t> target="RFC7752"/>.</t>
<t hangText="BGP-SPF">A hybrid protocol using BGP transport but <t hangText="BGP-SPF">A hybrid protocol using BGP transport but
a Dijkstra SPF decision process. See <xref a Dijkstra Shortest Path First decision process. See <xref
target="I-D.ietf-lsvr-bgp-spf"/>.</t> target="I-D.ietf-lsvr-bgp-spf"/>.</t>
<t hangText="Clos:">A hierarchic subset of a crossbar switch <t hangText="Clos:">A hierarchic subset of a crossbar switch
topology commonly used in data centers.</t> topology commonly used in data centers.</t>
@ -141,7 +140,7 @@
frame. A full L3DL PDU may be packaged in multiple Datagrams.</t> frame. A full L3DL PDU may be packaged in multiple Datagrams.</t>
<t hangText="Encapsulation:">Address Family Indicator and <t hangText="Encapsulation:">Address Family Indicator and
Subsequent Address Family Indicator (AFI/SAFI). I.e. classes of Subsequent Address Family Indicator (AFI/SAFI). I.e. classes of
layer 2.5 and 3 addresses such as IPv4, IPv6, MPLS, ...</t> layer 2.5 and 3 addresses such as IPv4, IPv6, MPLS, etc.</t>
<t hangText="Frame:">A Layer 2 packet.</t> <t hangText="Frame:">A Layer 2 packet.</t>
<t hangText="Link or Logical Link:">A logical connection between <t hangText="Link or Logical Link:">A logical connection between
two logical ports on two devices. E.g. two VLANs between the same two logical ports on two devices. E.g. two VLANs between the same
@ -153,8 +152,8 @@
since they are used by all widely deployed Layer 2 network since they are used by all widely deployed Layer 2 network
technologies of interest, especially Ethernet. See <xref technologies of interest, especially Ethernet. See <xref
target="IEEE.802_2001"/>.</t> target="IEEE.802_2001"/>.</t>
<t hangText="MDC:">Massive Data Center, commonly thousands of <t hangText="MDC:">Massive Data Center, commonly composed of
TORs.</t> thousands of Top of Rack Switches (TORs).</t>
<t hangText="MTU:">Maximum Transmission Unit, the size in octets <t hangText="MTU:">Maximum Transmission Unit, the size in octets
of the largest packet that can be sent on a medium, see <xref of the largest packet that can be sent on a medium, see <xref
target="RFC1122"/> 1.3.3.</t> target="RFC1122"/> 1.3.3.</t>
@ -201,7 +200,7 @@
in interfaces with thousands of disaggregated prefixes.</t> in interfaces with thousands of disaggregated prefixes.</t>
<t>Therefore the L3DL protocol is session oriented and uses <t>Therefore the L3DL protocol is session oriented and uses
incremental announcement and widrawal with hot restart, a la BGP incremental announcement and widrawal with session restart, a la BGP
(<xref target="RFC4271"/>).</t> (<xref target="RFC4271"/>).</t>
</section> </section>
@ -247,7 +246,7 @@
</figure> </figure>
<t>There are two protocols, the inter-device per-link layer 3 <t>There are two protocols, the inter-device per-link layer 3
discovery and the interface to the upper level BGP-like API: discovery and the API to the upper level BGP-like routing prototol:
<list style="symbols"> <list style="symbols">
<t>Inter-device PDUs are used to exchange device and logical link <t>Inter-device PDUs are used to exchange device and logical link
@ -272,21 +271,21 @@
<section anchor="ilpo" title="Inter-Link Protocol Overview"> <section anchor="ilpo" title="Inter-Link Protocol Overview">
<t>Two devices discover each other and their respective identities <t>Two devices discover each other and their respective identities
by sending multicast HELLO PDUs (<xref target="hello"/>). To allow by sending multicast HELLO PDUs (<xref target="hello"/>). To assure
discovery of new devices coming up on a multi-link topology, devices discovery of new devices coming up on a multi-link topology, devices
on such a topology send periodic HELLOs forever, see <xref on such a topology send periodic HELLOs forever, see <xref
target="dhello"/>.</t> target="dhello"/>.</t>
<t>Once a new device is recognized, both devices attempt to <t>Once a new device is recognized, both devices attempt to
negotiate and establish peering by sending unicast OPEN PDUs (<xref negotiate and establish a session by sending unicast OPEN PDUs
target="open"/>). In an established peering, the Encapsulations (<xref target="open"/>). In an established session, the
(<xref target="afisafi"/>) configured on an end point may be Encapsulations (<xref target="afisafi"/>) configured on an end point
announced and modified. Note that these are only the encapsuation may be announced and modified. Note that these are only the
and addresses on the announcing interface; though a device's encapsuation and addresses configured on the announcing interface;
loopback interface(s) may also be announced. When two devices on a though a device's loopback and overlay interface(s) may also be
link have compatible Encapsulations and addresses, i.e. the same announced. When two devices on a link have compatible
AFI/SAFI and the same subnet, the link is announced via the BGP-LS Encapsulations and addresses, i.e. the same AFI/SAFI and the same
API.</t> subnet, the link is announced via the BGP-LS API.</t>
<section anchor="ladder" title="L3DL Ladder Diagram"> <section anchor="ladder" title="L3DL Ladder Diagram">
@ -302,7 +301,7 @@
PDUs are optional; though at least one encapsulation SHOULD be PDUs are optional; though at least one encapsulation SHOULD be
agreed at some point.</t> agreed at some point.</t>
<t>The following is a ladder-style sketch of the L3DL protocol <t>The following is a ladder-style diagram of the L3DL protocol
exchanges:</t> exchanges:</t>
<figure> <figure>
@ -380,8 +379,8 @@
<section anchor="transport" title="Transport Layer"> <section anchor="transport" title="Transport Layer">
<t>L3DL PDUs are carried by a simple transport layer which allows <t>L3DL PDUs are carried by a simple transport layer which allows
long PDUs to occupy many Ethernet frames. An L3DL frame is referred PDUs to occupy many Ethernet frames. An L3DL Ethernet frame is
to as a Datagram.</t> referred to as a Datagram.</t>
<t>The L3DL Transport Layer encapsulates each Datagram using a <t>The L3DL Transport Layer encapsulates each Datagram using a
common transport header.</t> common transport header.</t>
@ -402,7 +401,7 @@
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Datagram Length | Checksum ~ | Datagram Length | Checksum ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~ | Payload... | ~ | Payload... ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork> </artwork>
</figure> </figure>
@ -411,9 +410,9 @@
<list style="hanging"> <list style="hanging">
<t hangText="Version:">Seven-bit Version number of the protocol, <t hangText="Version:">Seven-bit Version number of the protocol,
currently 0. Values other than 0 are treated as errors. The currently 0. Values other than 0 MUST BE treated as an error.
protocol version nees to be in one and only one place, so it is in The protocol version nees to be in one and only one place, so it
the datagram as opposed to, for example, the PDU header.</t> is in the datagram as opposed to, for example, the PDU header.</t>
<t hangText="L:">A bit that set to one if this Datagram is the <t hangText="L:">A bit that set to one if this Datagram is the
last Datagram of the PDU. For a PDU which fits in only one last Datagram of the PDU. For a PDU which fits in only one
@ -436,6 +435,12 @@
thereof.</t> thereof.</t>
</list></t> </list></t>
<t>To avoid the need for a receiver to reassemble two PDUs at the
same time, a sender MUST NOT send a subsequent PDU when a PDU is
already in flight and not yet acknowledged if it is an ACKed PDU
Type.</t>
</section> </section>
<section anchor="checksum" title="The Checksum"> <section anchor="checksum" title="The Checksum">
@ -528,7 +533,7 @@ uint32_t sbox_checksum_32(const uint8_t *b, const size_t n)
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sig Type | Signature Length | ~ | Sig Type | Signature Length | ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
~ Signature | ~ Signature ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork> </artwork>
</figure> </figure>
@ -557,7 +562,7 @@ uint32_t sbox_checksum_32(const uint8_t *b, const size_t n)
<t hangText="Signature Length:">The length of the Signature, <t hangText="Signature Length:">The length of the Signature,
possibly including padding, in octets. If Sig Type is 0, possibly including padding, in octets. If Sig Type is 0,
Signature Length must be 0.</t> Signature Length MUST BE 0.</t>
<t hangText="Signature:">The result of running the signature <t hangText="Signature:">The result of running the signature
algorithm specified in Sig Type over all octets of the PDU except algorithm specified in Sig Type over all octets of the PDU except
@ -636,10 +641,6 @@ uint32_t sbox_checksum_32(const uint8_t *b, const size_t n)
<section anchor="hello" title="HELLO"> <section anchor="hello" title="HELLO">
<t>WARNING: The second multicast address below is incorrect. We
need to get a new assignment. , which is what we really wanted with the second address
below.</t>
<t>The HELLO PDU is unique in that it is encapsulated in a multicast <t>The HELLO PDU is unique in that it is encapsulated in a multicast
Ethernet frame. It solicits response(s) from other LLEI(s) on the Ethernet frame. It solicits response(s) from other LLEI(s) on the
link. See <xref target="dhello"/> for why multicast is used. The link. See <xref target="dhello"/> for why multicast is used. The
@ -649,13 +650,15 @@ uint32_t sbox_checksum_32(const uint8_t *b, const size_t n)
<t hangText="01-80-C2-00-00-0E:">Nearest Bridge = Propagation <t hangText="01-80-C2-00-00-0E:">Nearest Bridge = Propagation
constrained to a single physical link; stopped by all types of constrained to a single physical link; stopped by all types of
bridges (including MPRs (media converters)).</t> bridges (including MPRs (media converters)). This SHOULD BE used
when the link is known to be a simple point to point link.</t>
<t hangText="To Be Assigned:"> When a switch receives a frame with <t hangText="To Be Assigned:"> When a switch receives a frame with
a multicast destination MAC it does not recognize, it forwards to a multicast destination MAC it does not recognize, it forwards to
all ports. This destination MAC is to be sent when the interface all ports. This destination MAC is to be sent when the interface
is known to be connected to a switch. See <xref is known to be connected to a switch. See <xref
target="ieee"/>.</t> target="ieee"/>. This SHOULD BE used when the link may be a
multi-point link.</t>
<?rfc subcompact="no"?></list></t> <?rfc subcompact="no"?></list></t>
@ -664,11 +667,12 @@ uint32_t sbox_checksum_32(const uint8_t *b, const size_t n)
exchange.</t> exchange.</t>
<t>When an interface is turned up on a device, it SHOULD issue a <t>When an interface is turned up on a device, it SHOULD issue a
HELLO.</t> HELLO if it is to participate in L3DL sessions.</t>
<t>If a constrained destination address configured, see above, then <t>If a constrained Nearest Bridge destination address is configured
the HELLO need not be repeated once a session has been created by an for a point-to-point interface, see above, then the HELLO SHOULD NOT
exchange of OPENs.</t> be repeated once a session has been created by an exchange of
OPENs.</t>
<t>If the configured destination address is one that is propagated <t>If the configured destination address is one that is propagated
by switches, the HELLO SHOULD be repeated at a configured interval, by switches, the HELLO SHOULD be repeated at a configured interval,
@ -696,8 +700,8 @@ uint32_t sbox_checksum_32(const uint8_t *b, const size_t n)
separate logical link.</t> separate logical link.</t>
<t>When a HELLO is received from a source MAC address with which <t>When a HELLO is received from a source MAC address with which
there is no established L3DL adjacency, the receiver SHOULD respond there is no established L3DL session, the receiver SHOULD respond
with an OPEN PDU. The two devices establish an L3DL adjacency by with an OPEN PDU. The two devices establish an L3DL session by
exchanging OPEN PDUs.</t> exchanging OPEN PDUs.</t>
<t>The Payload Length is zero as there is no payload.</t> <t>The Payload Length is zero as there is no payload.</t>
@ -711,7 +715,7 @@ uint32_t sbox_checksum_32(const uint8_t *b, const size_t n)
<t>Each device has learned the other's MAC Address from the HELLO <t>Each device has learned the other's MAC Address from the HELLO
exchange, see <xref target="hello"/>. Therefore the OPEN and exchange, see <xref target="hello"/>. Therefore the OPEN and
subsequent PDUs are unicast, as opposed to the HELLO's multicast subsequent PDUs MUST BE unicast, as opposed to the HELLO's multicast
frame.</t> frame.</t>
<!-- <!--
@ -754,9 +758,10 @@ q-->
<t>My LLEI is the sender's LLEI, see <xref target="llei"/>.</t> <t>My LLEI is the sender's LLEI, see <xref target="llei"/>.</t>
<t>AttrCount is the number of attributes in the Attribute List. <t>AttrCount is the number of attributes in the Attribute List.
Attributes are single octets whose semantics are user-defined.</t> Attributes are single octets the semantics of which are
operator-defined.</t>
<t>A node may have zero or more user-defined attributes, e.g. <t>A node may have zero or more operator-defined attributes, e.g.:
spine, leaf, backbone, route reflector, arabica, ...</t> spine, leaf, backbone, route reflector, arabica, ...</t>
<t>Attribute syntax and semantics are local to an operator or <t>Attribute syntax and semantics are local to an operator or
@ -767,19 +772,19 @@ q-->
target="tlv"/>.</t> target="tlv"/>.</t>
<t>Key Length is a 16-bit field denoting the length in octets of the <t>Key Length is a 16-bit field denoting the length in octets of the
Key itself, not including the Auth Type or the Key Lengths. If Key itself, not including the Auth Type or the Key Length. If there
there is no Key, the Auth Type and key Length MUST both be zero.</t> is no Key, the Auth Type and key Length MUST both be zero.</t>
<t>The Key is specific to the operational environment. A failure to <t>The Key is specific to the operational environment. A failure to
authenticate is a failure to start the L3DL session, an ERROR PDU is authenticate is a failure to start the L3DL session, an ERROR PDU
sent (Error Code 2), and HELLOs MUST be restarted.</t> MUST BE sent (Error Code 2), and HELLOs MUST be restarted.</t>
<t>The Serial Number is that of the last received and processed <t>The Serial Number is that of the last received and processed PDU.
Encapsulation PDU. This allows a receiver sending an OPEN to tell This allows a receiver sending an OPEN to tell the sender that the
the sender that the receiver wants to resume a session and the receiver wants to resume a session and the sender only needs to send
sender only needs to send data more recent than the Serial Number. data more recent than the Serial Number. If this OPEN is not trying
If this OPEN is not trying to restart a lost session, the Serial to restart a lost session, the Serial Number MUST BE set to
Number MUST be set to zero.</t> zero.</t>
<t>The Signature fields are described in <xref target="tlv"/> and in <t>The Signature fields are described in <xref target="tlv"/> and in
an asymmetric key environment serve as a proof of possession of the an asymmetric key environment serve as a proof of possession of the
@ -791,19 +796,29 @@ q-->
keep the session semantics alive. The timing and acceptable drop of keep the session semantics alive. The timing and acceptable drop of
KEEPALIVE PDUs are discussed in <xref target="keepalive"/>.</t> KEEPALIVE PDUs are discussed in <xref target="keepalive"/>.</t>
<t>If a sender of OPEN does not receive an ACK of the OPEN PDU Type, <t>If a sender of OPEN does not receive an ACK of the OPEN PDU, then
then they MUST resend the same OPEN PDU, with the same Nonce. they MUST resend the same OPEN PDU, with the same Nonce. Resending
Resending an unacknowledged OPEN PDU, like other ACKed PDUs, SHOULD an unacknowledged OPEN PDU, like other ACKed PDUs, SHOULD use
use exponential back-off, see <xref target="RFC1122"/>.</t> exponential back-off, see <xref target="RFC1122"/>.</t>
<t>If a properly authenticated OPEN arrives with a new Nonce from an <t>If a properly authenticated OPEN arrives with a new Nonce from an
LLEI with which the receiving logical link endpoint believes it LLEI with which the receiving logical link endpoint believes it
already has an L3DL session (OPENs have already been exchanged), the already has an L3DL session (OPENs have already been exchanged), and
receiver MAY assume that the sending LLEI or entire device has been the Serial Number in the OPEN is non-zero, the receiver SHOULD
reset. If the Serial Number in the OPEN is zero, then all establish a new session by sending an OPEN with the Serial Number of
discovered encapsulation data SHOULD be withdrawn via the BGP-LS API the last data it received. Each party MUST resume sending
and the recipient MUST respond with a new OPEN. In this encapsulations etc. subsequent to the other party's Sequence Number.
circumstance encapsulations SHOULD NOT be kept.</t> And each MUST retain all previously discovered encapsulation and
other data.</t>
<t>If a properly authenticated OPEN arrives with a new Nonce from an
LLEI with which the receiving logical link endpoint believes it
already has an L3DL session (OPENs have already been exchanged), and
the Serial Number in the OPEN is zero, then the receiver MUST assume
that the sending LLEI or entire device has been reset. All
previously discovered encapsulation data MUST NOT be kept and MUST
be withdrawn via the BGP-LS API and the recipient MUST respond with
a new OPEN.</t>
</section> </section>
@ -836,7 +851,7 @@ q-->
PDU, etc.</t> PDU, etc.</t>
<t>The ACKed PDU is the PDU Type of the PDU being acknowledged, <t>The ACKed PDU is the PDU Type of the PDU being acknowledged,
e.g., OPEN or one of the Encapsulations.</t> e.g., OPEN, one of the Encapsulations, etc.</t>
<t>If there was an error processing the received PDU, then the EType <t>If there was an error processing the received PDU, then the EType
is non-zero. If the EType is zero, Error Code and Error Hint MUST is non-zero. If the EType is zero, Error Code and Error Hint MUST
@ -848,12 +863,21 @@ q-->
error.</t> error.</t>
<t>The decimal value of EType gives a strong hint how the receiver <t>The decimal value of EType gives a strong hint how the receiver
sending the ACK believes things should proceed. The ETypes are sending the ACK believes things should proceed:
listed in <xref target="iana-error"/>. Someone stuck in the 1990s <list style="empty">
might think of the error codes as 0x1zzz, 0x2zzz, etc. They might <?rfc subcompact="yes"?>
be right. Or not.</t> <t>0 - No Error, Error Code and Error Hint MUST be zero</t>
<t>1 - Warning, something not too serious happened, continue</t>
<t>2 - Session should not be continued, try to restart</t>
<t>3 - Restart is hopeless, call the operator</t>
<t>4-15 - Reserved</t>
<?rfc subcompact="no"?>
</list></t>
<t>The Error Code indicates the type of error.</t> <t>The Error Codes, noting protocol failures listed in thi document,
are listed in <xref target="iana-error"/>. Someone stuck in the
1990s might think the catenation of EType and Error Code as an echo
of 0x1zzz, 0x2zzz, etc. They might be right; or not.</t>
<t>The Error Hint is any additional data the sender of the error PDU <t>The Error Hint is any additional data the sender of the error PDU
thinks will help the recipient or the debugger with the particular thinks will help the recipient or the debugger with the particular
@ -873,8 +897,7 @@ q-->
case of this ACK failure.</t> case of this ACK failure.</t>
<t>If the link is broken at layer 2, retransmission MAY BE retried <t>If the link is broken at layer 2, retransmission MAY BE retried
when the link comes back up if data have not changed in the when the link is restored.</t>
interim.</t>
</section> </section>
@ -887,11 +910,10 @@ q-->
session is considered established, and the devices SHOULD exchange session is considered established, and the devices SHOULD exchange
L3 interface encapsulations, L3 addresses, and L2.5 labels.</t> L3 interface encapsulations, L3 addresses, and L2.5 labels.</t>
<t>The Encapsulation types the peers exchange may be IPv4 <t>The Encapsulation types the peers exchange may be IPv4 (<xref
Announcement (<xref target="ipv4"/>), IPv6 Announcement (<xref target="ipv4"/>), IPv6 (<xref target="ipv6"/>), MPLS IPv4 (<xref
target="ipv6"/>), MPLS IPv4 Announcement (<xref target="mpls4"/>), target="mpls4"/>), MPLS IPv6 (<xref target="mpls6"/>), and/or
MPLS IPv6 Announcement (<xref target="mpls6"/>), and/or possibly possibly others not defined here.</t>
others not defined here.</t>
<t>The sender of an Encapsulation PDU MUST NOT assume that the peer <t>The sender of an Encapsulation PDU MUST NOT assume that the peer
is capable of the same Encapsulation Type. An ACK (<xref is capable of the same Encapsulation Type. An ACK (<xref
@ -937,12 +959,12 @@ q-->
</artwork> </artwork>
</figure> </figure>
<t>The 24-bit Count is the number of Encapsulations in the
Encapsulation list.</t>
<t>An Encapsulation PDU describes zero or more addresses of the <t>An Encapsulation PDU describes zero or more addresses of the
encapsulation type.</t> encapsulation type.</t>
<t>The 24-bit Count is the number of Encapsulations in the
Encapsulation list.</t>
<t>The Serial Number is a monotonically increasing 32-bit value <t>The Serial Number is a monotonically increasing 32-bit value
representing the sender's state in time. It may be an integer, a representing the sender's state in time. It may be an integer, a
timestamp, etc. On session restart (new OPEN), a receiver MAY timestamp, etc. On session restart (new OPEN), a receiver MAY
@ -950,7 +972,7 @@ q-->
send newer data.</t> send newer data.</t>
<t>If a sender has multiple links on the same interface, separate <t>If a sender has multiple links on the same interface, separate
state: data, ACKs, etc. must be kept for each peer.</t> state: data, ACKs, etc. must be kept for each peer session.</t>
<t>Over time, multiple Encapsulation PDUs may be sent for an <t>Over time, multiple Encapsulation PDUs may be sent for an
interface as configuration changes.</t> interface as configuration changes.</t>
@ -988,9 +1010,10 @@ q-->
</artwork> </artwork>
</figure> </figure>
<t>An Encapsulation PDU of Type T may announce new and/or withdraw <t>Each encapsulation in an Encapsulation PDU of Type T may
old encapsulations of Type T. It indicates this with the Ann/With announce new and/or withdraw old encapsulations of Type T. It
Encapsulation Flag, Announce == 1, Withdraw == 0.</t> indicates this with the Ann/With Encapsulation Flag, Announce ==
1, Withdraw == 0.</t>
<t>Each Encapsulation interface address in an Encapsulation PDU is <t>Each Encapsulation interface address in an Encapsulation PDU is
either a new encapsulation be announced (Ann/With == 1) (yes, a la either a new encapsulation be announced (Ann/With == 1) (yes, a la
@ -1006,20 +1029,18 @@ q-->
be marked as primary for a particular encapsulation type.</t> be marked as primary for a particular encapsulation type.</t>
<t>An Encapsulation interface address in an Encapsulation PDU MAY <t>An Encapsulation interface address in an Encapsulation PDU MAY
be marked as a loopback, in which case the Loopback bit is be marked as a loopback, in which case the Loopback bit is set.
set.</t> Loopback addresses are generally not seen directly on an external
interface. One or more loopback addresses MAY be exposed by
<t>Loopback addresses are generally not seen directly on an configuration on one or more L3DL speaking external interfaces,
external interface. One or more loopback addresses MAY be exposed
by configuration on one or more L3DL speaking external interfaces,
e.g. for iBGP peering. They SHOULD be marked as such, Loopback e.g. for iBGP peering. They SHOULD be marked as such, Loopback
Flag == 1.</t> Flag == 1.</t>
<t>Each Encapsulation interface address in an Encapsulation PDU is <t>Each Encapsulation interface address in an Encapsulation PDU is
that of the direct 'underlay interface (Under/Over == 1), or an that of the direct 'underlay interface (Under/Over == 1), or an
'overlay' address (Under/Over == 0), likely that of a VM or 'overlay' address (Under/Over == 0), likely that of a VM or
container guest bridged on to the interface with an underlay container guest bridged or configured on to the interface already
address.</t> having an underlay address.</t>
</section> </section>
@ -1053,7 +1074,8 @@ q-->
</artwork> </artwork>
</figure> </figure>
<t>The 24-bit Count is the number of IPv4 Encapsulations.</t> <t>The 24-bit Count is the number of IPv4 Encapsulations being
announced and/or withdrawn.</t>
</section> </section>
@ -1094,7 +1116,8 @@ q-->
</artwork> </artwork>
</figure> </figure>
<t>The 24-bit Count is the number of IPv6 Encapsulations.</t> <t>The 24-bit Count is the number of IPv6 Encapsulations being
announced and/or withdrawn.</t>
</section> </section>
@ -1160,7 +1183,8 @@ q-->
</artwork> </artwork>
</figure> </figure>
<t>The 24-bit Count is the number of MPLSv4 Encapsulations.</t> <t>The 24-bit Count is the number of MPLSv4 Encapsulation being
announced and/or withdrawns.</t>
</section> </section>
@ -1169,7 +1193,7 @@ q-->
<t>The MPLS IPv4 Encapsulation describes a logical link's ability <t>The MPLS IPv4 Encapsulation describes a logical link's ability
to exchange labeled IPv4 packets on one or more subnets. It does to exchange labeled IPv4 packets on one or more subnets. It does
so by stating the interface's addresses, the corresponding prefix so by stating the interface's addresses, the corresponding prefix
lengths, and the corresponding labels which will be accepted fpr lengths, and the corresponding labels which will be accepted for
each address.</t> each address.</t>
<!-- <!--
protocol "PDU Type = 7:8,Payload Length:32,Count:24,Serial Number:32,Encaps Flags:8,MPLS Label List ...:16,IPv6 Address:128,Prefix Len:8,more ...:8,Sig Type:8,Signature Length:16,Signature ...:32" protocol "PDU Type = 7:8,Payload Length:32,Count:24,Serial Number:32,Encaps Flags:8,MPLS Label List ...:16,IPv6 Address:128,Prefix Len:8,more ...:8,Sig Type:8,Signature Length:16,Signature ...:32"
@ -1203,14 +1227,9 @@ q-->
</artwork> </artwork>
</figure> </figure>
<t>The 24-bit Count is the number of MPLSv6 Encapsulations.</t> <t>The 24-bit Count is the number of MPLSv6 Encapsulations being
announced and/or withdrawn.</t>
<t>The MPLS IPv6 Encapsulation describes a logical link's ability
to exchange labeled IPv6 packets on one or more subnets. It does
so by stating the interface's addresses, the corresponding prefix
lengths, and the corresponding labels which will be accepted fpr
each address.</t>
</section> </section>
</section> </section>
@ -1256,26 +1275,6 @@ q-->
<section anchor="keepalive" title="KEEPALIVE - Layer 2 Liveness"> <section anchor="keepalive" title="KEEPALIVE - Layer 2 Liveness">
<t>L3DL devices SHOULD beacon frequent Layer 2 KEEPALIVE PDUs to
ensure session continuity. A receiver may choose to ignore
KEEPALIVE PDUs.</t>
<t>An operational deployment MUST BE configured whether to use
KEEPALIVEs or not, either globally, or down to per-link granularity.
Disagreement MAY result in repeated session break and
reestablishment.</t>
<t>KEEPALIVEs SHOULD be beaconed at a configured frequency. One per
second is the default. Layer 3 liveness, such as BFD, may be more
(or less) aggressive.</t>
<t>If a KEEPALIVE is not received from a peer with which a receiver
has an open session for a configurable time (default 30 seconds),
the link SHOULD BE presumed down. The devices MAY keep
configuration state and restore it without retransmission if no data
have changed. Otherwise, a new session SHOULD BE established and
new Encapsulation PDUs exchanged.</t>
<!-- <!--
protocol "PDU Type = 2:8,Payload Length = 0:32,Sig Type = 0:8,Signature Length = 0:16" protocol "PDU Type = 2:8,Payload Length = 0:32,Sig Type = 0:8,Signature Length = 0:16"
--> -->
@ -1292,6 +1291,31 @@ q-->
</artwork> </artwork>
</figure> </figure>
<t>L3DL devices SHOULD beacon frequent Layer 2 KEEPALIVE PDUs to
ensure session continuity. A receiver may choose to ignore
KEEPALIVE PDUs.</t>
<t>An operational deployment MUST BE configured whether to use
KEEPALIVEs or not, either globally, or down to per-link granularity.
Disagreement MAY result in repeated session break and
reestablishment.</t>
<t>KEEPALIVEs SHOULD be beaconed at a configured frequency. One per
second is the default. Layer 3 liveness, such as BFD, may be more
(or less) aggressive.</t>
<t>When a sender transmits a PDU which is not a KEEPALIVE, the
sender SHOULD reset the KEEPALIVE timer. I.e. sending any PDU acts
as a keepalive. Once the last fragment has been sent, the
KEEPALIVE timer SHOULD BE restarted. Do not wait for the ACK.</t>
<t>If a KEEPALIVE or other PDUs have not been received from a peer
with which a receiver has an open session for a configurable time
(default 30 seconds), the link SHOULD BE presumed down. The devices
MAY keep configuration state and restore it without retransmission
if no data have changed. Otherwise, a new session SHOULD BE
established and new Encapsulation PDUs exchanged.</t>
</section> </section>
<section anchor="l3liveness" title="Layers 2.5 and 3 Liveness"> <section anchor="l3liveness" title="Layers 2.5 and 3 Liveness">
@ -1303,7 +1327,7 @@ q-->
technique.</t> technique.</t>
<t>This protocol assumes that one or more Encapsulation addresses <t>This protocol assumes that one or more Encapsulation addresses
will be used to ping, run BFD, or whatever the operator may be used to ping, run BFD, or whatever the operator
configures.</t> configures.</t>
</section> </section>
@ -1317,7 +1341,7 @@ q-->
LLEIs and Encapsulations on each logical link interface.</t> LLEIs and Encapsulations on each logical link interface.</t>
<t>Full topology discovery is not appropriate at the L3DL layer, so <t>Full topology discovery is not appropriate at the L3DL layer, so
Dijkstra à la IS-IS etc. is assumed to be done by higher level Dijkstra a la IS-IS etc. is assumed to be done by higher level
protocols such as BGP-SPF.</t> protocols such as BGP-SPF.</t>
<t>Therefore the LLEIs, link Encapsulations, and state changes are <t>Therefore the LLEIs, link Encapsulations, and state changes are
@ -1370,24 +1394,15 @@ q-->
<section anchor="dhello" title="HELLO Discussion"> <section anchor="dhello" title="HELLO Discussion">
<!--
<t>There is the question of whether to allow an intermediate
switch to be transparent to discovery. We consider that an
interface on a device is a Layer 2 or a Layer 3 interface. In
theory it could be a Layer 3 interface with no encapsulation or
Layer 3 addressing currently configured.</t>
-->
<t>A device with multiple Layer 2 interfaces, traditionally called <t>A device with multiple Layer 2 interfaces, traditionally called
a switch, may be used to forward frames and therefore packets from a switch, may be used to forward frames and therefore packets from
multiple devices to one logical interface (LLEI), I, on an L3DL multiple devices to one logical interface (LLEI), I, on an L3DL
speaking device. Interface I could discover a peer J across the speaking device. Interface I could discover a peer J across the
switch. Later, a prospective peer K could come up across the switch. Later, a prospective peer K could come up across the
switch. If I was not still sending and listening for HELLOs, the switch. If I was not still sending and listening for HELLOs, the
potential peering with K could not be discovered. Therefore, potential peering with K could not be discovered. Therefore, on
interfaces MUST continue to send HELLOs as long as they are turned multi-link interfaces MUST continue to send HELLOs as long as they
up.</t> are turned up.</t>
</section> </section>
@ -1444,15 +1459,15 @@ q-->
encapsulation, the implementation MAY mark it as primary by encapsulation, the implementation MAY mark it as primary by
default.</t> default.</t>
<t>An implementation SHOULD allow optional configuration which <t>An implementation MAY allow optional configuration which updates
updates the local forwarding table with overlay and underlay data the local forwarding table with overlay and underlay data both
both learned from L3DL peers and configured locally.</t> learned from L3DL peers and configured locally.</t>
</section> </section>
<section anchor="security" title="Security Considerations"> <section anchor="security" title="Security Considerations">
<t>The protocol as it is MUST NOT be used outside a datacenter or <t>The protocol as is MUST NOT be used outside a datacenter or
similarly closed environment due to lack of formal definition of the similarly closed environment due to lack of formal definition of the
authentication and authorization mechanism. Sufficient mechanisms authentication and authorization mechanism. Sufficient mechanisms
may be described in separate documents.</t> may be described in separate documents.</t>
@ -1588,12 +1603,13 @@ q-->
<section anchor="acks" title="Acknowledgments"> <section anchor="acks" title="Acknowledgments">
<t>The authors thank Cristel Pelsser for multiple reviews, Jeff Haas <t>The authors thank Cristel Pelsser for multiple reviews, Harsha
for review and comments, Joe Clarke for a useful review, John Kovuru for comments during implementation, Jeff Haas for review and
Scudder for deeply serious review and comments, Larry Kreeger for a comments, Joe Clarke for a useful review, John Scudder for deeply
lot of layer 2 clue, Martijn Schmidt for his contribution, Neeraj serious review and comments, Larry Kreeger for a lot of layer 2
Malhotra for review, Russ Housley for checksum discussion and sBox, clue, Martijn Schmidt for his contribution, Neeraj Malhotra for
and Steve Bellovin for checksum advice.</t> review, Russ Housley for checksum discussion and sBox, and Steve
Bellovin for checksum advice.</t>
</section> </section>