1761 lines
75 KiB
XML
1761 lines
75 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
|
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
|
|
<?rfc comments="yes"?>
|
|
<?rfc compact="yes"?>
|
|
<?rfc subcompact="no"?>
|
|
<?rfc inline="yes"?>
|
|
<?rfc sortrefs="yes"?>
|
|
<?rfc symrefs="yes"?>
|
|
<?rfc toc="yes"?>
|
|
<?rfc tocdepth="6"?>
|
|
<?rfc tocindent="yes"?>
|
|
<?rfc tocompact="yes"?>
|
|
|
|
<rfc category="std" docName="draft-ietf-lsvr-l3dl-04" ipr="trust200902">
|
|
|
|
<front>
|
|
|
|
<title>Layer 3 Discovery and Liveness</title>
|
|
|
|
<author fullname="Randy Bush" initials="R." surname="Bush">
|
|
<organization>Arrcus & Internet Initiative Japan</organization>
|
|
<address>
|
|
<postal>
|
|
<street>5147 Crystal Springs</street>
|
|
<city>Bainbridge Island</city>
|
|
<region>WA</region>
|
|
<code>98110</code>
|
|
<country>US</country>
|
|
</postal>
|
|
<email>randy@psg.com</email>
|
|
</address>
|
|
</author>
|
|
|
|
<author initials="R." surname="Austein" fullname="Rob Austein">
|
|
<organization abbrev="Arrcus">Arrcus, Inc</organization>
|
|
<address>
|
|
<email>sra@hactrn.net</email>
|
|
</address>
|
|
</author>
|
|
|
|
<author fullname="Keyur Patel" initials="K." surname="Patel">
|
|
<organization>Arrcus</organization>
|
|
<address>
|
|
<postal>
|
|
<street>2077 Gateway Place, Suite #400</street>
|
|
<city>San Jose</city>
|
|
<region>CA</region>
|
|
<code>95119</code>
|
|
<country>US</country>
|
|
</postal>
|
|
<email>keyur@arrcus.com</email>
|
|
</address>
|
|
</author>
|
|
|
|
<date />
|
|
|
|
<abstract>
|
|
|
|
<t>In Massive Data Centers, BGP-SPF and similar routing protocols
|
|
are used to build topology and reachability databases. These
|
|
protocols need to discover IP Layer 3 attributes of links, such as
|
|
logical link IP encapsulation abilities, IP neighbor address
|
|
discovery, and link liveness. This Layer 3 Discovery and Liveness
|
|
protocol collects these data, which may then be disseminated using
|
|
BGP-SPF and similar protocols.</t>
|
|
|
|
</abstract>
|
|
|
|
<note title="Requirements Language">
|
|
|
|
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
|
|
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
|
|
"OPTIONAL" in this document are to be interpreted as described in
|
|
BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when,
|
|
and only when, they appear in all capitals, as shown here.</t>
|
|
|
|
</note>
|
|
|
|
</front>
|
|
|
|
<middle>
|
|
|
|
<section anchor="intro" title="Introduction">
|
|
|
|
<t>The Massive Data Center (MDC) environment presents unusual
|
|
problems of scale, e.g. O(10,000) forwarding devices, while its
|
|
homogeneity presents opportunities for simple approaches.
|
|
Approaches such as Jupiter Rising <xref target="JUPITER"/> use a
|
|
central controller to deal with scaling, while BGP-SPF <xref
|
|
target="I-D.ietf-lsvr-bgp-spf"/> provides massive scale-out without
|
|
centralization using a tried and tested scalable distributed control
|
|
plane, offering a scalable routing solution in Clos <xref
|
|
target="Clos0"/><xref target="Clos1"/> and similar environments.
|
|
But BGP-SPF and similar higher level device-spanning protocols,
|
|
e.g. <xref target="I-D.malhotra-bess-evpn-lsoe"/>, need logical link
|
|
state and addressing data from the network to build the routing
|
|
topology. They also need prompt but prudent reaction to (logical)
|
|
link failure.</t>
|
|
|
|
<t>Layer 3 Discovery and Liveness (L3DL) provides brutally simple
|
|
mechanisms for devices to <list style="symbols">
|
|
<t>Discover each other's unique endpoint identification,</t>
|
|
<t>Discover mutually supported layer 3 encapsulations, e.g.
|
|
IP/MPLS,</t>
|
|
<t>Discover Layer 3 IP and/or MPLS addressing of interfaces of the
|
|
encapsulations,</t>
|
|
<t>Present these data, using a very restricted profile of a BGP-LS
|
|
<xref target="RFC7752"/> API, to BGP-SPF which computes the
|
|
topology and builds routing and forwarding tables,</t>
|
|
<t>Enable Layer 3 link liveness such as BFD,</t>
|
|
<t>Provide Layer 2 keep-alive messages for session continuity, and
|
|
finally</t>
|
|
<t>Provide for authenticity verification of protocol messages.</t>
|
|
</list></t>
|
|
|
|
<t>In this document, the use case for L3DL is for point to point
|
|
links in a datacenter Clos in order to exchange the data needed for
|
|
BGP-SPF <xref target="I-D.ietf-lsvr-bgp-spf"/> bootstrap and
|
|
continuity. Once layer two connectivity has been leveraged to get
|
|
layer three addressability and forwarding capabilities, normal layer
|
|
three forwarding and routing can take over.</t>
|
|
|
|
<t>L3DL might be found to be more widely applicable to a range of
|
|
routing and similar protocols which need layer three discovery and
|
|
characterisation.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="terminology" title="Terminology">
|
|
|
|
<t>Even though it concentrates on the inter-device layer, this
|
|
document relies heavily on routing terminology. The following
|
|
attempts to clarify the use of some possibly confusing terms:
|
|
<list hangIndent="11" style="hanging">
|
|
<?rfc subcompact="yes"?>
|
|
<t hangText="ASN:">Autonomous System Number <xref
|
|
target="RFC4271"/>, a BGP identifier for an originator of
|
|
Layer 3 routes, particularly BGP announcements.</t>
|
|
<t hangText="BGP-LS:">A mechanism by which link-state and TE
|
|
information can be collected from networks and shared with
|
|
external components using the BGP routing protocol. See <xref
|
|
target="RFC7752"/>.</t>
|
|
<t hangText="BGP-SPF">A hybrid protocol using BGP transport but
|
|
a Dijkstra Shortest Path First decision process. See <xref
|
|
target="I-D.ietf-lsvr-bgp-spf"/>.</t>
|
|
<t hangText="Clos:">A hierarchic subset of a crossbar switch
|
|
topology commonly used in data centers.</t>
|
|
<t hangText="Datagram:">The L3DL content of a single Layer 2
|
|
frame, sans Ethernet framing. A full L3DL PDU may be packaged in
|
|
multiple Datagrams.</t>
|
|
<t hangText="Encapsulation:">Address Family Indicator and
|
|
Subsequent Address Family Indicator (AFI/SAFI). I.e. classes of
|
|
layer 2.5 and 3 addresses such as IPv4, IPv6, MPLS, etc.</t>
|
|
<t hangText="Frame:">A Layer 2 Ethernet packet.</t>
|
|
<t hangText="Link or Logical Link:">A logical connection between
|
|
two logical ports on two devices. E.g. two VLANs between the same
|
|
two ports are two links.</t>
|
|
<t hangText="LLEI:">Logical Link Endpoint Identifier, the unique
|
|
identifier of one end of a logical link, see <xref
|
|
target="llei"/>.</t>
|
|
<t hangText="MAC Address:">48-bit Layer 2 addresses are assumed
|
|
since they are used by all widely deployed Layer 2 network
|
|
technologies of interest, especially Ethernet. See <xref
|
|
target="IEEE.802_2001"/>.</t>
|
|
<t hangText="MDC:">Massive Data Center, commonly composed of
|
|
thousands of Top of Rack Switches (TORs).</t>
|
|
<t hangText="MTU:">Maximum Transmission Unit, the size in octets
|
|
of the largest packet that can be sent on a medium, see <xref
|
|
target="RFC1122"/> 1.3.3.</t>
|
|
<t hangText="PDU:">Protocol Data Unit, an L3DL application layer
|
|
message. A PDU's content may need to be broken into multiple
|
|
Datagrams to make it through MTU or other restrictions.</t>
|
|
<t hangText="RouterID:">An 32-bit identifier unique in the
|
|
current routing domain, see <xref target="RFC6286"/>.</t>
|
|
<t hangText="Session:">An established, via OPEN PDUs, session
|
|
between two L3DL capable link end-points,</t>
|
|
<t hangText="SPF:">Shortest Path First, an algorithm for finding
|
|
the shortest paths between nodes in a graph; AKA Dijkstra's
|
|
algorithm.</t>
|
|
<t hangText="System Identifier:">An eight octet ISO System
|
|
Identifier a la <xref target="RFC1629"/> System ID</t>
|
|
<t hangText="TOR:">Top Of Rack switch, aggregates the servers in
|
|
a rack and connects to aggregation layers of the Clos tree, AKA
|
|
the Clos spine.</t>
|
|
<t hangText="ZTP:">Zero Touch Provisioning gives devices initial
|
|
addresses, credentials, etc. on boot/restart.</t>
|
|
<?rfc subcompact="no"?>
|
|
</list></t>
|
|
|
|
</section>
|
|
|
|
<section anchor="background" title="Background">
|
|
|
|
<t>L3DL is primarily designed for a Clos type datacenter scale and
|
|
topology, but can accommodate richer topologies which contain
|
|
potential cycles.</t>
|
|
|
|
<t>While L3DL is designed for the MDC, there are no inherent reasons
|
|
it could not run on a WAN. The authentication and authorization
|
|
needed to run safely on a WAN need to be considered, and the
|
|
appropriate level of security options chosen.</t>
|
|
|
|
<t>L3DL assumes a new IEEE assigned EtherType (TBD).</t>
|
|
|
|
<t>The number of addresses of one Encapsulation type on an interface
|
|
link may be quite large given a TOR with tens of servers, each
|
|
server having a few hundred micro-services, resulting in an
|
|
inordinate number of addresses. And highly automated micro-service
|
|
migration can cause serious address prefix disaggregation, resulting
|
|
in interfaces with thousands of disaggregated prefixes.</t>
|
|
|
|
<t>Therefore the L3DL protocol is session oriented and uses
|
|
incremental announcement and withdrawal with session restart, a la
|
|
BGP (<xref target="RFC4271"/>).</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="top" title="Top Level Overview">
|
|
|
|
<t><list style="symbols">
|
|
<t>Devices discover each other on logical links</t>
|
|
<t>Logical Link Endpoint Identifiers (LLEIs) are exchanged</t>
|
|
<t>Layer 2 Liveness checks may be started</t>
|
|
<t>Encapsulation data are exchanged and IP-Level Liveness checks
|
|
enabled</t>
|
|
<t>A BGP-like upper layer protocol is assumed to use the
|
|
identiiers and encapsulation data to discover and build a topology
|
|
database</t>
|
|
</list></t>
|
|
|
|
<figure>
|
|
<artwork>
|
|
+-------------------+ +-------------------+ +-------------------+
|
|
| Device | | Device | | Device |
|
|
| | | | | |
|
|
|+-----------------+| |+-----------------+| |+-----------------+|
|
|
|| || || || || ||
|
|
|| BGP-SPF <+---+> BGP-SPF <+---+> BGP-SPF ||
|
|
|| || || || || ||
|
|
|+--------^--------+| |+--------^--------+| |+--------^--------+|
|
|
| | | | | | | | |
|
|
| | | | | | | | |
|
|
|+--------+--------+| |+--------+--------+| |+--------+--------+|
|
|
|| Encapsulations || || Encapsulations || || Encapsulations ||
|
|
|| Addresses || || Addresses || || Addresses ||
|
|
|| L2 Liveness || || L2 Liveness || || L2 Liveness ||
|
|
|+--------^--------+| |+--------^--------+| |+--------^--------+|
|
|
| | | | | | | | |
|
|
| | | | | | | | |
|
|
|+--------v--------+| |+--------v--------+| |+--------v--------+|
|
|
|| || || || || ||
|
|
||Inter-Device PDUs<+---+>Inter-Device PDUs<+---+>Inter-Device PDUs||
|
|
|| || || || || ||
|
|
|+-----------------+| |+-----------------+| |+-----------------+|
|
|
+-------------------+ +-------------------+ +-------------------+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>There are two protocols, the inter-device (left-right in the
|
|
diagram) per-link layer 3 discovery and the API to the upper level
|
|
BGP-like routing prototol (up-down in the above diagram):
|
|
<list style="symbols">
|
|
|
|
<t>Inter-device PDUs are used to exchange device and logical link
|
|
identities and layer 2.5 (MPLS) and 3 identifiers (not payloads),
|
|
e.g. device IDs, port identities, VLAN IDs, Encapsulations, and IP
|
|
addresses.</t>
|
|
|
|
<t>A Link Layer to BGP API presents these data up the stack to
|
|
a BGP protocol or an other device-spanning upper layer protocol,
|
|
presenting them using the BGP-LS BGP-like data format.</t>
|
|
|
|
</list></t>
|
|
|
|
<t>The upper layer BGP family routing protocols cross all the
|
|
devices, though they are not part of these L3DL protocols.</t>
|
|
|
|
<t>To simplify this document, Layer 2 framing is not shown. L3DL is
|
|
about layer 3.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="ilpo" title="Inter-Link Protocol Overview">
|
|
|
|
<t>Two devices discover each other and their respective identities
|
|
by sending multicast HELLO PDUs (<xref target="hello"/>). To assure
|
|
discovery of new devices coming up on a multi-link topology, devices
|
|
on such a topology, and only on a multi-link topology, send periodic
|
|
HELLOs forever, see <xref target="dhello"/>.</t>
|
|
|
|
<t>Once a new device is recognized, both devices attempt to
|
|
negotiate and establish a session by sending unicast OPEN PDUs
|
|
(<xref target="open"/>) to the source MAC addresses (plus VIDs if
|
|
VLANs) of the received HELLOs. Once a session is established
|
|
through the OPEN exchange, the Encapsulations (<xref
|
|
target="afisafi"/>) configured on an end point may be announced and
|
|
modified. Note that these are only the encapsuation and addresses
|
|
configured on the announcing interface; though a device's loopback
|
|
and overlay interface(s) may also be announced. When two devices on
|
|
a link have compatible Encapsulations and addresses, i.e. the same
|
|
AFI/SAFI and the same subnet, the link is announced via the BGP-LS
|
|
API.</t>
|
|
|
|
<section anchor="ladder" title="L3DL Ladder Diagram">
|
|
|
|
<t>The HELLO, <xref target="hello"/>, is a priming message sent on
|
|
all configured logical links. It is a small L3DL PDU encapsulated
|
|
in an Ethernet multicast frame with the simple goal of discovering
|
|
the identities of logical link endpoint(s) reachable from a
|
|
Logical Link Endpoint, <xref target="llei"/>.</t>
|
|
|
|
<t>The HELLO and OPEN, <xref target="open"/>, PDUs, which are used
|
|
to discover and exchange detailed Logical Link Endpoint
|
|
Identifiers, LLEIs, and the ACK/ERROR PDU, are mandatory; other
|
|
PDUs are optional; though at least one encapsulation SHOULD be
|
|
agreed at some point.</t>
|
|
|
|
<t>The following is a ladder-style diagram of the L3DL protocol
|
|
exchanges:</t>
|
|
|
|
<figure>
|
|
<artwork>
|
|
| HELLO | Logical Link Peer discovery
|
|
|---------------------------->|
|
|
| HELLO | Mandatory
|
|
|<----------------------------|
|
|
| |
|
|
| |
|
|
| OPEN | MACs, IDs, etc.
|
|
|---------------------------->|
|
|
| ACK |
|
|
|<----------------------------|
|
|
| |
|
|
| OPEN | Mandatory
|
|
|<----------------------------|
|
|
| ACK |
|
|
|---------------------------->|
|
|
| |
|
|
| |
|
|
| Interface IPv4 Addresses | Interface IPv4 Addresses
|
|
|---------------------------->| Optional
|
|
| ACK |
|
|
|<----------------------------|
|
|
| |
|
|
| Interface IPv4 Addresses |
|
|
|<----------------------------|
|
|
| ACK |
|
|
|---------------------------->|
|
|
| |
|
|
| |
|
|
| Interface IPv6 Addresses | Interface IPv6 Addresses
|
|
|---------------------------->| Optional
|
|
| ACK |
|
|
|<----------------------------|
|
|
| |
|
|
| Interface IPv6 Addresses |
|
|
|<----------------------------|
|
|
| ACK |
|
|
|---------------------------->|
|
|
| |
|
|
| |
|
|
| Interface MPLSv4 Labels | Interface MPLSv4 Labels
|
|
|---------------------------->| Optional
|
|
| ACK |
|
|
|<----------------------------|
|
|
| |
|
|
| Interface MPLSv4 Labels | Interface MPLSv4 Labels
|
|
|<----------------------------| Optional
|
|
| ACK |
|
|
|---------------------------->|
|
|
| |
|
|
| |
|
|
| Interface MPLSv6 Labels | Interface MPLSv6 Labels
|
|
|---------------------------->| Optional
|
|
| ACK |
|
|
|<----------------------------|
|
|
| |
|
|
| Interface MPLSv6 Labels | Interface MPLSv6 Labels
|
|
|<----------------------------| Optional
|
|
| ACK |
|
|
|---------------------------->|
|
|
| |
|
|
| |
|
|
| L3DL KEEPALIVE | Layer 2 Liveness
|
|
|---------------------------->| Optional
|
|
| L3DL KEEPALIVE |
|
|
|<----------------------------|
|
|
</artwork>
|
|
</figure>
|
|
</section>
|
|
</section>
|
|
|
|
<section anchor="transport" title="Transport Layer">
|
|
|
|
<t>L3DL PDUs are carried by a simple transport layer which allows
|
|
long PDUs to occupy many Ethernet frames. The L3DL content of a
|
|
single Ethernet frame, exclusive of Ethernet framing data, is
|
|
referred to as a Datagram.</t>
|
|
|
|
<t>The L3DL Transport Layer encapsulates each Datagram using a
|
|
common transport header.</t>
|
|
|
|
<t>If a PDU does not fit in a single datagram, it is broken into
|
|
multiple Datagrams and reassembled by the receiver a la <xref
|
|
target="RFC0791"/> Section 2.3 Fragmentation.</t>
|
|
|
|
<t>This is not classic 'fragmentation', but rather decomposition at
|
|
the origin to allow PDU payloads larger than the frame allows.
|
|
There are no intermediate devices capable of further fragmentation
|
|
or reassembly.</t>
|
|
|
|
<t>L3DL is carrying relatively small amounts of data on relatively
|
|
high bandwidth links, and at a time when the link is not active with
|
|
other data as it does not yet have layer three connectivity. So
|
|
congestion is not considered a sufficiently significant risk to
|
|
warrent additional complexity.</t>
|
|
|
|
<t>Should a PDU need to be retransmitted, it MUST BE sent as the
|
|
identical Datagram set as the original transmission. The
|
|
Transmission Sequence Number informs the receiver that it is the
|
|
same PDU.</t>
|
|
|
|
<!--
|
|
protocol "Version:8,Transmission Sequence Number:16,L:1,Datagram Number:23,Datagram Length:16,Checksum:32,Payload...:32"
|
|
-->
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Version | Transmission Sequence Number |L| ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ Datagram Number | Datagram Length |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Checksum |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Payload... |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>The fields of the L3DL Transport Header are as follows:
|
|
<list style="hanging">
|
|
|
|
<t hangText="Version:">Seven-bit Version number of the protocol,
|
|
currently 0. Values other than 0 MUST BE treated as an error.
|
|
The protocol version needs to be in one and only one place, so it
|
|
is in the datagram as opposed to, for example, the PDU header.</t>
|
|
|
|
<t hangText="L:">A bit that set to one if this Datagram is the
|
|
last Datagram of the PDU. For a PDU which fits in
|
|
only one Datagram, it is set to one. Note that this is the
|
|
inverse of the marking technique used by <xref
|
|
target="RFC0791"/>.</t>
|
|
|
|
<t hangText="Transmission Sequence Number:">A 16-bit strictly
|
|
increasing unsigned integer identifying this PDU, possibly across
|
|
retransmissions, that wraps from 2^16-1 to 0. The initial value
|
|
is arbitrary. See <xref target="RFC1982"/> on DNS Serial Number
|
|
Arithmetic for too much detail on comparing and incrementing a
|
|
wrapping sequence number.</t>
|
|
|
|
<t hangText="Datagram Number:">A monotonically increasing 24-bit
|
|
value which starts at zero for each PDU. This is used to
|
|
reassemble frames into PDUs a la <xref target="RFC0791"/> Section
|
|
2.3. Note that this limits an L3DL PDU to 2^24 frames.</t>
|
|
|
|
<t hangText="Datagram Length:">Total number of octets in the
|
|
Datagram including all payloads and fields. Note that this limits
|
|
a datagram to 2^16 octets; though Ethernet framing is likely to
|
|
impose a smaller limit.</t>
|
|
|
|
<t hangText="Checksum:">A 32 bit hash over the Datagram to detect
|
|
bit flips, see <xref target="checksum"/>.</t>
|
|
|
|
<t>If a Datagram fails checksum verification, the datagram is
|
|
invalid and should be silently discarded. The sender will
|
|
retransmit the PDU, and the receiver can assmble it.</t>
|
|
|
|
<t hangText="Payload:">The PDU being transported or a fragment
|
|
thereof.</t>
|
|
|
|
</list></t>
|
|
|
|
<t>To avoid the need for a receiver to reassemble two PDUs at the
|
|
same time, a sender MUST NOT send a subsequent PDU when a PDU is
|
|
already in flight and not yet acknowledged; assuming it is an ACKed
|
|
PDU Type.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="checksum" title="The Checksum">
|
|
|
|
<t>There is a reason conservative folk use a checksum in UDP. And
|
|
as many operators stretch to jumbo frames (over 1,500 octets) longer
|
|
checksums are the prudent approach.</t>
|
|
|
|
<t>For the purpose of computing a checksum, the checksum field
|
|
itself is assumed to be zero.</t>
|
|
|
|
<t>The following code describes a suggested algorithm. This
|
|
specification avoids mandatory to implement, algorithm agility, etc.
|
|
What matters is that the same algorithm is used consistently in any
|
|
deployment.</t>
|
|
|
|
<figure>
|
|
<preamble>Sum up 32-bit unsigned ints in a 64-bit long, then take
|
|
the high-order section, shift it right, rotate, add it in, repeat
|
|
until zero.</preamble>
|
|
|
|
<artwork><![CDATA[
|
|
|
|
<CODE BEGINS>
|
|
#include <stddef.h>
|
|
#include <stdint.h>
|
|
|
|
/* The F table from Skipjack, and it would work for the S-Box. */
|
|
static const uint8_t sbox[256] = {
|
|
0xa3,0xd7,0x09,0x83,0xf8,0x48,0xf6,0xf4,0xb3,0x21,0x15,0x78,
|
|
0x99,0xb1,0xaf,0xf9,0xe7,0x2d,0x4d,0x8a,0xce,0x4c,0xca,0x2e,
|
|
0x52,0x95,0xd9,0x1e,0x4e,0x38,0x44,0x28,0x0a,0xdf,0x02,0xa0,
|
|
0x17,0xf1,0x60,0x68,0x12,0xb7,0x7a,0xc3,0xe9,0xfa,0x3d,0x53,
|
|
0x96,0x84,0x6b,0xba,0xf2,0x63,0x9a,0x19,0x7c,0xae,0xe5,0xf5,
|
|
0xf7,0x16,0x6a,0xa2,0x39,0xb6,0x7b,0x0f,0xc1,0x93,0x81,0x1b,
|
|
0xee,0xb4,0x1a,0xea,0xd0,0x91,0x2f,0xb8,0x55,0xb9,0xda,0x85,
|
|
0x3f,0x41,0xbf,0xe0,0x5a,0x58,0x80,0x5f,0x66,0x0b,0xd8,0x90,
|
|
0x35,0xd5,0xc0,0xa7,0x33,0x06,0x65,0x69,0x45,0x00,0x94,0x56,
|
|
0x6d,0x98,0x9b,0x76,0x97,0xfc,0xb2,0xc2,0xb0,0xfe,0xdb,0x20,
|
|
0xe1,0xeb,0xd6,0xe4,0xdd,0x47,0x4a,0x1d,0x42,0xed,0x9e,0x6e,
|
|
0x49,0x3c,0xcd,0x43,0x27,0xd2,0x07,0xd4,0xde,0xc7,0x67,0x18,
|
|
0x89,0xcb,0x30,0x1f,0x8d,0xc6,0x8f,0xaa,0xc8,0x74,0xdc,0xc9,
|
|
0x5d,0x5c,0x31,0xa4,0x70,0x88,0x61,0x2c,0x9f,0x0d,0x2b,0x87,
|
|
0x50,0x82,0x54,0x64,0x26,0x7d,0x03,0x40,0x34,0x4b,0x1c,0x73,
|
|
0xd1,0xc4,0xfd,0x3b,0xcc,0xfb,0x7f,0xab,0xe6,0x3e,0x5b,0xa5,
|
|
0xad,0x04,0x23,0x9c,0x14,0x51,0x22,0xf0,0x29,0x79,0x71,0x7e,
|
|
0xff,0x8c,0x0e,0xe2,0x0c,0xef,0xbc,0x72,0x75,0x6f,0x37,0xa1,
|
|
0xec,0xd3,0x8e,0x62,0x8b,0x86,0x10,0xe8,0x08,0x77,0x11,0xbe,
|
|
0x92,0x4f,0x24,0xc5,0x32,0x36,0x9d,0xcf,0xf3,0xa6,0xbb,0xac,
|
|
0x5e,0x6c,0xa9,0x13,0x57,0x25,0xb5,0xe3,0xbd,0xa8,0x3a,0x01,
|
|
0x05,0x59,0x2a,0x46
|
|
};
|
|
|
|
/* non-normative example C code, constant time even */
|
|
|
|
uint32_t sbox_checksum_32(const uint8_t *b, const size_t n)
|
|
{
|
|
uint32_t sum[4] = {0, 0, 0, 0};
|
|
uint64_t result = 0;
|
|
for (size_t i = 0; i < n; i++)
|
|
sum[i & 3] += sbox[*b++];
|
|
for (int i = 0; i < sizeof(sum)/sizeof(*sum); i++)
|
|
result = (result << 8) + sum[i];
|
|
result = (result >> 32) + (result & 0xFFFFFFFF);
|
|
result = (result >> 32) + (result & 0xFFFFFFFF);
|
|
return (uint32_t) result;
|
|
}
|
|
<CODE ENDS>
|
|
]]></artwork>
|
|
</figure>
|
|
|
|
</section>
|
|
|
|
<section anchor="tlv" title="TLV PDUs">
|
|
|
|
<t>The basic L3DL application layer PDU is a typical TLV (Type
|
|
Length Value) PDU. It includes a signature to provide optional
|
|
integrity and authentication. It may be broken into multiple
|
|
Datagrams, see <xref target="transport"/>.</t>
|
|
|
|
<!--
|
|
protocol "PDU Type:8,Payload Length:32,Payload ...:24,Sig Type:8,Signature Length:16,Signature:40"
|
|
-->
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| PDU Type | Payload Length ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Payload ... |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Sig Type | Signature Length | ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
|
|
~ Signature ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>The fields of the basic L3DL header are as follows:
|
|
<list style="hanging">
|
|
|
|
<t hangText="PDU Type:">An integer differentiating PDU payload
|
|
types. See <xref target="iana-types"/>.</t>
|
|
|
|
<t hangText="Payload Length:">Total number of octets in the
|
|
Payload field.</t>
|
|
|
|
<t hangText="Payload:">The application layer content of the L3DL
|
|
PDU.</t>
|
|
|
|
<t hangText="Sig Type:">The type of the Signature, see <xref
|
|
target="iana-sigtype"/>. Type 0, a null signature, is defined in
|
|
this document.</t>
|
|
|
|
<t>Sig Type 0 indicates a null Signature. For a trivial PDU such
|
|
as KEEPALIVE, the underlying Datagram checksum may be sufficient
|
|
for integrity, though it lacks authenticity.</t>
|
|
|
|
<t>Other Sig Types may be defined in other documents, cf. <xref
|
|
target="I-D.ymbk-lsvr-l3dl-signing"/>.</t>
|
|
|
|
<t hangText="Signature Length:">The length of the Signature,
|
|
possibly including padding, in octets. If Sig Type is 0,
|
|
Signature Length MUST BE 0.</t>
|
|
|
|
<t hangText="Signature:">The result of running the signature
|
|
algorithm specified in Sig Type over all octets of the PDU except
|
|
for the Signature itself.</t>
|
|
|
|
</list></t>
|
|
|
|
</section>
|
|
|
|
<section anchor="llei" title="Logical Link Endpoint Identifier">
|
|
|
|
<t>L3DL discovers neighbors on logical links and establishes
|
|
sessions between the two ends of all consenting discovered logical
|
|
links. A logical link is described by a pair of Logical Link
|
|
Endpoint Identifiers, LLEIs.</t>
|
|
|
|
<t>An LLEI is a variable length descriptor which could be an ASN, a
|
|
classic RouterID, a catenation of the two, an eight octet ISO System
|
|
Identifier <xref target="RFC1629"/>, or any other identifier unique
|
|
to a single logical link endpoint in the topology.</t>
|
|
|
|
<t>An L3DL deployment will choose and define an LLEI which suits its
|
|
needs, simple or complex. Examples of two extremes follow:</t>
|
|
|
|
<t>A simplistic view of a link between two devices is two ports,
|
|
identified by unique MAC addresses, carrying a layer 3 protocol
|
|
conversation. In this case, the MAC addresses might suffice for the
|
|
LLEIs.</t>
|
|
|
|
<t>Unfortunately, things can get more complex. Multiple VLANs can
|
|
run between those two MAC addresses. In practice, many real devices
|
|
use the same MAC address on multiple ports and/or
|
|
sub-interfaces.</t>
|
|
|
|
<t>Therefore, in the general circumstance, a fully described LLEI
|
|
might be as follows:</t>
|
|
<!--
|
|
protocol "System Identifier:64,ifIndex:32"
|
|
-->
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| |
|
|
+ System Identifier +
|
|
| |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| ifIndex |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>System Identifier, a la <xref target="RFC1629"/>, is an eight
|
|
octet identifier unique in the entire operational space. Routers
|
|
and switches usually have internal MAC Addresses which can be padded
|
|
with high order zeros and used if no System ID exists on the device.
|
|
If no unique identifier is burned into a device, the local L3DL
|
|
configuration SHOULD create and assign a unique one, likely by
|
|
configuration.</t>
|
|
|
|
<t>ifIndex is the SNMP identifier of the (sub-)interface, see <xref
|
|
target="RFC1213"/>. This uniquely identifies the port.</t>
|
|
|
|
<t>For a layer 3 tagged sub-interface or a VLAN/SVI interface,
|
|
Ifindex is that of the logical sub-interface, so no further
|
|
disambiguation is needed.</t>
|
|
|
|
<t>L3DL PDUs learned over VLAN-ports may be interpreted by upper
|
|
layer-3 routing protocols as being learned on the corresponding
|
|
layer-3 SVI interface for the VLAN.</t>
|
|
|
|
<t>LLEIs are big-endian.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="hello" title="HELLO">
|
|
|
|
<t>The HELLO PDU is unique in that it is encapsulated in a multicast
|
|
Ethernet frame. It solicits response(s) from other LLEI(s) on the
|
|
link. See <xref target="dhello"/> for why multicast is used. The
|
|
destination multicast MAC Addressees to be used MUST be one of the
|
|
following, See Clause 9.2.2 of <xref target="IEEE802-2014"/>:
|
|
<list style="hanging"> <?rfc subcompact="yes"?>
|
|
|
|
<t hangText="01-80-C2-00-00-0E:">Nearest Bridge = Propagation
|
|
constrained to a single physical link; stopped by all types of
|
|
bridges (including MPRs (media converters)). This SHOULD BE used
|
|
when the link is known to be a simple point to point link.</t>
|
|
|
|
<t hangText="To Be Assigned:"> When a switch receives a frame with
|
|
a multicast destination MAC it does not recognize, it forwards to
|
|
all ports. This destination MAC is to be sent when the interface
|
|
is known to be connected to a switch. See <xref
|
|
target="ieee"/>. This SHOULD BE used when the link may be a
|
|
multi-point link.</t>
|
|
|
|
<?rfc subcompact="no"?></list></t>
|
|
|
|
<t>All other L3DL PDUs are encapsulated in unicast frames, as the
|
|
peer's destination MAC address is known after the HELLO
|
|
exchange.</t>
|
|
|
|
<t>When an interface is turned up on a device, it SHOULD issue a
|
|
HELLO if it is to participate in L3DL sessions.</t>
|
|
|
|
<t>If a constrained Nearest Bridge destination address has been
|
|
configured for a point-to-point interface, see above, then the HELLO
|
|
SHOULD NOT be repeated once a session has been created by an
|
|
exchange of OPENs.</t>
|
|
|
|
<t>If the configured destination address is one that is propagated
|
|
by switches, the HELLO SHOULD be repeated at a configured interval,
|
|
with a default of 60 seconds. This allows discovery by new devices
|
|
which come up on the layer-2 mesh. In this multi-link scenario, the
|
|
operator should be aware of the trade-off between timer tuning and
|
|
network noise and adjust the inter-HELLO timer accordingly.</t>
|
|
|
|
<!--
|
|
protocol "PDU Type = 0:8,Payload Length = 0:32,Sig Type = 0:8,Signature Length = 0:16"
|
|
-->
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| PDU Type = 0 | Payload Length = 0 ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Sig Type = 0 | Signature Length = 0 |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>If more than one device responds, one adjacency is formed for
|
|
each unique source LLEI response. L3DL treats each adjacency as a
|
|
separate logical link.</t>
|
|
|
|
<t>When a HELLO is received from a source MAC address (plus VID if
|
|
VLAN) with which there is no established L3DL session, the receiver
|
|
SHOULD respond by sending an OPEN PDU to the source MAC address
|
|
(plus VID). The two devices establish an L3DL session by exchanging
|
|
OPEN PDUs.</t>
|
|
|
|
<t>To ameliorate possible load spikes during bootstrap or event
|
|
recovery, there SHOULD be a jittered delay between receipt of a
|
|
HELLO and issue of the OPEN. The default delay range SHOULD BE zero
|
|
to five seconds, and MUST be configurable.</t>
|
|
|
|
<t>If a HELLO is received from a MAC address with which there is an
|
|
established session, the HELLO should be dropped.</t>
|
|
|
|
<t>The Payload Length is zero as there is no payload.</t>
|
|
|
|
<t>HELLO PDUs can not be signed as keying material has yet to be
|
|
exchanged. Hence the signature MUST always be the null type.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="open" title="OPEN">
|
|
|
|
<t>Each device has learned the other's MAC Address from the HELLO
|
|
exchange, see <xref target="hello"/>. Therefore the OPEN and all
|
|
subsequent PDUs MUST BE unicast, as opposed to the HELLO's multicast
|
|
frame.</t>
|
|
|
|
<!--
|
|
protocol "PDU Type = 1:8,Payload Length:32,Nonce:32,LLEI Length:8,My LLEI:32,AttrCount:8,Attribute List ...:24,Auth Type:8,Key Length:16,Key ...:24,Serial Number:32,Sig Type:8,Signature Length:16,Signature ...:8"
|
|
q-->
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| PDU Type = 1 | Payload Length ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Nonce ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | LLEI Length | My LLEI |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-~
|
|
~ | AttrCount | ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ Attribute List ... | Auth Type | Key Length ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Key ... |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Serial Number |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Sig Type | Signature Length | Signature ... |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>The Payload Length is the number of octets in all fields of the
|
|
PDU from the Nonce through the Serial Number, not including the
|
|
three final signature fields.</t>
|
|
|
|
<t>The Nonce enables detection of a duplicate OPEN PDU. It SHOULD
|
|
be either a random number or a high resolution timestamp. It is
|
|
needed to prevent session closure due to a repeated OPEN caused by a
|
|
race or a dropped or delayed ACK.</t>
|
|
|
|
<t>My LLEI is the sender's LLEI, see <xref target="llei"/>.</t>
|
|
|
|
<t>AttrCount is the number of attributes in the Attribute List.
|
|
Attributes are single octets the semantics of which are
|
|
operator-defined.</t>
|
|
|
|
<t>A node may have zero or more operator-defined attributes, e.g.:
|
|
spine, leaf, backbone, route reflector, arabica, ...</t>
|
|
|
|
<t>Attribute syntax and semantics are local to an operator or
|
|
datacenter; hence there is no global registry. Nodes exchange
|
|
their attributes only in the OPEN PDU.</t>
|
|
|
|
<t>Auth Type is the Signature algorithm suite, see <xref
|
|
target="tlv"/>.</t>
|
|
|
|
<t>Key Length is a 16-bit field denoting the length in octets of the
|
|
Key itself, not including the Auth Type or the Key Length. If the
|
|
Auth Type is zero, then the Key Length MUST also be zero, and there
|
|
MUST BE no Key data.</t>
|
|
|
|
<t>The Key is specific to the operational environment. A failure to
|
|
authenticate is a failure to start the L3DL session, an ERROR PDU
|
|
MUST BE sent (Error Code 3), and HELLOs MUST be restarted.</t>
|
|
|
|
<t>The Serial Number is that of the last received and processed PDU.
|
|
This allows a receiver sending an OPEN to tell the sender that the
|
|
receiver wants to resume a session and the sender only needs to send
|
|
data more recent than the Serial Number. If this OPEN is not trying
|
|
to restart a lost session, the Serial Number MUST BE set to
|
|
zero.</t>
|
|
|
|
<t>The Signature fields are described in <xref target="tlv"/> and in
|
|
an asymmetric key environment serve as a proof of possession of the
|
|
signing auth data by the sender.</t>
|
|
|
|
<t>Once two logical link endpoints know each other, and have ACKed
|
|
each other's OPEN PDUs, Layer 2 KEEPALIVEs (see <xref
|
|
target="keepalive"/>) MAY be started to ensure Layer 2 liveness and
|
|
keep the session semantics alive. The timing and acceptable drop of
|
|
KEEPALIVE PDUs are discussed in <xref target="keepalive"/>.</t>
|
|
|
|
<t>If a sender of OPEN does not receive an ACK of the OPEN PDU, then
|
|
they MUST resend the same OPEN PDU, with the same Nonce. Resending
|
|
an unacknowledged OPEN PDU, like other ACKed PDUs, SHOULD use
|
|
exponential back-off, see <xref target="RFC1122"/>.</t>
|
|
|
|
<t>If a properly authenticated OPEN arrives with a new Nonce from an
|
|
LLEI with which the receiving logical link endpoint believes it
|
|
already has an L3DL session (OPENs have already been exchanged), and
|
|
the Serial Number in the OPEN PDU is non-zero, the receiver SHOULD
|
|
establish a new session by sending an OPEN with the Serial Number
|
|
being the same as that of the last sent and ACKed PDU. Each party
|
|
MUST resume sending encapsulations etc. subsequent to the other
|
|
party's Sequence Number. And each MUST retain all previously
|
|
discovered encapsulation and other data.</t>
|
|
|
|
<t>If a properly authenticated OPEN arrives with a new Nonce from an
|
|
LLEI with which the receiving logical link endpoint believes it
|
|
already has an L3DL session (OPENs have already been exchanged), and
|
|
the Serial Number in the OPEN is zero, then the receiver MUST assume
|
|
that the sending LLEI or entire device has been reset. All
|
|
previously discovered encapsulation data MUST NOT be kept and MUST
|
|
BE withdrawn via the BGP-LS API and the recipient MUST respond with
|
|
a new OPEN.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="ack" title="ACK">
|
|
|
|
<!--
|
|
protocol "PDU Type = 3:8,Payload Length = 5:32,ACKed PDU:8,EType:4,Error Code:12,Error Hint:16,Sig Type:8,Signature Length:16,Signature ...:24"
|
|
-->
|
|
|
|
<t>The ACK PDU acknowledges receipt of a PDU and reports any error
|
|
condition which might have been raised.</t>
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| PDU Type = 3 | Payload Length = 5 ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | ACKed PDU | EType | Error Code |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Error Hint | Sig Type |Signature Leng.~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Signature ... |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>The ACK acknowledges receipt of an OPEN, Encapsulation, VENDOR
|
|
PDU, etc.</t>
|
|
|
|
<t>The ACKed PDU is the PDU Type of the PDU being acknowledged,
|
|
e.g., OPEN, one of the Encapsulations, etc.</t>
|
|
|
|
<t>If there was an error processing the received PDU, then the EType
|
|
is non-zero. If the EType is zero, Error Code and Error Hint MUST
|
|
also be zero.</t>
|
|
|
|
<t>A non-zero EType is the receiver's way of telling the PDU's
|
|
sender that the receiver had problems processing the PDU. The Error
|
|
Code and Error Hint will tell the sender more detail about the
|
|
error.</t>
|
|
|
|
<t>The decimal value of EType gives a strong hint how the receiver
|
|
sending the ACK believes things should proceed:
|
|
<list style="empty">
|
|
<?rfc subcompact="yes"?>
|
|
<t>0 - No Error, Error Code and Error Hint MUST be zero</t>
|
|
<t>1 - Warning, something not too serious happened, continue</t>
|
|
<t>2 - Session should not be continued, try to restart</t>
|
|
<t>3 - Restart is hopeless, call the operator</t>
|
|
<t>4-15 - Reserved</t>
|
|
<?rfc subcompact="no"?>
|
|
</list></t>
|
|
|
|
<t>The Error Codes, noting protocol failures, are listed in <xref
|
|
target="iana-error"/>. Someone stuck in the 1990s might think the
|
|
catenation of EType and Error Code as an echo of 0x1zzz, 0x2zzz,
|
|
etc. They might be right; or not.</t>
|
|
|
|
<t>The Error Hint, an arbitrary 16 bits, is any additional data the
|
|
sender of the error PDU thinks will help the recipient or the
|
|
debugger with the particular error.</t>
|
|
|
|
<t>The Signature fields are described in <xref target="tlv"/>.</t>
|
|
|
|
<section anchor="retrans" title="Retransmission">
|
|
|
|
<t>If a PDU sender expects an ACK, e.g. for an OPEN, an
|
|
Encapsulation, a VENDOR PDU, etc., and does not receive the ACK
|
|
for a configurable time (default one second), and the interface is
|
|
live at layer 2, the sender resends the PDU using exponential
|
|
back-off, see <xref target="RFC1122"/>. This cycle MAY be
|
|
repeated a configurable number of times (default three) before it
|
|
is considered a failure. The session MAY BE considered closed
|
|
this in case of this ACK failure.</t>
|
|
|
|
<t>If the link is broken at layer 2, retransmission MAY BE retried
|
|
when the link is restored.</t>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section anchor="afisafi" title="The Encapsulations">
|
|
|
|
<t>Once the devices know each other's LLEIs, know each other's upper
|
|
layer (L2.5 and L3) identities, have means to ensure link state,
|
|
etc., the L3DL session is considered established, and the devices
|
|
SHOULD exchange L3 interface encapsulations, L3 addresses, and L2.5
|
|
labels.</t>
|
|
|
|
<t>The Encapsulation types the peers exchange may be IPv4 (<xref
|
|
target="ipv4"/>), IPv6 (<xref target="ipv6"/>), MPLS IPv4 (<xref
|
|
target="mpls4"/>), MPLS IPv6 (<xref target="mpls6"/>), and/or
|
|
possibly others not defined here.</t>
|
|
|
|
<t>The sender of an Encapsulation PDU MUST NOT assume that the peer
|
|
is capable of the same Encapsulation Type. An ACK (<xref
|
|
target="ack"/>) merely acknowledges receipt. Only if both peers
|
|
have sent the same Encapsulation Type is it safe for Layer 3
|
|
protocols to assume that they are compatible for that type.</t>
|
|
|
|
<t>A receiver of an encapsulation might recognize an addressing
|
|
conflict, such as both ends of the link trying to use the same
|
|
address. In this case, the receiver SHOULD respond with an error
|
|
(Error Code 2) ACK. As there may be other usable addresses or
|
|
encapsulations, this error might log and continue, letting an upper
|
|
layer topology builder deal with what works.</t>
|
|
|
|
<t>Further, to consider a logical link of a type to formally be
|
|
established so that it may be pushed up to upper layer protocols,
|
|
the addressing for the type must be compatible, e.g. on the same
|
|
IP subnet.</t>
|
|
|
|
<section anchor="encaps" title="The Encapsulation PDU Skeleton">
|
|
|
|
<t>The header for all encapsulation PDUs is as follows:</t>
|
|
|
|
<!--
|
|
protocol "PDU Type:8,Payload Length:32,Count:24,Serial Number:32,Encapsulation List...:24,Sig Type:8,Signature Length:16,Signature ...:16"
|
|
-->
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| PDU Type | Payload Length ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Count |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Serial Number |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Encapsulation List... | Sig Type |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Signature Length | Signature ... |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>An Encapsulation PDU describes zero or more addresses of the
|
|
encapsulation type.</t>
|
|
|
|
<t>The 24-bit Count is the number of Encapsulations in the
|
|
Encapsulation list.</t>
|
|
|
|
<t>The Serial Number is a monotonically increasing 32-bit value
|
|
representing the sender's state in time. It may be an integer, a
|
|
timestamp, etc. On session restart (new OPEN), a receiver MAY
|
|
send the last received Session Number to tell the sender to only
|
|
send newer data.</t>
|
|
|
|
<t>If a sender has multiple links on the same interface, separate
|
|
state: data, ACKs, etc. must be kept for each peer session.</t>
|
|
|
|
<t>Over time, multiple Encapsulation PDUs may be sent for an
|
|
interface as configuration changes.</t>
|
|
|
|
<t>If the length of an Encapsulation PDU exceeds the Datagram size
|
|
limit on media, the PDU is broken into multiple Datagrams. See
|
|
<xref target="tlv"/>.</t>
|
|
|
|
<t>The Signature fields are described in <xref target="tlv"/>.</t>
|
|
|
|
<t>The Receiver MUST acknowledge the Encapsulation PDU with a
|
|
Type=3, ACK PDU (<xref target="ack"/>) with the Encapsulation Type
|
|
being that of the encapsulation being announced, see <xref
|
|
target="ack"/>.</t>
|
|
|
|
<t>If the Sender does not receive an ACK in a configurable
|
|
interval (default one second), and the interface is live at layer
|
|
2, they SHOULD retransmit. After a user configurable number of
|
|
failures (default three), the L3DL session should be considered
|
|
dead and the OPEN process SHOULD be restarted.</t>
|
|
|
|
<t>If the link is broken at layer 2, retransmission MAY BE retried
|
|
if data have not changed in the interim.</t>
|
|
|
|
</section>
|
|
<section anchor="eflags" title="Encapsulaion Flags">
|
|
|
|
<t>The Encapsulation Flags are a sequence of bit fields as
|
|
follows:</t>
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3 4 ... 7
|
|
+------------+------------+------------+------------+------------+
|
|
| Ann/With | Primary | Under/Over | Loopback | Reserved ..|
|
|
+------------+------------+------------+------------+------------+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>Each encapsulation in an Encapsulation PDU of Type T may
|
|
announce new and/or withdraw old encapsulations of Type T. It
|
|
indicates this with the Ann/With Encapsulation Flag, Announce ==
|
|
1, Withdraw == 0.</t>
|
|
|
|
<t>Each Encapsulation interface address in an Encapsulation PDU is
|
|
either a new encapsulation be announced (Ann/With == 1) (yes, a la
|
|
BGP) or requests one be withdrawn (Ann/With == 0). Adding an
|
|
encapsulation which already exists SHOULD raise an
|
|
Announce/Withdraw Error (see <xref target="iana-error"/>); the EType
|
|
SHOULD be 2, suggesting a session restart (see <xref
|
|
target="ack"/> so all encapsulations will be resent.</t>
|
|
|
|
<t>If an LLEI has multiple addresses for an encapsulation type,
|
|
one and only one address MAY be marked as primary (Primary Flag ==
|
|
1) for that Encapsulation Type.</t>
|
|
|
|
<t>An Encapsulation interface address in an Encapsulation PDU MAY
|
|
be marked as a loopback, in which case the Loopback bit is set.
|
|
Loopback addresses are generally not seen directly on an external
|
|
interface. One or more loopback addresses MAY be exposed by
|
|
configuration on one or more L3DL speaking external interfaces,
|
|
e.g. for iBGP peering. They SHOULD be marked as such, Loopback
|
|
Flag == 1.</t>
|
|
|
|
<t>Each Encapsulation interface address in an Encapsulation PDU is
|
|
that of the direct 'underlay interface (Under/Over == 1), or an
|
|
'overlay' address (Under/Over == 0), likely that of a VM or
|
|
container guest bridged or configured on to the interface already
|
|
having an underlay address.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="ipv4" title="IPv4 Encapsulation">
|
|
|
|
<t>The IPv4 Encapsulation describes a device's ability to exchange
|
|
IPv4 packets on one or more subnets. It does so by stating the
|
|
interface's addresses and the corresponding prefix lengths.</t>
|
|
|
|
<!--
|
|
protocol "PDU Type = 4:8,Payload Length:32,Count:24,Serial Number:32,Encaps Flags:8,IPv4 Address:32,PrefixLen:8,more ...:8,Sig Type:8,Signature Length:16,Signature ...:16"
|
|
-->
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| PDU Type = 4 | Payload Length ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Count |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Serial Number |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Encaps Flags | IPv4 Address ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | PrefixLen | more ... | Sig Type |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Signature Length | Signature ... |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>The 24-bit Count is the sum of the number of IPv4
|
|
Encapsulations being announced and/or withdrawn.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="ipv6" title="IPv6 Encapsulation">
|
|
|
|
<t>The IPv6 Encapsulation describes a logical link's ability to
|
|
exchange IPv6 packets on one or more subnets. It does so by
|
|
stating the interface's addresses and the corresponding prefix
|
|
lengths.</t>
|
|
|
|
<!--
|
|
protocol "PDU Type = 5:8,Payload Length:32,Count:24,Serial Number:32,Encaps Flags:8,IPv6 Address:128,PrefixLen:8,more ...:8,Sig Type:8,Signature Length:16,Signature ...:16"
|
|
-->
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| PDU Type = 5 | Payload Length ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Count |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Serial Number |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Encaps Flags | |
|
|
+-+-+-+-+-+-+-+-+ +
|
|
| |
|
|
+ +
|
|
| |
|
|
+ +
|
|
| IPv6 Address |
|
|
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| | PrefixLen | more ... | Sig Type |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Signature Length | Signature ... |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>The 24-bit Count is the sum of the number of IPv6
|
|
Encapsulations being announced and/or withdrawn.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="mplslist" title="MPLS Label List">
|
|
|
|
<t>As an MPLS enabled interface may have a label stack, see <xref
|
|
target="RFC3032"/>, a variable length list of labels is needed.
|
|
These are the labels the sender will accept for the prefix to
|
|
which the list is attached.</t>
|
|
|
|
<!--
|
|
protocol "Label Count:8,Label:20,Exp:3,S:1,Label:20,Exp:3,S:1,more ...:8"
|
|
-->
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Label Count | Label | Exp |S|
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Label | Exp |S| more ... |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>A Label Count of zero is an implicit withdraw of all labels for
|
|
that prefix on that interface.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="mpls4" title="MPLS IPv4 Encapsulation">
|
|
|
|
<t>The MPLS IPv4 Encapsulation describes a logical link's ability
|
|
to exchange labeled IPv4 packets on one or more subnets. It does
|
|
so by stating the interface's addresses the corresponding prefix
|
|
lengths, and the corresponding labels which will be accepted fpr
|
|
each address.</t>
|
|
|
|
<!--
|
|
protocol "PDU Type = 6:8,Payload Length:32,Count:24,Serial Number:32,Encaps Flags:8,MPLS Label List ...:16,IPv4 Address:32,PrefixLen:8,more ...:8,Sig Type:8,Signature Length:16,Signature:32"
|
|
-->
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| PDU Type = 6 | Payload Length ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Count |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Serial Number |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Encaps Flags | MPLS Label List ... | ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ IPv4 Address | PrefixLen |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| more ... | Sig Type | Signature Length |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Signature |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>The 24-bit Count is the sum of the number of MPLSv4
|
|
Encapsulation being announced and/or withdrawns.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="mpls6" title="MPLS IPv6 Encapsulation">
|
|
|
|
<t>The MPLS IPv6 Encapsulation describes a logical link's ability
|
|
to exchange labeled IPv6 packets on one or more subnets. It does
|
|
so by stating the interface's addresses, the corresponding prefix
|
|
lengths, and the corresponding labels which will be accepted for
|
|
each address.</t>
|
|
<!--
|
|
protocol "PDU Type = 7:8,Payload Length:32,Count:24,Serial Number:32,Encaps Flags:8,MPLS Label List ...:16,IPv6 Address:128,Prefix Len:8,more ...:8,Sig Type:8,Signature Length:16,Signature ...:32"
|
|
-->
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| PDU Type = 7 | Payload Length ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Count |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Serial Number |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Encaps Flags | MPLS Label List ... | |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +
|
|
| |
|
|
+ +
|
|
| |
|
|
+ +
|
|
| IPv6 Address |
|
|
+ +-+-+-+-+-+-+-+-+
|
|
| | Prefix Len |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| more ... | Sig Type | Signature Length |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Signature ... |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>The 24-bit Count is the sum of the number of MPLSv6
|
|
Encapsulations being announced and/or withdrawn.</t>
|
|
|
|
</section>
|
|
</section>
|
|
|
|
<section anchor="vendor" title="VENDOR - Vendor Extensions">
|
|
|
|
<!--
|
|
protocol "PDU Type = 255:8,Payload Length:32,Serial Number:32,Enterprise Number:24,Ent Type:8,Enterprise Data ...:32,Sig Type:8,Signature Length:16,Signature ...:32"
|
|
-->
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| PDU Type = 255| Payload Length ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Serial Number ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Enterprise Number |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Ent Type | Enterprise Data ... ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Sig Type | Signature Length |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| Signature ... |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>Vendors or enterprises may define TLVs beyond the scope of L3DL
|
|
standards. This is done using a Private Enterprise Number <xref
|
|
target="IANA-PEN"/> followed by Enterprise Data in a format
|
|
defined for that Enterprise Number and Ent Type.</t>
|
|
|
|
<t>Ent Type allows a VENDOR PDU to be sub-typed in the event that
|
|
the vendor/enterprise needs multiple PDU types.</t>
|
|
|
|
<t>As with Encapsulation PDUs, a receiver of a VENDOR PDU MUST
|
|
respond with an ACK or an ERROR PDU. Similarly, a VENDOR PDU MUST
|
|
only be sent over an open session.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="keepalive" title="KEEPALIVE - Layer 2 Liveness">
|
|
|
|
<!--
|
|
protocol "PDU Type = 2:8,Payload Length = 0:32,Sig Type = 0:8,Signature Length = 0:16"
|
|
-->
|
|
|
|
<figure>
|
|
<artwork>
|
|
0 1 2 3
|
|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
| PDU Type = 2 | Payload Length = 0 ~
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
~ | Sig Type = 0 | Signature Length = 0 |
|
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
|
</artwork>
|
|
</figure>
|
|
|
|
<t>L3DL devices SHOULD beacon frequent Layer 2 KEEPALIVE PDUs to
|
|
ensure session continuity. The inter-KEEPALIVE interval is
|
|
configurable, with a default of ten seconds. A receiver may choose
|
|
to ignore KEEPALIVE PDUs.</t>
|
|
|
|
<t>An operational deployment MUST BE configured whether to use
|
|
KEEPALIVEs or not, either globally, or as finely as to per-link
|
|
granularity. Disagreement MAY result in repeated session failure
|
|
and reestablishment.</t>
|
|
|
|
<t>KEEPALIVEs SHOULD be beaconed at a configured frequency. One per
|
|
second is the default. Layer 3 liveness, such as BFD, may be more
|
|
(or less) aggressive.</t>
|
|
|
|
<t>When a sender transmits a PDU which is not a KEEPALIVE, the
|
|
sender SHOULD reset the KEEPALIVE timer. I.e. sending any PDU acts
|
|
as a keepalive. Once the last fragment has been sent, the
|
|
KEEPALIVE timer SHOULD BE restarted. Do not wait for the ACK.</t>
|
|
|
|
<t>If a KEEPALIVE or other PDUs have not been received from a peer
|
|
with which a receiver has an open session for a configurable time
|
|
(default 30 seconds), the link SHOULD BE presumed down. The devices
|
|
MAY keep configuration state and restore it without retransmission
|
|
if no data have changed. Otherwise, a new session SHOULD BE
|
|
established and new Encapsulation PDUs exchanged.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="l3liveness" title="Layers 2.5 and 3 Liveness">
|
|
|
|
<t>Layer 2 liveness may be continuously tested by KEEPALIVE PDUs,
|
|
see <xref target="keepalive"/>. As layer 2.5 or layer 3
|
|
connectivity could still break, liveness above layer 2 MAY be
|
|
frequently tested using BFD (<xref target="RFC5880"/>) or a similar
|
|
technique.</t>
|
|
|
|
<t>This protocol assumes that one or more Encapsulation addresses
|
|
may be used to ping, run BFD, or whatever the operator
|
|
configures.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="northsouth" title="The North/South Protocol">
|
|
|
|
<t>Thus far, a one-hop point-to-point logical link discovery
|
|
protocol has been defined.</t>
|
|
|
|
<t>The devices know their unique LLEIs and know the unique peer
|
|
LLEIs and Encapsulations on each logical link interface.</t>
|
|
|
|
<t>Full topology discovery is not appropriate at the L3DL layer, so
|
|
Dijkstra a la IS-IS etc. is assumed to be done by higher level
|
|
protocols such as BGP-SPF.</t>
|
|
|
|
<t>Therefore the LLEIs, link Encapsulations, and state changes are
|
|
pushed North via a small subset of the BGP-LS API. The upper layer
|
|
routing protocol(s), e.g. BGP-SPF, learn and maintain the topology,
|
|
run Dijkstra, and build the routing database(s).</t>
|
|
|
|
<t>For example, if a neighbor's IPv4 Encapsulation address changes,
|
|
the devices seeing the change push that change Northbound.</t>
|
|
|
|
<section anchor="ls" title="Use BGP-LS as Much as Possible">
|
|
|
|
<t>BGP-LS <xref target="RFC7752"/> defines BGP-like Datagrams
|
|
describing logical link state (links, nodes, link prefixes, and
|
|
many other things), and a new BGP path attribute providing
|
|
Northbound transport, all of which can be ingested by upper layer
|
|
protocols such as BGP-SPF; see Section 4 of <xref
|
|
target="I-D.ietf-lsvr-bgp-spf"/>.</t>
|
|
|
|
<t>For IPv4 links, TLVs 259 and 260 are used. For IPv6 links,
|
|
TLVs 261 and 262. If there are multiple addresses on a link,
|
|
multiple TLV pairs are pushed North, having the same ID pairs.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="ls-ext" title="Extensions to BGP-LS">
|
|
|
|
<t>The Northbound protocol needs a few minor extensions to BGP-LS.
|
|
Luckily, others have needed the same extensions.</t>
|
|
|
|
<t>Similarly to BGP-SPF, the BGP protocol is used in the
|
|
Protocol-ID field specified in table 1 of <xref
|
|
target="I-D.ietf-idr-bgpls-segment-routing-epe"/>. The local and
|
|
remote node descriptors for all NLRI are the IDs described in
|
|
<xref target="open"/>. This is equivalent to an adjacency SID or
|
|
a node SID if the address is a loopback address.</t>
|
|
|
|
<t>Label Sub-TLVs from <xref
|
|
target="I-D.ietf-idr-bgp-ls-segment-routing-ext"/> Section 2.1.1,
|
|
are used to associate one or more MPLS Labels with a link.</t>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section anchor="discuss" title="Discussion">
|
|
|
|
<t>This section explores some trade-offs taken and some
|
|
considerations.</t>
|
|
|
|
<section anchor="dhello" title="HELLO Discussion">
|
|
|
|
<t>A device with multiple Layer 2 interfaces, traditionally called
|
|
a switch, may be used to forward frames and therefore packets from
|
|
multiple devices to one logical interface (LLEI), I, on an L3DL
|
|
speaking device. Interface I could discover a peer J across the
|
|
switch. Later, a prospective peer K could come up across the
|
|
switch. If I was not still sending and listening for HELLOs, the
|
|
potential peering with K could not be discovered. Therefore, on
|
|
multi-link interfaces, L3DL MUST continue to send HELLOs as long
|
|
as they are turned up.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="dkeepalive" title="HELLO versus KEEPALIVE">
|
|
|
|
<t>Both HELLO and KEEPALIVE are periodic. KEEPALIVE might be
|
|
eliminated in favor of keeping only HELLOs. But KEEPALIVEs are
|
|
unicast, and thus less noisy on the network, especially if HELLO
|
|
is configured to transit layer-2-only switches, see <xref
|
|
target="dhello"/>.</t>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section anchor="vlans" title="VLANs/SVIs/Sub-interfaces">
|
|
|
|
<t>One can think of the protocol as an instance (i.e. state machine)
|
|
which runs on each logical link of a device.</t>
|
|
|
|
<t>As the upper routing layer must view VLAN topologies as separate
|
|
graphs, L3DL treats VLAN ports as separate links.</t>
|
|
|
|
<t>L3DL PDUs learned over VLAN-ports may be interpreted by upper
|
|
layer-3 routing protocols as being learned on the corresponding
|
|
layer-3 SVI interface for the VLAN.</t>
|
|
|
|
<t>As Sub-Interfaces each have their own LLIEs, they act as separate
|
|
interfaces, forming their own links.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="impl" title="Implementation Considerations">
|
|
|
|
<t>An implementation SHOULD provide the ability to configure each
|
|
logical interface as L3DL speaking or not.</t>
|
|
|
|
<t>An implementation SHOULD provide the ability to configure whether
|
|
HELLOs on an L3DL enabled interface send Nearest Bridge or the MAC
|
|
which is propagated by switches from that interface; see <xref
|
|
target="hello"/>.</t>
|
|
|
|
<t>An implementation SHOULD provide the ability to distribute one or
|
|
more loopback addresses or interfaces into L3DL on an external L3DL
|
|
speaking interface.</t>
|
|
|
|
<t>An implementation SHOULD provide the ability to distribute one or
|
|
more overlay and/or underlay addresses or interfaces into L3DL on an
|
|
external L3DL speaking interface.</t>
|
|
|
|
<t>An implementation SHOULD provide the ability to configure one of
|
|
the addresses of an encapsulation as primary on an L3DL speaking
|
|
interface. If there is only one address for a particular
|
|
encapsulation, the implementation MAY mark it as primary by
|
|
default.</t>
|
|
|
|
<t>An implementation MAY allow optional configuration which updates
|
|
the local forwarding table with overlay and underlay data both
|
|
learned from L3DL peers and configured locally.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="security" title="Security Considerations">
|
|
|
|
<t>The protocol as is MUST NOT be used outside a datacenter or
|
|
similarly closed environment without authentication ans
|
|
authorisation mechanisms such as <xref
|
|
target="I-D.ymbk-lsvr-l3dl-signing"/>.</t>
|
|
|
|
<t>Many MDC operators have a strange belief that physical walls and
|
|
firewalls provide sufficient security. This is not credible. All
|
|
MDC protocols need to be examined for exposure and attack surface.
|
|
In the case of L3DL, Authentication and Integrity as provided in
|
|
<xref target="I-D.ymbk-lsvr-l3dl-signing"/> is strongly recommended.</t>
|
|
|
|
<t>It is generally unwise to assume that on the wire Layer 2 is
|
|
secure. Strange/unauthorized devices may plug into a port.
|
|
Mis-wiring is very common in datacenter installations. A poisoned
|
|
laptop might be plugged into a device's port, form malicious
|
|
sessions, etc. to divert, intercept, or drop traffic.</t>
|
|
|
|
<t>Similarly, malicious nodes/devices could mis-announce
|
|
addressing.</t>
|
|
|
|
<t>If OPENs are not being authenticated, an attacker could forge an
|
|
OPEN for an existing session and cause the session to be reset.</t>
|
|
|
|
<t>For these reasons, the OPEN PDU's authentication data exchange
|
|
SHOULD be used.</t>
|
|
|
|
<t>If the KEEPALIVE PDU is not signed (as suggested in <xref
|
|
target="tlv"/>) to save computation, then a MITM could fake a
|
|
session being alive.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="iana" title="IANA Considerations">
|
|
|
|
<section anchor="iana-types" title="PDU Types">
|
|
|
|
<t>This document requests the IANA create a registry for L3DL PDU
|
|
Type, which may range from 0 to 255. The name of the registry
|
|
should be L3DL-PDU-Type. The policy for adding to the registry is
|
|
RFC Required per <xref target="RFC5226"/>, either standards track or
|
|
experimental. The initial entries should be the following:</t>
|
|
<figure>
|
|
<artwork>
|
|
PDU
|
|
Code PDU Name
|
|
---- -------------------
|
|
0 HELLO
|
|
1 OPEN
|
|
2 KEEPALIVE
|
|
3 ACK
|
|
4 IPv4 Announcement
|
|
5 IPv6 Announcement
|
|
6 MPLS IPv4 Announcement
|
|
7 MPLS IPv6 Announcement
|
|
8-254 Reserved
|
|
255 VENDOR
|
|
</artwork>
|
|
</figure>
|
|
|
|
</section>
|
|
|
|
<section anchor="iana-sigtype" title="Signature Type">
|
|
|
|
<t>This document requests the IANA create a registry for L3DL
|
|
Signature Type, AKA Sig Type, which may range from 0 to 255. The
|
|
name of the registry should be L3DL-Signature-Type. The policy for
|
|
adding to the registry is RFC Required per <xref target="RFC5226"/>,
|
|
either standards track or experimental. The initial entries should
|
|
be the following:</t>
|
|
<figure>
|
|
<artwork>
|
|
Number Name
|
|
------ -------------------
|
|
0 Null
|
|
1-255 Reserved
|
|
</artwork>
|
|
</figure>
|
|
|
|
</section>
|
|
|
|
<section anchor="iana-flags" title="Flag Bits">
|
|
|
|
<t>This document requests the IANA create a registry for L3DL PL
|
|
Flag Bits, which may range from 0 to 7. The name of the registry
|
|
should be L3DL-PL-Flag-Bits. The policy for adding to the registry is
|
|
RFC Required per <xref target="RFC5226"/>, either standards track or
|
|
experimental. The initial entries should be the following:</t>
|
|
<figure>
|
|
<artwork>
|
|
Bit Bit Name
|
|
---- -------------------
|
|
0 Announce/Withdraw (ann == 0)
|
|
1 Primary
|
|
2 Underlay/Overlay (under == 0)
|
|
3 Loopback
|
|
4-7 Reserved
|
|
</artwork>
|
|
</figure>
|
|
|
|
</section>
|
|
|
|
<section anchor="iana-error" title="Error Codes">
|
|
|
|
<t>This document requests the IANA create a registry for L3DL Error
|
|
Codes, a 16 bit integer. The name of the registry should be
|
|
L3DL-Error-Codes. The policy for adding to the registry is RFC
|
|
Required per <xref target="RFC5226"/>, either standards track or
|
|
experimental. The initial entries should be the following:</t>
|
|
<figure>
|
|
<artwork>
|
|
Error
|
|
Code Error Name
|
|
---- -------------------
|
|
0 No Error
|
|
1 Checksum Error
|
|
2 Logical Link Addressing Conflict
|
|
3 Authorization Failure
|
|
4 Announce/Withdraw Error
|
|
</artwork>
|
|
</figure>
|
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
<section anchor="ieee" title="IEEE Considerations">
|
|
|
|
<t>This document requires a new EtherType.</t>
|
|
|
|
<t>This document requires a new multicast MAC address that will be
|
|
broadcast through a switch.</t>
|
|
|
|
</section>
|
|
|
|
<section anchor="acks" title="Acknowledgments">
|
|
|
|
<t>The authors thank Cristel Pelsser for multiple reviews, Harsha
|
|
Kovuru for comments during implementation, Jeff Haas for review and
|
|
comments, Jörg Ott for an early but deep transport review, Joe
|
|
Clarke for a useful review, John Scudder for deeply serious review
|
|
and comments, Larry Kreeger for a lot of layer 2 clue, Martijn
|
|
Schmidt for his contribution, Nalinaksh Pai for transport
|
|
discussions, Neeraj Malhotra for review, Paul Congdon for Ethernet
|
|
hints, Russ Housley for checksum discussion and sBox, and Steve
|
|
Bellovin for checksum advice.</t>
|
|
|
|
</section>
|
|
|
|
</middle>
|
|
|
|
<back>
|
|
|
|
<references title="Normative References">
|
|
<?rfc include="reference.RFC.1213"?>
|
|
<?rfc include="reference.RFC.1629"?>
|
|
<?rfc include="reference.RFC.2119"?>
|
|
<?rfc include="reference.RFC.3032"?>
|
|
<?rfc include="reference.RFC.4271"?>
|
|
<?rfc include="reference.I-D.ietf-lsvr-bgp-spf"?>
|
|
<?rfc include="reference.RFC.5226"?>
|
|
<?rfc include="reference.RFC.5880"?>
|
|
<?rfc include="reference.RFC.6286"?>
|
|
<?rfc include="reference.RFC.7752"?>
|
|
<?rfc include="reference.RFC.8174"?>
|
|
<?rfc include="reference.I-D.ietf-idr-bgpls-segment-routing-epe"?>
|
|
<?rfc include="reference.I-D.ietf-idr-bgp-ls-segment-routing-ext"?>
|
|
<?rfc include="reference.I-D.ymbk-lsvr-l3dl-signing"?>
|
|
<reference anchor="IANA-PEN" target="https://www.iana.org/assignments/enterprise-numbers/enterprise-numbers">
|
|
<front>
|
|
<title>IANA Private Enterprise Numbers</title>
|
|
<author/>
|
|
<date/>
|
|
</front>
|
|
</reference>
|
|
<reference anchor="IEEE.802_2001"
|
|
target="http://ieeexplore.ieee.org/servlet/opac?punumber=7732">
|
|
<front>
|
|
<title>IEEE Standard for Local and Metropolitan Area Networks:
|
|
Overview and Architecture
|
|
</title>
|
|
<author>
|
|
<organization>IEEE</organization>
|
|
</author>
|
|
<date day="27" month="July" year="2002"/>
|
|
</front>
|
|
<seriesInfo name="IEEE" value="802-2001"/>
|
|
<seriesInfo name="DOI" value="10.1109/ieeestd.2002.93395"/>
|
|
</reference>
|
|
<reference anchor="IEEE802-2014">
|
|
<front>
|
|
<title>Local and Metropolitan Area Networks: Overview and Architecture</title>
|
|
<author>
|
|
<organization>Institute of Electrical and Electronics Engineers</organization>
|
|
</author>
|
|
<date month="" year="2014"/>
|
|
</front>
|
|
<seriesInfo name="IEEE" value="Std 802-2014"/>
|
|
</reference>
|
|
</references>
|
|
|
|
<references title="Informative References">
|
|
<?rfc include="reference.RFC.0791"?>
|
|
<?rfc include="reference.RFC.1122"?>
|
|
<?rfc include="reference.RFC.1982"?>
|
|
<?rfc include="reference.I-D.malhotra-bess-evpn-lsoe"?>
|
|
<?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml-doi/reference.DOI.10.1145/2975159.xml?anchor=JUPITER"?>
|
|
<reference anchor="Clos0" >
|
|
<front>
|
|
<title>A study of non-blocking switching networks [PAYWALLED]</title>
|
|
<author initials="C." surname="Clos" fullname="Charles Clos">
|
|
<organization></organization>
|
|
</author>
|
|
<date month="March" year="1953"/>
|
|
</front>
|
|
<seriesInfo name="Bell System Technical Journal" value="32 (2), pp 406-424"/>
|
|
</reference>
|
|
<reference anchor="Clos1" target="https://en.wikipedia.org/wiki/Clos_network/">
|
|
<front>
|
|
<title>Clos Network</title>
|
|
<author/>
|
|
<date/>
|
|
</front>
|
|
</reference>
|
|
</references>
|
|
|
|
</back>
|
|
</rfc>
|