reorganized and framed Races, Ordering, and Transactions

-20 published
This commit is contained in:
Randy Bush 2025-04-22 12:15:42 -07:00
parent 28289e3b8f
commit 9461a09b73

View file

@ -1739,84 +1739,115 @@ Cache Router
</t>
</section>
<section anchor="races" title="ROA PDU Race Minimization">
<t>
When a cache is sending ROA (IPv4 or IPv6) PDUs to a router
undesirable race conditions are possible:
<list style="hanging">
<t hangText="Make Before Break:">
For some prefix P, an operator may create two or more ROAs
with different ASes because they are in the process of
changing what provider AS may announce P. This is a case of
"make before break." If a cache is feeding a router and sends
the one not yet in service a significant time before sending
the one currently in service, then BGP data could be marked
Invalid during the interval. To minimize that interval, the
cache SHOULD semd all VRPs for the same prefix as close to
sequentially as possible.
</t>
<t hangText="Longest Prefix Match:">
If an operator has created a ROA for prefix P0, and another
operator (often their customer) has created a ROA for P1 which
is a sub-prefix covered by P0, a router which receives the ROA
for P0 before that for P1 might mark BGP for prefix P1 Invalid
until the P1 announcement is processed. Therefore, the cache
SHOULD announce the sub-prefix P1 before the covering prefix
P0. Conversely, the cache SHOULD withdraw covering prefixes
before covered sub-prefixes.
</t>
<t hangText="AS 0:">
To minimize risk of inadvertent marking of BGP data as
Invalid, an announcement VRP for prefix P which has an AS of
0, SHOULD be sent after all other prefix PDUs for prefix P.
Conversely, a withdrawal VRP for prefix P which has an AS of
0, SHOULD be sent before all other prefix PDUs for prefix P.
</t>
</list>
<section anchor="rot" title="Races, Ordering, and Transactions">
<t>
In order to further mitigate such race conditions, a router MAY
choose not to make effective the PDUs received in response to a
request until the relevant End of Data PDU is received.
<xref target="races"/> describes race conditions which could, if
not mitigated, cause BGP states to be temporarily marked as
Invalid by Route Origin Validation. Minimal mitigations are
described.
</t>
<t>
However, a router MAY apply a time limit for how long it is
willing to wait for the End of Data PDU.
<xref target="order"/> describes an optional but RECOMMENDED
ordering of all PDUs which allows mitigation of the race
conditions described in <xref target="races"/>.
</t>
<t>
<xref target="trans"/> describes how a router MAY process
ordered PDUs, especially IPvX VRPs, into atomic transactions to
be committed in a manner to mitigate the race conditions.
</t>
</t>
</section>
<section anchor="races" title="ROA PDU Race Minimization">
<t>
When a cache is sending ROA (IPv4 or IPv6) VRPs to a router
undesirable race conditions are possible:
<list style="hanging">
<t hangText="Make Before Break:">
For prefix P, an operator may create two or more ROAs with
different ASes because they are in the process of changing
what provider AS may announce P. This is a known as "make
before break." If a cache is feeding a router and sends
the one not yet in service a significant time before
sending the one currently in service, then BGP data could
be marked Invalid during the interval. To minimize that
interval, the cache SHOULD send all VRPs for the same
prefix as close to sequentially as possible.
</t>
<t hangText="Longest Prefix Match:">
If an operator has created a ROA for prefix P0, and another
operator (often their customer) has created a ROA for P1 which
is a sub-prefix covered by P0, a router which receives the VRP
for P0 before the VRP for P1 might mark BGP for prefix P1 Invalid
until the P1 announcement is processed. Therefore, the cache
SHOULD announce the sub-prefix P1 before the covering prefix
P0. Conversely, the cache SHOULD withdraw covering prefixes
before covered sub-prefixes.
</t>
<t hangText="AS 0:">
To minimize risk of inadvertent marking of BGP data as
Invalid, an announcement VRP for prefix P which has an AS
of 0, SHOULD be sent after all other VRPs for prefix P.
Conversely, a withdrawal VRP for prefix P which has an AS
of 0, SHOULD be sent before all other prefix PDUs for
prefix P.
</t>
</list>
</t>
</section>
<section anchor="ordering" title="PDU Ordering">
<t>
A Version 2 Cache SHOULD, unless it requires major revision of
existing code, order payload PDUs (IPvX, Router Key, ASPA) it
sends to routers. Ascending order is considered somewhat more
efficient as routers are likely building trees. Iff ordering,
with the exceptions in <xref target="races"/> above, ordering MUST
be, as follows:
</t>
<list style="symbols">
<t>
PDUs are first ordered by PDU Type,
</t>
<t>
IPv4 and IPv6 Prefix VRPs are ordered by: first IPvX Prefix,
second Prefix Length, third Max Length, and fourth Autonomous
System Number. Treating announcements of VPUs with AS 0 as
sorting last, and withdrawals as sorting first, fulfills the "AS
0" requirement of <xref target="races"/>,
</t>
<t>
Router Key PDUs are ordered by AS Number and then Subject Public
Key Info,
</t>
<t>
And ASPA PDUs ordered by Customer AS.
</t>
</list>
<t>
Routers MUST NOT depend on payload PDU ordering.
</t>
<section anchor="order" title="PDU Ordering">
<t>
A Version 2 Cache SHOULD, unless it requires major revision of
existing code, order Payload PDUs it sends to routers.
Ascending order is considered somewhat more efficient as
routers are likely building trees. Iff ordering, with the
exceptions in <xref target="races"/> above, ordering MUST be,
as follows:
</t>
<list style="symbols">
<t>
PDUs are first ordered by PDU Type,
</t>
<t>
IPv4 and IPv6 Prefix VRPs are ordered by: first IPvX Prefix,
second Prefix Length, third Max Length, and fourth Autonomous
System Number. Treating announcements of VPUs with AS 0 as
sorting last, and withdrawals as sorting first, fulfills the "AS
0" requirement of <xref target="races"/>. Treating
announcements of sub-prefixes as sorting first fulfills the
"Longest Prefix Match" requirement of <xref target="races"/>.
</t>
<t>
Router Key PDUs are ordered by AS Number and then Subject Public
Key Info.
</t>
<t>
And ASPA PDUs ordered by Customer AS.
</t>
</list>
<t>
Unless specifically configured for a particular cache, a
router MUST NOT depend on payload PDU ordering.
</t>
</section>
<section anchor="trans" title="Transaction-like Commit">
<t>
Iff a router has been configured to know that a particular
cache's data are ordered per <xref target="order"/>, a router
MAY wait to commit, i.e. make effective, IPvX VRPs only after
all sub-prefixes of a received covering prefix are received.
</t>
<t>
Another method a router MAY choose to mitigate the above race
conditions is not to commit, i.e. make effective, the VRPs
received in response to a request until the relevant End of
Data PDU is received. During start or restart of a session,
this approach may consume considerable memory. If using this
approach, a router MUST apply a time limit for how long it is
willing to wait for the End of Data PDU.
</t>
</section>
</section>