reorganized and framed Races, Ordering, and Transactions

-20 published
This commit is contained in:
Randy Bush 2025-04-22 12:15:42 -07:00
parent 28289e3b8f
commit 9461a09b73

View file

@ -1739,87 +1739,118 @@ Cache Router
</t> </t>
</section> </section>
<section anchor="races" title="ROA PDU Race Minimization"> <section anchor="rot" title="Races, Ordering, and Transactions">
<t> <t>
When a cache is sending ROA (IPv4 or IPv6) PDUs to a router <xref target="races"/> describes race conditions which could, if
undesirable race conditions are possible: not mitigated, cause BGP states to be temporarily marked as
<list style="hanging"> Invalid by Route Origin Validation. Minimal mitigations are
<t hangText="Make Before Break:"> described.
For some prefix P, an operator may create two or more ROAs </t>
with different ASes because they are in the process of <t>
changing what provider AS may announce P. This is a case of <xref target="order"/> describes an optional but RECOMMENDED
"make before break." If a cache is feeding a router and sends ordering of all PDUs which allows mitigation of the race
the one not yet in service a significant time before sending conditions described in <xref target="races"/>.
the one currently in service, then BGP data could be marked </t>
Invalid during the interval. To minimize that interval, the <t>
cache SHOULD semd all VRPs for the same prefix as close to <xref target="trans"/> describes how a router MAY process
sequentially as possible. ordered PDUs, especially IPvX VRPs, into atomic transactions to
be committed in a manner to mitigate the race conditions.
</t>
<section anchor="races" title="ROA PDU Race Minimization">
<t>
When a cache is sending ROA (IPv4 or IPv6) VRPs to a router
undesirable race conditions are possible:
<list style="hanging">
<t hangText="Make Before Break:">
For prefix P, an operator may create two or more ROAs with
different ASes because they are in the process of changing
what provider AS may announce P. This is a known as "make
before break." If a cache is feeding a router and sends
the one not yet in service a significant time before
sending the one currently in service, then BGP data could
be marked Invalid during the interval. To minimize that
interval, the cache SHOULD send all VRPs for the same
prefix as close to sequentially as possible.
</t>
<t hangText="Longest Prefix Match:">
If an operator has created a ROA for prefix P0, and another
operator (often their customer) has created a ROA for P1 which
is a sub-prefix covered by P0, a router which receives the VRP
for P0 before the VRP for P1 might mark BGP for prefix P1 Invalid
until the P1 announcement is processed. Therefore, the cache
SHOULD announce the sub-prefix P1 before the covering prefix
P0. Conversely, the cache SHOULD withdraw covering prefixes
before covered sub-prefixes.
</t>
<t hangText="AS 0:">
To minimize risk of inadvertent marking of BGP data as
Invalid, an announcement VRP for prefix P which has an AS
of 0, SHOULD be sent after all other VRPs for prefix P.
Conversely, a withdrawal VRP for prefix P which has an AS
of 0, SHOULD be sent before all other prefix PDUs for
prefix P.
</t>
</list>
</t> </t>
<t hangText="Longest Prefix Match:"> </section>
If an operator has created a ROA for prefix P0, and another
operator (often their customer) has created a ROA for P1 which <section anchor="order" title="PDU Ordering">
is a sub-prefix covered by P0, a router which receives the ROA <t>
for P0 before that for P1 might mark BGP for prefix P1 Invalid A Version 2 Cache SHOULD, unless it requires major revision of
until the P1 announcement is processed. Therefore, the cache existing code, order Payload PDUs it sends to routers.
SHOULD announce the sub-prefix P1 before the covering prefix Ascending order is considered somewhat more efficient as
P0. Conversely, the cache SHOULD withdraw covering prefixes routers are likely building trees. Iff ordering, with the
before covered sub-prefixes. exceptions in <xref target="races"/> above, ordering MUST be,
as follows:
</t> </t>
<t hangText="AS 0:"> <list style="symbols">
To minimize risk of inadvertent marking of BGP data as <t>
Invalid, an announcement VRP for prefix P which has an AS of PDUs are first ordered by PDU Type,
0, SHOULD be sent after all other prefix PDUs for prefix P. </t>
Conversely, a withdrawal VRP for prefix P which has an AS of <t>
0, SHOULD be sent before all other prefix PDUs for prefix P. IPv4 and IPv6 Prefix VRPs are ordered by: first IPvX Prefix,
second Prefix Length, third Max Length, and fourth Autonomous
System Number. Treating announcements of VPUs with AS 0 as
sorting last, and withdrawals as sorting first, fulfills the "AS
0" requirement of <xref target="races"/>. Treating
announcements of sub-prefixes as sorting first fulfills the
"Longest Prefix Match" requirement of <xref target="races"/>.
</t>
<t>
Router Key PDUs are ordered by AS Number and then Subject Public
Key Info.
</t>
<t>
And ASPA PDUs ordered by Customer AS.
</t>
</list>
<t>
Unless specifically configured for a particular cache, a
router MUST NOT depend on payload PDU ordering.
</t> </t>
</list> </section>
<t>
In order to further mitigate such race conditions, a router MAY <section anchor="trans" title="Transaction-like Commit">
choose not to make effective the PDUs received in response to a <t>
request until the relevant End of Data PDU is received. Iff a router has been configured to know that a particular
</t> cache's data are ordered per <xref target="order"/>, a router
<t> MAY wait to commit, i.e. make effective, IPvX VRPs only after
However, a router MAY apply a time limit for how long it is all sub-prefixes of a received covering prefix are received.
willing to wait for the End of Data PDU. </t>
</t> <t>
</t> Another method a router MAY choose to mitigate the above race
conditions is not to commit, i.e. make effective, the VRPs
</section> received in response to a request until the relevant End of
Data PDU is received. During start or restart of a session,
<section anchor="ordering" title="PDU Ordering"> this approach may consume considerable memory. If using this
<t> approach, a router MUST apply a time limit for how long it is
A Version 2 Cache SHOULD, unless it requires major revision of willing to wait for the End of Data PDU.
existing code, order payload PDUs (IPvX, Router Key, ASPA) it </t>
sends to routers. Ascending order is considered somewhat more </section>
efficient as routers are likely building trees. Iff ordering,
with the exceptions in <xref target="races"/> above, ordering MUST
be, as follows:
</t>
<list style="symbols">
<t>
PDUs are first ordered by PDU Type,
</t>
<t>
IPv4 and IPv6 Prefix VRPs are ordered by: first IPvX Prefix,
second Prefix Length, third Max Length, and fourth Autonomous
System Number. Treating announcements of VPUs with AS 0 as
sorting last, and withdrawals as sorting first, fulfills the "AS
0" requirement of <xref target="races"/>,
</t>
<t>
Router Key PDUs are ordered by AS Number and then Subject Public
Key Info,
</t>
<t>
And ASPA PDUs ordered by Customer AS.
</t>
</list>
<t>
Routers MUST NOT depend on payload PDU ordering.
</t>
</section> </section>
<!--- <!---
<section anchor="Scenarios" title="Deployment Scenarios"> <section anchor="Scenarios" title="Deployment Scenarios">
<t> <t>