Finding and Using Geofeed DataIIJ & Arrcus5147 Crystal SpringsBainbridge IslandWashington98110United States of Americarandy@psg.comNTTVeemweg 23Barneveld3771 MTNetherlandsmassimo@ntt.netGoogle1600 Amphitheatre ParkwayMountain ViewCA94043United States of Americawarren@kumari.netVigil Security, LLC516 Dranesville RoadHerndonVA20170United States of Americahousley@vigilsec.comgeolocationgeo-locationRPSLinetnum
This document specifies how to augment the Routing Policy
Specification Language inetnum: class to refer specifically to
geofeed data files and describes an optional scheme that uses
the Resource Public Key Infrastructure to authenticate the
geofeed datafiles.
Introduction
Providers of Internet content and other services may wish to
customize those services based on the geographic location of the
user of the service. This is often done using the source IP
address used to contact the service, which may not point to a
user, see , Section 14 in particular.
Also, infrastructure and other services might wish to publish
the locale of their services. defines geofeed, a syntax to associate
geographic locales with IP addresses, but it does not specify
how to find the relevant geofeed data given an IP address.
This document specifies how to augment the Routing Policy
Specification Language (RPSL) inetnum: class to refer specifically to
geofeed data files and how to prudently use them. In all places
inetnum: is used, inet6num: should also be assumed .
The reader may find
and informative, and
certainly more verbose, descriptions of the inetnum: database
classes.
An optional utterly awesome but slightly complex means for
authenticating geofeed data is also defined in .
This document obsoletes . Changes from
include the following:
RIPE has implemented the geofeed: attribute.
Allow, but discourage, an inetnum: to have both a geofeed
remarks: attribute and a geofeed: attribute.
Rewrite Authentication to be more
formal.
Geofeed file only UTF-8 CSV.
Stress that authenticating geofeed data is optional.
IP Address Delegation extensions must not use "inherit".
If geofeed data are present, ignore geographic location
hints in other data.
Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14 when, and only when, they appear in all
capitals, as shown here.
Geofeed Files
Geofeed files are described in . They provide a facility for an IP address
resource "owner" to associate those IP addresses to geographic
locales.
Per , geofeed files consist of CSVs
(Comma Separated Values) in UTF-8 text format; not HTML,
richtext, or other formats.
Content providers and other parties who wish to locate an IP
address to a geographic locale need to find the relevant geofeed
data. In , this
document specifies how to find the relevant geofeed file given an IP address.
Geofeed data for large providers with significant horizontal
scale and high granularity can be quite large. The size of a
file can be even larger if an unsigned geofeed file combines
data for many prefixes, if dual IPv4/IPv6 spaces are
represented, etc.
Geofeed data do have privacy considerations (see ); this process makes bulk
access to those data easier.
This document also suggests an optional signature to strongly
authenticate the data in the geofeed files.
inetnum: Class
The original RPSL specifications starting with , , and a trail of subsequent documents were
written by the RIPE community. The IETF standardized RPSL in
and . Since then, it has been
modified and extensively enhanced in the Regional Internet
Registry (RIR) community, mostly by RIPE . Currently, change control effectively lies
in the operator community.
The RPSL, and and
used by the Regional
Internet Registries (RIRs), specify the inetnum: database class.
Each of these objects describes an IP address range and its
attributes. The inetnum: objects form a hierarchy ordered on
the address space.
Ideally, RPSL would be augmented to define a new RPSL geofeed:
attribute in the inetnum: class. Absent implementation of the
geofeed: attribute in a particular RIR database, this document
defines the syntax of a Geofeed remarks: attribute, which
contains an HTTPS URL of a geofeed file. The format of the
inetnum: geofeed remarks: attribute MUST be as in this example,
"remarks: Geofeed ", where the token "Geofeed " MUST be case
sensitive, followed by a URL that will vary, but it MUST refer
only to a single geofeed file.
While we leave global agreement of RPSL modification to the
relevant parties, we specify that a proper geofeed: attribute in
the inetnum: class MUST be "geofeed:" and
MUST be followed by a single URL that will vary,
but it MUST refer only to a single geofeed file.
The URL uses HTTPS, so the WebPKI provides authentication,
integrity, and confidentiality for the fetched geofeed file.
However, the WebPKI can not provide authentication of IP address
space assignment. In contrast, the RPKI (see ) can be used to authenticate
IP space assignment; see optional authentication in .
Until all producers of inetnum: objects, i.e., the RIRs, state
that they have migrated to supporting a geofeed: attribute,
consumers looking at inetnum: objects to find geofeed URLs
MUST be able to consume both the remarks: and
geofeed: forms.
The migration not only implies that the RIRs support the
geofeed: attribute, but that all registrants have migrated any
inetnum: objects from remarks: to geofeed: attributes.
Any particular inetnum: object SHOULD have, at
most, one geofeed reference, whether a remarks: or a proper
geofeed: attribute when it is implemented. If there is more
than one, the geofeed: attribute SHOULD be used.
For inetnum:s covering the same address range, or an inetnum:
with both remarks: and geofeed: attributes, a signed geofeed
file SHOULD be preferred over an unsigned file.
If a geofeed file describes multiple disjoint ranges of IP
address space, there are likely to be geofeed references from
multiple inetnum: objects. Files with geofeed references from
multiple inetnum: objects are not compatible with the signing
procedure in .
An unsigned, and only an unsigned, geofeed file MAY be
referenced by multiple inetnum:s and MAY contain prefixes from
more than one registry.
When geofeed references are provided by multiple inetnum:
objects that have identical address ranges, then the geofeed
reference on the inetnum: with the most recent last-modified:
attribute SHOULD be preferred.
As inetnum: objects form a hierarchy, geofeed references
SHOULD be at the lowest applicable inetnum:
object covering the relevant address ranges in the referenced
geofeed file. When fetching, the most specific inetnum: object
with a geofeed reference MUST be used.
It is significant that geofeed data may have finer granularity
than the inetnum: that refers to them. For example, an INETNUM
object for an address range P could refer to a geofeed file in
which P has been subdivided into one or more longer prefixes.
Fetching Geofeed Data
This document is to provides a guideline for how interested
parties should fetch and read geofeed files.
Historically, before , this was done in
varied ways, at the discretion of the implementer, often without
consistent authentication, where data were mostly imported from
email without formal authorisation or validation.
To minimize the load on RIRs' WHOIS
services, the RIR's FTP services SHOULD
be used for large-scale access to gather inetnum:s with geofeed
references. This uses efficient bulk access instead of fetching
via brute-force search through the IP space.
When an inetnum: with a geofeed file reference is identified,
the file MUST be downloaded using HTTPS.
When reading data from the geofeed file, one MUST ignore data
outside the referring inetnum: object's address range. This is
to avoid importing data about ranges not under the control of
the operator. If geofeed files are fetched, other location
information from the inetnum: MUST be ignored.
Given an address range of interest, the most specific inetnum:
object with a geofeed reference MUST be used to fetch the
geofeed file. For example, if the fetching party finds
the following inetnum: objects:
and the file geofeed_1 contains geolocation data about
192.0.2.0/29, this MUST be discarded because 192.0.2.0/24 is
within the more specific inetnum: covering the address range and
that inetnum: has a geofeed reference.
If an inetnum: object has both remarks: with geofeed data and
also has a geofeed: attribute, the geofeed: attribute SHOULD be
used and the remarks: ignored.
Hints in inetnum:s such as country:, geoloc:, etc. tend to be
administrative, and not deployment specific. Consider large,
possibly global, providers with headquarters very far from most
of their deployments. Therefore, if geofeed data are specified,
either as a geofeed: attribute or in a geofeed remarks:
attribute, other geographic hints such as country:, geoloc:, DNS
geoloc RRsets, etc., for that address range MUST be ignored.
There is open-source code to traverse the RPSL data across all
of the RIRs, collect all geofeed references, and process them
. It implements the steps above
and of all the Operational Considerations described in , including caching. It produces a single geofeed
file, merging all the geofeed files found. This open-source
code can be run daily by a cronjob, and the output file can be
directly used.
Authenticating Geofeed Data (Optional)
The question arises whether a particular geofeed data set is valid, i.e., is authorized by the
"owner" of the IP address space and is authoritative in some
sense. The inetnum: that points to the geofeed file provides some assurance. Unfortunately,
the RPSL in some repositories is weakly authenticated at best.
An approach where RPSL was signed per
would be good, except it would have to be deployed by all RPSL
registries, and there is a fair number of them.
The remainder of this section specifies an optional
authenticator for the geofeed data set that follows the Signed
Object Template for the Resource Public Key Infrastructure
(RPKI) .
A single optional authenticator MAY be appended to a geofeed
file. It is a digest of the main body
of the file signed by the private key of the relevant RPKI
certificate for a covering address range. The following format
bundles the relevant RPKI certificate with a signature over the
geofeed text.
The canonicalization procedure converts the data from their
internal character representation to the UTF-8 character encoding, and the <CRLF>
sequence MUST be used to denote the end of each line of text. A
blank line is represented solely by the <CRLF> sequence.
For robustness, any non-printable characters MUST NOT be changed
by canonicalization. Trailing blank lines MUST NOT appear at
the end of the file. That is, the file must not end with
multiple consecutive <CRLF> sequences. Any end-of-file
marker used by an operating system is not considered to be part
of the file content. When present, such end-of-file markers
MUST NOT be covered by the digital signature.
If the authenticator is not in the canonical form described above,
then, the authenticator is invalid.
Borrowing detached signatures from ,
after file canonicalization, the Cryptographic Message Syntax
(CMS) is used to create a detached
DER-encoded signature that is then Base64 encoded with padding
(as defined in Section 4 of ) and line
wrapped to 72 or fewer characters. The same digest algorithm
MUST be used for calculating the message digest of the content
being signed, which is the geofeed file, and for calculating the
message digest on the SignerInfo SignedAttributes . The message digest algorithm identifier
MUST appear in both the CMS SignedData
DigestAlgorithmIdentifiers and the SignerInfo
DigestAlgorithmIdentifier . The RPKI
certificate covering the geofeed inetnum: object's address range
is included in the CMS SignedData certificates field .
The address range of the signing certificate MUST cover all
prefixes in the signed geofeed file. The signing certificate
MUST NOT include the Autonomous System Identifier Delegation
certificate extension .
As with many other RPKI signed objects, the IP Address
Delegation certificate extension MUST NOT use the "inherit"
capability defined in Section 2.2.3.5 of . An IP Address Delegation extension using
"inherit" would complicate processing. The implementation would
have to build the certification path from the end-entity to the
trust anchor, then validate the path from the trust anchor to
the end-entity, and then the parameter would have to be
remembered when the validated public key was used to validate a
signature on a CMS object. Having to remember things from
certification path validation for use with CMS object processing
is too hard. And, the certificates do not get that much bigger
by repeating the information.
An address range A "covers" address range B if the range of B is
identical to or a subset of A. "Address range" is used here
because inetnum: objects and RPKI certificates need not align on
Classless Inter-Domain Routing (CIDR)
prefix boundaries, while those of the lines in a geofeed file do
align.
The CA MUST sign only one geofeed with a particular generated
private key and MUST generate a new key pair for each new
version of the geofeed. An associated EE certificate used in
this fashion is termed a "one-time- use" EE certificate (see
Section 3 of ).
Identifying the private key associated with the certificate and
getting the department that controls the private key (which
might be stored in a Hardware Security Module (HSM)) to generate
the CMS signature is left as an exercise for the implementor.
On the other hand, verifying the signature has no similar
complexity; the certificate, which is validated in the public
RPKI, contains the needed public key. The RPKI trust anchors
for the RIRs are expected to already be available to the party
performing signature validation. Validation of the CMS
signature over the geofeed file involves:
Obtain the signer's certificate from the CMS SignedData
CertificateSet . The certificate
SubjectKeyIdentifier extension MUST
match the SubjectKeyIdentifier in the CMS SignerInfo
SignerIdentifier . If the key
identifiers do not match, then validation MUST fail.
Validation of the signer's certificate MUST ensure that it is
part of the current manifest and that
all resources are covered by the RPKI certificate.
Construct the certification path for the signer's certificate.
All of the needed certificates are expected to be readily
available in the RPKI repository. The certification path MUST
be valid according to the validation algorithm in and the additional checks specified in
associated with the IP Address
Delegation certificate extension and the Autonomous System
Identifier Delegation certificate extension. If certification
path validation is unsuccessful, then validation MUST fail.
Validate the CMS SignedData as specified in using the public key from the validated
signer's certificate. If the signature validation is
unsuccessful, then validation MUST fail.
Confirm that the eContentType object identifier (OID) is
id-ct-geofeedCSVwithCRLF (1.2.840.113549.1.9.16.1.47). This
OID MUST appear within both the eContentType in the
encapContentInfo object and the ContentType signed attribute
in the signerInfo object (see ).
Verify that the IP Address Delegation certificate extension
covers all of the address ranges of
the geofeed file. If all of the address ranges are not
covered, then validation MUST fail.
All of the above steps MUST be successful to consider the
geofeed file signature as valid.
Identifying the private key associated with the certificate and
getting the department with the Hardware Security Module (HSM)
to sign the CMS blob is left as an exercise for the implementor.
On the other hand, verifying the signature requires no
complexity; the certificate, which can be validated in the
public RPKI, has the needed public key.
The authenticator MUST be hidden as a series of "#" comments at the
end of the geofeed file. The following simple example is
cryptographically incorrect:
A correct and full example is in Appendix A.
The CMS signature does not cover the signature lines.
The bracketing "# RPKI Signature:" and "# End Signature:" MUST
be present as shown in the example. The RPKI Signature's IP
address range MUST match that of the geofeed URL in the inetnum:
that points to the geofeed file.
Operational Considerations
To create the needed inetnum: objects, an operator wishing to register
the location of their geofeed file needs to coordinate with their
Regional Internet Registry (RIR) or National Internet Registry (NIR)
and/or any provider Local Internet Registry (LIR) that has assigned
address ranges to them. RIRs/NIRs provide means for assignees to
create and maintain inetnum: objects. They also provide means of
assigning or sub-assigning IP address resources and allowing the
assignee to create WHOIS data, including inetnum: objects, thereby
referring to geofeed files.
The geofeed files MUST be published via and fetched using
HTTPS .
When using data from a geofeed file, one MUST ignore data
outside the referring inetnum: object's inetnum: attribute
address range.
If and only if the geofeed file is not signed per , then multiple inetnum: objects MAY
refer to the same geofeed file, and the consumer MUST
use only lines in the geofeed file where the prefix is covered by the
address range of the inetnum: object's URL it has followed.
If the geofeed file is signed, and the signer's certificate
changes, the signature in the geofeed file MUST
be updated.
It is good key hygiene to use a given key for only one purpose.
To dedicate a signing private key for signing a geofeed file, an
RPKI Certification Authority (CA) may issue a subordinate
certificate exclusively for the purpose shown in .
Harvesting and publishing aggregated geofeed data outside of the
RPSL model should be avoided as it can have the effect that more
specifics from one aggregatee could undesirably affect the less
specifics of a different aggregatee. Moreover, publishing
aggregated geofeed data prevents the reader of the data to
perform the checks described in and .
Currently, geolocation providers have bulk WHOIS data access at
all the RIRs. An anonymized version of such data is openly
available for all RIRs except ARIN, which requires an
authorization. However, for users without such authorization,
the same result can be achieved with extra RDAP effort. There is
open-source code to pass over such data across all RIRs, collect
all geofeed references, and process them .
To prevent undue load on RPSL and geofeed servers,
entity-fetching geofeed data using these mechanisms MUST
NOT do frequent real-time lookups. suggests use of the HTTP Expires header to signal when geofeed data
should be refetched. As the data change very infrequently, in
the absence of such an HTTP Header signal, collectors
SHOULD NOT fetch more frequently than weekly. It
would be polite not to fetch at magic times such as midnight
UTC, the first of the month, etc., because too many others are
likely to do the same.
Privacy Considerations geofeed data may reveal the
approximate location of an IP address, which might in turn reveal the
approximate location of an individual user. Unfortunately, provides no privacy guidance on
avoiding or ameliorating possible damage due to this exposure of the
user. In publishing pointers to geofeed files as described in this
document, the operator should be aware of this exposure in geofeed
data and be cautious. All the privacy considerations of
apply to this document.
Where provided the ability
to publish location data, this document makes bulk access to those data
readily available. This is a goal, not an accident.
Implementation Status
Currently, the geofeed: attribute in inetnum objects has
been implemented in the RIPE and APNIC databases.
Registrants in databases which do not yet support the geofeed:
attribute are using the remarks:, or equivalent, attribute.
Currently, the registry data published by ARIN are not the same
RPSL as that of the other registries (see for a survey of the WHOIS Tower of Babel);
therefore, when fetching from ARIN via FTP , WHOIS , the Registration Data
Access Protocol (RDAP) , etc., the "NetRange" attribute/key must be
treated as "inetnum", and the "Comment" attribute must be
treated as "remarks".
can be used to authenticate a
signed geofeed file.
Security Considerations
It is generally prudent for a consumer of geofeed data to also
use other sources to cross-validate the data. All the security
considerations of
apply here as well.
The consumer of geofeed data SHOULD fetch and process the data
themselves. Importing datasets produced and/or processed by a
third-party places ill-advised trust in the third-party.
As mentioned in , some
RPSL repositories have weak, if any, authentication. This
allows spoofing of inetnum: objects pointing to malicious
geofeed files. suggests
an unfortunately complex method for stronger authentication
based on the RPKI.
For example, if an inetnum: for a wide address range (e.g., a
/16) points to an RPKI-signed geofeed file, a customer or
attacker could publish an unsigned equal or narrower (e.g., a
/24) inetnum: in a WHOIS registry that has weak authorization,
abusing the rule that the most-specific inetnum: object with a
geofeed reference MUST be used.
If signatures were mandatory, the above attack would be stymied, but
of course that is not happening anytime soon.
The RPSL providers have had to throttle fetching from their
servers due to too-frequent queries. Usually, they throttle by
the querying IP address or block. Similar defenses will likely
need to be deployed by geofeed file servers.
IANA Considerations
There are no new actions needed by the IANA.
Thanks to Rob Austein for CMS and detached signature clue,
George Michaelson for the first and substantial external review,
and Erik Kline who was too shy to agree to coauthorship.
Additionally, we express our gratitude to early implementors,
including Menno Schepers; Flavio Luciani; Eric Dugas; and Kevin
Pack. Also, thanks to the following geolocation providers who
are consuming geofeeds with this described solution: Jonathan
Kosgei (ipdata.co), Ben Dowling (ipinfo.io), and Pol Nisenblat
(bigdatacloud.com). For an amazing number of helpful reviews,
we thank Job Snijders, who also found an ASN.1 'inherit' issue;
Adrian Farrel; Antonio Prado; Francesca Palombini; Jean-Michel
Combes (INTDIR); John Scudder; Kyle Rose (SECDIR); Martin Duke;
Murray Kucherawy; Paul Kyzivat (GENART); Rob Wilton; Roman
Danyliw; and Ties de Kock.Representation Of IP Routing Policies In The RIPE DatabaseRIPE NCCRepresentation Of IP Routing Policies In A Routing RegistryRIPE NCCRIPE Database DocumentationRIPE NCCDescription of the INETNUM ObjectRIPE NCCDescription of the INET6NUM ObjectRIPE NCCgeofeed-findercommit 5f557a4Example on how to use rpki-client to authenticate a signed Geofeed
This appendix provides an example, including a trust anchor,
a CRL signed by the trust anchor, a CA certificate subordinate to
the trust anchor, a CRL signed by the CA, an end-entity certificate
subordinate to the CA for signing the geofeed, and a detached signature.
The trust anchor is represented by a self-signed certificate. As usual in
the RPKI, the trust anchor has authority over all IPv4 address blocks,
all IPv6 address blocks, and all AS numbers.
The CRL issued by the trust anchor.
The CA certificate is issued by the trust anchor. This
certificate grants authority over one IPv4 address block
(192.0.2.0/24) and two AS numbers (64496 and 64497).
The CRL issued by the CA.
The end-entity certificate is issued by the CA. This
certificate grants signature authority for one IPv4 address block
(192.0.2.0/24). Signature authority for AS numbers is not needed
for geofeed data signatures, so no AS numbers are included in the
end-entity certificate.
The end-entity certificate is displayed below in detail. For
brevity, the other two certificates are not.
To allow reproduction of the signature results, the end-entity
private key is provided. For brevity, the other two private
keys are not.
Signing of "192.0.2.0/24,US,WA,Seattle," (terminated by CR and LF),
yields the following detached CMS signature.