<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE rfc PUBLIC "-//IETF//DTD RFC 2629//EN" "http://xml.resource.org/authoring/rfc2629.dtd" [
  <!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
  <!ENTITY rfc7049 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7049.xml">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xsl" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<?rfc toc="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc symrefs="yes" ?>
<rfc category="std" ipr="trust200902" submissionType="independent"
  docName="draft-sporny-hashlink-07">
<front>

  <title>Cryptographic Hyperlinks</title>

  <author initials="M.S." surname="Sporny" fullname="Manu Sporny">
    <organization>Digital Bazaar</organization>
    <address>
      <postal>
        <street>203 Roanoke Street W.</street>
        <city>Blacksburg</city>
        <region>VA</region>
        <code>24060</code>
        <country>US</country>
      </postal>
      <phone>+1 540 961 4469</phone>
      <email>msporny@digitalbazaar.com</email>
      <uri>http://manu.sporny.org/</uri>
    </address>
  </author>

  <author initials="L.R." surname="Rosenthol" fullname="Leonard Rosenthol">
    <organization>Adobe Systems</organization>
    <address>
      <postal>
        <street>345 Park Ave.</street>
        <city>San Jose</city>
        <region>CA</region>
        <code>95110-2704</code>
        <country>US</country>
      </postal>
      <phone>+1 800 833 6687</phone>
      <email>lrosenth@adobe.com</email>
      <uri>https://www.linkedin.com/in/lrosenthol/</uri>
    </address>
  </author>

  <date month="May" year="2021" />
  <area>Security</area>
  <workgroup />
  <keyword>hyperlink</keyword>
  <keyword>cryptography</keyword>
  <keyword>security</keyword>

  <abstract>
    <t>
When using a hyperlink to fetch a resource from the Internet, it is often useful
to know if the resource has changed since the data was published. Cryptographic
hashes, such as SHA-256, are often used to determine if published data has
changed in unexpected ways. Due to the nature of most hyperlinks, the
cryptographic hash is often published separately from the link itself.
This specification describes a data model and serialization formats for
expressing cryptographically protected hyperlinks. The mechanisms described
in the document enables a system to publish a hyperlink in a way that
empowers a consuming application to determine if the resource associated with
the hyperlink has changed in unexpected ways.
    </t>
  </abstract>

  <note title="Feedback">
    <t>
This specification is a work product of the
<eref target="https://w3c-dvcg.github.io/">W3C Digital Verification Community Group</eref>
and the
<eref target="https://w3c-ccg.github.io/">W3C Credentials Community Group</eref>.
Feedback related to this specification should be logged in the
<eref target="https://github.com/w3c-dvcg/hashlink/issues">issue tracker</eref>
or be sent to
<eref target="mailto:public-credentials@w3.org">public-credentials@w3.org</eref>.
</t>
  </note>
</front>
<middle>
  <section anchor="intro" title="Introduction">
    <t>
Uniform Resource Locators (URLs) enable software developers to build
distributed systems that are able to publish information using hyperlinks.
When a client fetches a resource at the given hyperlink, the result is
typically a stream of data that the client may further process. Due to the
design of most hyperlinks, the data associated with a hyperlink may change
over time. This design feature is often not an issue for systems that
do not depend on static data.
    </t>
    <t>
Some software systems expect data published at a specific URL to not change.
For example, firmware files, operating system releases, security upgrades,
and other high-risk files are often distributed with associated manifest files.
These manifest files typically utilize a cryptographic hash per URL to
ensure that an attack to modify the files themselves will be detected:
      <figure>
        <artwork>
b1a653e5...de5d3e8f3  https://example.com/operating-system.iso
7b23bf52...557a0902c  https://example.com/firmware-v4.35.bin
        </artwork>
      </figure>
    </t>
    <t>
An unfortunate downside of the manifest file approach is that a separate system
from the URL itself must be utilized to add this level of content integrity
protection. In addition, the cryptographic hash format for the files are often
application specific and are not easily upgradeable once newer and more
advanced cryptographic hash formats are standardized.
    </t>
    <t>
New types of distributed file storage networks have been deployed over the
past several decades. Examples include HTTP file mirrors for the Debian
Operating System, peer-to-peer file networks such as BitTorrent, and
content-addressed networks, such as the Inter Planetary File System (IPFS).
While each one of these systems have their own URL format, it is currently not
possible to express a content-addressed URL that associates the content
address to a file published on each one of these networks.
    </t>
    <t>
This specification provides a simple data model and serialization formats
for cryptographic hyperlinks that:
      <list style="symbols">
         <t>Enable existing URLs to add content integrity protection.</t>
         <t>Provide a URL format for multi-sourced content integrity protected data.</t>
         <t>Enable URL metadata to be discarded without having to re-encode the URL.</t>
         <t>Enable algorithm agility for all data model components</t>
      </list>
    </t>

    <section anchor="encodings" title="Multiple Encodings">
      <t>
A hashlink can be encoded in two different ways, the RECOMMENDED way to
express a hashlink is:
        <figure>
          <artwork>hl:&lt;resource-hash&gt;:&lt;optional-metadata&gt;</artwork>
        </figure>
      </t>
      <t>
To enable existing applications utilizing historical URL schemes to provide
content integrity protection, hashlinks may also be encoded using URL
parameters:
        <figure>
          <artwork>&lt;url&gt;?hl=&lt;resource-hash&gt;</artwork>
        </figure>
      </t>

      <t>
Implementers should take note that the URL parameter-based encoding mechanism
is application specific and SHOULD NOT be used unless the URL resolver for the
application cannot be upgraded to support the RECOMMENDED encoding.
      </t>
    </section>
  </section>

  <section anchor="data-model" title="Hashlink Data Model">
    <t>
The hashlink data model is a simple expression of a cryptographic hash of
the resource, one or more URLs, and a content type.
    </t>
    <section anchor="resource-hash" title="The Resource Hash">
      <t>
The resource hash is the the mechanism that enables content integrity
protection for the associated data stream. The resource hash value MUST be
provided in a hashlink.
      </t>
    </section>
    <section anchor="metadata" title="The Optional Metadata">
      <t>
All metadata associated with the hashlink is optional and is provided to
enable a client to more easily discover data that matches the provided
resource hash.
      </t>
      <section anchor="md-url" title="URLs">
        <t>
A hashlink may be associated with a set of one or more URLs that, when
dereferenced, result in data that matches the resource hash.
        </t>
      </section>
      <section anchor="md-content-type" title="Content Type">
        <t>
A hashlink may be associated with exactly one Content Type that may be used
in protocols that support content types, such as HTTP's Accept header.
        </t>
      </section>
      <section anchor="md-x" title="Experimental Metadata">
        <t>
Application developers often need to express other important metadata related
to their specific application. These developers MUST use this field to do so.
Data expressed in this field MAY conflict with keys chosen by other developers
in other applications. Experimental fields that become widely used are expected
to be standardized and become core metadata fields.
        </t>
      </section>
    </section>
  </section>

  <section anchor="serialization" title="Hashlink Serialization">
    <t>
A hashlink may be serialized in one or two ways. The first is the
RECOMMENDED method, called a "Hashlink URL", which is a compact URL
representation of the Hashlink data model. The second is called a
"Hashlink as a Parameterized URL", which MUST NOT be used unless there is no
mechanism available to upgrade the application's URL resolver.
    </t>

    <section anchor="hashlink-url" title="Hashlink URL">
      <t>
The beginning of a Hashlink URL always starts with the following three
characters:
      </t>
      <figure>
        <artwork>hl:</artwork>
      </figure>
      <t>
The remainder of the URL is a concatenation of the resource hash and,
optionally, the Hashlink URL metadata.
      </t>
      <section anchor="serializing-rh" title="Serializing the Resource Hash">
        <t>
The value of the resource hash can be generated by utilizing the following
algorithm:
        <list style="numbers">
          <t>
Generate the raw hash value by processing the resource data using the
cryptographic hashing algorithm.
          </t>
          <t>
Generate the multihash value by encoding the raw hash using the
<xref target="multihash">Multihash Data Format</xref>.
          </t>
          <t>
Generate the multibase hash by encoding the multihash value using the
<xref target="multibase">Multibase Data Format</xref>.
          </t>
          <t>
Output the multibase hash as the resource hash.
          </t>
        </list>
The example below demonstrates the output of the algorithm above for a
hashlink that expresses the data "Hello World!" processed using the
SHA-2, 256 bit, 32 byte cryptographic algorithm which is then expressed using
the base-58 Bitcoin base-encoding format:
          <figure>
            <artwork>zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e</artwork>
          </figure>
        </t>
      </section>
      <section anchor="serializing-md" title="Serializing the Metadata">
        <t>
To generate the value for the metadata, the metadata values are encoded in
the <xref target="RFC7049">CBOR Data Format</xref> using the following
algorithm:
        <list style="numbers">
          <t>
Create the raw output map (CBOR major type 5).
          </t>
          <t>
If at least one URL exists, add a CBOR key of 15 (0x0f) to the raw output map
with a value that is an array (CBOR major type 4).
            <list style="numbers">
              <t>
Encode each URL as a CBOR URI (CBOR type 32) and place it into the array.
              </t>
            </list>
          </t>
          <t>
If the content type exists, add a CBOR key of 14 (0x0e) to the raw output
map with a value that is a UTF-8 byte string (0x6) and the
value of the content type.
          </t>
          <t>
If experimental metadata exists, add a CBOR key of 13 (0x0d) and encode it
as a map by creating a raw output map (CBOR major type 5). For each item
in the map, serialize to CBOR where the CBOR major types, the key name,
and the value is derived from the input data. For example a key of "foo" and a
value of 200 would be encoded as a CBOR major type of 2 for the key and a
CBOR major type of 0 for the value.
          </t>
          <t>
Generate the multibase value by encoding the raw output map using the
Multibase Data Format.
          </t>
        </list>
The example below demonstrates the output of the algorithm above for metadata
containing a single URL ("http://example.org/hw.txt") with a content type
of "text/plain" expressed using the base-58 Bitcoin base-encoding format:
          <figure>
            <artwork>zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF</artwork>
          </figure>
        </t>
      </section>

      <section anchor="deserializing-md" title="Deserializing the Metadata">
        <t>
To deserialize the metadata, the "Serializing the Metadata" algorithm is
reversed. Implementers MUST use the following table to deserialize keys to
JSON:
        </t>

        <texttable anchor="mh-registry-table" title="Multihash Algorithms Registry">
          <ttcol align="center">Key (hex)</ttcol>
          <ttcol align="center">JSON key</ttcol>
          <ttcol align="center">JSON value</ttcol>
          <c>0x0f</c><c>"url"</c><c>Array of strings</c>
          <c>0x0e</c><c>"content-type"</c><c>string</c>
          <c>0x0d</c><c>"experimental"</c><c>JSON Object</c>
        </texttable>

        <t>
The example below demonstrates the output of the algorithm above for metadata
containing a single URL ("http://example.org/hw.txt") with a content type
of "text/plain", and an experimental metadata key of "foo" and value of 123:
          <figure>
            <artwork>{
  "url": ["http://example.org/hw.txt"],
  "content-type": "text/plain",
  "experimental": {
    "foo": 123
  }
}</artwork>
          </figure>
        </t>
      </section>

      <section anchor="ex-simple" title="A Simple Hashlink Example">
        <t>
The example below demonstrates a simple hashlink that provides content
integrity protection for the "http://example.org/hw.txt" file, which has a
content type of "text/plain" (line breaks added for readability purposes):
          <figure>
            <artwork>hl:
zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e:
zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF</artwork>
          </figure>
        </t>
      </section>
    </section>

    <section anchor="hl-url-params" title="Hashlink as a Parameterized URL">
      <t>
An algorithm resulting in the same output as the one below MUST be used when
encoding the hashlink data model as a set of parameters in a URL:
        <list style="numbers">
          <t>
Create an empty string and assign it to the output value.
          </t>
          <t>
Append the first URL in the URL metadata array to the output URL.
          </t>
          <t>
Append a URL parameter with a key of "hl" and the value of the resource hash
as generated in <xref target="serializing-rh"></xref>.
          </t>
        </list>
      </t>
      <section anchor="ex-params" title="Hashlink as a Parameterized URL Example">
        <t>
The example below demonstrates a simple hashlink that provides content
integrity protection for the "http://example.org/hw.txt" file, which has a
content type of "text/plain":
          <figure>
            <artwork>http://example.org/hw.txt?hl=
zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e
            </artwork>
          </figure>
        </t>
      </section>
    </section>
  </section>

  <section anchor="processors" title="Hashlink Encoders and Decoders">
    <t>
Hashlink encoders and decoders MUST support the following core algorithms:
    <list style="numbers">
      <t>
The SHA-2, 256 bit, 32 byte output cryptographic hashing algorithm and the
associated Multihash Data Format.
      </t>
      <t>
The Bitcoin base58-encoding and decoding algorithm and the associated Multibase
Data Format.
      </t>
    </list>
Implementations MAY support algorithms and data formats in addition to the ones
listed above.
    </t>
  </section>

  <section anchor="security" title="Security Considerations">
    <t>
This section documents the security attacks that are out of scope for this
specification as well as known attacks and mitigations against those attacks.
    </t>

    <section anchor="sec-badhash" title="Insecure Hashing Functions">
      <t>
There are a number of insecure cryptographic hashing functions in deployment
today. Among these are MD5 and SHA-1. Implementers MUST throw an error by
default when encoding or decoding these values. Implementers MAY provide a
non-default library option to override the error.
      </t>
    </section>
  </section>
</middle>

<back>
  <references title="Normative References">
&rfc2119;
&rfc7049;
    <reference anchor="multibase" target="https://tools.ietf.org/id/draft-multiformats-multibase-00.html">
      <front>
        <title>The Multihash Data Format</title>
        <author initials="J.B." surname="Benet" fullname="Juan Benet">
          <organization>Protocol Labs</organization>
        </author>

        <author initials="M.S." surname="Sporny" fullname="Manu Sporny">
          <organization>Digital Bazaar</organization>
        </author>
        <date month="December" year="2018" />
      </front>
    </reference>

    <reference anchor="multihash" target="https://tools.ietf.org/id/draft-multiformats-multihash-00.html">
      <front>
        <title>The Multihash Data Format</title>
        <author initials="J.B." surname="Benet" fullname="Juan Benet">
          <organization>Protocol Labs</organization>
        </author>

        <author initials="M.S." surname="Sporny" fullname="Manu Sporny">
          <organization>Digital Bazaar</organization>
        </author>
        <date month="August" year="2018" />
      </front>
    </reference>
  </references>

  <section anchor="appendix-a" title="Security Considerations">
    <t>
There are a number of security considerations to take into account when
implementing or utilizing this specification: TBD
    </t>
  </section>
  <section anchor="appendix-c" title="Test Values">

    <t>
The following test values may be used to verify the conformance of Hashlink
encoders and decoders.
    </t>

    <section anchor="tv-simple" title="Simple Hashlink URL">
      <t>
The following Hashlink URL encodes the data "Hello World!" served from the
"http://example.org/hw.txt" URL with a content type of "text/plain". The
resource hash is generated using the SHA-2, 256 bit, 32 byte cryptographic
algorithm which is then encoded using the base-58 Bitcoin base-encoding format.
The metadata options are encoded using the base-58 Bitcoin base-encoding format.
The final Hashlink URL is (new lines added for readability purposes):
        <figure>
          <artwork>
hl:
zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e:
zuh8iaLobXC8g9tfma1CSTtYBakXeSTkHrYA5hmD4F7dCLw8XYwZ1GWyJ3zwF
          </artwork>
        </figure>
      </t>
    </section>

    <section anchor="tv-multisource" title="Multi-sourced Hashlink URL">
      <t>
The following Hashlink URL encodes the data "Hello World!" served from three
different networks. The first is a standard Web-based URL
("http://example.org/hw.txt"), the second is an IPFS-based URL
("ipfs:/ipfs/QmXfrS3pHerg44zzK6QKQj6JDk8H6cMtQS7pdXbohwNQfK/hello"), and
the third is a Tor-based URL ("http://c4m3g2upq6pkufl4.onion/hworld.txt"). The
resource hash is generated using the SHA-2, 256 bit, 32 byte cryptographic
algorithm which is then encoded using the base-58 Bitcoin base-encoding format.
The metadata options are encoded using the base-58 Bitcoin base-encoding format.
The final Hashlink URL is (new lines added for readability purposes):
        <figure>
          <artwork>
hl:
zQmWvQxTqbG2Z9HPJgG57jjwR154cKhbtJenbyYTWkjgF3e:
z333PdTakFeJueF2bim3PaaDqbtqjkpxUc8ETSWXe6dQLWXQWvqiUdw8TJrncx3uKhwfc
88MtM5xZbR27FhVRUKv9ogekamVtdE3UbXnXpMRT1AseCtoBUt1NE8x2SsnJxGfiZN45V
VSCp6jh4dgcufL16tWrHREiSYESEGP1J75yXCvAdvKPr7nb5aYujLeay8Ww
          </artwork>
        </figure>
      </t>
    </section>

  </section>
  <section anchor="acknowledgements" title="Acknowledgements">
    <t>
The editors would like to thank the following individuals for feedback on and
implementations of the specification (in alphabetical order): TBD
    </t>
    <t>
Portions of the work on this specification have been funded by the United
States Department of Homeland Security's Science and Technology Directorate
under contract HSHQDC-17-C-00019. The content of this specification does not
necessarily reflect the position or the policy of the U.S. Government and no
official endorsement should be inferred.
    </t>
  </section>
</back>
</rfc>
