The Multihash Data FormatProtocol Labs548 Market Street, #51207San FranciscoCA94104US+1 619 957 7606juan@protocol.aihttp://juan.benet.ai/Digital Bazaar203 Roanoke Street W.BlacksburgVA24060US+1 540 961 4469msporny@digitalbazaar.comhttp://manu.sporny.org/
Security
digest algorithmdigital signaturePKISHABLAKEposeidon
Cryptographic hash functions often have multiple output sizes and encodings.
This variability makes it difficult for applications to examine a series of
bytes and determine which hash function produced them. Multihash is a universal
data format for encoding outputs from hash functions. It is useful to write
applications that can simultaneously support different hash function outputs as
well as upgrade their use of hashes over time; Multihash is intended to
address these needs.
This specification is a joint work product of
Protocol Labs and the
W3C Credentials Community Group.
Feedback related to this specification should logged in the
issue tracker
or be sent to
public-credentials@w3.org.
Multihash is particularly important in systems which depend on
cryptographically secure hash functions. Attacks may break the cryptographic
properties of secure hash functions. These cryptographic breaks are
particularly painful in large tool ecosystems, where tools may have made
assumptions about hash values, such as function and digest size. Upgrading
becomes a nightmare, as all tools which make those assumptions would have
to be upgraded to use the new hash function and new hash digest length.
Tools may face serious interoperability problems or error-prone special casing.
How many programs out there assume a git hash is a SHA-1 hash?
How many scripts assume the hash value digest is exactly 160 bits?
How many tools will break when these values change?
How many programs will fail silently when these values change?
This is precisely why Multihash was created. It was designed for
seamlessly upgrading systems that depend on cryptographic hashes.
When using Multihash, a system warns the consumers of its hash values that
these may have to be upgraded in case of a break. Even though the system
may still only use a single hash function at a time, the use of multihash
makes it clear to applications that hash values may use different hash
functions or be longer in the future. Tooling, applications, and scripts
can avoid making assumptions about the length, and read it from the
multihash value instead. This way, the vast majority of tooling - which
may not do any checking of hashes - would not have to be upgraded at all.
This vastly simplifies the upgrade process, avoiding the waste of hundreds
or thousands of software engineering hours, deep frustrations, and high
blood pressure.
A multihash follows the TLV (type-length-value) pattern and consists of
several fields composed of a combination of unsigned variable length
integers and byte information.
The following section details the core data types used by the Multihash
data format.
A data type that enables one to express an unsigned integer of variable length.
The format uses the Little Endian Base 128 (LEB128) encoding that is defined in
Appendix C of the
DWARF Debugging Information Format standard,
initially released in 1993.
As suggested by the name, this variable length encoding is only capable of
representing unsigned integers. Further, while there is no theoretical maximum
integer value that can be represented by the format, implementations MUST NOT
encode more than nine (9) bytes giving a practical limit of integers in a range
between 0 and 2^63 - 1.
When encoding an unsigned variable integer, the unsigned integer is serialized
seven bits at a time, starting with the least significant bits. The most
significant bit in each output byte indicates if there is a
continuation byte. It is not possible to express a signed integer with this
data type.
ValueEncoding (bits)hexadecimal notation1000000010x01127011111110x7F12810000000 000000010x800125511111111 000000010xFF0130010101100 000000100xAC021638410000000 10000000 000000010x808001
Implementations MUST restrict the size of the varint to a max of nine bytes
(63 bits). In order to avoid memory attacks on the encoding, the
aforementioned practical maximum length of nine bytes is used. There is
no theoretical limit, and future specs can grow this number if it is truly
necessary to have code or length values larger than 2^31.
A multihash follows the TLV (type-length-value) pattern.
The hash function identifier is an
unsigned variable integer
identifying the hash
function. The possible values for this field are provided in
The Multihash Identifier Registry.
The digest length is an
unsigned variable integer
counting the length of the digest in bytes.
The digest value is the hash function digest with a length of exactly what is
specified in the digest length, which is specified in bytes.
For example, the following is an expression of a SHA2-256 hash in hexadecimal
notation (spaces added for readability purposes):
The first byte (0x12) specifies the SHA2-256 hash function. The second byte
(0x20) specifies the length of the hash, which is 32 bytes. The rest of the
data specifies the value of the output of the hash function.
DWARF Debugging Information Format, Version 3This document defines the format for the information generated by compilers, assemblers and linkage editors, that is necessary for symbolic, source-level debugging.US Secure Hash Algorithms (SHA and SHA-based HMAC and HKDF)Federal Information Processing Standard, FIPSSHA-3 Standard: Permutation-Based Hash and Extendable-Output FunctionsThis Standard specifies the Secure Hash Algorithm-3 (SHA-3) family of functions on binary data.POSEIDON: A New Hash Function for Zero-Knowledge Proof SystemsA modular framework and concrete instances of cryptographic hash functions which work natively with GF(p) objects. The POSEIDON hash function uses up to 8x fewer constraints per message bit than a Pedersen Hash.The BLAKE2 Cryptographic Hash and Message Authentication Code (MAC)This document describes the cryptographic hash function BLAKE2 and makes the algorithm specification and C source code conveniently available to the Internet community. BLAKE2 comes in two main flavors: BLAKE2b is optimized for 64-bit platforms and BLAKE2s for smaller architectures. BLAKE2 can be directly keyed, making it functionally equivalent to a Message Authentication Code (MAC).MD4 to Historic StatusThis document retires RFC 1320, which documents the MD4 algorithm, and discusses the reasons for doing so. This document moves RFC 1320 to Historic status. This document is not an Internet Standards Track specification; it is published for informational purposes.Updated Security Considerations for the MD5 Message-Digest and the HMAC-MD5 AlgorithmsThis document retires RFC 1320, which documents the MD4 algorithm, and discusses the reasons for doing so. This document moves RFC 1320 to Historic status. This document is not an Internet Standards Track specification; it is published for informational purposes.
There are a number of security considerations to take into account when
implementing or utilizing this specification.
TBD
The multihash examples are chosen to show different hash functions and
different hash digest lengths at play. The input test data for all of the
examples in this section is:
The fields for this multihash are - hashing function: sha1 (0x11),
length: 20 (0x14), digest: 0x8a173fd3e32c0fa78b90fe42d305f202244e2739
The fields for this multihash are - hashing function: sha2-256 (0x12),
length: 32 (0x20), digest: 0x41dd7b6443542e75701aa98a0c235951a28a0d851b11564d20022ab11d2589a8
The fields for this multihash are - hashing function: sha2-512 (0x13),
length: 32 (0x20),
digest: 0x52eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4
The fields for this multihash are - hashing function: sha2-512 (0x13),
length: 64 (0x40),
digest: 0x52eb4dd19f1ec522859e12d89706156570f8fbab1824870bc6f8c7d235eef5f4c2cbbafd365f96fb12b1d98a0334870c2ce90355da25e6a1108a6e17c4aaebb0
The fields for this multihash are - hashing function: blake2b-512 (0xb240),
length: 64 (0x40),
digest: 0xd91ae0cb0e48022053ab0f8f0dc78d28593d0f1c13ae39c9b169c136a779f21a0496337b6f776a73c1742805c1cc15e792ddb3c92ee1fe300389456ef3dc97e2
The fields for this multihash are - hashing function: blake2b-256 (0xb220),
length: 32 (0x20),
digest: 0x7d0a1371550f3306532ff44520b649f8be05b72674e46fc24468ff74323ab030
The fields for this multihash are - hashing function: blake2s-256 (0xb260),
length: 32 (0x20),
digest: 0xa96953281f3fd944a3206219fad61a40b992611b7580f1fa091935db3f7ca13d
The fields for this multihash are - hashing function: blake2s-128 (0xb250),
length: 16 (0x10), digest: 0x0a4ec6f1629e49262d7093e2f82a3278
The editors would like to thank the following individuals for feedback on and
implementations of the specification (in alphabetical order).
The Multihash Identifier Registry contains hash functions supported by Multihash
each with its canonical name, its value in hexadecimal notation, and its status.
The following initial entries should be added
to the registry to be created and maintained at (the suggested URI)
http://www.iana.org/assignments/multihash-identifiers:
NameIdentifierStatusSpecificationidentity0x00activeUnknownsha10x11activeRFC 6234sha2-2560x12activeRFC 6234sha2-5120x13activeRFC 6234sha3-5120x14activeFIPS 202sha3-3840x15activeFIPS 202sha3-2560x16activeFIPS 202sha3-2240x17activeFIPS 202sha2-3840x20activeRFC 6234sha2-256-trunc254-padded0x1012activeRFC 6234sha2-2240x1013activeRFC 6234sha2-512-2240x1014activeRFC 6234sha2-512-2560x1015activeRFC 6234blake2b-2560xb220activeRFC 7693poseidon-bls12_381-a2-fc10xb401activePOSEIDON
NOTE: The most up to date place for developers to find the table above, plus
all multihash headers in "draft" status, is
https://github.com/multiformats/multicodec/blob/master/table.csv.
This memo registers the "mh" digest-algorithm in the
HTTP Digest Algorithm Values
registry with the following values:
Digest Algorithm: mhDescription: The multibase-serialized value of a multihash-supported algorithm.References: this documentStatus: standard
This memo registers the "mh" hash algorithm in the
Named Information Hash Algorithm
registry with the following values:
ID: 49Hash Name String: mhValue Length: variableReference: this documentStatus: current