Merkle Disclosure Proof 2021

Terminology

We use the term proof in place of signature throughout this document. This is important because not all cryptographic prooving techniqueas relying exclusively on a single digital signature.

See [[DID-CORE]] for definitions of commonly-used DID terminology.

See [[VC-DATA-MODEL]] for definitions of commonly-used DID terminology.

Motivation

Single message signature schemes make generic selective disclosure proofs difficult or impossible to implement on top of standard cryptographic tooling.

Single Message Proofs

Traditional signature and proof formats have focused on single message signature and verification schemes.

For example this JWT encodes a Verifiable Credential, the input to the signature and verification algorithms is:

          "base64url(JSON.stringify(header)).base64url(JSON.stringify(payload))"

        {
            "alg": "EdDSA",
            "kid": "did:key:z6MkokrsVo8DbGDsnMAjnoHhJotMbDZiHfvxM4j65d8prXUr#z6MkokrsVo8DbGDsnMAjnoHhJotMbDZiHfvxM4j65d8prXUr"
        }

Example outdated, will be revised to use JWP.

        {
            "iss": "did:key:z6MkokrsVo8DbGDsnMAjnoHhJotMbDZiHfvxM4j65d8prXUr",
            "sub": "did:example:ebfeb1f712ebc6f1c276e12ec21",
            "vc": {
              "@context": [
                "https://www.w3.org/2018/credentials/v1",
                "https://w3id.org/security/suites/jws-2020/v1"
              ],
              "id": "http://example.edu/credentials/3732",
              "type": [
                "VerifiableCredential"
              ],
              "issuer": {
                "id": "did:key:z6MkokrsVo8DbGDsnMAjnoHhJotMbDZiHfvxM4j65d8prXUr"
              },
              "issuanceDate": "2010-01-01T19:23:24Z",
              "credentialSubject": {
                "id": "did:example:ebfeb1f712ebc6f1c276e12ec21"
              },
              "proof": {
                "type": "JsonWebSignature2020",
                "created": "2010-01-01T19:23:24Z",
                "verificationMethod": "did:key:z6MkokrsVo8DbGDsnMAjnoHhJotMbDZiHfvxM4j65d8prXUr#z6MkokrsVo8DbGDsnMAjnoHhJotMbDZiHfvxM4j65d8prXUr",
                "proofPurpose": "assertionMethod",
                "jws": "eyJhbGciOiJFZERTQSIsImI2NCI6ZmFsc2UsImNyaXQiOlsiYjY0Il19..k_7t6h5IGSWFAqIlqru3zyZ0FDPQGo88p9jDeKC1yw8oxd7xj6B70tZNSaspWkMyWbXFmZ5yCO8dlZZ9_kKbAQ"
              }
            },
            "jti": "http://example.edu/credentials/3732",
            "nbf": 1262373804
          }

In the case of JSON-LD Linked Data Proofs, the input to the signature is typically calculated like this:

        async canonize(
            input,
            { documentLoader, expansionMap, skipExpansion }
          ) {
            return jsonld.canonize(input, {
              algorithm: 'URDNA2015',
              format: 'application/n-quads',
              documentLoader,
              expansionMap,
              skipExpansion,
              useNative: this.useNativeCanonize,
            });
          }
        
        async canonizeProof(proof, { documentLoader, expansionMap }) {
        // `jws` must not be included in the proof
        proof = { ...proof };
        delete proof.jws;
        return this.canonize(proof, {
            documentLoader,
            expansionMap,
            skipExpansion: false,
        });
        }
        
        async createVerifyData({
            document,
            proof,
            documentLoader,
            expansionMap,
          }) {
            const c14nProofOptions = await canonizeProof(proof, {
              documentLoader,
              expansionMap,
            });
            const c14nDocument = await canonize(document, {
              documentLoader,
              expansionMap,
            });
            return Buffer.concat([
              await sha256(c14nProofOptions),
              await sha256(c14nDocument),
            ]);
          }

While the JSON-LD approach is more complex, it performs the same function as the base64url and string encoding used by JOSE.

At the end of these "payload preparation" steps, a digital signature sign or verify operation is used.

Problem Statement

Tampering with a payload breaks an associated signature.

This requires a holder to return to the issuer for a new verifiable credential when attempting to reveal a subset of the claims the issuer has attested to in original verifiable credential.

Requiring a Holder to interact with the original issuer harms privacy and can be expensive in time and bandwith or impossible in offline scenarios.

How can a holder reveal some subset of issuer attested claims to a verifier, without contacting the issuer or asking the verifier to contact the issuer? Solutions to this problem are often referred to as Selective Disclosure

Multi Message Proofs

A multi message proof provides cryptographic tamper protection and authentication capabilities for a set of messages.

Because the payload of the proof is broken up before the sign and verify operations, the holder can disclose parts of the `payload` and parts of the `proof` without breaking the cryptographic assurances.

There are a few examples of this approach under developement:

[[LDP-BBS2020]]
Json Web Proof

A multi message proof that is applied to an object will require some stable transformations between messages and object. See the section .

Cryptographic Toolkit

This suite proposes a solution for selective disclosure of issuer attested claims (verifiable credentials).

Unlike previous solutions such as CL Signatures or BBS+ Signatures 2020, this approach does not rely on Zero Knowledge Proofs, instead it relies on Merkle Proofs.

Merkle Proofs

A key advantage of using merkle proofs is proving set membership by only relying on cryptographic hash functions.

Because a verifier will learn some information about undislosed set members when verifying a proof for disclosed ones, this solution does leak some information. The information a verifier learns is the path from a leaf to a merkle root, which proves a member exists in the set, but this path is built from hashes of members of the set the prover may not be dislosing.

A robust summary of merkle proofs is beyond the scope of this specification. The proof of concept we build relies on this implementation. The diagram below is from the wikipedia page on merkle trees.

Json Web Signatures

The most popular solution to encoding digital signatures that rely on standard cryptography in JSON is [[RFC7515]].

A robust summary of Json Web Signatures is beyond the scope of this specification.

By using a standard digital signature approach to sign the merkle root, a holder can then disclose messages and proofs, which can be verified as originating from the issuer who produced the signature using their private key.

An advantage of building selective disclosure proofs on top of JWS is that keys already in use for single message proofs can be used with multi message selective dislosure proofs.

[[RFC7515]] has been implemented in many languages. JWS and JWT are used as the foundation of most modern identity assurance systems.

Compression

One of the disadvantages of merkle proofs is their size.

As you can see in the merkle tree diagram, the size of a single set membership proof is O(log n). Depending on the size of the associated hashes, this can make sparse disclosures of set members (revealing all but a few members) very expensive in proof size.

Luckily each membership proof share common nodes in the tree, allowing for compression algorithms to provide significant advantage when disclosing most of the members of a set.

In our proof of concept we use this compression implementation, which is essentially the same as gzip.

Compressed encoding of merkle proofs is an area where better standards are needed. The solution we have used is subject to BREAKING CHANGES.

Proof Suite

This suite specification describes an approach to selective dislosure proofs that is based on the original [[LD-PROOFS]] specification.

We are working with the community to develop this same proof technique for use without [[LD-PROOFS]] at the DIF Applied Cryptography Working Group. There is currently no registered way to encode multi message proofs in JOSE, but we are working with the community to remedy this.

There are 2 unsupported features which we require to enable multi message disclosure proofs in JOSE.

Standard normalization algorithms for converting between objects and messages.
Standard proof encodings for multi message proofs, which allow a holder to derive new proofs.

JSON-LD based proofs already support these requirements as was first demonstrated in [[LDP-BBS2020]]. This suite takes a more generic approach to the problem in order to support normalization that operate on JSON (which might or might not be JSON-LD).

Normalization

In order to support signing and verifying of objects where object members are dislosed or ommitted, a bi-directional losseless message conversion process is required.

In our proof of concept we name two functions:

objectToMessage: Converts a JSON object to a set of string messages.
messagesToObject: Converts a set of string messages to a JSON object.

It is important that these processes be stable, such that chaining them together does not result in an object that is different than the input.

JSON Pointer

[[RFC6901]] defines operations over JSON objects, that are sufficient for use with this suite.

Here is some TypeScript codes that implements our required functions:

import pointer from 'json-pointer';

const objectToMessages = (obj: any) => {
  const dict = pointer.dict(obj);
  const messages = Object.keys(dict).map(key => {
    return `{"${key}": "${dict[key]}"}`;
  });
  return messages;
};

const messagesToObject = (messages: string[]) => {
  const obj = {};
  messages
    .map(m => {
      return JSON.parse(m);
    })
    .forEach(m => {
      const [key] = Object.keys(m);
      const value = m[key];
      pointer.set(obj, key, value);
    });
  return obj;
};

export { objectToMessages, messagesToObject };

URDNA 2015

[[RDF-DATASET-NORMALIZATION]] defines operations over JSON-LD objects, that are sufficient for use with this suite.

This normalization approach is different from [[LD-PROOFS]] and [[LDP-BBS2020]]. The reason for the diffence is to address a common way to encode object payloads as messages, that is not bound to RDF, but remains compatible with it. Also the examples in the current repo are massively outdated, and will be revised to support JWP.

URDNA2015 normalization is not recommended due to its fragility with respect to context changes.

See the source code here.

Membership Proofs

The membership proofs are expected to be represented as JWPs.

We are maintaining the DRAFT for JWP Merkle Proofs here

Sign and Verify

Unlike traditional single message proof schemes such as compact JWTs, we are only signing the merkle root. This allows a Holder to adjust both messages and proofs to selective disclose object members.

Because messages and proofs are not signed or verified, it is critical that the merkle root signature be verified first, before verifying merkle proofs for the individual messages.

Encoding

As mentioned in , merkle proofs can be large, especially when many proofs must be provided when only a single message is withheld by a Holder.

In order to address this challenge, we rely on a proof representation that makes use of binary compression:

Example outdated, will be revised to use JWP.

{
  "type": "MerkleDisclosureProof2021",
  "created": "2021-08-22T19:36:43Z",
  "verificationMethod": "did:example:123#key-0",
  "proofPurpose": "assertionMethod",
  "normalization": "jsonPointer",
  "proofs": "eJzNzjeSq0gAANC7TNpbBQiEIGy8Ny0at7UBIIGEl4Q//f97hAmmauKXvH+/YPwMhwSEJabAxAOLpLiw7pLXZKRKfowbyt2zV/jmSpc9DJ91XMyE5YUIKOvzokey0gaHejnyOJh71jFIDYhC240BvOfb61CsWmw8gtY78UkSNl2S9+gs3T5wsU/LbI3eqFLNBMur8rFfp2hVz+URAUlib5DUlh6eEyLvEvzQvRIvF3wvH1//fEFFK5Vdoje3HsKamlSB7vmLxIrNwptqX3t1nbJP9zgYEv+WMoqf3JOoQPKWrQxbR5dRO1XJt8MYsm1geNGxIl90bo4kQ2egs0s/WznKjBDHQPP147o/QsBt8uJdG1KuqCy+KKNx/smyMwbyKq7c+b7onLJ4g6VxEmeLL25mfCU69vySRY31ljD1W8oomw7pgx0T1qVcG+2x2h9AH/6HtIgXBqwdeZWPPEzgDualrI39mh7Xtzn1804NEpEwN30uUj4XIs4vZUzm1uJVDdRIfhnda9WxBIs30YDZ4zw3nRwAj+KV0/wJr0WuXZLYb75bFmPu/r7xHUcFaEBStKWmJaBMDCJ9OyWYv5ZtrWY2cHT2t5Tv72UJHST54WiZGhd2GU4B9eJFhPiXbQCmkSbNShxPCaCyv2GGWdsPkBNhuhDyzY8Dl5szLzFPwFDcrHBEA8iz8pNlm9r7jI/bIqFP4f0ZRe6UxtG7CqUK1ZVDmew90f8KYzi/pRzngzCh4ipaZtcIDxK8qqQZiHTpk9YVW7/ko/qq3uQVVbA4K6XdiDMWtU0UykEYH52jDEVqFC22SaBKBAyqV+8UKoRvBaoNhXYfyeWiWh7ZEMzEzNQ0xS65T2GqOUASdelo/m8IpcIn161C2r1CmUrpY5Q2KevNsqKxnLZ+Hqrv6pmejuFPNgrn1hAHXcCpS6Y+sdhPOr54fwVEsLzX7ByWMM0raSmdDZq8LFVnJiJD8GRqnd48jXXalVDbnH+ceukerik66TxY1+82TG5Y7Xly2+es7xbCdnbZ6/aASUzbptBsO3PoppuLKNV+rPHfHzccgQI=",
  "jws": "eyJhbGciOiJFZERTQSJ9.ImkrMUVBbU9mMDJUM2JwdHdTcW5DNG1sNlc5TGNmYUU1cGVSY3JLbHdvUnc9Ig.ZXlKaGJHY2lPaUpGWkVSVFFTSXNJbUkyTkNJNlptRnNjMlVzSW1OeWFYUWlPbHNpWWpZMElsMTkuLmpEUFJMbW9taVJmc1kwX1hFOFdwVVNTZXdOeEUwRHI4LVlxNXBOeGdoZUJmVnhORlQ3aFZlMnBsU3NsT05PLXMwUzlLcGpTcXhqM2I2alowdDFqSERR"
}

Deriving

In order to derive a new Verifiable Credential which discloses a subset of the original, the holder must filter the messages associated with the original object, and the proofs associated with those messages.

Unlike [[LDP-BBS2020]], our proof of concept does not rely on [[JSON-LD-FRAMING]]. This is due to also not relying exclusively on [[RDF-DATASET-NORMALIZATION]]. Instead we compute the messages and proofs by taking the set difference of the of the original and derived document objects. This approach works with any stable normalization algorithm, and is the reason for the difference in our normalization process compared to [[LDP-BBS2020]].

Use Cases

These uses cases are hypothetical.

Supply chain traceability

The GS1 Digital Link https://id.gs1.org/01/9506000134352 is also known as Dal Giardino Risotto Rice with Mushrooms 411g.

Perhapse not all manufacturing details are necessary to disclose until a recall is issued, at which point sensitive product and supply chain details (costs, locations, times) can be disclosed from associated original credentials.

During an investigation, supply chain participants might be compelled to fully disclose credentials to an auditor or trusted third party.

Disclosing known aliases

Sometimes an authority or public registry maintainer may know that a single entity is known as multiple pseudonmous identifiers. For example:

The drivers license Q6780 22812 41253 might be also known as Pearline Abshire. During an investigation, her legal councel might want to be able to prove that she used to be known as Katarina Kozey with drivers license number 9375599 when she worked as an informant on narcotics activity in Alaska before being relocated under witness protection program.

Minimizing verifier liability

Data processors should not collect sensitive information they do not need.

Protecting subject privacy

Data subjects should not need to expose sensitive information they do not need.