Engineering Privacy for Verified Credentials

In Which We Describe Data Minimization, Selective Disclosure, and Progressive Trust

Draft Community Group Report

Latest editor's draft:
https://w3c-ccg.github.io/data-minimization/
Editor:
Lionel Wolberger
Authors:
Brent Zundel (Evernym/Sovrin)
Irene Hernandez
Christopher Allen
Zachary Larson
Katryna Dow (Meeco)
Participate:
GitHub w3c-ccg/data-minimization
File a bug
Commit history
Pull requests

Abstract

We often share information on the World Wide Web, though some of it is private. The W3C Credentials Community Group focuses on how privacy can be enhanced when attributes are shared electronically. In the course of our work, we have identified three related but distinct privacy enhancing strategies: "data minimization," "selective disclosure," and "progressive trust." These enhancements are enabled with cryptography. The goal of this paper is to enable decision makers, particularly non-technical ones, to gain a nuanced grasp of these enhancements along with some idea of how their enablers work. We describe them below in plain English, but with some rigor. This knowledge will enable readers of this paper to be better able to know when they need privacy enhancements, to select the type of enhancement needed, to assess techniques that enable those enhancements, and to adopt the correct enhancement for the correct use case.

Status of This Document

This specification was published by the Credentials Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

If you wish to make comments regarding this document, please send them to public-credentials@w3.org (subscribe, archives).

1. Introduction

We often share information on the World Wide Web, though some of it is private. The W3C Credentials Community Group focuses on how privacy can be enhanced when attributes are shared electronically. In the course of our work, we have identified three related but distinct privacy enhancing strategies: "data minimization," "selective disclosure," and "progressive trust." These enhancements are enabled with cryptography. The goal of this paper is to enable decision makers, particularly non-technical ones, to gain a nuanced grasp of these enhancements along with some idea of how their enablers work. We describe them below in plain English, but with some rigor. This knowledge will enable readers of this paper to be better able to know when they need privacy enhancements, to select the type of enhancement needed, to assess techniques that enable those enhancements, and to adopt the correct enhancement for the correct use case.

2. Three examples

Three examples of how people would like their privacy preserved in the process of sharing credentials help to illuminate these three techniques.

Diego attempts to use an online service and is asked to share his location in order to prove his geolocation. Diego hesitates, since the service doesn't need his location everyday, everywhere. He knows that the service may share this information with other parties without meaningful consent on his part. Thoughts pass through his mind: What location data does the service actually need? What will it read in future? Is there a way for him to share his location just this once, or to only share an approximate location?

Selena hands her driver's license to a bouncer to prove she is of drinking age. As he looks it over, she sees him inspecting her date of birth and home address. He only needs to know that she is over 21. Is there a way to disclose that she is indeed old enough without revealing her actual age, along with her home address and city of residence as well?

Proctor, negotiating with a real estate agent to purchase a home, reveals a letter from his bank stating his credit limit. He wanted to reveal its approximate amount only, but the agent insisted on verifying that the letter was authentic. Proctor feels the agent now has the upper hand in the negotiation, as the letter reveals more than just its authenticity. Could he have revealed only an approximate amount and reveal more details as the negotiations progress?

Each story features information that is verifiable: a home address, age, or credit limit. We call such information a credential, and a detail of a credential we call an attribute. We have three strategies for enhancing the privacy of digitally shared credential attributes, and each story highlights one. Diego's story highlights the need for "data minimization," Selena's for "selective disclosure," and Proctor's for "progressive trust." Let's examine each one in detail before discussing enablers.

3. Privacy Enhancements

We propose the following three privacy enhancements. (Sources used to curate these definitions are listed in .)

3.1 Data Minimization

Data minimization is the act of limiting the amount of shared data strictly to the minimum necessary in order to successfully accomplish a task or goal. There are three types of minimization:

Data minimization is enacted primarily by policy decisions made by stakeholders in the credentials ecosystem:

Data minimization policies impact selective disclosure, the next privacy enhancement.

3.2 Selective Disclosure

Selective disclosure is the ability of an individual to granularly decide what information to share. Stakeholders in the credentials ecosystem enable selective disclosure capabilities in the following ways:

Once data minimization policies and selective disclosure are in place, the third and last enhancement can be applied.

3.3 Progressive Trust

Progressive trust is the ability of an individual to gradually increase the amount of relevant data revealed as trust is built or value generated.

To enable progressive trust capabilities, stakeholders in the credentials ecosystem act in the following ways:

4. Crypto Enablers

Implementing privacy enhancements depends on organizational decisions. Determination of the data needed, with an eye towards data minimization, along with a clear model of how data is used over the lifecycle of engagement, goes a long way towards enabling progressive trust. However, policies are not enough. When enhancing privacy online, some data parts must be revealed while others remain concealed. Concealment is achieved mostly by the art of cryptography, from the greek word "kryptos," meaning hidden, like in a crypt. Crypto (a short word we will use for cryptography) enables us to achieve our goal by means of three primary enablers: having a secret, having a difficult mathematical task, and having zero-knowledge enablers. The children's "Where's Waldo?" illustrated book series helps us to understand these three enablers. In these books a distinctively dressed man appears only once on each page, wearing a striped hat. Readers are asked to scour the page and locate him. We can understand the three enablers by examining Where's Waldo one step at a time.

Where's Waldo books are drawings, while crypto is built from mathematical equations, basically puzzles based on numbers. We provide the interested reader with a layman's overview in .

5. Three Solutions

We now return to our opening examples, apply the privacy preserving strategies and enablers described, and describe the improved outcomes.

The online service that Diego uses does an internal policy review and realizes (a) it only needs a location when a user signs up for an account, and (b) it does not need an exact address, only the county district. It changes its interface to request a Verifiable Credential for Diego's location. Diego's system creates this credential for him, which can be inspected to reveal the county district. The crypto to enable this would be similar to that described in . With this data minimization, the online service has less risk of violating data protection rules, is less a target for hacking, and has lower overall costs, while at the same time preserving Diego's privacy.

The bar seeking to verify Selena's age uses selective disclosure as built into the Verifiable Claims system. Selena will no longer share her date of birth. Instead, Selena creates a secret that we harness to craft a crypto-formatted credential. This crypto makes it easy to verify her age, but difficult to determine her exact date of birth. The bouncer's system can perform a zero-knowledge proof to determine the credential is valid and that Selena is older than twenty-one, without revealing her birthday or her secret. The bouncer sees she is over twenty-one without seeing her date of birth, residence address, or any other unnecessary information. In we show the process step-by-step.

The real estate agency working with Proctor implements a data model specifying what is required at each step of the real estate negotiation. The first step requires only proof of being an account holder in good standing at a known bank, so Proctor does not have to reveal the detailed letter at this point. As their negotiation continues, Proctor reveals more and more information as required. Some steps of the process may share Verifiable Claims encoded with crypto.

6. Summary

The World Wide Web accelerates the sharing of credentials and other digital interactions, and many regulations have been passed and strategies proposed to protect privacy, some of which require cryptography. To align terminology, the World Wide Web Credentials Community Group has found three related but distinct privacy enhancing strategies that create a useful rubric for discussing the challenges and arriving at solutions. We share the examples of Diego, Selena, and Proctor and propose "data minimization," "selective disclosure," and "progressive trust," with accompanying crypto protocols as useful semantics for accelerating the adoption of digital interaction while protecting privacy.

A. Definition Sources

This section contains definitions we curated, based on research and oral interviews, to create the definitions of data minimization, selective disclosure and progressive trust.

A.1 Data Minimization

Definitions of data minimization that we considered in the formation of our definition above.

A.2 Selective Disclosure

Definitions of selective disclosure that we considered in the formation of our definition above.

A.3 Progressive Trust

Definitions of progressive trust that we considered in the formation of our definition above. Note that we included definitions of progressive trust and progressive disclosure as well.

B. Basic Crypto Concepts

This appendix describes basic cryptographic concepts critical to the privacy preserving engineering of credential attributes. For readability, we use the short word, "crypto."

B.1 Overview

Crypto is a huge field with highly specialized jargon, too much to cover here. But non-specialists would benefit from some understanding of relevant crypto in order to make informed decisions. We begin with a brief overview of several concepts from number theory that serve as a foundation for all crypto used in this process. This is a curated list of topics progressing from the simple to the more complex. Notice how ideas are re-used and layered as you read on.

B.2 Number Theory

Number theory refers to the study of the behavior of integer numbers such as one, three, or two hundred. The following are behaviors of these numbers that make them useful for crypto:

B.3 Primary Objectives

The curious behavior of numbers is exploited to achieve four primary crypto objectives.

B.4 Ten Crypto Concepts

Over the decades hundreds if not thousands of crypto protocols, processes, algorithms and protocols have been innovated to achieve these objectives, by cobbling together the above six behaviors in different ways. We present here a brief tour of the ten most significant ones in our field of verifiable credentials:

C. Drinking Age Credential Implementation

The birthday of an individual is formatted into a verifiable credential, which can be inspected to reveal the age of the credential holder without revealing their birthdate. The flow described here is based on the developing Verifiable Claims standard of the W3C Credentials Community Group. It uses cryptography developed by Jan Camenisch, as implemented by Sovrin.

This is a work in progress. Note that other types of crypto could be applied to achieve the same privacy preserving goals.

C.1 Communication Flow

The flow below may be copies and pasted into the [WEB-SEQ] webpage to generate a flow diagram.

title Verifiable credential using Selective Disclosure
participant Valid Time Oracle
participant Janet
participant ID Provider
participant Ledger
participant Bar

note over Janet:Prover
note over Bar:Validator

note over Janet,Bar: Preparation and Setup

note right of ID Provider:Infrastructure
ID Provider->Ledger: Define Schema (Name, Birthdate, Address)
ID Provider->Ledger: credential Definition (Pub Key, etc.)
ID Provider->ID Provider: Generate Prv Key for this credential
ID Provider->Ledger:Revocation Registry

note left of Bar: Prepare to accept credentials
Bar->Bar:Install Agent
Bar->Ledger: Check schema

note over Janet,Bar: Begin Use Case
Janet->ID Provider: Request ID
ID Provider-->Janet: ID will be issued as a digital credential
note right of Janet: Prepare to receive credentials
Janet->Janet: Install Agent
Janet->Janet: Prv Key Generate, Store
Janet->Ledger:Check Schema
Ledger->Janet:credential Definition
Janet-->ID Provider:Proof of Name, Birthdate, Address
Janet->ID Provider: Blinded secret
ID Provider->Janet: credential
Janet->Janet: Validate credential against credential Def

note over Janet,Bar: Janet goes to the bar
note left of Bar: Can Janet Enter?
Bar->Janet: Request Proof of Age
Janet->Valid Time Oracle: Get time
Valid Time Oracle->Janet: Time credential
Janet->Janet:Generate Proof (This person is over 21)
Janet->Bar: Provide Proof
Bar->Bar: Evaluate proof
Bar->Ledger: Verify on Ledger
Ledger->Bar: Verification
Bar->Janet: Come in

note left of Bar: Invite to club

Bar->Janet: Join loyalty club? (requires valid postal code)
Janet->Janet:Generate Proof (postal code)
Janet->Bar: Provide Proof
Bar->Bar: Evaluate proof
Bar->Ledger: Verify on Ledger
Ledger->Bar: Verification
Bar->Janet: Have Loyalty Card

C.2 Crypto Details

Below are some of the detailed mathematics involved in issuing a verifiable credential as implemented by Sovrin, a non-profit organization dedicated to managing a decentralized, public network for the purposes of self-sovereign identity.

C.2.1 Issuer Setup

The following setup is a necessary precursor to issuing a privacy-preserving credential.

C.2.1.1 Compute

Perform the mathematical calculations required to curate the essential ingredients of the operations we are about to perform. Some of these results, like the private keys, are very sensitive and must be kept secret by the credential holder; others are to be shared.

  • Random ๐“น', ๐“บ', 1024-bit prime numbers, such that ๐“น = 2๐“น' + 1 and ๐“บ = 2๐“บ' + 1 are both 1024-bit prime numbers.
  • ๐“ท = ๐“น๐“บ.
  • Random quadratic residue: ๐“ข mod ๐“ท
  • Random ๐“ง๐“ฉ, ๐“ง๐“ก1, . . . , ๐“ง๐“ก๐“ต โˆˆ \[2: ๐“น'๐“บ' - 1\], where ๐“ต is the number of attributes in the credential.
  • ๐“ฉ = ๐“ข๐“ง๐“ฉ mod ๐“ท
  • ๐“ก๐“ฒ = ๐“ข๐“ง๐“ก๐“ฒ mod ๐“ท, 1 โ‰ค ๐“ฒ โ‰ค ๐“ต
  • Issuer private key ๐“ผ๐“ด๐“ฌ = ๐“น'๐“บ'
  • Issuer public key ๐“น๐“ด๐“ฌ = {๐“ท, ๐“ข, ๐“ฉ, ๐“ก1, . . . , ๐“ก๐“ต }
C.2.1.2 Proof of Correctness

As a result of the above computations, we then curate the following. This proof, along with the public keys, is the computational algorithm that will be used to validate the credential.

  • Random ๐“ง'๐“ฉ, ๐“ง'๐“ก1, . . . , ๐“ง'๐“ก๐“ต โˆˆ \[2: ๐“น'๐“บ' - 1\]
  • ๐“ฉ' = ๐“ข๐“ง'๐“ฉ mod ๐“ท
  • ๐“ก'๐“ฒ = ๐“ข๐“ง'๐“ก๐“ฒ mod ๐“ท, 1 โ‰ค ๐“ฒ โ‰ค ๐“ต
  • ๐“ฌ = ๐“—๐“ช๐“ผ๐“ฑ ( ๐“ฉ || ๐“ก1 || . . . || ๐“ก๐“ต || ๐“ฉ' || ๐“ก'1 || . . . || ๐“ก'๐“ต )
  • ๐“ง''๐“ฉ = ๐“ง'๐“ฉ + ๐“ฌ ๐“ง๐“ฉ
  • ๐“ง''๐“ก๐“ฒ = ๐“ง'๐“ก๐“ฒ + ๐“ฌ ๐“ง๐“ก๐“ฒ , 1 โ‰ค ๐“ฒ โ‰ค ๐“ต

The Cred Def is comprised of the public key and the proof of correctness; this is published to the distributed ledger.

C.2.2 Issuing a Credential

With setup complete, we can now issue the credential in a privacy-preserving manner.

C.2.2.1 For Each Credential

For each credential issued, perform the following operations.

C.2.2.1.1 Issuer Computes

A cryptographic accumulator is constructed in order to enable zero-knowledge queries further on. It is a one-way membership function, including the claim in the membership set. The operation can then answers a query as to whether a potential candidate is a member of a set without revealing the individual members of the set.

  • ๐“๐“ฒ = accumulator index
  • ๐“ค๐“ฒ = user index
  • ๐“ถ2 = ๐“—๐“ช๐“ผ๐“ฑ ( ๐“๐“ฒ || ๐“ค๐“ฒ )
  • 256-bit integer representations of each of the attributes: ๐“ถ3 , . . . , ๐“ถ๐“ต
  • ๐“ท0 = nonce
C.2.2.1.2 Issuer Sends ๐“ท0 to Prover

This nonce is provided to the Prover for calculation of the Prover's proof of correctness.

C.2.2.1.3 Prover Receives ๐“ท0 and Computes the Following

The prover aggregates and prepares public keys for use in validating the signatures. The prover also commits to a chosen value while keeping it temporarily hidden, making the calculation binding.

  • Retrieves Issuerโ€™s public key ๐“น๐“ด๐“ฌ
  • Retrieves Issuerโ€™s proof of correctness
  • Generates:
    • ๐“ถ1 = pedersen commitment of claim link secret
    • Random ๐“ฟ', ๐“ฟ'', ๐“ถ'1
  • ๐“ท1 = nonce
C.2.2.1.4 Prover Verifies the Issuerโ€™s Proof of Correctness
  • ๐“ฉ^ = ๐“ฉ๐“ฌ๐“ข๐“ง''๐“ฉ mod ๐“ท
  • ๐“ก^๐“ฒ = ๐“ก๐“ฒ๐“ฌ๐“ข๐“ง''๐“ก๐“ฒ mod ๐“ท, 1 โ‰ค ๐“ฒ โ‰ค ๐“ต
  • Verifies ๐“ฌ = ๐“—๐“ช๐“ผ๐“ฑ ( ๐“ฉ || ๐“ก1 || . . . || ๐“ก๐“ต || ๐“ฉ^ || ๐“ก^1 || . . . || ๐“ก^๐“ต )
C.2.2.1.5 Prover Computes
  • ๐“ค = ๐“ข๐“ฟโ€™๐“ก1๐“ถ1 mod ๐“ท
  • ๐“คโ€™ = ๐“ข๐“ฟโ€™โ€™๐“ก1๐“ถโ€™1 mod ๐“ท
  • ๐“ฌโ€™ = ๐“—๐“ช๐“ผ๐“ฑ ( ๐“ค || ๐“คโ€™ || ๐“ท0 )
  • ๐“ฟ^ = ๐“ฟโ€™โ€™ + ๐“ฌโ€™๐“ฟโ€™
  • ๐“ถ^1 = ๐“ถโ€™1 + ๐“ฌโ€™๐“ถ1
C.2.2.1.6 Prover Sends ๐“Ÿ = { ๐“ค, ๐“ฌโ€™, ๐“ฟ^, ๐“ถ^1, ๐“ท1 } to the Issuer
C.2.2.1.7 Issuer Verifies Prover Setup
  • Computes ๐“ค^ = ๐“ค-๐“ฌ๐“ข๐“ฟ^๐“ก1๐“ถ^1 mod ๐“ท
  • Verifies ๐“ฌโ€™ = ๐“—๐“ช๐“ผ๐“ฑ ( ๐“ค || ๐“ค^ || ๐“ท0 )
C.2.2.1.8 Issuer Signs the Credential by Computing the Following
  • ๐“  = ๐“ฉ / (๐“ค๐“ข๐“ฟ*๐“ก2๐“ถ2๐“ก3๐“ถ3 ยทยทยท ๐“ก๐“ต๐“ถ๐“ต ) mod ๐“ท
  • ๐“ญ = ๐“ฎ-1 mod ๐“นโ€™๐“บโ€™
  • ๐“ = ๐“ ๐“ญ mod ๐“ท
  • ๐“โ€™ = ๐“ ๐“ป mod ๐“ท
  • ๐“ฌโ€™โ€™ = ๐“—๐“ช๐“ผ๐“ฑ (๐“  || ๐“ || ๐“โ€™|| ๐“ท1 )
  • ๐“ผ๐“ฎ = (๐“ป - ๐“ฌโ€™โ€™๐“ฎ-1) mod ๐“นโ€™๐“บโ€™
C.2.2.1.9 Issuer Sends ๐“ž = {๐“, ๐“ฎ, ๐“ฟ*, ๐“ผ๐“ฎ, ๐“ฌโ€™โ€™, ๐“ถ2, . . . , ๐“ถ๐“ต } to the Prover
C.2.2.1.10 Prover Receives ๐“ž and Does the Following
Prover Computes
  • ๐“ฟ = ๐“ฟโ€™ + ๐“ฟ*
  • ๐“ โ€™ = ๐“ฉ / (๐“ข๐“ฟ๐“ก2๐“ถ2๐“ก3๐“ถ3 ยทยทยท ๐“ก๐“ต๐“ถ๐“ต ) mod ๐“ท
  • ๐“ญโ€™ = ๐“ฌโ€™โ€™ + ๐“ผ๐“ฎ ๐“ฎ
  • ๐“^ = ๐“๐“ญโ€™๐“ข๐“ฟโ€™๐“ผ๐“ฎ mod ๐“ท
Prover Verifies
  • ๐“ฎ is prime and 2596 โ‰ค ๐“ฎ โ‰ค 2596 + 2119
  • ๐“ โ€™ = ๐“๐“ฎ mod ๐“ท
  • ๐“ฌโ€™โ€™ = ๐“—๐“ช๐“ผ๐“ฑ (๐“ โ€™ || ๐“ || ๐“^ || ๐“ท1 )
Prover Stores Primary Claim ({๐“ถ1, . . . , ๐“ถ๐“ต}, ๐“, ๐“ฎ, ๐“ฟ)

C.3 For Additional Information

The crypto used here is originally from [IDENTITY-MIXER].

The Sovrin team shares additional information and working code at the following links.

D. References

D.1 Informative references

[IDENTITY-MIXER]
Identity Mixer. URL: https://www.zurich.ibm.com/identity_mixer/
[VC-CODE]
Verifiable Credentials Code. URL: https://github.com/hyperledger/indy-sdk/blob/master/libindy/src/api/anoncreds.rs
[VC-EXAMPLE]
Verifiable Credentials Example Usage in Python. URL: https://github.com/hyperledger/indy-sdk/blob/master/samples/python/src/anoncreds.py
[WEB-SEQ]
Web Sequence Diagrams. URL: https://www.websequencediagrams.com/