We often share information on the World Wide Web, though some of it is private. The W3C Credentials Community Group focuses on how privacy can be enhanced when attributes are shared electronically. In the course of our work, we have identified three related but distinct privacy enhancing strategies: "data minimization," "selective disclosure," and "progressive trust." These enhancements are enabled with cryptography. The goal of this paper is to enable decision makers, particularly non-technical ones, to gain a nuanced grasp of these enhancements along with some idea of how their enablers work. We describe them below in plain English, but with some rigor. This knowledge will enable readers of this paper to be better able to know when they need privacy enhancements, to select the type of enhancement needed, to assess techniques that enable those enhancements, and to adopt the correct enhancement for the correct use case.

Introduction

We often share information on the World Wide Web, though some of it is private. The W3C Credentials Community Group focuses on how privacy can be enhanced when attributes are shared electronically. In the course of our work, we have identified three related but distinct privacy enhancing strategies: "data minimization," "selective disclosure," and "progressive trust." These enhancements are enabled with cryptography. The goal of this paper is to enable decision makers, particularly non-technical ones, to gain a nuanced grasp of these enhancements along with some idea of how their enablers work. We describe them below in plain English, but with some rigor. This knowledge will enable readers of this paper to be better able to know when they need privacy enhancements, to select the type of enhancement needed, to assess techniques that enable those enhancements, and to adopt the correct enhancement for the correct use case.

Three examples

Three examples of how people would like their privacy preserved in the process of sharing credentials help to illuminate these three techniques.

Diego attempts to use an online service and is asked to share his location in order to prove his geolocation. Diego hesitates, since the service doesn't need his location everyday, everywhere. He knows that the service may share this information with other parties without meaningful consent on his part. Thoughts pass through his mind: What location data does the service actually need? What will it read in future? Is there a way for him to share his location just this once, or to only share an approximate location?

Selena hands her driver's license to a bouncer to prove she is of drinking age. As he looks it over, she sees him inspecting her date of birth and home address. He only needs to know that she is over 21. Is there a way to disclose that she is indeed old enough without revealing her actual age, along with her home address and city of residence as well?

Proctor, negotiating with a real estate agent to purchase a home, reveals a letter from his bank stating his credit limit. He wanted to reveal its approximate amount only, but the agent insisted on verifying that the letter was authentic. Proctor feels the agent now has the upper hand in the negotiation, as the letter reveals more than just its authenticity. Could he have revealed only an approximate amount and reveal more details as the negotiations progress?

Each story features information that is verifiable: a home address, age, or credit limit. We call such information a credential, and a detail of a credential we call an attribute. We have three strategies for enhancing the privacy of digitally shared credential attributes, and each story highlights one. Diego's story highlights the need for "data minimization," Selena's for "selective disclosure," and Proctor's for "progressive trust." Let's examine each one in detail before discussing enablers.

Privacy Enhancements

We propose the following three privacy enhancements. (Sources used to curate these definitions are listed in .)

Data Minimization

Data minimization is the act of limiting the amount of shared data strictly to the minimum necessary in order to successfully accomplish a task or goal. There are three types of minimization:

Data minimization is enacted primarily by policy decisions made by stakeholders in the credentials ecosystem:

Data minimization policies impact selective disclosure, the next privacy enhancement.

Selective Disclosure

Selective disclosure is the ability of an individual to granularly decide what information to share. Stakeholders in the credentials ecosystem enable selective disclosure capabilities in the following ways:

Once data minimization policies and selective disclosure are in place, the third and last enhancement can be applied.

Progressive Trust

Progressive trust is the ability of an individual to gradually increase the amount of relevant data revealed as trust is built or value generated.

To enable progressive trust capabilities, stakeholders in the credentials ecosystem act in the following ways:

Crypto Enablers

Implementing privacy enhancements depends on organizational decisions. Determination of the data needed, with an eye towards data minimization, along with a clear model of how data is used over the lifecycle of engagement, goes a long way towards enabling progressive trust. However, policies are not enough. When enhancing privacy online, some data parts must be revealed while others remain concealed. Concealment is achieved mostly by the art of cryptography, from the greek word "kryptos," meaning hidden, like in a crypt. Crypto (a short word we will use for cryptography) enables us to achieve our goal by means of three primary enablers: having a secret, having a difficult mathematical task, and having zero-knowledge enablers. The children's "Where's Waldo?" illustrated book series helps us to understand these three enablers. In these books a distinctively dressed man appears only once on each page, wearing a striped hat. Readers are asked to scour the page and locate him. We can understand the three enablers by examining Where's Waldo one step at a time.

Where's Waldo books are drawings, while crypto is built from mathematical equations, basically puzzles based on numbers. We provide the interested reader with a layman's overview in .

Three Solutions

We now return to our opening examples, apply the privacy preserving strategies and enablers described, and describe the improved outcomes.

The online service that Diego uses does an internal policy review and realizes (a) it only needs a location when a user signs up for an account, and (b) it does not need an exact address, only the county district. It changes its interface to request a Verifiable Credential for Diego's location. Diego's system creates this credential for him, which can be inspected to reveal the county district. The crypto to enable this would be similar to that described in . With this data minimization, the online service has less risk of violating data protection rules, is less a target for hacking, and has lower overall costs, while at the same time preserving Diego's privacy.

The bar seeking to verify Selena's age uses selective disclosure as built into the Verifiable Claims system. Selena will no longer share her date of birth. Instead, Selena creates a secret that we harness to craft a crypto-formatted credential. This crypto makes it easy to verify her age, but difficult to determine her exact date of birth. The bouncer's system can perform a zero-knowledge proof to determine the credential is valid and that Selena is older than twenty-one, without revealing her birthday or her secret. The bouncer sees she is over twenty-one without seeing her date of birth, residence address, or any other unnecessary information. In we show the process step-by-step.

The real estate agency working with Proctor implements a data model specifying what is required at each step of the real estate negotiation. The first step requires only proof of being an account holder in good standing at a known bank, so Proctor does not have to reveal the detailed letter at this point. As their negotiation continues, Proctor reveals more and more information as required. Some steps of the process may share Verifiable Claims encoded with crypto.

Summary

The World Wide Web accelerates the sharing of credentials and other digital interactions, and many regulations have been passed and strategies proposed to protect privacy, some of which require cryptography. To align terminology, the World Wide Web Credentials Community Group has found three related but distinct privacy enhancing strategies that create a useful rubric for discussing the challenges and arriving at solutions. We share the examples of Diego, Selena, and Proctor and propose "data minimization," "selective disclosure," and "progressive trust," with accompanying crypto protocols as useful semantics for accelerating the adoption of digital interaction while protecting privacy.

Definition Sources

This section contains definitions we curated, based on research and oral interviews, to create the definitions of data minimization, selective disclosure and progressive trust.

Data Minimization

Definitions of data minimization that we considered in the formation of our definition above.

Selective Disclosure

Definitions of selective disclosure that we considered in the formation of our definition above.

Progressive Trust

Definitions of progressive trust that we considered in the formation of our definition above. Note that we included definitions of progressive trust and progressive disclosure as well.

Basic Crypto Concepts

This appendix describes basic cryptographic concepts critical to the privacy preserving engineering of credential attributes. For readability, we use the short word, "crypto."

Overview

Crypto is a huge field with highly specialized jargon, too much to cover here. But non-specialists would benefit from some understanding of relevant crypto in order to make informed decisions. We begin with a brief overview of several concepts from number theory that serve as a foundation for all crypto used in this process. This is a curated list of topics progressing from the simple to the more complex. Notice how ideas are re-used and layered as you read on.

Number Theory

Number theory refers to the study of the behavior of integer numbers such as one, three, or two hundred. The following are behaviors of these numbers that make them useful for crypto:

Primary Objectives

The curious behavior of numbers is exploited to achieve four primary crypto objectives.

Ten Crypto Concepts

Over the decades hundreds if not thousands of crypto protocols, processes, algorithms and protocols have been innovated to achieve these objectives, by cobbling together the above six behaviors in different ways. We present here a brief tour of the ten most significant ones in our field of verifiable credentials:

Drinking Age Credential Implementation

The birthday of an individual is formatted into a verifiable credential, which can be inspected to reveal the age of the credential holder without revealing their birthdate. The flow described here is based on the developing Verifiable Claims standard of the W3C Credentials Community Group. It uses cryptography developed by Jan Camenisch, as implemented by Sovrin.

This is a work in progress. Note that other types of crypto could be applied to achieve the same privacy preserving goals.

Communication Flow

The flow below may be copies and pasted into the [[WEB-SEQ]] webpage to generate a flow diagram.

title Verifiable credential using Selective Disclosure
participant Valid Time Oracle
participant Janet
participant ID Provider
participant Ledger
participant Bar

note over Janet:Prover
note over Bar:Validator

note over Janet,Bar: Preparation and Setup

note right of ID Provider:Infrastructure
ID Provider->Ledger: Define Schema (Name, Birthdate, Address)
ID Provider->Ledger: credential Definition (Pub Key, etc.)
ID Provider->ID Provider: Generate Prv Key for this credential
ID Provider->Ledger:Revocation Registry

note left of Bar: Prepare to accept credentials
Bar->Bar:Install Agent
Bar->Ledger: Check schema

note over Janet,Bar: Begin Use Case
Janet->ID Provider: Request ID
ID Provider-->Janet: ID will be issued as a digital credential
note right of Janet: Prepare to receive credentials
Janet->Janet: Install Agent
Janet->Janet: Prv Key Generate, Store
Janet->Ledger:Check Schema
Ledger->Janet:credential Definition
Janet-->ID Provider:Proof of Name, Birthdate, Address
Janet->ID Provider: Blinded secret
ID Provider->Janet: credential
Janet->Janet: Validate credential against credential Def

note over Janet,Bar: Janet goes to the bar
note left of Bar: Can Janet Enter?
Bar->Janet: Request Proof of Age
Janet->Valid Time Oracle: Get time
Valid Time Oracle->Janet: Time credential
Janet->Janet:Generate Proof (This person is over 21)
Janet->Bar: Provide Proof
Bar->Bar: Evaluate proof
Bar->Ledger: Verify on Ledger
Ledger->Bar: Verification
Bar->Janet: Come in

note left of Bar: Invite to club

Bar->Janet: Join loyalty club? (requires valid postal code)
Janet->Janet:Generate Proof (postal code)
Janet->Bar: Provide Proof
Bar->Bar: Evaluate proof
Bar->Ledger: Verify on Ledger
Ledger->Bar: Verification
Bar->Janet: Have Loyalty Card
            

Crypto Details

Below are some of the detailed mathematics involved in issuing a verifiable credential as implemented by Sovrin, a non-profit organization dedicated to managing a decentralized, public network for the purposes of self-sovereign identity.

Issuer Setup

The following setup is a necessary precursor to issuing a privacy-preserving credential.

Compute

Perform the mathematical calculations required to curate the essential ingredients of the operations we are about to perform. Some of these results, like the private keys, are very sensitive and must be kept secret by the credential holder; others are to be shared.

  • Random 𝓹', 𝓺', 1024-bit prime numbers, such that 𝓹 = 2𝓹' + 1 and 𝓺 = 2𝓺' + 1 are both 1024-bit prime numbers.
  • 𝓷 = 𝓹𝓺.
  • Random quadratic residue: 𝓢 mod 𝓷
  • Random 𝓧𝓩, 𝓧𝓡1, . . . , 𝓧𝓡𝓵 ∈ \[2: 𝓹'𝓺' - 1\], where 𝓵 is the number of attributes in the credential.
  • 𝓩 = 𝓢𝓧𝓩 mod 𝓷
  • 𝓡𝓲 = 𝓢𝓧𝓡𝓲 mod 𝓷, 1 ≤ 𝓲 ≤ 𝓵
  • Issuer private key 𝓼𝓴𝓬 = 𝓹'𝓺'
  • Issuer public key 𝓹𝓴𝓬 = {𝓷, 𝓢, 𝓩, 𝓡1, . . . , 𝓡𝓵 }

Proof of Correctness

As a result of the above computations, we then curate the following. This proof, along with the public keys, is the computational algorithm that will be used to validate the credential.

  • Random 𝓧'𝓩, 𝓧'𝓡1, . . . , 𝓧'𝓡𝓵 ∈ \[2: 𝓹'𝓺' - 1\]
  • 𝓩' = 𝓢𝓧'𝓩 mod 𝓷
  • 𝓡'𝓲 = 𝓢𝓧'𝓡𝓲 mod 𝓷, 1 ≤ 𝓲 ≤ 𝓵
  • 𝓬 = 𝓗𝓪𝓼𝓱 ( 𝓩 || 𝓡1 || . . . || 𝓡𝓵 || 𝓩' || 𝓡'1 || . . . || 𝓡'𝓵 )
  • 𝓧''𝓩 = 𝓧'𝓩 + 𝓬 𝓧𝓩
  • 𝓧''𝓡𝓲 = 𝓧'𝓡𝓲 + 𝓬 𝓧𝓡𝓲 , 1 ≤ 𝓲 ≤ 𝓵

The Cred Def is comprised of the public key and the proof of correctness; this is published to the distributed ledger.

Issuing a Credential

With setup complete, we can now issue the credential in a privacy-preserving manner.

For Each Credential

For each credential issued, perform the following operations.

Issuer Computes

A cryptographic accumulator is constructed in order to enable zero-knowledge queries further on. It is a one-way membership function, including the claim in the membership set. The operation can then answers a query as to whether a potential candidate is a member of a set without revealing the individual members of the set.

  • 𝓐𝓲 = accumulator index
  • 𝓤𝓲 = user index
  • 𝓶2 = 𝓗𝓪𝓼𝓱 ( 𝓐𝓲 || 𝓤𝓲 )
  • 256-bit integer representations of each of the attributes: 𝓶3 , . . . , 𝓶𝓵
  • 𝓷0 = nonce
Issuer Sends 𝓷0 to Prover

This nonce is provided to the Prover for calculation of the Prover's proof of correctness.

Prover Receives 𝓷0 and Computes the Following

The prover aggregates and prepares public keys for use in validating the signatures. The prover also commits to a chosen value while keeping it temporarily hidden, making the calculation binding.

  • Retrieves Issuer’s public key 𝓹𝓴𝓬
  • Retrieves Issuer’s proof of correctness
  • Generates:
    • 𝓶1 = pedersen commitment of claim link secret
    • Random 𝓿', 𝓿'', 𝓶'1
  • 𝓷1 = nonce
Prover Verifies the Issuer’s Proof of Correctness
  • 𝓩^ = 𝓩𝓬𝓢𝓧''𝓩 mod 𝓷
  • 𝓡^𝓲 = 𝓡𝓲𝓬𝓢𝓧''𝓡𝓲 mod 𝓷, 1 ≤ 𝓲 ≤ 𝓵
  • Verifies 𝓬 = 𝓗𝓪𝓼𝓱 ( 𝓩 || 𝓡1 || . . . || 𝓡𝓵 || 𝓩^ || 𝓡^1 || . . . || 𝓡^𝓵 )
Prover Computes
  • 𝓤 = 𝓢𝓿’𝓡1𝓶1 mod 𝓷
  • 𝓤’ = 𝓢𝓿’’𝓡1𝓶’1 mod 𝓷
  • 𝓬’ = 𝓗𝓪𝓼𝓱 ( 𝓤 || 𝓤’ || 𝓷0 )
  • 𝓿^ = 𝓿’’ + 𝓬’𝓿’
  • 𝓶^1 = 𝓶’1 + 𝓬’𝓶1
Prover Sends 𝓟 = { 𝓤, 𝓬’, 𝓿^, 𝓶^1, 𝓷1 } to the Issuer
Issuer Verifies Prover Setup
  • Computes 𝓤^ = 𝓤-𝓬𝓢𝓿^𝓡1𝓶^1 mod 𝓷
  • Verifies 𝓬’ = 𝓗𝓪𝓼𝓱 ( 𝓤 || 𝓤^ || 𝓷0 )
Issuer Signs the Credential by Computing the Following
  • 𝓠 = 𝓩 / (𝓤𝓢𝓿*𝓡2𝓶2𝓡3𝓶3 ··· 𝓡𝓵𝓶𝓵 ) mod 𝓷
  • 𝓭 = 𝓮-1 mod 𝓹’𝓺’
  • 𝓐 = 𝓠𝓭 mod 𝓷
  • 𝓐’ = 𝓠𝓻 mod 𝓷
  • 𝓬’’ = 𝓗𝓪𝓼𝓱 (𝓠 || 𝓐 || 𝓐’|| 𝓷1 )
  • 𝓼𝓮 = (𝓻 - 𝓬’’𝓮-1) mod 𝓹’𝓺’
Issuer Sends 𝓞 = {𝓐, 𝓮, 𝓿*, 𝓼𝓮, 𝓬’’, 𝓶2, . . . , 𝓶𝓵 } to the Prover
Prover Receives 𝓞 and Does the Following
Prover Computes
  • 𝓿 = 𝓿’ + 𝓿*
  • 𝓠’ = 𝓩 / (𝓢𝓿𝓡2𝓶2𝓡3𝓶3 ··· 𝓡𝓵𝓶𝓵 ) mod 𝓷
  • 𝓭’ = 𝓬’’ + 𝓼𝓮 𝓮
  • 𝓐^ = 𝓐𝓭’𝓢𝓿’𝓼𝓮 mod 𝓷
Prover Verifies
  • 𝓮 is prime and 2596 ≤ 𝓮 ≤ 2596 + 2119
  • 𝓠’ = 𝓐𝓮 mod 𝓷
  • 𝓬’’ = 𝓗𝓪𝓼𝓱 (𝓠’ || 𝓐 || 𝓐^ || 𝓷1 )
Prover Stores Primary Claim ({𝓶1, . . . , 𝓶𝓵}, 𝓐, 𝓮, 𝓿)

For Additional Information

The crypto used here is originally from [[IDENTITY-MIXER]].

The Sovrin team shares additional information and working code at the following links.