Part 3: Privacy Implications and Enhancements in Identity Verification

October 1, 2019

In Part 1 of this series, we explored the merits of modern identity documents, such as the e-Passport (or biometric passport), that include cryptographic features to aid machine verification of the presented document. In Part 2, we discussed how computer vision and deep learning can be used to improve the efficacy of machine-verifying legacy documents (i.e. those without cryptographically verifiable verification elements) through the use of print-based protection features and machine-readable optical codes such as PDF417. In both posts, the focus was on machine-based techniques that can strongly evaluate the authenticity of a document presented as proof of identification, in an effort to prevent false documents from being accepted as valid.

In this installment, we’ll discuss the privacy implications that arise once an identification document has been verified to be authentic and its contents have been collected. Further, we’ll introduce the techniques of selective disclosure and data minimization, which make the data collected via identity proofing more actionable while simultaneously reducing the risk that sensitive personal information (PI) will be compromised as an unintended side effect of verifying an individual’s identity.

At the time of identity verification, a veritable treasure trove of personal data may be presented, in full, depending upon which specific government-issued identity document is being verified. These PI elements commonly include:

    • Full legal name
    • Personal identification numbers (e.g. driver’s license number, passport number, national ID number)
    • Residential address
    • Date of birth
    • Gender
    • Document validity dates
    • Nationality
    • Physical traits (e.g. height, weight, hair color)

Many of these elements are intrinsic to the individual and cannot be changed, while others may change, but do so very infrequently. To compound the problem, many of these elements are treated by unsophisticated authentication systems as secrets, known only to the individual. At some point, we’ve all likely encountered systems that grant us access if we know a last name, last 4 digits of social security number, and date of birth, that all correspond to an existing record.  This practice is what’s known as static knowledge-based authentication (KBA).

An improvement to the static KBA is what’s known as a dynamic KBA, which incorporates so-called out of wallet (OOW) data elements that purportedly could only be known by the true individual; however, given the recent explosion in the number and scope of data breaches, much of this OOW data has been leaked and is available to be searched and associated with the very same data that can be retrieved from an identity document, thus defeating the entire security posture of the dynamic KBA approach.

Fortunately, the dubious benefits of KBA have become undeniable, and the use of KBA in practice is now waning, as a result, but many such systems remain entrenched and are unlikely to disappear overnight. Furthermore, even when KBA has been fully discontinued as a practice, the collection of raw data from an identity document will continue to be highly undesirable, for reasons that will become clear in the following explanation of emerging privacy legislation.

The sensitivity of the full complement of data on an identity document notwithstanding, an entity relying on the authenticity of an individual’s identity document will likely find most of the information provided on it to be extraneous. In many cases, this relying party (RP) is simply seeking to prove that the individual at the keys corresponds to the party authorized to perform an action that’s being requested, and perhaps meets some other specific requirements.

A real-world example that makes this concept more concrete is an online video streaming subscription service. In this scenario, an individual’s actual name, date of birth, and address may not matter, but it is of prime importance for the video streaming service to know that the individual is an authorized user of the method of payment presented, that they are of legal age to enter into contracts and to view the content being provided, and that the individual’s sales/use tax has been collected in compliance with the local laws in which they reside.

While all of these conditions can be verified by performing operations directly on the raw name, date of birth, and address available in most forms of identification, all of these data elements are considered PI and are therefore protected under emerging privacy laws like the EU’s General Data Protection Regulation (GDPR), the California Consumer Privacy Act (CCPA), and the New York Privacy Act (NYPA; which is, as of this writing, up for vote but not yet passed into law). Each of these regulations present considerable penalties for non-compliance.  

The National Institute of Standards and Technology provides guidance, by way of its Digital Identity Guidelines (NIST 800-63) special publications, for various assurance levels and for minimization of PI, and defines the notion of credential service provider (CSP) to mediate and limit disclosure of sensitive data. Only actionable information is passed on from the ID resolution/proofing/verification process. Three identity assurance levels are prescribed as follows:

    • IAL1 – There is no requirement to link the applicant to a specific real-life identity
    • IAL2 – Evidence supports the real-world existence of the claimed identity and verifies that the applicant is appropriately associated with this real-world identity
    • IAL3 – Physical presence is required for identity proofing

With IAL2 and IAL3, unique identification of an individual without specific knowledge of strongly identifying characteristics, or pseudonymity, is enabled by limiting the number of attributes sent from the CSP to the RP, or by deriving from them information that is specific to the RP’s use case. The example provided in section 2.2 of NIST 800-63A on enrollment and identity proofing provides guidance for data minimization and selective disclosure of identifying attributes as follows:

[I]f the RP needs a valid birthdate, but no other personal details, the RP should leverage a CSP to only request the birthdate of the subscriber. Wherever possible, the RP should ask the CSP for an attribute reference. For example, if the RP needs to know whether a claimant is older than 18 years old, they should request a boolean value, not the entire birthdate, to evaluate age. Conversely, it may be beneficial to the user that leverages a high assurance CSP for transactions at lower assurance levels. For example, a user may maintain an IAL3 identity, yet should be able to use their CSP for IAL2 and IAL1 transactions.

Therefore, best practice is for the CSP to minimize the data shared with the RP, or provide derivative attributes (e.g. the boolean value for “older than 18 years old”) in place of overly generic and potentially sensitive PI values.  This practice greatly reduces the burden of the RP to ensure proper handling of sensitive PI it does not even require, providing identity proofing data that are far more relevant to the use case and thus actionable by the RP.

The bottom line is that the same identity verification mechanisms that make it easier to validate an individual are the same techniques that present privacy issues. One can only imagine what the future of identity verification might look like if all 50 U.S. States and every Federal entity coordinated efforts to issue RFID documents for remote verification purposes – given the privacy implications of chipped identification documents – and while a coordinated effort of this magnitude is not a reality today, enough reason exists to believe it could happen in the not-so-distant future.

Organizations that collect PI to perform remote verifications have a duty to demonstrate compliance with emerging global data protection regulations, and are more likely to achieve this by using artificial intelligence technologies like computer vision and machine learning, alongside best practices to manage selective disclosure of data, leverage identity documents, and provide high assurance with just the right actionable data tailored to specific use cases.


Read other blogs in this series:

Part 1: Remote Identity Verification – Cryptography vs. Artificial Intelligence

Part 2: Computer Vision Alternatives to Cryptographic Verification

Tags: , , , , , , , , , , , ,

News and Resources

Ready to reduce your third-party risk with automated Insurance Verification and Fulfillment?