Why is this information sensitive? The deeper Equifax problem
by Jeffrey Goldberg on
As the world now knows Equifax, the credit rating company and master of our fates, suffered a data breach in May and June 2017, which revealed to criminals details of 143 million people.
(I would have liked to say, “143 million customers", but that is very far from the case. We have no control at all over Equifax and other credit rating companies collecting information about us. We are neither their customers nor users.)
The revealed data includes:
There are many important things to ask about this incident, but what I am focusing on today is why has non-secret information become sensitive? None of those numbers were designed to be used as secrets (including social security numbers and credit card numbers), yet we live in a world in which we have to keep these secret. What is going on here?
Names only provide a first pass at identifying individuals in some list or database. There are a lot of Jeffrey Goldbergs out there. (For example, I am not the journalist and now editor-in-chief at the Atlantic. But there are lots of others that I also am not.) Also people change their names. Some people change their name when they get married. (My wife, Lívia Markóczy, decided to keep her name because we figure it is easier to spell than “Goldberg”.) Others change their names for other reasons. We have three “Jeffreys” at AgileBits, but fortunately we have distinct family names. Though sometimes I think that everyone who joins the company should just go by “Jeffrey” to avoid confusion. Anyway, names alone are not enough to figure out who we are talking about once we get beyond a small group of people. So we use other things. Social security numbers worked well in the US for some time. They didn’t change over your lifetime (except in rare circumstances) and nearly everyone had one. Dates of birth also don’t change. So a combination of a name, a date of birth, and a social security number was a good way to create an identifier for nearly every individual in the US, with the understanding that a name might change.
Sometimes it is not a person that we need to uniquely and reliably identify. Sometimes it is something like a bank account or charge account. Cheques (remember writing those?) have the account number printed on them. They uniquely identify the particular account within a bank, and a routing number (in the US) identifies the bank. The routing number is also printed on each cheque. Things like social security numbers and driver’s license numbers are designed as “identifiers” of people. They are ways to know which Jeffrey Goldberg is which. Occasionally getting email meant for the journalist is no big problem, but if he gets himself on the no-fly list, I want to be sure that I don’t get caught up in that net. Likewise, I don’t want my doctor or pharmacist mixing me up with some other Jeffrey Goldberg who isn’t allergic to the same stuff that I am. Nor does some other Jeffrey Goldberg want the record of speeding tickets I seem to acquire. Things like bank or charge account numbers are used to uniquely and reliably identify the particular account. While I wouldn’t mind if my credit card charges were charged against someone else’s account, they would certainly mind, and so would the the relevant bank. (I’m going to just start using the word “bank” broadly to include credit card issuers, automobile loan issuers, and the like.) A username on some system is also an identifier. It identifies to the service which particular user or account is being talked about. I am jpgoldberg on our discussion forums. That username is how the system knows what permissions I have and how to verify my password.
Something that is designed and used as an identifier is hard to keep secret. A service can hash a password, but it needs to know which account is being talked about before it can look up any information. In many database systems, identifiers are used as record locators. These need to be efficiently searchable for lookup. Identifiers also need to be communicated before secret stuff can happen. Bank account numbers are printed on cheques for a reason. Now really clever cryptographic protocols – like the one behind Zero Cash – can allow for transactions which don’t reveal the account identifier of the parties, but for almost everything else, account identifiers are not secret. Identifiers are hard to change. If you depend on the secrecy of some identifier for your security, then you are stuck with a problem when those secrets do get compromised. It is a pain to get a new credit card number, and it is far worse trying to get a new social security number. Getting a new date of birth might also be a teeny tiny problem. The point here is that, given what identifiers are designed to do, they aren’t designed to be kept secret.
Authentication is the process of proving some identity. And this almost always involves proving that you have access to a secret that only you should have access to. When I use 1Password to fill in my username (jpgoldberg) and password to our discussion forums, I am proving to the system that I have access to the secret (the password) associated with that particular account. The password is designed to be kept secret. The server running the discussion forum doesn’t need to search to find the password (unlike searching to do a lookup from my username), so it can get away with storing a salted hash of the password. Also, I can change the password without losing all of the stuff that lives under my account. (Changing my username would require more work.) Plus, my username is used to identify me to other people using the system, and so is made very public. My password, on the other hand, is not.
The mess we are in today is because financial institutions have been using knowledge of identifiers as authentication secrets. The fact that someone can defraud a credit card issuer by knowing my credit card number (an account number) and my name and address (matters of public record) is all because at one point, credit card issuers decided that knowledge of the credit card number (a non-secret account number) was good way to authenticate. I have not researched the history in detail, but I believe that this started with credit card numbers when telephone shopping first became a thing (early 1970s, I believe). Prior to then, credit cards were always used when the account holder was physically present and could show the merchant an ID with a signature. The credit card number was used solely as designed up until that point: as a record locator. The same thing is true of social security numbers. Social security numbers were not secret until banks started to use knowledge of them as authentication proofs when they introduced telephone banking. Before then, there was nothing secret about them.
Because high-value systems use knowledge of identifiers as authentication proofs we are in deep doo-doo. And it will take a long time to dig ourselves out. But we continue to dig ourselves deeper.
It is fine to be asked for non-secret identifying information to help someone or something figure out who they are talking about. I like it when my doctor asks for my date of birth to make sure that they are looking at and updating the right records. But when they won’t reveal certain information to me unless I give them my date of birth, then we have a problem. That is when they start using knowledge of an identifier as an authentication secret.
Over the past decade or so, various institutions have been told that they can’t hold on to social security numbers, and so can’t use them for identifiers. That is a pity, because those are the best identifiers we have in the US. But what is worse is that knowledge of the new identifiers is being used for authentication.
Right now, Baskin-Robbins knows my date of birth (so they can offer me some free ice-cream on my birthday). In ten years, will I have to keep my birth date a closely guarded secret so that I don’t become a victim of some financial or medical records crime? If we keep on making this mistake – using identifiers as authentication secrets – that is where we are headed.
I do not want to dismiss the technological hurdles in fixing this problem, but I believe that there is a bigger (and harder) problem that will need to be fixed first: the incentives are in the wrong place.
When Fraudster Freddy gets a loan from Bank Bertha using the identity of Victim Victor, Bertha is (correctly) responsible for the direct financial loss. The problem is that there are costs beyond the immediate fraudulent loan that are borne by Victor. But Victor has no capacity or opportunity to prevent himself from being a victim. In economics jargon, Victor suffers a negative externality.
Bertha factors in the risk of the direct cost to her of issuing a loan to a fraudster. She looks at that risk when deciding how thoroughly to check that Freddy is who he says he is. Bertha could insist that new customers submit notarized documents, but if she insists on that and her competitors don’t, then she would lose business to those competitors.
But Bertha does not factor in the indirect costs to Victor. She has no dealings with Victor. Victor isn’t a potential customer. So if Victor has costly damage to his credit and reputation that requires a lot of effort to sort out, that is not Bertha’s problem (and it certainly isn’t Freddy’s problem.)
Only when Freddy and Bertha (the parties to the original deal) have to pay the cost of the damage done to Victor (Economics jargon: “internalizing the externalities”) will Bertha have the incentives to improve authentication. I don’t have an answer to how we get there from here, but that is the direction we need to head. In the meantime, if you find yourself a victim (whether you’re a Victor, a Jeffrey, or something else entirely).