A breach by another name? There is probably more coming. We need to prepare.


Imagine a data scientist working at Facebook. Let’s call her Alice. As part of her assignment, Alice collects a couple hundred thousand Facebook users’ profile, stores on her laptop. The data contains not just what users entered into Facebook, but what Facebook gathered and inferred about them. Alice is excited. Users whose data is being used largely trust Facebook to be good custodians of this data.

But Alice’s boyfriend, Bob, has another idea. He knows that the user data Alice has on her laptop can be sold to some data broker. He’s been unhappy where their relationship has been going anyway, his startup going through down rounds while Facebook stock just keeps going up and up.

So Bob gets going; he borrows Alice’s laptop, downloads the user data to his computer. And much to Bob’s surprise, he realizes he can not only get all this user data, but also make a few API requests via some app he builds and collect all those people’s friends’ data too.

Then Bob goes online, takes the name of a Bond villain, an sells this data to some sketchy character online, Guccifer 1.7 or whatever.

Obviously that’s not what happened. But it’s also oddly not too far off what happened. It wasn’t Alice, Facebook employee, that collected the data, but a psychologist, Dr. Kogan, who built that an app that operates on Facebook’s platform. And it was the same third party that also expanded the data to unsuspecting friends, and then sold it to Cambridge Analytica. This already pretty horrible. But to add insult to the injury, Facebook hired the psychologist’s coworker who was involved in the transaction. If that doesn’t sound made up, here’s more. Kogan, who used to list altruism as one of his research interests on faculty his website, at some point changed his name to Dr.Spectre. Committed to the bit, he is.

Facebook figured out what happened, asked everyone pretty please to delete all this data that didn’t belong to them (it belonged to the psychologist, I guess it argued). But of course they didn’t. Why would they? They told Facebook they did, though. Then a whistleblower blew the…whistle. When Facebook found out, they first cooperated with the whistleblower but then kicked him off the platform, probably correctly all things considered. Also correctly, Facebook cut off Cambridge Analytica’s access to Facebook. Less correctly, probably, they announced this on a Friday night, at 9PM, a time slot customarily saved for news you hope you can bury over the weekend. And even less correctly, Facebook then threatened to sue journalists for reporting on this.

I made up the admittedly belabored Alice and Bob story to illustrate a point though. Facebook has gone out of their way to call this a fraud, as opposed to a data breach. It makes sense, if you are Facebook, to be the victim here, primarily because breach has a specific legal meaning (I am not a lawyer). So calling it a breach would not only mean Facebook would have to notify users affected, but also probably open a huge can of legal worms, all the way from FTC in the US to various EU bodies.

If you are one of the users who data is exposed though, does it matter? You could argue that the small set of users consented to the app collecting their data. It is technically correct, but the informed consent framework is problematic. Certainly none of those users knew their data would be changing hands. Did any of them even understand the app wasn’t a Facebook feature but a third party app? My guess is, no. Most users probably do not even know Facebook or a third party could infer their personality traits and keeps in a database. How comfortable would you feel doing anything on Facebook if you knew your activity could be linked to your sexual preferences? Yet!

And of course, the expanded data set. The unsuspecting set of friends. Facebook closed this API back in the day, because it’s a scarily open API that allowed you to scoop a given user’s friends’ data without anyone knowing but also probably because it’s an API that allowed a really zealous app developer to duplicate Facebook’s friend graph with high fidelity.

So yes, Facebook is a victim. But the real victims are the users. And it’s hard to claim Facebook isn’t at fault here, to some degree. The initial set of users consented to their data being scooped up, but definitely didn’t allow it to be shared this way. The expanded set of users probably gave their consent by virtue of being a Facebook user and accepting its terms, but again, certainly didn’t allow anyone else to have it.

And Facebook clearly knew something went awry as early as 2015, but never told the users who got affected what happened. This doesn’t bode well. But what I am more curious, does Facebook even know of all the cases this happened? The APIs existed for a few years, and I for one know many apps that used it liberally. We are only having this conversation, seemingly, because a whistleblower couldn’t take it anymore and some brave journalists weren’t beholden to legal threats.

How many other app developers ended up with such a trove of data and decided to sell it to even sketchier people? Millions of people connected to thousands (if not millions) of apps that was able similarly siphon out this information from Facebook for the many years such APIs existed. Do we know where this data is now? And you can even think slightly more nefariously. Are there any other cases where Facebook found out about the leak but was able to successfully suppress it? We don’t know. We probably should. We need to have rules and regulations globally to allow us to do this, and social media companies be more cognizant.

This isn’t to say Facebook is an unethical company, that would go out of its way to facilitate espionage and hide it or turn a blind eye to its most valuable asset changing hands on sketchy parts of the web. But Facebook is a for-profit company that makes decisions within a certain framework where they can be disturbingly haphazard with such a huge amount of user data.

There are many problems with Facebook, that I want to expand on. I personally think a single entity that governs of two billion user’s media diet and communication channels is scary, firstly due to its size. We expect just a bunch of people mostly living in a suburb in Northern California to safeguard US elections in 2018, not cause ethnic cleansing in Myanmar, not help Duterte kill off people he doesn’t like in Philippines, not kill off media companies by mistake in Serbia, and a few more.

And lack of competition is a problem too; users of Facebook are locked in to a network they can’t easily get out of. And Facebook goes out of their way to buy companies that not only threaten their business model but even buy companies that allow them to spy on users to see what **could** threaten their business. This is untenable. And users are not the only ones paying the price; publishers who compete for the same dwindling ad money have very little leverage over Facebook for social media traffic.

In the end, however, the real losers are the users. The data that got leaked was detailed and personal; it included personality traits that were used for political targeting.

Social media companies with troves of private data are able to create vast amounts of wealth for themselves and provide free services to their users; it’s clear such data isn’t without commercial value. But how much of that value is extracted against users’ wishes and what are the costs of misusing it? And is there a point where such a giant bag of private data has a life of its own that no single entity can really keep it safe? Facebook has more than 2 billion users; a breach of 50M users’ data represents less than 3%. History suggests there’ll be more.