“In God we trust. All others must bring data.” (W. Edwards Deming, Statistician)
“It’s difficult to imagine the power that you’re going to have when so many different sorts of data are available.” (Tim Berners-Lee, father of the Worldwide Web)
I know … everybody talks about this subject. All available media, online and offline, have hungrily targeted this topic last week, as Facebook’s CEO has testified before Congress in an almost pitiful attempt to provide reasonable explanations on why this has even been possible.
This blog post will try to synthesize the timeline of the events that led to such a happening and analyze the impact that such incidents can have upon us.
CNBC has a very well documented article on the Facebook/Cambridge Analytica (CA) scandal. More or less the facts are the following:
- In April 2010, Facebook announced the launch of a platform called Open Graph that allowed third-party apps to reach out to Facebook users and request permission to access a large chunk of their personal data and their friends’ personal data.
- In 2013, Aleksandr Kogan created an app called “thisisyourdigitallife”, prompting users to answer questions for a psychological profile. Kogan ended up having access to the data of millions of Facebook profiles.
- In 2014, Facebook updated its rules to limit a developer’s access to user’s friend without first asking permission.
- Late 2015, Cambridge Analytica was helping Ted Cruz’s presidential campaign, using psychological data based on research spanning tens of millions of Facebook users (at that point Cruz was competing with Donald Trump also). As a response, Facebook banned Kogan’s app and pressured both Kogan and Cambridge Analytica (CA) to remove all of the data. Apparently they certified that data was deleted.
- March 2018, whistleblower Christopher Wylie (co-founder CA) reported that 50 million Facebook (later revised to 87 million) profiles were harvested by Cambridge Analytica, then used to develop “psychographic” profiles of people and deliver pro-Trump material to them online. CA has denied everything.
- March 20, Federal Trade Commission (FTC) opened an investigation into possible privacy violations on US regulation. Zuckerberg is urged to testify before Congress. A similar request comes from the UK Parliament, but Facebook’s CEO cannot make it so will send somebody.
- March 21, Zuckerberg published a post regarding the incident. Among others he said: “We have a responsibility to protect your data, and if we can’t then we don’t deserve to serve you. I’ve been working to understand exactly what happened and how to make sure this doesn’t happen again.”
- March 25, Zuckerberg apologised for a “breach of trust” with full-page ads in several British and American newspapers: “I’m sorry we didn’t do more at the time. We’re now taking steps to ensure this doesn’t happen again”.
- April 10, joint hearing of the US Senate Judiciary and Commerce committees.
- April 12, House Energy and Commerce Committee hearing.
Now, just by reading the timeline above several questions come into mind:
- How was it possible that Facebook was so careless with their user’s data? Was it just a mistake overlooked within the frenzy that characterizes the online businesses? Or it is an acceptable practice that assures a continuous cash flow towards the internet giants?
- How many other apps have benefited from the same faulty rules and downloaded millions of records across the world? 2010 to 2014 was a lot of time during which abuses could have happened. God knows how many other apps acted in the same way.
- How can a company like Facebook have such inadequate procedures as regards to data breach response and follow-up. How did they make sure the compromised data was deleted? Was it through email confirmation?
- Is it just me or Zuckerberg seems to treat the world differently?
Actually if you think about it, the real question is whether Facebook has consciously allowed privacy violations within the network so as to grow and assure cash flow or they were just careless, minding only about aspects that they thought to be important. As you probably have noticed while browsing online or checking your Spam folder , attempts to collect personal data are many and they work pretty well. The whole Internet is fed by data that, in turn, feed advertisers that make it possible so that you have also other type of content at your fingertips.
In a rather unconvincing attempt to provide an explanation to what happened, Zuckerberg addressed the US Senate and the House Energy and Commerce Committee between the 10th-12 of April 2018. The hearing lasted several hours and provided even more insights into the scandal, but also left a lot of questions unanswered. You can find above the links to the hearings.
While we focus on the Facebook part of the scandal the is also CA, the company that took advantage of a possible wide industry practice, applied some machiavellic tactics so as to gain economic advantage.
The fact that politicians are trying to influence election results is not new. A very good insight on the topic is provided in an article (in Romanian language) by Oxford University Professor Corneliu Bjola.
According to Prof. Bjola classical methods of political influence are based on the control/manipulation of the context. To be more concrete a certain technique called framing is used in presenting the real situation to the public. E.g. if we want to buy a car the salesman will try to highlight performance related characteristics if you are a man (frame 1), or some aesthetic aspects if you are a woman (frame 2). Both points of view are correct as they present certain veridic facts, but the way the context is built is very important. What the buyer must do is to try to get out of the context and ask additional questions that will get him the overall picture that will definitely help in taking the best decision. Relying only on gut/intuition might not be the way to take decisions. But, according to science, most of us accept the gut decisions that are in many cases influenced by techniques such as framing.
What CA did in this case was to target certain user profiles by framing them with the use of powerful emotional content, that could have influenced their voting decision. The technique is not new, as it was used previously along history. But what made this case exceptionally interesting and what turned it into a full scandal was the ease of access to millions of users and the frivolity with which their psychological profiles were built. What we don’t know though, again according to prof. Bjola, is whether these efforts were successful or not.
What classifies this incident as a worldwide scandal is that it managed to prove how much power data controllers/processors can have nowadays, just to use a GDPR friendly term. This is clearly a case where Facebook would have been fined under the new EU’s General Data Protection Regulation if it were to happen after the 25th of May.
This scandal is a turning point, I might even say ground zero in what might be the next course of action as regards privacy worldwide. Data becomes crucial as with the new technology developments such as big data, artificial intelligence, IoT etc. All new technological developments that dominate the end of this decade cannot function properly without being fed by massive amounts of data. No machine learning algorithm will be able to learn without the use of data. Data has become an expensive currency and must be protected properly. Just like Pedro Domingos mentions in his book “The Master Algorithm” we will probably need some kind of data banks that will protect our data against abusive access.
There are also other ideas out there on what the future might bring us after this scandal. One idea that I particularly find interesting is “self-sovereign identity – open standards-based infrastructure that centers on people”, as described in this article. The main point is that we should not rely on platforms such as Facebook to provide them with an identity online, but we should start developing open-source standards/protocols for identification that can later be used by different platforms for log in. Take for example the model of TCP/IP protocol stack, that sustain most online communications today. TCP/IP would not have had such a success if owned by one player in the market.
As you have probably seen many internet giants have recently started updating their privacy policies due to the new requirements of the EU General Data Protection Regulation (GDPR). I am also pretty sure that based on the CA scandal they are also now looking into their data and who has access to it. If such a disclosure were to happen after the 25th of May 2018 (official entry into force) there would be some serious fines for the responsible party. Our take from this should be that data has become a very valuable asset. Whole industries are practically depending only on data, so we might want to be more responsible in this area.