Privacy: an anomaly soon to be corrected

Vinton Cerf
Vinton Cerf in 2010 by Вени Марковски Veni Markovski, picture under licence CC BY 3.0 via Wikimedia Commons.

Vinton Cerf had an essential role in modern communications systems: he is one of the inventors of the TCP/IP protocol, which is the basis of the Internet. He is now a Google member, which recently became Alphabet. On November 20th, 2013, he stated that ‘privacy could be an anomaly.’ Rest assured, this anomaly is about to be corrected …

It appears that his point was to warn on the difficulties in protecting privacy. Indeed, in an interview given to the French weekly La Recherche in February 20161Philippe Pajot, 2016. Entretien avec Vinton Cerf, La Recherche n° 508, pp. 4–8., he affirmed the importance of strong authentication and related security techniques to ensure personal data does not fall into the wrong hands. However, nowadays protecting privacy may be jeopardized.

Actually, the privacy issue is not really a new one. As indicated, Vinton Cerf’s quote given in the first paragraph of this article as been said in 2013. Edward Snowden’s revelations also began in 2013 and show that the nsa is likely to interfere in the private life of any Internet user. This regardless of her or his nationality, but especially if she or he is not on the us territory. Previously, WikiLeaks began operations in 2006, highlighting, among other things, violations of privacy. Even before, echelon was revealed in 1988 and has expanded to the Internet.

However, several recent news revive the interest on these matters. For instance, issue 501–502, dated July and August 2015, of La Recherche feature an interview on the subject with Yves-Alexandre de Montjoye, a researcher in computational science at the mit2Yves-Alexandre de Montjoye and Gautier Cariou, 2015. Les métadonnées menacent la vie privée, La Recherche n° 501–502, pp. 114–117.. Once again, serious questions arise concerning privacy on the new version of Microsoft Windows. In France, there is of course the recent adoption of the Bill on intelligence, which resulted in five articles in this blog. Let examine these issues, before trying to give some solutions.

Meta-data have more to say than data themselves

Avoid posting photos of an evening when you were drunk is insufficient. During any Internet browsing, each service you use produces meta-data. These meta-data are saying very much about habits and personality of the user. And you do not produce meta-data only on the Internet.

Everyday acts can be used to track you

Malte Spitz
Malte Spitz in 2011 – picture by BÜNDNIS 90/DIE GRÜNEN under licence CC BY 3.0 via Wikimedia Commons.

In 2011, German member of parliament Malte Spitz managed to obtain from his phone company all the data it had about him. With the help of the German newspaper Die Zeit, he has put on-line a tool to track not only all of his phone calls, but also all of his travels between September 2009 and February 2010. For a mobile phone, even if it is not equipped with a GPS, can locate quite precisely its user (based on the cell sites to which it connects).

When an individual uses a credit card, her or his bank book four information about her or him: her or his id, as well as the date, the place and the price of the transaction. Note that the bank retains neither the first nor the last name of the client, but an identifier. This procedure is called anonymization. However, this procedure does not guarantee anonymity: Yves-Alexandre de Montjoye and his collaborators calculated that with nothing but the day and the place of the transaction, on average an individual is re-identified in 90 % of the cases3Yves-Alexandre de Montjoye, Laura Radaelli, Vivek Kumar Singh, and Alex Pentland, 2015. Unique in the shopping mall: On the reidentifiability of credit card metadata, Science n° 347 (6221), pp. 536 – 539. Doi: 10.1126/science.1256297.

Still according to them, this problem does not only affect bank data. It arises with any meta-data and four information is enough to bring down the anonymity performed according to the procedure I described above.

Meta-data collected by smart-phones

Yves-Alexandre de Montjoye
Yves-Alexandre de Montjoye – picture by Bryce Vickmark.

With meta-data obtained using a smart-phone, it is possible to go further than simply identifying the presence of a person in a given location. Indeed, Yves-Alexandre de Montjoye and his collaborators have shown that meta-data can be used to determine personality traits of the user4Yves-Alexandre de Montjoye, Jordi Quoidbach, Florent Robic, and Alex Pentland, 2013. Predicting Personality Using Novel Mobile Phone-Based Metrics, chapter of Social Computing, Behavioral-Cultural Modeling and Prediction, volume 7812 in the serie Lecture Notes in Computer Science, pp. 48 – 55. Doi: 10.1007/978-3-642-37210-0_6.

Researchers identified five major personality traits: openness to experience, self-discipline, extroversion, agreeableness and emotional instability. To each category is assigned a level: low, medium or high, which allows summarizing the personality. With this base, using meta-data from smart-phones owned by several voluntaries, the researchers have elaborate an algorithm. This algorithm can predict fairly accurately the personality of a user of a smart-phone by analysing the meta-data it contains.

Also, by using nothing but meta-data collected with a smart-phone, Jonathan Mayer and his collaborators have shown that one can obtain highly personal information such as health problems5Jonathan Mayer, Patrick Mutchler, and John C. Mitchell, 2016. Evaluating the privacy properties of telephone metadata, pnas , vol. 113, n° 20, pp. 5536 – 5541. Doi: 10.1073/pnas.1508081113. Disponible en ligne.. Moreover, taking for example the nsa surveillance program, they hypothesised this agency follows individuals who contact those who directly contacted an individual subject to surveillance. Then, from a single individual, the authors estimated the agency will be following up to 25,000 individuals or more.

A desire to collect data on all Internet users

Max Schrems
Part of elements presented in this article come from Max Schrems’ work – picture by Eleleleven under licence CC BY-SA 2.0 via Wikimedia Commons.

However, we also produce many meta-data when using certain services on the Internet. Notably, when using Facebook, Twitter, various Alphabet services – which is the name of the new Google parent company, you followed – and other Instagram. To think that to protect your privacy you just have to pay attention to what you publish is to be mistaken about the real threats.

What you actually publish finally do not provided much information about your private life. In contrast, people with whom you share, the items for which you clicked “I like” (or “I do not like”), location information, as well as date of birth, these tell volumes about you. Updating more or less frequently your status is also very revealing. All these are pretty informational about your tastes, your habits, your needs, and so on. This can lead to a fairly accurate psychological profile and even gives the possibility to anticipate what you could do.

It is interesting to note that Facebook also collects as many data as possible from users who are not registered to its services. The Facebook button available on many websites will install a cookie on the user’s computer, even if she or he did not actually click the button6Arnold Roosendaal, 2010. Facebook Tracks and Traces Everyone: Like This!, Tilburg Law School Legal Studies Research Paper Series n° 03/2011. Available on-line.. This cookie allows Facebook to collect information about the user navigation. Therefore, to preserve the privacy of the few people who read these web-pages, my website does not have such a button.

To use any means

Anyway, Facebook has other ways to collect data on those who are not part of its users. By using the import function of address book it offers to its users, it collects all email addresses. For those that do not match an existing profile, it builds ghosts profiles that are fed by cross-checking.

Moreover, its servers keep every dates and locations data collected by its mobile application. And you need to keep in mind that nothing of what is published or collected is ever deleted, at most is it hidden to public eyes while Facebook internally retains full access. It is quite simple: for Facebook, privacy simply does not exist.

It is therefore established that Facebook collects data about me, though I am not one of its users. Therefore, in accordance with European Directive 95/46/EC, I wanted to contact it so it put at my disposal this data. I have not found a way to do so without being connected to its services, meaning without a Facebook profile. I am not certain it is legal …

Facebook knows more about you than you know yourself

At this point, you may be thinking: “Facebook knows I love kittens, big deal!” However, Facebook knows much more.

Michal Kosinski, David Stillwell, and Thore Graepel used solely the mentions “like” from 58,000 US Facebook accounts7Michal Kosinski, David Stillwell, and Thore Graepel, 2013. Private traits and attributes are predictable from digital records of human behavior, PNAS vol. 110, n° 15, pp. 5802–5805. Doi: 10.1073/pnas.1218772110. Available online.. Using nothing but these statements, they have been able to deduct if they belong to a Caucasian (white) or an African American (black) with an accuracy of 95 %, 93 % for the gender and 88 % for the sexual orientation, but also 65 % for the use of narcotics and 60 % to determine if the user’s parents were a couple at 21, among other features that intuitively do not seem to be determined with the aid of a simple “like.”

Now, as we have seen, Facebook has much more information than just the “like.” With this, it can finely determine the profiles of its users. It is not exaggeration to say that, on several points, these data allow it to anticipate their reactions. Therefore, on some issues, so Facebook knows more accurately its users that its users know themselves …

Moreover, Facebook controls the algorithm filling the news feed its users consult. Thus, it has a very efficient mean to manipulate opinions.

Others can use data from Facebook

Of course, for some there is a huge temptation to drink from this data source. A simple way is to consult the profiles of those on which one seeks information. To take an example, among others, Matthew Manant, Serge Pajak, and Nicolas Soulie created two fictitious identities with identical CVs8Matthieu Manant, Serge Pajak, and Nicolas Soulié, 2014. Online social networks and hiring: a field experiment on the French labor market, MPRA Paper. Available online.. There was only a single detail that distinguished them: the Facebook profile associated with one of them indicated that the person was born in Marrakesh and spoke Arabic, while the other one stated that the person was born in Brive-la-Gaillarde (a small town in southern France) and spoke Italian. For the rest, the Facebook pages had identical contents, including identical photos.

Researchers sent the CVs of the two identities in response to several job offers. For 230 applications, the identity native from Brive-la-Gaillarde received 49 positive responses (i.e., a recruiter response to prolong the hiring process), a rate of 21.3 %. The other identity has received 31 positive responses for 232 applications delivered, a rate of 13.4 %. Such a difference, greater than 50 %, is significant.

That job applicants face discrimination based on their origins, though illegal, is unfortunately not a fact recently discovered9Nicolas Jacquemet and Anthony Edo, 2013. La Discrimination à l’embauche sur le marché du travail français, Éditions rue d’Ulm. Available on-line.. The novelty highlighted in this study is that recruiters have used Facebook to learn about the candidates, which is normally not relevant because supposed to be a matter of private life …

Privacy for sale

A third party can easily use a Facebook account to view other Facebook accounts. However, as we have seen, Facebook itself has the possibility to know much more by analysing meta-data. Consequently, some people would appreciate being able to access such information. Well, I have some good news …

We know for a long time that Facebook sells personal data about its users. Of course, it does not sell all the data at its disposal (in order to keep its competitive advantage), but still it sells some of them. There is no reason to be surprised: in economics, a company is defined by its ability to make a profit, it is its reason for being. The example of free software shows that the maxim ‘if it is free, you are the product’ is false. However, as a company cannot exist without making profit, and as Facebook do not charge its users, nor can rely for revenue from not targeted advertisements, it must sell what it has: data regarding privacy. Therefore, whoever is ready to pay the price can obtain very fine access to the private life of a great part of humanity, as Facebook does not just collect all that it may on its users, but also potentially on every Internet users.

To guard yourself against this, you should not share with anyone, press “Like”, nor update your status, and so on. In other words: do not use the service … However, it is not enough because any website offering services related to Facebook potentially send information about your private life.

I have given many details about Facebook. This because it is probably the entity collecting the most data regarding privacy at the time, just ahead of Google – anyway, according to it. However, one should not forget they are far from being the only ones to collect such data, often without notifying the user. Sometimes, this collection is not even done on purpose. For instance, the use of a script (in Javascript) to perform auto-completion (automatically completing entries made by the user) can lead to transmit personal information without validation from the user. The web has been developed without worrying about privacy, so that today there are number of elements which threaten it, even in good faith.

Indeed, a simple mistake can have huge effects. For example, on an iPhone, the game Pokémon Go can get access to large sections of the user’s Google Account. Provided he or she uses fairly wide range of Google services, this involves particularly broad access to her or his private life. Although, as I write this, the editor is trying to correct the problem, this story highlights the possibility of an error with very large consequences.

There is something new indeed

Edward Snowden in 2013
Edward Snowden in 2013 – screen shot from the movie Citizenfour (2014, Laura Poitras, Praxis Films) under licence Creative Commons BY 3.0 via Wikimedia Commons.

We must not forget that, since a while, the police can obtain bank statements of an individual and deduce amount of information about her or his private life. We should not underestimate how far such investigations can go. However, the current situation presents two novelties.

The first novelty is the type of data that can be obtained nowadays. For instance, I have found no reported cases where such investigations have determined if the parents of the subject were a couple when being 21. Using meta-data, unprecedented knowledge of intimate life of anyone can be achieved.

The second concerns the conditions of access to such information. As part of a police investigation, if everything goes as it should, a request to gain access to private data must be substantiated. Then, a judge must decide if it is legitimate. But nowadays, the limit to access to personal data is the price you can pay.

In security, trust is defined as the ability to do harm. Whether you find me friendly or not, I do not have access to your medical records: you do not trust me on this matter, unlike your doctor. By storing privacy, habits and personality of its users, the ability of many companies to do some harm becomes particularly important.

I have expressed myself against a lack of control in surveillance from states, with a vue from here. However, in some framework, a state has legitimacy to carry out surveillance and investigations. We must constantly ensure that this framework is both appropriate and respected – for instance, as I write this, Edward Snowden’s disclosures have shown that this is not the case in most countries – but there is such legitimacy. For its part, a business is not legitimate in any way to have unlimited trust.

The end of privacy is not inevitable

Even though some try to make us believe the contrary, violation of privacy is not consubstantial to the use of social networks in particular, nor to the use of the Internet in general. Moreover, there already exist some practices that help to protect it.

Limit risks while browsing

As a starting point, you can use Privacy Badger when web browsing. It is a plug-in that blocks elements which may collect personal data on web pages you are visiting. It is quite a simple way to reduce the risk your data are transmitted to a third party without notifying you.

Besides, if you browse this site using Privacy Badger and depending on your browser settings, you will find that by default it may block some items. This is due to the fact that WordPress, the software I am using to manage the site, uses elements that may be used for the purpose of collecting information, for example using Google tools. I do not do such things. I am looking for a way to stop using these WordPress elements. However, this cannot be achieved so easily as I cannot afford, at least for now on, my WordPress version to differ too much from the standard version. At least because that would make updates difficult, which would be a big security problem.

Anyway, I have not installed module that would allow me to collect private information: the only items I collect with this website are the pages being viewed and the approximate location of the IP addresses that are consulting these pages. However, none of these are really reliable. As the browsing experience on this site is not altered by the use of Privacy Badger, I encourage you to use it even on these pages. Doing so, you will not have to trust me – keep in mind: trust is the ability to do harm.

Anyhow, I encourage you to read the privacy policy of this website.

Use social networks concerned with privacy

As we have seen, the way Facebook is working is inherently intrusive to privacy. As a consequence, it is not reasonable to use it. However, there are social networks that have incorporated the respect of privacy.

For instance, you can join me on Diaspora*. This is a social network based on a decentralized technology and free software. More than that, from the start respect of privacy is a key point of its conception, which is the subject of this article. We are quite a large number using it nowadays. Also, though still in active development, it is already perfectly usable. My ID on this network is ylebars@framasphere.org.

And for those who might be tempted with Manichaeism, notice that Mark Zuckerberg, Facebook co-founder, made a donation to the Kickstarter of Diaspora*. On this occasion, he said he considered it a “cool” idea.

Get out of the influence of Google

The policy on privacy of Google is also problematic. Yet it is quite possible to do without its services. Thus, for your research on the web, you can use alternative searching engines such as DuckDuckGo and Qwant.

Moreover, accounts of one user to various Google services, such as Gmail and Youtube, are bonded to each other. This allows a large collection of data and cross-checking. Same problems arise with services like Dropbox. However, there are alternative services equally effective and much less intrusive, such as those proposed by OwnCloud and Framasoft. The latter has also initiated the project “De-Google-ify Internet,” which aims to offer free and privacy respectful alternatives to the most popular web services.

Should you change your operating system?

Some may thinks that protect privacy requires technical skills and use of obscure systems. This is not entirely true.

It turns out that Microsoft must constantly be recalled to order about privacy, as it uses its operating system Windows to collect much of the private data of its customers. Apple, for its part, makes commitments and provides tools that should allow you not to disseminate private information. However, MacOS operating system has a long history of very intrusive practices. In particular, Apple definition of what are the private data is extremely restrictive, excluding many elements that are actually part of it.

Ultimately, it is true that the best guarantee in protecting your privacy lies in the use of alternative systems such as GNU/Linux and *BSD. True enough, one has to take the decision to install and use these systems, in contrast to Windows and MacOS which in general are pre-installed. However, contrary to popular belief, to use GNU/Linux is not – in any case, is no longer – difficult. Also, I think this site shows that, when using this system, you may efficiently work on publishing, music, videos and other things to come. Anyway, I think it shows that my difficulties do not come from the tools, but from my skills – or the lake thereof. Anyway, you can gradually adopt more privacy protecting practices and register an eventual transition to a new operating system in a medium-term process.

Use a smart-phone in privacy

I think that the various studies I cited clearly establish that smart-phones are essentially tools to collect metadata about their users. Such metadata are exploited by manufacturers and operators. Therefore, the ratio cost-benefit should be seriously questioned.

However, if you decide to use such a device, you have several options to forestall some privacy violations.

Regarding devices running Android, there are several websites offering valuable advices to protect your privacy using such a system. Simply, they usually lack of advice to protect against data collection by Google. This starts with the Google account. It is better not to use such an account. In this purpose, you can use independent application markets such as F-Droid and Aptoide, on which the vast majority of Android applications are available. Using these markets, there is no need to set up a Google account.

All applications available on F-Droid are open source and free software. This means that their source code is available, therefore one can ensure that they are not malicious. However, the principle of Aptoid is to allow everyone to create a store without specific criteria. As a result, some stores may offer malicious applications. To guard against this, it is better to use only stores rated “~ppa” and to install only applications marked with the badge “verified.” You should also pay attention to what an application requests access. For example, a simple flash-light application requesting access to your private data probably has some hidden purpose.

About the iPhone, Apple once again makes commitments. However, considering the history of Apple on the subject, the efficiency of these commitments is still in question.

Note that Edward Snowden and Andrew Huang presented an iPhone case which will allow the user to control whether the smart-phone is transmitting data. At the time I write this, it should be available in a few months and they plan to make versions for each type of smart-phone.

There are also alternative systems more respectful of privacy. Among these, Ubuntu for Phones and Firefox OS. Such systems provide much better protection of privacy.

Go further while preserving effectiveness for the various systems

We have seen that anonymization made by replacing an identity with an identifier is not sufficient. However, since the 1990s, more efficient methods in protecting privacy have been established.

For instance, Latanya Sweeney introduced k-anonymity10Latanya Sweeney, 1997. Datafly: A System for Providing Anonymity in Medical Data, Proceedings of the IFIP TC11 WG11.3 Eleventh International Conference on Database Securty XI: Status and Prospects, Chapman & Hall, Ltd. London, UK, pp. 356 – 381. Its principle is to expand the information spectrum: instead of the date of birth, age is requested; instead of the postal code, the department; and so on. Such a method increases the number of people that can match the information, reinforcing the respect of privacy.

However, concerning metadata, it is still not sufficient. A credible solution would be that users become the sole depositors of such data, rather than the different systems collect these data. For their part, those services can make demands without seeking excessive precision.

Take the example of a GPS system that uses information from its users to evaluate traffic. Rather than actually collect their coordinates, the application may send questions like: ‘Are you located less than five kilometres from this train station?’ The user’s phone would just answer yes or no. The application could then assess the traffic just as effectively as it would by collecting its users coordinates, but with increased respect for privacy. This provides a credible solution to protect private life without jeopardizing the efficiency of applications that are emerging.

No reason to resign

While credible solutions do exist, it is clear that the industrial collecting data are reluctant to adopt them. However, regulations agency tend to impose them. Create political will to promote such approaches is probably a good solution.

I have heard that in any case the end of privacy is inevitable, we would have no other choice but to accept it. Yet as we have seen, nothing really binds us to give up our privacy.

I write this article from France. I do not want to make big words, but let notice that in early 1789, it seemed clear that the French regime (since called the old regime), heir of centuries old tradition, was immutable. No one would have thought it could be changed but at the margin. Yet in the summer of the same year, the regime had undergone a profound change.

The first reason Apple opposed the FBI is because its customers wanted their privacy can be preserved to government investigations. What is surprising is that there is not at least as much concern about privacy violations made by commercial companies.

Still, if users decide that respecting privacy is an essential point, companies will risk losing their customers, while real solutions exist. We can choose what we want to accept: we have the means to act because a company cannot exists without customers.

Thank Diaspora* community for its constructive comments that helped me to improve this article.

Notes

Notes
1 Philippe Pajot, 2016. Entretien avec Vinton Cerf, La Recherche n° 508, pp. 4–8.
2 Yves-Alexandre de Montjoye and Gautier Cariou, 2015. Les métadonnées menacent la vie privée, La Recherche n° 501–502, pp. 114–117.
3 Yves-Alexandre de Montjoye, Laura Radaelli, Vivek Kumar Singh, and Alex Pentland, 2015. Unique in the shopping mall: On the reidentifiability of credit card metadata, Science n° 347 (6221), pp. 536 – 539. Doi: 10.1126/science.1256297
4 Yves-Alexandre de Montjoye, Jordi Quoidbach, Florent Robic, and Alex Pentland, 2013. Predicting Personality Using Novel Mobile Phone-Based Metrics, chapter of Social Computing, Behavioral-Cultural Modeling and Prediction, volume 7812 in the serie Lecture Notes in Computer Science, pp. 48 – 55. Doi: 10.1007/978-3-642-37210-0_6
5 Jonathan Mayer, Patrick Mutchler, and John C. Mitchell, 2016. Evaluating the privacy properties of telephone metadata, pnas , vol. 113, n° 20, pp. 5536 – 5541. Doi: 10.1073/pnas.1508081113. Disponible en ligne.
6 Arnold Roosendaal, 2010. Facebook Tracks and Traces Everyone: Like This!, Tilburg Law School Legal Studies Research Paper Series n° 03/2011. Available on-line.
7 Michal Kosinski, David Stillwell, and Thore Graepel, 2013. Private traits and attributes are predictable from digital records of human behavior, PNAS vol. 110, n° 15, pp. 5802–5805. Doi: 10.1073/pnas.1218772110. Available online.
8 Matthieu Manant, Serge Pajak, and Nicolas Soulié, 2014. Online social networks and hiring: a field experiment on the French labor market, MPRA Paper. Available online.
9 Nicolas Jacquemet and Anthony Edo, 2013. La Discrimination à l’embauche sur le marché du travail français, Éditions rue d’Ulm. Available on-line.
10 Latanya Sweeney, 1997. Datafly: A System for Providing Anonymity in Medical Data, Proceedings of the IFIP TC11 WG11.3 Eleventh International Conference on Database Securty XI: Status and Prospects, Chapman & Hall, Ltd. London, UK, pp. 356 – 381

Published by

Yoann Le Bars

A researcher and teacher with slightly too many interests to sum this up …

One thought on “Privacy: an anomaly soon to be corrected”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.