We are in an age of “Big Data”, “Open Data” and the intended and unintended sharing and misuse of such data. This blog post is intended to collect some examples and notes to contribute to the task of informing the public about some of the privacy issues that need to be considered and addressed. It is not acceptable that big business and government agencies continue to decide on how such data is collected, stored, used, monetized and transferred to third parties without a more informed public debate.
There are valid and important social uses of data held by governments, companies, service providers and individuals. But there are serious misuses of such data. Informatics people need to be at the forefront of explaining to politicians, agencies and individuals what is happening, what is possible and what the terminology means. They should explain the positive uses, but also the current and potential risks and misuses.
Sometimes data that is collected on individuals is kept in a form that allows for individuals in the population to be tracked across related data to help in getting better quality results and useful information. E.g. in health related data this can assist in many ways. To protect the individual from misuse of very sensitive data and potentially damaging consequences, the names and other information is usually removed from such data so that the actual individual cannot be subsequently identified, and the data misused to their detriment. This is a normal ethical requirement for even collecting the data in the first place in research. In things like health records the identifying information is kept with the stored data to assist agencies with legitimate access to the data to serve the individual by making proper use of such information.
The problem arises when government agencies and research organizations feel that this data can be used to assist them in their studies and a simplistic and easily reversed mechanism for pseudo-anonymization takes place. For example removing names, but leaving an individual identification number. There is a strong tendency to want to retain post code information for the individual as well as that helps in many data correlation and study aspects, but that also makes reversal of the pseudo-anonymization trivial.
The NHS Care.Data system is planned to become active from Autumn 2014 for everyone in England unless individuals opt-out. Care.Data proposes to store key medical information and consultations along with an individual’s unique NHS identification number, full postcode, ethnicity and gender [ref] yet states to the public that:
This new record will not contain information that identifies you.
This is quite clearly wrong or misleading. Third parties will be given access to this data for legitimate and contracted purposes…. some of the same companies which have been accused of dubious practices in their research publications approaches and data gathering and handling.
In future governments or cash-strapped agencies will be tempted to “monetize” the valuable data resources they have collected, especially as the costs of data warehousing and curation mount, and will find personal data to be irresistibly of interest and value to companies. If they can hide behind the term “Pseudo-anonymization” and make the public believe that protects them there will be many pitfalls and serious implications for individuals in future.
Hiding behind the term “Meta-Data”
Meta-data, data that describes other data often referred to as “Content” is I think something misunderstood (or deliberately used in a way that misleads) by many press and media commentators, security agencies and politicians. The simple example of a web search on a health web site by an individual which will contain the search query and much identifying information about the person performing the search can be used to show the dangers of allowing such content to be put in the hands of anyone who wishes to buy it.
Why does this matter… no one is “looking” at your data… there is so much data no one can look at it all… everybody shares data these days… today’s kids are not bothered… what harm can it do… it’s illegal to misuse data… there must only be only a few isolated cases where data is lost or misused… you are paranoid…
I have heard every one of these comments.
Besides the obvious loss and subsequent misuse of data through illegal activities such as theft and losses of data through human error, process flaws and casual mistakes, there are many documented cases of “secure” systems being targeted and the data of millions of individuals made available to third parties, criminals and agencies who find that data useful.
Large “blue chip” companies are known to illegally or unethically obtain and use data in private or to affect their business. This is known through successful prosecutions and many investigations that are underway.
An investigation by the UK Information Commissioner into big business obtaining and misusing personal data, including health records, is reported in the Independent’s i paper on Monday 24th February, 2014. But I just chose that example as it is the day on which this blog post is written. On many days such reports can be found.
Big business has used illegal black lists to affect hiring of individuals who had no way to know they were on such a list or challenge that. They have been shown to use private investigators to blag or obtain private information to affect their dealings with individuals – including medical records. Insurance companies in the USA have been reported as requiring the data chip in cars to be handed over to investigate driving behaviour when a claim is made.
It is naive to assume that companies such as loss adjustors and insurance agencies will not be tempted to misuse data they are given for one purpose for another where they have been shown already to be employing dubious or unforeseen means today.
We act within the legal constraints and monitoring mechanisms of the local jurisdiction
Reactions to events such as bomb attacks and terrorism risks can cause radical temporary changes in the law and the balance of rights between government and individuals in society. Temporary emergency measures can and sometimes must be used to maintain order even in largely democratic societies. But, laws may be drafted that are very wide ranging and with little depth of risk analysis that considers the potential for individual harm rather than the matters of state. The reliance on individual ministers to exercise the wide powers they are given and oversight bodies with minimal capability are a danger. Under such legal frameworks essentially anything becomes “legal”. But such powers tend to be left in place and even strengthened to give more surveillance capability to centralised authorities.
Alliances can be used to broaden the scope of what can “legally” be done by going “off-shore” when required to achieve broader coverage.
But if one country that purports to be working within its legal framework is doing things considered by some to be objectionable, why not others, some working without such legal constraints. A free for all and race to the bottom will arise.
For the really paranoid… or just those with an imagination… and some knowledge of past and current affairs… consider the situation where a future government agency or group is able to obtain access (legally or illegally) to personal data and takes it upon themselves to identify a cohort that meets some negative criteria (as deemed at the time or by the policies of some specific regime or group). A quick trawl over historical data could be useful to identify who would one day fail some test and become targets for attacks and action by such agencies and groups.