“My daughter got this in the mail!” he said. “She’s still in high school, and you’re sending her coupons for baby clothes and cribs? Are you trying to encourage her to get pregnant?”
In 2012, the US discount retailer Target perfected their customer analytics to the extent that they predicted the pregnancy of one of their teenage customers, before her own father did!
When we work with data analytics we often lose sight of the context in which our users and customers live their lives. As data scientists, we focus on collecting data, filtering it, and improving our predictive accuracy. After all, getting the prediction right is a challenge in and of itself. We do not want to make incorrect predictions, or miss out on correct predictions. However, as the example above illustrates, sometimes we also should avoid making the correct prediction too: just because we can, does not mean we should.
This holds not only for specific inferences like the one about the pregnancy of a teenage daughter, but also other aspects of data analytics. Data is often personal, and occasionally sensitive. It’s time to think more carefully about inferences across data streams, and consider whether we are making inferences that users did not consent to being made when they initially provided the data. Users are also influenced by algorithmic biases, like the filter bubbles on social media that occurred during the EU referendum in the UK. Pre-existing biases are amplified, because machine learning algorithms are only as impartial as their developers and the data they are trained on. There are times when these biases need to be made visible to users: this is a great time for us to start having a conversation about how explanations be used to improve transparency.
How did we get here? Well, many of us believe that data analytics have a massive potential to be used for commercial and societal benefits. We are experiencing an explosive growth of digital content. According to the International Data Corporation, there currently exist over 2.7 zettabytes of data. It is estimated that the digital universe in 2020 will be 50 times as big as in 2010 and that from now until 2020 it will double every two years. The commercial world has been transformed by big data with companies competing on analytics. We are entering a new era of predictive analytics and data intensive computing which has been recognised worldwide with various high profile reports.
The question is thus not whether we should be involved with analytics. Like any powerful toolkit, they can and should be used for the greater good. Consequently, this article is not interested in pointing fingers. Rather, this is a call for recognition of uncharted territory and that the issues surrounding collecting and analysing personal data are complex and delicate. There are many issues that we already know about, but also many we have yet to encounter.
the internet has become the marketplace where algorithms collecting and selling data about human attention act as the driver of their digital economies
The current giants of the global economy (including, but not limited to e.g., Google, Facebook, and Amazon) all trade profitably in human attention. For them, the internet has become the marketplace where algorithms collecting and selling data about human attention act as the driver of their digital economies. As we collect and analyse this data, we must keep our consumers in mind. Firstly, not everyone is happy with their personal or usage data being collected, and many users are worried about how it is being used. A recent survey by the Chartered Institute of Marketing (CIM) survey with 2500 people found that nine in ten people have no idea what companies do with the personal information the firms hold about them. Among other things, the survey concludes that personal data policies on websites should be clearer and simpler. Similar concerns were voiced by participants at a recent event at the ESRC Festival of Social Science titled “What is the internet hiding from you?’’. This event covered the topic of information sharing online, naming both risks (such as sensitive personal information being directly collected or inferred) and benefits (such as improved tailoring and personalization).
These concerns are also recognised on a policy level, by both the European Union (EU) and the Information Commissioner’s Office (ICO) in the UK. The new EU General Data Protection Regulation (GDPR) coming into effect in 2018 recognises privacy as a legal right, and includes a “right to explanation’’ whereby a user can ask for an explanation of an algorithmic decision that was made about them. Despite the planned UK exit from the EU, the ICO confirms that comparable regulations will be put into effect in the UK, and that the ICO will have legal capacity to enforce compliance with these regulations. Privacy policies will need to be geared towards the customer and expressed in clear and plain language.
Overlooking the person-centred ethical issues may result in negative social impact.
Consequently, it is largely a welcome development that analytics platforms have a great deal of power by having access to the usage data of individuals. It is now time to start using that power wisely, both in the immediate future, as well as moving forward. Let us consider what will happen with information if it falls in the wrong hands, or gets combined with other streams of data. Let us reflect on the societal impact of our growing reliance on algorithms, AI, and machine learning, combined with the gradual reduction of human engagement with many automatic processes. The public is justifiably concerned, and it is our responsibility to think critically about what data we need to collect and store, and for which purposes. Computers can make and collect data and run algorithms, but humans working with big data are the ones that establish the analytical programmes, professional practices, and codes surrounding them. Overlooking the person-centred ethical issues may result in negative social impact.
Let there be a balance between the innovation and economic opportunity of big data, and respecting privacy and human rights within open, tolerant societies. To allow this to happen, we need to work together to establish best practices, and make a record of positive case studies where these have been observed. This is a conversation that is going to need all hands on deck: customers, policy makers, data analytics companies, as well as academic researchers. Let’s get cracking!