Big Data: how to minimise risk in data analysis

Posted by media on December 29, 2015 at 9:00 AM


Today, an increasing number of companies are analysing online data to predict the behaviour of customers or potential new customers as well as offer more tailor-made products and services.

However, any data analysis from the Internet or Cloud must comply with a series of standards, namely Spain’s Data Protection Act (Law 15/1999 of 13th December) that applies to any online information involving individual persons.

Any breach of regulation would invoke a number of legal risks that could undermine a company's profitability through loss of reputation or credibility. Moreover, any failure to comply with the law could incur fines ranging from €900 to €600,000 (Art. 45) issued by the Spanish Data Protection Agency.

In today's post we explore the factors needing consideration in order to minimise any potential legal risks. 

This post is also available in Spanish.

Risks to user privacy.

If a particular method of data analysis involves the identification of individuals, their explicit permission must be sought. According to the rules of data protection, if a company analyses data from an individual’s social media profile, that user must be informed that their data is being used and to what purpose in order to gain their full approval.

Having easy access to social media should not allow us to retrieve and use whatever information we find online. It must be understood that each and every individual with an online profile is entitled to their privacy, and while it is one thing for an individual to allow the use of his or her data for analysis, it is quite another to take it without warning or prior notice.

It is necessary to either have full user consent or an agreement with a company managing a platform to ensure that any data used is in accordance with the terms of an agreement and respects user privacy by only focusing on relevant information. Anything to the contrary would mean breaking the law and incurring penalties as described earlier in the post.

If we want to work with social networks while ensuring both the security and privacy of users, the best option would be to first review each network’s terms and conditions as well as sign partnerships in full awareness of the relevant legislation. This prior research would ensure that any levels of compliance can be met before any agreement is formalised.

Data analysis may also be carried out by a third party. If an outsourced contractor has access to information in which individuals can clearly be identified, the contractor will become what the Data Protection Law classifies as a data controller. He or she will therefore be obliged to sign a contract clearly detailing which files he or she is authorised to view as well as the security measures the individual must observe in order to work in a lawful manner.

Should such services not be outsourced, the company offering them would effectively assume the role of “data controller” and therefore be responsible for adhering to any relevant legislation on the client’s behalf.

If anonymised information is being analysed from various sources, that is to say, data that does not identify individuals and may prove useful to a company for statistical purposes, we must also consider whether it is lawful to use in accordance with the guidelines below.

  • Infringements

As mentioned previously, accessing data on social networks not only invokes rights relating to individual privacy, but also that of companies who are entitled to protect their content and regulate its use, as well as control their trademark, image rights and copyright. These rights are also protected by other legislation which may in some cases overlap with data protection laws. If for example, an image was edited without an individual’s consent, both data protection laws and people's rights to honour, privacy and their own image would be violated. 


If data or content is gleaned from websites without it first being clarified that it is royalty-free or protected by copyright, the intellectual property law is at risk of being broken. Another example is the unlawful use of logos belonging to potential clients that have been taken from their website and used without permission on another company's documents or in their databases. This effectively leads to the infringement to a brand’s exclusive rights to use their own logo, or even lead to valid accusations that a company’s brand or reputation has been damaged by its unlawful use.

Another possible offence is data collection by scrapping (a technique that uses software to simulate Internet navigation and conflicts with the terms and conditions of many websites) and through the use of web crawlers or bots (software programs that inspect websites in a methodical and automated way, monitoring URLs, creating a copy of the visited websites and later trawling the same websites using a search engine to index the pages and create a rapid search system).

Many of these techniques contravene the terms of use of many websites and social networks. LinkedIn, for example, does not allow its databases to be accessed by some of these systems and explicitly prohibits them. Great taken must therefore be taken as to where data is sourced. Terms of use must be checked at all costs.

Other websites are subject to copyright or licences. If data were extracted and used from them regardless, a copyright infringement would subsequently occur.

In any of these scenarios, a failure to observe the relevant terms and conditions of a source may result in a violation of regulations and accusations of perjury or damage suffered by the holders of those rights.

Security Risks.

Another area of uncertainty related to data use is the reliability of a source. Without thoroughly checking sources, it is not always clear if they have been hacked or indeed meet a minimum standard of reliability and security. If large-scale data has been extracted from different places, how do we know that the information is wholly reliable? How would we know if the data drawn upon was the result of a truthful enquiry and accords what is needed for research purposes? How do we know if websites or databases used as sources of information are as clear and objective as they could possibly be?

It will therefore be necessary to investigate each and every source from which data is obtained as well as thoroughly checking legal notices, terms and conditions, procedures and security measures. If a third party is hired to assist with this process, they must be able to show an appropriate level of experience and credentials to ensure that they are safe and reliable.

Suppose information has been gathered from a hacked website featuring false data? In such an instance, the results obtained from the data collection would simply not be accurate.

Data analysis is a practice that self-evidently allows for the technological, social, economic or financial progress of the business sector. But also it invites a number of risks. In this post, we wanted to highlight the factors that must be taken into consideration to ensure reliable and safe data analysis, ensuring a sound and lawful practice.

This is a guest post by Vanesa Alarcón Caparrós and is also available in Spanish.

Vanesa is a specialised lawyer in new technologies and intellectual property, and a founding member of Avatic Abogados.




Topics: Legality

Blog Subscription

Recent Posts