So, what’s it all about?
The Guidance provides practical advice on how to resolve apparent tensions between data protection compliance on the one hand, and improving efficiencies and accuracy by deploying AI on the other. By its nature, AI uses large datasets and can result in questions around accuracy and data minimisation that some may see as at odds with privacy requirements. The Guidance explains how these issues can be resolved. It focuses mainly on machine learning (ML); however, it acknowledges that whether ML or not, the risks and controls highlighted will be of use to any development or deployment of AI.
Who is the Guidance aimed at?
The Guidance is, of course, aimed at those working in compliance (DPOs, GCs and so on), but, importantly, it has also been designed with tech specialists and privacy teams in mind, such that developers and IT teams will be able to use the Guidance to see what practical steps they need to take in their day-to-day interactions with AI and in longer-term project planning.
Hasn’t the ICO already released guidance on AI?
Yes. The Guidance is part of a longer-standing initiative, and it relates to other documentation on AI. This includes the ICO’s explAIn guidance, produced in collaboration with the Alan Turing Institute, which sets out the key considerations to be taken into account by organisations when explaining the processes, services and decisions delivered or assisted by AI, to the individuals affected by them. Separately, the ICO recently produced a series of blog posts providing updates on specific AI data protection challenges and on how its approach to AI is developing. The amount of buzz around AI is unsurprising given that it is one of the ICO’s top three strategic priorities for 2018-2021.
Which GDPR issues does the Guidance address?
The Guidance addresses each of the GDPR’s principles as follows:
1. The accountability and governance implications of AI, including data protection impact assessments (DPIAs)
The key message here is DPIAs, DPIAs, DPIAs - Do them Please, In Advance! The ICO notes that AI processing activities are likely to be of high risk to individuals’ rights and freedoms, so it is certain a DPIA will be required. On occasion, prior consultation with the ICO will also be required. Helpfully, the ICO provide guidance here on additional AI-specific DPIA requirements, including explaining variations and margins of error (i.e. statistical accuracy), the degree of human involvement and trade-offs made.
Rather surprisingly, the ICO suggests producing two versions of a DPIA: one presenting a thorough technical description for specialists, and another containing a more high-level description of the processing. This may not go down particularly well with compliance officers whose workload may double as a result. In addition, given that AI often depends heavily on third-party processing, it is recommended that controllers consult with their processors on the intended use of the AI get checking those data processing agreements!
2. Lawfulness and fairness
As we all know, a lawful basis is required for any processing of personal data. This section of the Guidance provides some detail in the context of AI. It acknowledges, for example, that it is difficult to collect valid consent given that consent needs to be specific and informed, and often at the developmental stages of AI its ultimate purposes are not yet clear. Of course, a further complication in seeking to rely on consent is that the rights to erasure and withdrawal must always be borne in mind. The Guidance notes that it will be difficult to rely on the performance of a contract as a lawful basis for processing personal data, save where such processing is “intrinsically” linked to the service. Unsurprisingly, ‘vital interests’ is a no-go as a basis for processing personal data to train AI systems. That leaves most companies with ‘legitimate interests’ as the most appropriate lawful basis on which to process personal data bring on the legitimate interests assessments.
Automated decision making is relevant in the context of AI, and the Guidance also reminds controllers of their Article 22 obligations - namely, introducing ways for meaningful human intervention to be introduced when necessary.
In relation to fairness, the Guidance contains specific details around discrimination, which is helpful given that this is a common criticism and concern raised about AI. The ICO notes that AI can perpetuate historic discrimination by attempting to repeat it, due to the fact the training data initially used may itself be discriminatory (e.g. more men historically being hired into a certain role). In the Guidance, the ICO says that this can be addressed by manually modifying the AI model after it has been trained.
AI discrimination can also take place if a certain protected characteristic is underrepresented in training data. The ICO says that this issue can be addressed by balancing out that training data by adding or removing data about the under or overrepresented subset. The Guidance recommends that you proactively assess the chances that your AI model may be inferring protected characteristics from a dataset and inadvertently discriminating against those groups. Some protected characteristics comprise special category data, and as such you’ll need an Article 9 legal basis to infer that data, which your ML model may be doing unknowingly.
The Guidance stipulates that you should document your approach to bias and discrimination mitigation from the outset, and build in safeguards during the AI design and build phase. These measures should be robustly tested. One suggestion is that where human decision-making processes are being replaced by AI, the two processes should be run concurrently for a period of time to identify any potential issues.
3. Accuracy
Ensuring accuracy of personal data is a principle of the GDPR, and statistical accuracy is a key principle of most AI models. However, the ICO emphasises that the two concepts are not one and the same. Under the GDPR, all personal data should be accurate, but the output of an AI system does not need to be 100 per cent statistically accurate in order to comply with this principle. This is because in many cases the outputs of an AI system are not intended to be treated as factual. This means that such outputs will not be subject to data rectification requests: the outputted data represents a statistical guess, and should be labelled as such.
4. Security and data minimisation
Vast amounts of personal data may have to be moved around various databases in order to be fed into an AI system. For example, datasets may need to be exported from a database, emailed to a supplier, exported again and then imported into an AI training environment. Each time a new copy of a dataset is created, the risk to data subjects potentially increases, as does the difficulty in determining where a potential breach may have originated. For this reason, the ICO recommends that all movements and flows of personal data are recorded and documented, and that intermediate files created along the way are purged as soon as they are no longer required.
In relation to the use of open source code, the Guidance recommends subscribing to security advisories so as to be notified of any newly discovered vulnerabilities in the code. Risks can further be mitigated at the development stage by running the AI training environment within a virtual machine. Similar guidance is given to manage the risk of model inversion attacks and membership inference attacks. Examples are also given of images that have been deliberately modified so that they are reliably misclassified, resulting in an image of a turtle being classified as a gun. Given that this was at page 75 of 105, I wasn’t sure whether I had read that correctly or whether I was just delirious.
There is an obvious potential clash of principles when it comes to data minimisation (which, by its nature, encourages that the absolute minimum amount of data be collected in order to fulfil a purpose) and training an AI model (which, by its nature, requires a large amount of data to be provided in order to adequately train the model and, in turn, help improve accuracy). The ICO fully acknowledges this issue and recommends utilising certain techniques to achieve data minimisation, including differential privacy (whereby the ML reports data fields depended upon to produce an outcome - the data field could then be deleted if not utilised) and federated learning (whereby multiple parties train ML models on their own data locally, then combine some of the patterns identified by the models, known as gradients, into a single global ML model without sharing their personal data).
5. Data subject rights and AI
Now comes the fun part. How do you adequately address the exercise of a data subject access request when it comes to AI? Unhelpfully, the guidance “does not cover each right in detail”, so we’ll just have to keep guessing. However, it does at least identify the four areas in which an AI system may be holding personal data as training data, predictions made, results of predictions/outputs, and data contained in the ML model itself. This provides a useful framework for thinking these issues through.
The ICO acknowledges some of the difficulties of managing data subject rights but urges for any requests to be looked at on a case-by-case basis, depending on the dataset and AI involved. Training data, for example, may still single out the individual it relates to (e.g. training data in a purchase prediction model that includes a pattern of purchases unique to one customer) but, on the other hand, in some cases it may simply be impossible to identify an individual in the training data. There is no one-size-fits-all answer.
What about fulfilling erasure requests where the data subject’s data is crucial to the functioning of that model, and the model has been developed with that data from its inception? The ICO states that the chance of such a request being received is very small, but that it is possible and if requested, the entire model may have to be deleted. Apparently, this shouldn’t be costly “if you have a well-organised model management system”. I think that some organisations would beg to differ!
Interestingly, the Guidance also notes that where the right to rectification is exercised, you may not have to rectify, for example, an incorrect delivery address held in training data, because it is likely more important to rectify the same incorrect delivery address held within the customer’s live record. The position that rectification rights can be qualified by such rectification’s degree of ‘importance’ is new. We will see if this remains in the final published text.
The difficulty in providing privacy notices to individuals whose personal data forms part of training data is also addressed in the Guidance. This difficulty arises because training data has usually been stripped of obvious identifiers like name and email address, so providing a privacy notice to individuals to whom training data relates may be impossible (i.e. how do you know who to provide the notice to, and how do you contact them?). The Guidance says that the solution to this may lie in publicly posting information to explain where you obtained the training data from.
Conclusion
Prepare for a thorough overview of the risks to consider when implementing a new AI model, and the controls that an organisation can put in place in order to mitigate those risks. Although helpful, this Guidance may equally prove overwhelming to a number of organisations, given the sheer amount of documentation and governance procedures recommended for compliance. The recommended procedures include an overarching data governance framework, training programmes and competency assessments, discrimination and bias procedures, system evaluations, AI third party policies, monitoring outputs against expectations, testing procedures, enhanced DPIAs, regular ML model peer review, algorithmic fairness monitoring and security advisory subscriptions - just to name a few! Implementing every recommendation in the Guidance is likely to be a very large task indeed.
Client Alert 2020-243