AI is transforming e-discovery-but are your validation practices keeping pace? In this episode of Tech Law Talks, host Anthony Diana sits down with Kiriaki Tourikis and Marcin Krieger to tackle one of the most critical, and often misunderstood, aspects of AI-enabled discovery: how to prove your process was sound.
The panel unpacks what validation really means, why it's far more than a checkbox exercise, and how it protects legal teams when opposing parties or courts come calling. They also share hard-won lessons on navigating aggressive discovery demands-and what not to agree to. Tune in for practical strategies to stay defensible as generative AI raises the stakes.
Transcript:
Anthony: Hello, this is Anthony Diana, a partner at Reed Smith. And today I'm welcoming you to Tech Law Talks where we're going to be continuing a podcast series on AI-enabled e-discovery. The podcast series will focus on practical and legal issues when considering using AI enabled e-discovery with a focus on actual use cases, not just the theoretical. Joining me today are Kiriaki Tourikis and Marcin Krieger. And we're going to be discussing validation, one of the key aspects of making sure that the use of AI enabled discovery is legally defensible is validation. So we're going to be talking about that today. Welcome guys. So let's just start about, you know, what is validation, particularly in the context of enabled AI discovery. Kiriaki
Kiriaki: Sure, thanks, Anthony. So I think we all know AI tools, especially GenAI tools can be incredibly powerful, but they probably introduce some new risks more than manual review, more than even traditional, I think, TAR and analytics, right? They hallucinate, they can misinterpret. So I think because of all of those reasons, validation becomes the mechanism that allows lawyers to trust but verify. It's how I think we maintain our ethical duties of competence, candor, and supervision. But the expectation and what validation is isn't to understand the math and become a coder and understand the algorithm. But the expectation is that if we use AI to support decisions about relevance, privilege, or factual narratives, that we're taking reasonable steps to ensure that the tool is accurate, reliable, and properly supervised. So validation is not about proving the technology is perfect, but it's about demonstrating your process was thoughtful, your oversight was meaningful, your decisions were informed. You simply didn't just let the tool run unchecked. You supervised it the same way you'd supervise a junior reviewer or a vendor or any other non-lawyer assistant under the rules of professional conduct. And I think it's important to think about validation as not just a one-time event. It's not something you just kind of do at the beginning or just do at the end, right? It's something that's continuous. Models update, data changes, reviewers evolve their understanding. And so the model may shift. So ongoing validation will allow you to catch those changes. And we're just really, I think, talking about the defensibility of AI enabled workflows, right? I think everything we're gonna cover builds on this concept. A validation is that tissue that ties technology to our professional obligations and ensures that AI supports the integrity of the discovery process.
Marcin: Yeah, validation is also more than that. Validation, or at least the processes that go into what we call validation, are also separate and distinct from what we refer to as QC or QC workflows. The information that we get from validation is information that we have to be prepared to share. And that information is what informs the court and the requesting party that what we did is in fact complete. And that information needs to be structured in such a way that does not give the requesting party excessive insight or oversight of our discovery process. It allows us as the producing party to maintain control of our document reviews. And as Sedona Principle 6 says, you know, we as a producing party are the best situated to define our processes for our reviews. And it's the validation process that we use to show the court, you don't need to look any deeper than this. The validation proves what we did was right and that's all you need to know.
Anthony: So when you say that, Marcin, what activities or outputs are you actually validating? When you say we're going to the court who proves something, what are we validating?
Marcin: So at its most basic and the most common use of any AI tools, we are validating that our document productions are substantially complete when it comes to responding to the document requests that we received from the other side. So in American adversarial litigation and in the discovery process, we all know that the requesting party sends us their document requests. And after we negotiate the scope of them, we have an ethical obligation to produce documents that are responsive to those requests. But the standard is not perfection. The standard is reasonableness. And when you use any kind of AI tool, those AI tools are never going to be 100% complete. But the goal is to get to a level of completion that a court can say this is substantially complete and the cost and burden of continuing down the discovery process is excessive. But how do you do that? How do you validate that? You can't just say here are our documents and we found substantially everything. There's a whole world of documents that you aren't looking at. That's the whole point of this AI technology is to save us cost. It's to eliminate irrelevant documents. We have to validate that what you aren't looking at and that we aren't producing isn't significantly relevant, is largely nonresponsive, and that a court will agree with you that the cost of actually going into what you aren't looking at now far exceeds the benefit of what you're missing. So at its most basic, that's what we're validating. That's the most common use. Validate that what you're producing is substantially complete. Now you can use AI for other things beyond just responsiveness. You can use AI for things like privilege and you can use AI for things like automated redactions. But in the same way, you have to validate what the AI is doing and be able to defend that process to a court so that if mistakes come up or if gaps are discovered or if things are missed, a court will say that I agree with you that your process itself was accurate and done well and you validated that process so that those errors are not used against you.
Anthony: So when you're saying validation, are you saying, like, again, you're talking about sampling, right? You're basically sampling documents that haven't been produced to make sure that there's nothing responsive there, right?
Marcin: Right. So what does validation actually look like? The most frequently used validation process is what's called elusion testing. And that's a fancy technical term. But what we mean is you take the entire universe of documents that the AI said you don't need to look at or that you've determined is below your cutoff score. So somewhere along the way you're doing your QC, you are testing the boundaries of what is in your review and you get to a point you say, look, I believe that at this point we can stop our review. We're going to pause. We're going to stop reviewing and we have, you know, 50% of the documents haven't even been looked at, but the AI says that the likelihood of them being relevant is really low. So you do a sample test. You take a statistically significant random sample. It almost doesn't matter how many documents you haven't looked at. It could be 50,000. It could be half a million. The math is easy to find online. But essentially, if you take a sample of approximately 2,600 documents, that gets you to a very high level of confidence on your statistical sampling. You look at those documents and you have somebody review all of them and you can calculate how many responsive documents are in there. And that gives you a percentage. Ideally, it's zero. But you will always find one or two documents. And let's say you find 26 documents in your 2,600 set, that means that you have an elusion rate of 1% or 1 in 100 of the documents that you haven't looked at are relevant. And from there, you can figure out how many responsive documents have you missed. You take that number and you can compare it against what you have found and marked responsive, what you're producing or holding on your privilege log. And you can come up with what's called a recall score, which is just a percentage. So if you can say, hey, based upon that random sample, I think I've missed 1% of my documents I never looked at, and we know that's going to be a certain number, and here's what we're producing, that's a different number. And you could say, look, I found 90% of all of the responsive documents. That's how I validate my TAR process. And if you have a high enough recall score, which courts say that there are very few opinions that actually discuss what an appropriate recall score is, but in the industry and based upon existing case law, recall scores above 80% are generally considered sufficient and certainly a recall score of 90% a court would be hard pressed to find that is a deficient process. So you can disclose that number because that doesn't disclose any privileges, that doesn't disclose anything about your process. But you can go to the court and the opposing party and say, hey, I validated my review and we have a recall score of 90%. And based upon that, and also my existing ethical obligations, I can say that my productions are substantially complete.
Anthony: So, Kiriaki, just in terms of, I know you have a lot of experience, particularly with TAR and the use of technology assisted review, but what Marcin described, what are the sort of the challenges that you see both from courts, regulators, plaintiffs and the like, when we start talking about validation?
Kiriaki: I think some of it is understanding what it is entirely, right? I think there's the aspect of it, which is QC, and there's the aspect of validating a process, right? And so I think with TAR, we kind of understood the process and really it was validating the results and was very result driven, right? Are my results correct? Am I leaving things behind that really should get produced? Am I making sure I'm protecting against any privileged waivers? You know, this I think is a little bit more nuanced. There's more involved when you're using GenAI. And so I think it's less about just the QC aspect, but also about validating the process. Because there is a difference, I think, between QC and validation, right? So the quality control piece is about checking the work. And QC is part of the overall validation when you're talking about AI enabled eDiscovery, right? But QC asks, did the model get this document right? Is it classified correctly, et cetera? And the focus is on the output, right? The specific decisions the AI made. And it should be something we're comfortable doing already, right? With manual review, we QC it. With TAR, we QC. And so again, we still have to have that piece, which we should be used to doing. And I think validation in its entirety is also about checking the method and not just the results, right? Do we use the right tool for the right task? Are we producing reliable results? Are the workflows, the assumptions and the parameters sound? Are the outcomes consistent, repeatable and explainable, right? So validation in its entirety focuses, I think, on the process and not just the specific outputs. QC answers the question, did this decision make sense? Is it correct? And I think validation approaches it from the standpoint, does the entire approach make sense, right? So think of it as QC is like proofreading the brief, making sure the citations are right. I think validation is making sure the entire writing process is right. So did you do the right research, outline the document correctly, QC your sources? Right. And so you want to make sure the entire process is solid. And I think that's the important distinction between, you know, maybe a QC and a TAR workflow versus I think GenAI models can, as we keep saying, hallucinate, they can be confidently wrong, and QC may not necessarily catch some of these systemic issues if you're just looking at some sampling. So I think validation the way we think of it more broadly is trying to see that the model is behaving consistently. What were the settings and the prompts? Are those appropriate? Did you have a good workflow? And did you monitor the model over time? So I think validation when we think about it is that there's evidence that your process was supervised, tested, and defensible. And QC is just a piece of that.
Anthony: And then just in terms of thinking about challenges associated with validation, right? I know for TAR, one of the things that came up quite a bit when it was first coming out with validation is plaintiffs and others were saying, but we want to see the validation set, right? Which has been litigated and the like. I think ultimately that was told generally no, you don't do it. But that's always a risk. So what are your thoughts on that? Because that's obviously one of the things that is going to happen when we do the GenAI as well, right? Where people are going to say, if you're validating, I'm not going to take your word for it that you have a 90% recall. I want to see the validation set, particularly the responsive documents that have not been captured yet to make sure I'm comfortable with that. That's what the plaintiffs are saying.
Marcin: Yeah, so we're now more than a decade beyond the seminal opinion by Judge Peck that the use of technology assisted review is black letter law. And yet we are still actively fighting with opposing counsel about what we refer to as a validation protocol or something called a TAR protocol, where the party that's issuing the document requests, which with a mountain of case law says they have absolutely no rights to control the process, still seek to have a certain level of control over the process. Sometimes it is well-intentioned and there can be very healthy negotiations where parties come to an agreement about what we're willing to disclose about our validation process. And in my opinion, there are what I would consider to be very aggressive or heavy-handed requests where parties demand not only an unprecedented level of insight into the validation process, but oftentimes control over the validation process. Which can include things like requesting to see the samples of the nonresponsive documents, even though there's absolutely nothing in the law that says that we have an obligation to produce nonresponsive documents. In the world of generative AI in particular, I think we're going to see a movement where those same parties that were very aggressive about how we validate TAR are going to seek insights into our GenAI validation process. And with GenAI, we talked about validating and doing the end stage validation. But with GenAI, there's actually a lot of validation that happens up front in terms of prompt engineering. I can see a world where requesting parties are going to ask to see things like, what are your prompts? They're going to ask for revision control. They're going to ask for things like, what is the difference between if we run this prompt and that prompt? And I want you to produce the documents that are in the margins so that I can see that your GenAI process is working correctly. It's a great overreach in my opinion. We as the producing party have our ethical obligations to conduct our reviews in an appropriate manner. It's our obligation to validate those processes. It's our obligation maybe to disclose the results of those processes. But parties are going to want to pierce that veil. They're going to use every opportunity to glance behind the curtain. And it's going to be very important for producing parties to be careful what they agree to. Some of the ugliest opinions that we have in the world of the use of any kind of continuous active learning TAR or whatever are opinions that are grounded in the producing party agreed to something that they didn't actually have to agree to, but they agreed to it. Now they're being held to it. They violated what they agreed to. And now the court is ordering sanctions that include a heavy look into the e-discovery process. But discovery on discovery is disfavored and producing parties that are adopting new technology just have to be very careful and smart about what they do and do not agree to in response to requests from the requesting party.
Anthony: Yeah, that's a great tip, Marcin. So again, I think that is key. And this is the challenge we have with TAR and not being as used as frequently as need be is because of these issues, right? Fear of what if I get into a fight? I'm afraid of what I'm going to agree to. I'm going to have to produce my validation set. I'm going to have to look at all those issues that came up in TAR that kept it from moving forward. And hopefully that's not going to happen with GenAI. So Kiriaki, to you, what other tips do you have if you're starting to think about validation with the use of AI?
Kiriaki: I mean, I think it's about documenting your process, about not waiting to do it, not waiting to have that plan, right? As you're formulating your strategy, right, there should be a plan in place about validation. You should be thinking about when you're doing it, how often you're going to do it. This isn't something you kind of do at the end to just check the box, right? Because we are dealing with a more complex technology. There are different risks. And so I think you need to do that early. And I think you need to document it. I think you want to be able to show courts or regulators if ever asked that you are addressing your professional responsibility by supervising and monitoring the AI, right? So again, it's not disclosing the algorithm or being prepared to show all of the documents, but what did you do? What did you do to show that you were supervising this technology and this process, that you devised a workflow, that at various points you considered how to validate it, that you had documentation of your workflow, whether it's sampling? What did you do? Because I don't think anyone will expect perfection. But to be defensible, you have to be able to show your work on some level, right? Which is we thought about validation and we checked for it. We thought about hallucinations. We did something to guard against that. I think the bottom line is that courts or regulators aren't necessarily policing the technology, they're policing the lawyers, right? It's about user error. Are users doing the right thing when they are implementing GenAI, TAR, things like that? So I think the best thing you can do is to have a plan in place early on, and you may adjust it, you may adjust that strategy, but to have that early on and to document your decisions.
Anthony: Well, thanks guys. We're out of time, but thanks everybody for listening. We have got other podcasts coming up in our series where we can talk further and obviously validation will be a key part of when we start using AI tools in the future. So thanks all and hope to have people listen in for the next podcast.
Outro: Tech Law Talks is a Reed Smith production. Our producer is Shannon Ryan. For more information about Reed Smith's Emerging Technologies Practice, please email [email protected]. You can find our podcast on all streaming platforms, reedsmith.com and our social media accounts at Reed Smith LLP.
Disclaimer: This podcast is provided for educational purposes. It does not constitute legal advice and is not intended to establish an attorney-client relationship, nor is it intended to suggest or establish standards of care applicable to particular lawyers in any given situation. Prior results do not guarantee a similar outcome. Any views, opinions, or comments made by any external guest speaker are not to be attributed to Reed Smith LLP or its individual lawyers.
All rights reserved.
Transcript is auto-generated.