/ 1 min read / Tech Law Talks

AI-enabled e-discovery: Beyond TAR - How GenAI is rewriting the rules of document review

Generative AI is transforming document review-but are you ready to use it defensibly? In this episode, Anthony Diana sits down with e-discovery veterans Therese Craparo and Marcin Krieger to explore what makes GenAI different from traditional TAR, how to get your prompts right, and the pitfalls to avoid. Whether you're facing complex litigation or a regulatory investigation, tune in to learn how to make AI work for your next review.

Transcript:

Anthony: Hello, this is Anthony and Diana from Reed Smith and welcome to Tech Law Talks. Today we are continuing a podcast series on AI-enabled e-discovery. This podcast series will focus on practical and legal issues when considering using AI-enabled e-discovery with a focus on actual use cases, not just theoretical. Today, joining me are Therese Caparo and Marcin Krieger. And today we're gonna be discussing the use of AI-enabled discovery for doc and review and QC. So just the general topic of doc and review. So let's just start with Therese. What is using AI? What does it look like to use doc and review?

Therese: Well, I think one of the challenges with answering this question is there's so many things that we do in document review that we could probably have several podcasts talking about all the different ways that you can use GenAI within your document review process generally. But what I'm going to focus on today is really just using GenAI for first level, second level QC reviews for data that you want to produce and what we're seeing as the opportunities for using gen AI in those processes at a high level. So I think that, you know, at a high level, there's probably three big bucket categories. One is just identifying categories of relevant documents. We know, we think, at the outset of a case, we have a complaint and we have, you know, requests for production and we know generally what may be relevant. And we're looking to put together, we're putting together our review protocols and what we think is going to be relevant. And a lot of times you find out later on that there are additional categories or things that you didn't think of and things like that, which is the natural progression in any legal matter and any document review. But I think one of the things that GenAI can do is to help us surface those categories of relevant documents sooner so that the outset of our document reviews, we're in a better position to say what may be relevant. What kind of categories out there, things we may not have thought of to help to structure our review for the next big category of where GenAI can be used, which is in first level review, right? Using GenAI to review, so to speak, tag pre-identified documents that are likely to be relevant, right? That can be, you know, for your productions and the like, and using that within those processes. And then I think, you know, the other third big category is around QC. Of your document review, right? Whether that is at second level to evaluate our documents being coded consistently, right? Are there gaps in there? Are there disagreements between the reviewers? Are there gaps in the production that we're sending out where we are missing categories of things, right? Are things being mis-categorized and things like that to make sure that the productions after our first level, whether that is human or computer, is complete and accurate and there's nothing that's being missed? So I think, you know, really at a super high level, those are three categories that we're seeing to be able to use GenAI again, specifically for what do I need to produce and making sure that those productions are complete and accurate.

Anthony: And Therese, when you're talking about first level review, right? I think my understanding is that like TAR and others, that they're going to score each document, like highly relevant, less relevant, whatever. It's going to be a score. And I guess one of the issues that people are going to have to think through is how much you, is it, when we say, when we're saying we're going to use it in first level review, does that mean we're not going to look at the documents? And if it's highly relevant, we're just going to produce it? Or do we think that it's going to be people are going to still review it, but they're going to see what the scoring is. Like, what is your view in terms of how this is going to probably play out?

Therese: Look, I think it depends on who you talk to. And I think it's also a matter of timing. We're at a pretty early stage and using GenAI tools for document review. There's definitely great use cases out there and people are using it. So we're at early stages. I think there are people who will tell you it is so good or will get so good that we will really be able to forego a first level review. So there are definitely people who believe that. I think most lawyers are going to be for now a lot more cautious than that. There will probably still be some level of review of the documents to make sure that things going out are not just relevant, but not privileged. We've talked about privilege before in these podcasts to make sure that it's accurate. You're going to still see some level of review I expect for a while to human review, to make sure that what is happening, that anything that goes out the door is accurately categorized and is not privileged and we're not missing anything. I think the ideal goal would be to get to a point where we eliminate a lot of human physically looking at each document and more human time is spent on the prompts and making sure the process is accurate and validation and things like that. And using those tools to make sure that the productions are accurate and correct and are relevant and not, you know including privileged documents and the like. But I think as a starting point, we are still going to be seeing some level of human review as people, you know, we don't quite trust the technology that much, but I do think the long-term goal, you know, should the technology get us there would be to, you know, eliminate, largely eliminate manual document by document review. And instead you're basically having more of a level of technology review to make sure that the substance is correct. But, we're not there yet.

Anthony: And so, Marcin why don't you explain a little bit of like, how would you actually do this? Like, how would you actually do the three stages that Therese said if you're using some type of AI tool, particularly GenAI tool?

Marcin: All right, so generative AI tools in document review work drastically differently than the types of technology that we're used to using, which is TAR. In generative AI, you have to imagine that you have a computer or a virtual room that has a person that looks at only one document, and you have to write to that one person a set of instructions that they can read your protocol and look at that one document and hopefully be 100 % accurate every time. What that means is there's no cross document intelligence. If you have a million documents in your document review population, imagine if you had to write a document that you will hand to a million people and all they can do is read that document once and then they look at one document and tell you, it responsive or not to that memo? Same thing for privilege, same thing for every issue code. So all of the work has to happen upfront. Now, that doesn't mean that you spend a month writing one crazy instruction and then you just run it and you walk away because that would be incredibly expensive and wasteful. The way that we set up a generative AI review is that you start by selecting a small group of documents that you're going to test against. Now, my best advice on this is you want to have about 30 documents that you absolutely know are relevant. You want to have about 30 documents that you absolutely know are not relevant. And then just to make it rounded out an easy 40 randomly selected documents from your population. What you're going to do is you're going to then draft your prompt instruction. Now, attorneys write and think very differently than generative AI does. Generative AI is very literal and you also have a limit to the amount of text that you can put in. So when we used to write document review protocols that were 20 pages long and continued, contained every single nuance in detail. You have to really distill that drafting process down to clear, near binary instructions. Here is a description of the case. Here is a description of what has been requested. This is what is relevant. And then it's always best to have a little bit of instruction on this is what is not relevant. And it's a lot easier to do it when you're just doing the relevant, not relevant. There are tools out there that let you do, for example, up to 15 different issue codes. Each issue code needs to have that same thought process applied. And I actually have a tip about how to use those issue codes for later. But you start with your 100 and you test it. You run the GenAI against this 100 document population. And what you are looking for is not only that the AI agrees with what you expect to have happen, but these GenAI tools give you rationales. They tell you why they think a document's relevant, why they think a document is not relevant. You need to be looking at the rationales that AI is generating to make sure it's not giving you a correct score, but it's rationales completely out of left field because that means you just got lucky. You want to make sure that not only is the document relevant, but it's the rationale is grounded in the prompt that you generated. If you have to make small adjustments, you do until you get to a point where you say, Hey, I feel pretty good here. My 30 documents that were relevant, it says they're relevant and I agree. The 30 that I self-selected that are not relevant are in fact being said not relevant by the gen AI and its rationale is accurate. And within that random sample of 40, I agree with everything it did. You go from 100 documents to a thousand documents. Now, do you take the time to validate all 1000 documents? Probably yes. Maybe at this point, it's not you and I, Anthony. At this point, maybe you're using one of your senior eDiscovery attorneys. And what they are going through is they're not just confirming whether or not the documents are aren't responsive, but they're identifying why they think that the prompt got it wrong. Cause remember the prompt controls everything here. So somebody might come back and say, Hey, this document is relevant to the gen AI got it wrong. That's because in your prompt, you forgot to mention that documents about blue Buffalo dog food is not the same as conversations about the wild Buffalo in the Plains, right? Whatever the case may be. You have to go back to the drawing board. You have to retool your prompt. And unfortunately, you got to run it against the whole thousand again. You do this a couple of times until you get to a point where you feel comfortable that your margin of error is small enough, and then you run it against the whole. That's how we kick off a GenAI document review. Now documents are going to be scored. Most of these GenAI tools, unlike TAR, which uses a scale of 100 down to zero, only have basically five scores. And those are usually very likely relevant, relevant, maybe relevant, not relevant, and often error or junk or some other flag. Like I couldn't look at the document. What you really want to do is make sure you have almost no documents that are in the maybe relevant. These are the most problematic and costly because documents that your AI says are not relevant. At this point, you should be able to set aside and just do some validation sampling documents that the AI says are highly relevant should be going to your senior attorneys. These are your hot documents. These are your key documents. These are not only likely fast track to your production or privilege queue, but also are going to your case team for things like chronologies. And you have your human review team looking at the documents that the AI said are relevant. And again, ideally you have very few documents that are the maybe relevant because that's where you get your cost bloat. And I think that if you stratify your review in that way, and then you employ traditional validation techniques, you can get to a defensible document review. But the important part is, especially in those early prompting iterations, you don't have any, or you have almost zero documents in that maybe responsive, that unsure world. You really got to get your prompts to a point where the AI can with certainty look at what you wrote and say, yes, this document is relevant or no, this document is not. And if you have too much of the, don't know early on, you're going to be saddled with a whole lot of review when you turn that on a million document population.

Anthony: And so, again, I've done some of this and it's very difficult to figure out how to revise the prompt. What happened, like, Marcin mentioned and obviously in a simple case, it's probably relatively straightforward to say, I can describe the case, whatever. But obviously a lot of cases are incredibly complex, right? Really complex cases, lots of different parties, lots of like, they're messy. Hard, really hard to know. What are all the issues and what is relevant when you're starting to say, okay, tell me everything you know about the case on day one. When we all know by day, you know, 30 or whatever, our knowledge is going to change. We're going to have, we're going to find a new fact. How, how do you deal with that with if you're starting with a prompt and it's all scored and now you just did you know, custodian interviews or whatever and you found out, oh, here's some new issues. How does that work with GenAI prompts and the like?

Marcin: Yeah, this is definitely an emerging area of workflow development because with gen AI prompting, you kind of have to consider if I have to retool my gen AI prompt, do I have an ethical obligation to rerun it on the whole? Or do I have an ethical obligation to just rerun it on the stuff that the AI already missed? Also, how do you get ahead of that? Right. Rather than thinking, well, how do I adapt if I've already done all this prompt engineering and then a week later I find a new facts? Maybe a different question is how do I use GenAI to find those facts before I go to that crazy process of writing these prompts? So the other things that GenAI tools create for you are what we call early case insights. You can leverage more basic prompts like, hey, so here's an actual thing that I did. So tools like Relativity Air let you upload a complaint and then it writes your prompt framework based off that complaint. But we already know that complaints themselves are biased because they are in favor of the person writing it. So you can also feed it things like the complaint and the answer and try to get it to write a prompt. So what I did recently is I took complaint, answer, in-house interview notes, and document requests. I put them into a generative AI tool called Harvey. And I asked Harvey to write for me a 2000 word prompt that summarized just a factual statement to the issues in the case. I then took our entire document population pre-review. I exported it by printing parent and attachments as a single document so that GenAI can see the whole document as one document. I put all of that into Harvey as its own mini database. And then I took that 2000 word summary and after I vetted it and made sure it was actually accurate and I put it into Harvey and I said, roll. You are a senior attorney for this client task. You've been asked to review these documents to look for information that supports or refutes these allegations. Please do it. And then write me a memo about it and cite for me some documents. It did it. It found me two, 300 out of 10,000 documents. I reviewed those. The knowledge from those documents helped inform the conversations that we had with the client. Discovered some new custodians. We discovered some new facts that we had no awareness of. We also found some great documents that we were able to take directly to opposing counsel and wave under their noses. But having that early knowledge helps us get ahead of surprises later because you don't want to have those kinds of surprises in a gen AI review. Traditional tar reviews are more tolerant of changing course halfway through tar reviews that we've been doing for 10 years. Those algorithms adapt. Could be two, three days, even two, three weeks into your review and you can steer the ship in a new direction. You can adjust your scoring. You can do sampling to fix earlier errors. But usually with these TAR scores or these TAR models, they're pushing what it thinks is irrelevant down. So if you discover a new fact that is relevant, you haven't missed your opportunity to bring those documents up. But with GenAI, you've already invested 40, 50, 60 hours of senior attorney time to write these prompts, be smart about it, use GenAI to first find these weird facts, and then inform yourself before you write your prompt structures for the primary document.

Anthony: That's fair. Right. So, Therese, what do we think? I mean, obviously, that's wildly complex in a lot of ways. Helpful, but it's going to take some time for people to learn this and figure out how to do it. What do we think sort of courts, regulators, plaintiffs and the like are going to be thinking about when you just heard what Marcin did? What are the challenges there?

Therese: I think that a lot of the challenges that we're going to see are going to be similar to what we saw with TAR. And yet also on the flip side, I think some of this might actually be easier given the climate that we are in today. I think when TAR started to become a thing that we were using in eDiscovery, there was a lot of the, what is this technology? Who's using it? Like what's going on? What is this machine learning thing? And I think that generated a lot of resistance, fear from people saying, how do I know that it works? Are we sure that we should be using this? Maybe we don't need it. Interestingly, I think that in the era that we're in today, GenAI is so prolific. Everybody's using it. Everybody's using it for everything. Right. People are using it online for, you know, to do searches, to ask therapy questions. I mean, people are using it so prolifically that there is also a general acceptance of it, that this is a thing we should use and everyone's going to be use it. And it has a great value in many different ways. And so I think on the one hand, and we're seeing this, honestly, we are seeing litigation teams who historically might've been worried about using TAR saying, how can I use GenAI in my review? How can we leverage this to do some great things on my case. And so I think on the one hand, there's more of an expectation that it will be used. There's more an acceptance of the use of generative AI. I think you're going to see a corresponding more general acceptance from even from opposing counsel in courts and regulators that are saying, of course, people are going to be using this. Maybe you'd be crazy not to use it. So I do think we're going to see a little bit less resistance, which may give us some traction to use it. On the flip side, some of the challenges we had with TAR that caused litigators to be reluctant to use it is the requests from opposing counsel and sometimes from regulators and the like to say, well, I want to see your seed set. I want to be involved in making determinate how you're making determinations about what the tool is going to identify as relevant. I want to know the details of your process. And then of course, there are arguments, had lots of arguments about what's privileged, what's not, to what degree should you have to disclose your use of TAR, to what degree should the other side have any say or review of what you're doing? I think we've got a lot of cases that helped in the sense of the idea that, it's the producing party's responsibility and it is on you and there are some privileged information in terms of the use of it and work product, things that may be protected by the work product protection and the like, but there still was a lot of disclosure particularly early on and negotiating and discovery on discovery about what you're doing. I think we're still going to see some of that surfacing. What are the prompts that you're using? Should the other side be able to have a say in your prompts? Should they be able to review your prompts to decide if they are relevant? I'd like to think that we are far enough down the road where really, if you're being questioned, you're just talking about the high level process and what the protections are to make sure that it's accurate and things like validation to prove that your review met the standards and things like that. I am certain that there will be some, you know, disputes over the level of involvement or disclosure relating to the use of Gen. AI. Again, it'll be interesting to see if the recognition that everybody's using it, everybody should use it maybe lowers the likelihood that we have fights about what prompts people are using or how they're doing it. I somehow think that that is at least going to be an issue for some period of time until we get past it. But I do think there's worries about that. There will be concerns. And look, the reality is in most cases, we don't want to spend time on discovery on discovery. We don't want to spend time, more time than we need to fighting over prompts and things. We just want to get to the review and produce the documents and get to the merits. But we all know that discovery disputes are inevitable. And I think that as we've already seen, when we're looking at these things, a lot of the times it's not the technology, it's the people who are using it. And we are going to see situations just like we've already seen publicly with people using GenAI to do briefs and then not reviewing them and submitting them to the court and getting in trouble because you submit fall, know, hallucinate briefs with hallucinations in them cases that don't exist. We are almost inevitably going to see a case that comes out where somebody used GenAI and perhaps either they did not use it properly because they weren't properly changed. They didn't have the proper experts who know how to use the tools. They weren't trained properly and using it. They assume that I put in a prop that it all must be right. And I produced the data right. That will happen. It will end up in sanctions and it will create fear in people and worry about using these tools and things like that. I think, you we all know it's sort of a little trite. I think by now we all know we have an ethical obligation to if we're using technology to understand it, to understand how it works, to educate ourselves so that we are using it properly. But, you know, people don't always do that. And I think that's going to be one of the big risks is people who don't properly educate themselves, do not get the proper experts involved in using things. As as Marcine said, these tools can be great, but it is complex. It requires a very specific skillset and it's not a skillset that we all already have. It is a skillset we all need to develop. It is not instinctive, right? It is not, you know, that easy to figure out. It is a skillset. You need to learn it. You need to make sure you're doing that correctly and that you take the time to learn how to do it correctly. And that's going to be our challenges. I think there's a big learning curve, much bigger learning curve here than for TAR. There will inevitably be discovery disputes. Someone will not educate themselves and do it poorly, and then it's going to raise questions about the use of it generally. I don't know that that's different than a lot of other situations. We've seen that happen with people using just electronic review tools and making mistakes and it becoming an issue, right? And having disciplinary hearings and the like. But I think it's something we all need to be aware of so that we are educating ourselves properly to be able to represent our clients properly and to leverage the technology properly. Cause it's also our ethical obligation to use technology where it will benefit our clients. And I think we need to all make sure that we're just doing that and preparing ourselves for what's to come.

Anthony: Thanks, Therese and Marcin. I think we're gonna see a lot of activity in the coming year, both in terms of court cases, best practice development, all that. I think we are on a schedule that it's probably next year we'll have a lot more clarity on how this is gonna work. And in the meantime, hope this was helpful. Thanks for listening in. Thanks for listening to Tech Law Talks. Be on the lookout for more podcasts. Talk to you later.

Outro: Tech Law Talks is a Reed Smith production. Our producer is Shannon Ryan. For more information about Reed Smith's Emerging Technologies Practice, please email [email protected]. You can find our podcast on all streaming platforms, reedsmith.com and our social media accounts at Reed Smith LLP.

Disclaimer: This podcast is provided for educational purposes. It does not constitute legal advice and is not intended to establish an attorney-client relationship, nor is it intended to suggest or establish standards of care applicable to particular lawyers in any given situation. Prior results do not guarantee a similar outcome. Any views, opinions, or comments made by any external guest speaker are not to be attributed to Reed Smith LLP or its individual lawyers. 

All rights reserved.

Transcript is auto-generated.

Related Insights