Authors: Anthony J. Diana Samantha M. Walsh Chris Baird
Anthony Diana and Samantha Walsh are joined by Lighthouse’s Chris Baird as part of our series on what legal teams need to know about Microsoft 365 AI-driven productivity tool, Copilot.
This episode presents an overview of the risks relating to Copilot’s access to and use of privileged and sensitive data and how businesses can mitigate these risks, including using Microsoft 365's access control tools and user training.
In particular, the episode provides in-depth information about Microsoft 365's sensitivity labels and how they can be used to refine a business’s approach to managing risk associated with privileged and sensitive data stored in Microsoft 365.
Transcript:
Intro: Hello, and welcome to Tech Law Talks, a podcast brought to you by Reed Smith's Emerging Technologies Group. In each episode of this podcast, we will discuss cutting edge issues on technology, data, and the law. We will provide practical observations on a wide variety of technology and data topics to give you quick and actionable tips to address the issues you are dealing with every day.
Anthony: Hello, this is Anthony Diana, a partner here in Reed Smith's Emerging Technologies group, and welcome to Tech Law Talks and our podcast series on AI for legal departments with a focus on managing legal and regulatory risks with Microsoft Copilot that Reed Smith is presenting with Lighthouse. With me today are Sam Walsh from Reed Smith's Emerging Technologies Group and Chris Baird from Lighthouse. Welcome, guys. Just to level set, Copilot is sort of the AI tool that Microsoft has launched relatively recently to improve productivity within the Microsoft environment. There are a number of risks that we went through in a previous podcast that you have to consider, particularly legal departments, when you're launching Copilot within your organization. And let me just start to level set with Chris, if you could give a little bit of a technical background on how Copilot works.
Chris: Absolutely, Anthony. So thanks Thanks for having me. So I guess a couple of key points, because as we go through this conversation, things are going to come up around how Copilot is used. And you touched on it there. The key objective is to increase, improve data quality, increase productivity. So we want really good data in, want to maximize the data that we've got at our disposal and make the most of that data, make it available to Copilot. But we want to do so in a way that we're not oversharing data. We're not getting bad legacy data in, you know, stale data. And we're not getting data from departments that maybe we shouldn't have pulled it in, right? So that's one of the key things. We all know what Copilot does. In terms of its architecture, so think about it. You're in your Canvas, whatever your favorite Canvas is. It's Microsoft Word, it's Teams, it's PowerPoint. You're going to ask Copilot to give you some information to help you with a task, right? And the first piece of the architecture is you're going to make that request. Copilot's going to send a request into your Microsoft 365 tenant. Where is your data? It's going to use APIs. It's going to hit the Graph API. There's a whole semantic layer around that. And it's going to say, hey, I've got this guy, Chris. He wants to get access to this data. He's asking me this question. Have you got his data? And the first thing, really, there's this important term Microsoft use. They call it grounding. When you make your request into Copilot, whatever you request, you're going to get data back that's grounded to you. So you're not going to get data back from an open AI model, from Bing AI. You're only going to get data that's available to you. The issue with that is if you've got access to data you didn't know you had, you know, through poor governance. Maybe somebody shared a link with you two years ago. That data is going to be available to you as well. But what's going to happen, a few clever things happen from an architecture perspective. The graph gives a response. It says, hey, I've got Chris's data. It looks like this. That's going to go into the large language model. That's going to make it look beautiful and pass you all that data back in a way you can understand it. There's a final check that Copilot does at that point. It goes back to the graph and it says, I've got this response. I need to give it to the user. user, are there any compliance actions I need to perform on this response before I give it? And I think that's what we're going to focus on a lot today, Anthony, right? But the important thing is thinking about that grounding. And the one message I want to give to people listening is really, you know, don't be immediately scared and worried of Copilot. It respects a lot of the controls that are in there already. The challenge is if you have poor access control and governance, there are things that you need to work on.
Anthony: Yeah. And I think that's one of the challenges. I think a lot of legal departments don't know what access controls and what controls that the IT department has put in place into M365. And I think that's one of the things that you have to understand, right? I think that's one of the things we'll be talking about today is the importance of that. out. So Sam, just talking about what we're our focus today, which is on the risks associated with privileged information, highly confidential information, sensitive information. So can you just give a just a brief description of what those risks are?
Samantha: Sure. So I think one of the risks Chris just alluded to that Copilot is going to have access to information that you have access to, whether you know it or not. And so if you have privileged information that is sort of protected by just being in a spot maybe where people don't know it's there, but it's not necessarily controlled in terms of access, that could be coming up when people are using Copilot. I think another thing is Copilot returning information to people, you lose a bit of context for the information. And when you're talking about privilege and other types of sensitivity, sometimes you need some clues to alert you to the privilege or to the sensitive nature of the information. And if you're just getting a document sort of from the ether, and you don't know, you know, where it came from, and who put it there, you know, you're obscuring that sort of sensitive nature of the document potentially.
Anthony: Yeah. And then I guess the fear there is that you don't realize that it's privileged or highly confidential and you start sharing it, which causes all kinds of issues. And I think just generally for everyone is the regulators. And I think both on the privacy side, where there's a lot of concern about where you're using AI against personal information or highly sensitive personal information, as well as the SEC, which is very focused on material, not public information and how you're using AI against it. I think one of the things that people are going to be asking, the regulators are going to be saying, what controls do you have in place to make sure that it's not being used inappropriately? So again, I think that sets the groundwork for why we think this is important and you start setting things up. So one of the first things you do, let's talk about how you can manage the risk. I think one of the things you can do, right, which is pretty simple, is training, right? Like the users have to know how to do it. So Sam, what should they be thinking about in terms of training for this?
Samantha: I think you can sort of train users both on the inputs and maybe on what they're doing with the outputs from Copilot. I think there are certainly ways to prompt Copilot that maybe would reduce the risk that you're going to get just this information flooding in from parts unknown. known. And I think having clear rules about vetting of co-pilot responses or limitations on sort of just indiscriminately sharing co-pilot responses, you know, these are all kinds of things that you can train users in to try to sort of mitigate some of the data risk.
Anthony: Yeah, no, absolutely. And I think we're also seeing people just so in doing this and launching it, having user agreements that sort of say the same thing, right? What are the key risks? The user agreement says, make sure you're aware of these risks, including the risks that we've been talking about with sensitive information and how to use it. Okay, so now let's switch to more sort of from a technical perspective, some things you can do within the M365 environment to sort of protect this highly confidential information or sensitive information. Information so let's start with Chris sort of this concept of which i know is in there when you have a SharePoint online site or a team site that has a SharePoint online site i think one of the one of the things you can do is basically exclude those sites from co-pilot so if you give us a little a brief description of what that means and then a little bit about the pros and cons.
Chris: Yeah of course Anthony so that that control by the way that's that's nothing new. So for anybody that's administered SharePoint, you've always had the ability to control whether a site appears in search results or not. So it is that control, right? It's excluding sites from being available via search and via Copilot. You would do that at the SharePoint site level. So, you know, Microsoft makes that available. There's a couple of other controls, maybe one I'll mention in a second as well. These are kind of, I don't want to call it knee-jerk reaction, I guess I just did, but it's what are the quick things you can do if you want to get access to Copilot quickly and you're worried about some really sensitive information. And it is a knee-jerk, right? It's a sledgehammer to crack a door. You're going to turn off entire access to that whole site. But in reality, that site may have some real gems of data in that you want it to make accessible to Copilot. And you're going to miss that. The other quick win that's similar to that one, there's a product called Double Key Encryption. A lot of the products I'm going to talk about today are part of the Microsoft Purview stack. And as part of MIP, which is Microsoft Information Protection, we're definitely going to cover that, Anthony, shortly about labels. One thing you can do with the label is you can apply something called Double Key Encryption. And you would use your own encryption key. And that means Microsoft cannot see your data. So if you know you've got pockets of data that are really secret, really sensitive, but you want to activate Copilot quickly, you've got these options. You can disable a site from being available at search level. That's option one. The other option is at a data level. You can label it all as secret. That data is not going to be accessible at all to Copilot. But like I say, these are kind of really quick things that you can do that don't really fix the problem in the long term. don't help you get the best out of Copilot. The reason you're investing in Copilot is to get access to good quality data and hiding that data is a problem.
Anthony: Yeah. And I think one of the things that, and Microsoft has basically said, even though it's available, they've been pretty open about saying, this is not the way you should be managing the risks that we're talking about here. Because you do lose some functionality in that SharePoint site if you take it out of search. So it's an option if you're rushing. And that's basically why they said, If you frankly aren't comfortable and you haven't have all the controls in place and you really have certain data that you want excluded, it's an option. But I think, as you said, it's a sort of a knee-jerk short-term option if you really have to launch, but it's not a long-term solution. So, now let's focus a little bit on what they think is the right way to do it, which is, and first let's talk about the site level. I think you talked a little bit about this, is putting in this concept of a sensitivity label on a site. Now, before you do that, which we could talk about, is first you have to identify the site. So, Chris, why don't you talk a little bit about that, and then let's talk a little bit about the technical.
Chris: No, absolutely. So a couple of terminology things. When I talk about data classification, I'm talking about something different to applying a label. When I often say to a lot of my clients, data classification, they think, oh, that's confidential, highly confidential secret. What I mean when I talk about data classification is what is the data in its business context? What does it mean to the organization? Let's understand who the data owner is, what the risk of that data is if it falls into the wrong hands. What are the obligations around processing and handling and storing that data? How do we lifecycle it? So simple things would be, really simple things would be social security numbers, names, addresses, right? We're identifying data types. We can then build that up. We can move on from those simple things and we can do some really clever things to identify documents by their overall type, their shape, their structure. We can use machine learning models to train, to look for specific documents, case files, legal files, customer files, client files, right? We can train these machine learning classifiers. But the great thing is if you get a good handle on your classification, you will be able to discover and understand your data risk across your enterprise. So you'll see there are tools within Microsoft 365 Purview, Content Explorer, data classification. These tools will give you insights into SharePoint sites that you have in your organization that have high amounts of social security numbers, high amounts of case files, legal affairs documents, right? It's going to come back and tell you, these are the sites that have this type of information. And you can do that analysis. You don't have to go out and say, guys, you've got to put your hand up and tell us if you've got a SharePoint site with this information. The administrators, the guys that are running Purview, they can do that discovery and reach out to the business and go and discuss that SharePoint site. But Anthony, what you're talking about there is once you've identified that SharePoint site, you know, if we know we've got a SharePoint site that contains specific case files that are highly confidential, we can apply a highly confidential label to that site. And the label does a number of things. It visually marks the file, right? And what I mean by that, at a file level from a metadata perspective, anybody interacting with that file electronically will receive a pop-up dialogue on a ribbon or a pop-up. It's going to be front and center to say this file is labeled as highly confidential. I've also got options, which I'm sure we've all done before in the day-to-day work. You can mark the document itself across. You can put a watermark across the document to say it's highly confidential. You can put headers and footers on. So the label isn't just this little concept, but it takes it a step further even more. And this is where it really, really works with Copilot is you can define custom permissions at a label level. So we can say for highly confidential labels, we might have a label for a particular case, a particular project. And if it is a case label, then we could give permissions to only the people involved in that case. So only those people can open that file and that means only those people can talk about that file to copilot you know if you're not in that case Anthony if you're not part of that case and me and Sam are and i use that label you're going to ask copilot to give you all the information it can about that case you're not going to get any information back because you don't have the permissions that's on that source file so that's that's one of the first things that we can do is we can take that label and apply it to a sharepoint site and that's going to apply a default label across all the documents that are in that site. What we're really talking about here, by the way, when we talk about labels, is we're trying to plug a hole in access control and governance. So think about SharePoint management and hygiene. The issue is SharePoint has just grown exponentially for many organizations. You know, there's organic growth, you've got SharePoint migrations, but then you have this explosion of use once you're on SharePoint online. There's going to be public sites. There's going to be SharePoint sites that are public, that are available to everybody in your organization. There'll be poor JML processes, join and move and leave processes, where people who move departments, their access isn't revoked from a SharePoint site. The issue with Copilot is if the site access control isn't strict, if it's open and the file doesn't have permissions on the file, Copilot is going to be able to see that file. If it's public, it's going to be able to see that file, right? So with the label, where that differs to the permissions is it puts the access controls on the files that are in that SharePoint site directly. So if you lift those files from that site, if it is a public site and I take those files, I put it in another SharePoint site or I put it on my laptop, it carries the access control with it. And that's what's really important. That means that wherever that file goes, it's going to be hidden from Copilot if I don't have that access. That's the important thing. So, you know, sensitivity labels are a huge part of ensuring compliance for co-pilot, probably the biggest first step organizations can take, And I think you touch on the first step quite nicely, Anthony. A lot of our clients say, well, we're scared of labeling everything in the organization, going out immediately, doing all that discovery, labeling everything, right? Maybe just knock off the top SharePoint sites, the ones that you know contain the most sensitive data. Start there. Start applying those labels there.
Anthony: Yeah, and Sam, we've talked with some clients about using their provisioning process or attestation process, process lifecycle management to start gathering this information because it's a big project, right? If you have thousands of sites, the concept of figuring out which ones have that. Obviously, Chris talked about, so the technical way you could do it, which would be fantastic because that obviously, but there are other ways of low-tech ways of doing this.
Samantha: Right. Just kind of relying on your human resources to maybe take a little bit more of a manual approach to speaking up about what kind of sensitive data they have and where they're putting it.
Anthony: Which they may be doing already, right? I think that's one of the things that you have to track is like they may, an organization, you know, a specific business line may know where their data is. They just haven't told, they haven't told IT to do something with it. So I think it's just getting that information, gathering it through, you know, whether it's the provisioning process, you could do an attestation or survey or whatever, just to start. And then as Chris said, once you have an idea of what the highly confidential information sites are, then you start doing the work. And again, I think it's applying the labels. One of the things that I think, just to emphasize, and I want to make sure people understand this, is in the sensitivity labels, it's not an all or nothing. At least what I've seen, Chris, is that for each sensitivity label, right, and you could have different types of highly confidential information. Maybe it's sensitive personal information, maybe a material non-public information. Whatever it is, privileged information, you can have different settings. So, for example, you can have it where the site is in essence like a read-only, right, where nobody can touch it, nobody can transfer the data, you can't copy it. That's the most extreme. But then you can have others where it's a little bit more permissive. And as you said, you can tailor it so it could be, you know, certain people have it, certain groups or security groups or whatever, how you want to play. But there is some flexibility there. And I think that's where the legal departments have to get, you know, really talk to the IT folks and really look and figure out what are the options for just not just applying the sensitivity label, but what restrictions do we want to have in place for this?
Chris: Anthony as well like you know you you're touching on the really important thing there and I'm going to go back to what Sam had talked about earlier with training as well about culture but I guess you know the the important thing is finding the balance right so with a sensitivity label you are able as an administrator as an IT administrator you can define the permissions for that label so like I say you could have a high level and by the way you can have sub labels as well so let's go with a common scheme that we see, public, internal, confidential, highly confidential. We've got four labels. Highly confidential could be a parent label. And when we click on that, we get a number of sub labels and we could have sub labels for cases. We could have sub labels for departments. And at an administrative level, each of those labels can carry its own predefined permission. So the administrator defines those permissions. And exactly as you say, Anthony, you know, one of the great things about it, it's not just who can access it, it's what can they do with it. Do not forward, block reply to all. You can block screen share screen copy all of those kind of things save and edit it can block all of those things where i say you need to find a balance is that's going to become onerous for the administrator if every time there's a case you're going back for a new label for each case and you're going to end up with thousands of labels right so what microsoft gives you is an option to allow the users to define the permissions themselves and this is where it really works well for copilot but before i talk about what that option is i want to go back to what Sam said and talking about the training. One of the important things for me is really fostering a culture of data protection across the organization, making people realize the risk around their data, having frequent training, make that training fun, make it interactive if you can. At Lighthouse, our training is, it's kind of a Netflix style. There's some great coffee shop things where it's fun. We get to watch these little clips. But if you make people want to protect their data, when they realize data is going to be available to co-pilot now, they'll be invested in it, right? They'll want to work with you. So then when you come to do the training, Sam, you need to say, right, we're not going to use the administrative defined labels. It's too much burden on the admin. We're going to publish this label for highly confidential that allows the users to define the permissions themselves. And that's going to pop up in Word. If you're in your favorite canvas, you're in Word, you click highly confidential, it's going to pop up and say, what permissions do you want to set on this file? If you haven't trained, if you haven't fostered that culture of information protection amongst the user community, people are going to hate it, right? People aren't going to like that. So it's so important to start to engage and discuss and train and coach and just develop that culture. But when it's developed, people love it. People want to define the permissions. They want to be prescriptive. They want to make sure that information cannot be copied and extracted and so on. And anything you do at that level, again, it protects that data from being read in by Copilot. That's bringing that back to the whole purpose of it.
Anthony: And I would just say, again, that this all goes about prioritization because people are like, I have 50,000 people in my organization. There's no way I'm going to train everybody. You don't. I mean, obviously some, but there's only certain people who should have access to certain of this information, right? So you may want to train your HR people because they have a lot of the personal sensitive information, the benefits folks or whatever, because you have to break it down because I think a lot of people get caught up into, I'm never going to have 50,000 people do this, but you don't. Everyone has different things that come across their desk based on the business process that you're working on. So again, it's just thinking logically about this and prioritizing because I think people think training and, oh my God, I'm relying on the user and this is going to be too much. I think to your point is if you do it in chunks and say, okay, here's a business line that we think is really high risk, just train them on that. And like you said, it's part of their job, right? HR is not going to have like compensation. They're not throwing that everywhere in the organization. They shouldn't be right. But if they do, they know they're sensitive about it. And now you're just giving a tool, right? We know you want to protect this. Here's the tool to do it. So again, I think this is really important. Before we end I know, Chris, I think you had one more thing that you want to add, which was on the monitor monitoring side, which I had not heard of, but could you just talk a little bit about that?
Chris: You know, this is sort of really key information that you can think of going up to your leaders in your organization to say, look, we've got a roadmap for co-pilot adoption. It's X many months or however long it's going to take, but now we can implement some quick wins that really give us visibility. So there's a product, there's two products. Many of the listeners will probably know the second product that I'm about to talk about, but the first one might be new. There's a product called Communication Compliance. It's part of the Microsoft E5 or E5 Compliance or IP and Governance Suite. It's in Purview. Technically speaking, it's a digital surveillance product that looks at communications through Teams and throughout Look and through Viva. But what Microsoft has introduced, and this is a stroke of genius, it really is, they've introduced co-pilot monitoring. So the prompt and the responses for co-pilot can now be monitored by communication compliance. And what that means is we can create simple policies that say, if personal information, client information, case information. Is passed through a prompt or a response in Copilot. Let us know about it. We can take it a step further. If we get the sensitivity labels in, we can use the sensitivity labels as the condition on the policy as well. So now if we start to see highly confidential information spilling over in a Copilot response, we can get an alert on that as well. And that I think is just for many of the listeners, it's a quick win. You can go, cause you're going to be your CIO or, or, you know, your VP is going to be saying, we need Copilot. We want to use Copilot. that your CISO and your IT guys are saying, slow down. You can go to the CISOs and say, we've got some controls, guys. It's okay. Now, the other tool, which a lot of the listeners will know about is eDiscovery Premium. What you can do with communication compliance once you're alerted is you can raise a case in eDiscovery Premium to say, go and investigate that particular alert. And what that means is we can use the eDiscovery tools to do a search, a collection. We can export and download. We can look at a forensic level. What information came back in the response? And if it was data spillage, if that data came from a repository that we thought was secure, specific to some case or legal information, and now it's in the hands of a public-facing team in the organization, you can use the tools. You can use eDiscovery through the Graph API to go and delete that data, that newly created data. So two real quick wins there to think about is deploying communication compliance with eDiscovery.
Anthony: That's fantastic. Well, thanks, everybody. This was really helpful. We're going to have additional podcasts. We'll probably talk about e-discovery and retention alike in our next one. But thank you, Chris and Sam. This was highly informative. And thanks to our listeners. Welcome back. We hope you keep listening to our podcast. Thanks.
Outro: Tech Law Talks is a Reed Smith production. Our producers are Ali McCardell and Shannon Ryan. For more information about Reed Smith's Emerging Technologies practice, please email techlawtalks@reedsmith.com. You can find our podcasts on Spotify, Apple Podcasts, Google Podcasts, reedsmith.com, and our social media accounts.
Disclaimer: This podcast is provided for educational purposes. It does not constitute legal advice and is not intended to establish an attorney-client relationship, nor is it intended to suggest or establish standards of care applicable to particular lawyers in any given situation.
All rights reserved.
Transcript is auto-generated.