Data Security and Generative AI: What Lawyers Need to Know

Generative AI systems open up new data security risks that we haven’t seen before. The top concern is data leakage. Data leakage occurs when models take user inputs and add them to the training dataset, surfacing them as answers to another user’s query.
Written by
Jamie Fonarev
Published on
Dec 6, 2023

The integration of generative Artificial Intelligence (AI) into the legal domain heralds a new era of innovation, empowering practitioners with cutting-edge tools. This remarkable advancement, however, also presents significant data security challenges. Legal professionals considering the adoption of these advanced technologies must thoroughly scrutinize how these tools interact with their data. Through a conscientious and meticulous approach, both the generative AI industry and its users can unlock immense value, greatly enhancing work efficiency while upholding the stringent integrity and standards inherent to the legal profession.

Nevertheless, the complete range of risks associated with generative AI remains somewhat elusive, placing a proactive burden on users to thoroughly evaluate the security and ethical aspects of their AI interactions. Responsibility, however, is not solely the users' domain; it is equally imperative for providers servicing the legal industry to invest significant effort in constructing robust frameworks around data security and privacy of their platforms.

It is imperative that the core principles of integrity and confidentiality, synonymous with the legal profession, are upheld by both AI providers and users. In this discourse, we aim to elaborate on the principal data privacy and security risks intrinsic to generative AI, discuss the responsibilities of providers in mitigating these risks, and propose critical questions that users should pose to their platform providers, ensuring a secure and risk-mitigated experience.

Understanding the Risks of Generative AI in Law - Data Privacy and Security

Recognizing and comprehending potential risks is a crucial measure in protecting law firms from the data security challenges presented by generative AI. This article specifically focuses on the risks linked to data security and privacy. For a more comprehensive exploration of other types of risks, such as AI hallucinations (the creation of fictional information or drawing incorrect conclusions), we suggest perusing our detailed article on the matter.

In this article, we break down the top three data privacy risks associated with generative AI, with the perspective of new, changed, and old risks. 

Novel Privacy Risks

Generative AI systems open up new data security risks that we haven’t seen before. The top concern is data leakage. Data leakage occurs when models take user inputs and add them to the training dataset, surfacing them as answers to another user’s query. For example, as you’re working on a particular matter, you upload your case strategy to your gen AI solution, iterating on your approach. If the model is subsequently trained on your data, your case strategy can become part of the model’s core database. If another lawyer asks for a strategy on a similar matter, the model is now able to share your strategy as part of its answer, and if your strategy contained confidential client information, that data would be exposed. 

This risk arises when generative AI platforms don’t treat user input data as private - some providers retain user data for model retraining purposes (which could make the model better in general), thus exposing confidential data to other users. 

The risk that data imputed into models could be inadvertently used in future outputs is very serious for the legal profession. Improper handling of data by lawyers may pose a risk of exposing confidential client information.

How do we minimize the risk of Data Leakage? 

Minimizing the risk of data leakage is imperative. 

Providers should enforce strict rules for their data management systems, ensuring that customer data is never used to train foundational models. If building off central models, providers should be using zero-retention APIs. They should be implementing zero-retention policies to ensure that data is not stored beyond its immediate purpose. When leveraging larger Large Language Models (LLMs) like OpenAI, Anthropic, or others, providers should consistently use zero-retention APIs to prevent data from being input into the LLM system, where it could be used for model retraining or be vulnerable in a breach.

Users of generative AI should ask thorough questions of the tools they are using or evaluating. Some suggested questions: 

  • Does the provider use uploaded data train central models?
  • What are the company's data access levels, retention policies, and usage terms?
  • Who owns the data that is used throughout the systems? 

Where possible, users should opt-out of data sharing for model training. Making sure that providers have thoughtful policies for their model training and data retention helps identify solutions that will create the safest infrastructure for usage. 

Traditional Risks in a New Light

Generative AI comes with some other data privacy risks, similar to the risks of similar, but older technologies. The main data privacy risk that worries legal technology providers is the risk of a data breach.

Despite utilizing state-of-the-art technology, there is risk of confidential information being compromised through malicious breaches. Since users upload sensitive and confidential information to be parsed through and analyzed, generative AI solutions hold coveted information. Bad actors can aim to penetrate into the data repositories and steal valuable data, holding it ransom, or distributing it to the market. 

Two crucial factors expose platforms to the risk of data breaches. Firstly, inadequate encryption and failure to adhere to data retention and international transfer protocols significantly increase the vulnerability. Secondly, prolonged data retention by providers amplifies the risk of breaches or attacks, especially when dealing with vast quantities of confidential information. It is imperative to address these concerns to safeguard against potential threats.

Providers should take steps to maintain the highest standards for data encryption. Providers should employ best-in-class encryption, such as TLS 1.2+, Establishing robust encryption and access protocols is critical in safeguarding client data from unauthorized access during storage and transfer. Where possible, data should be isolated within any generative AI system to the maximum extent possible. Strict isolation of client data ensures that it is not shared unnecessarily within or outside the system.

Providers should also aim to retain the minimal amount of data, thereby reducing the breach risk. They must proactively disclose their data practices to clients and implementing clear data retention policies that embrace the principle of data minimization and ensure prompt deletion of data when no longer required is crucial.

Users of generative AI should ask through questions of the tools they are using or evaluating. Some suggested questions: 

  • What policies and procedures exist to address breaches or vulnerabilities? Do you have an incident response plan in place? 
  • What encryption, access controls, auditing, penetration testing is implemented? How frequently is this reevaluated? 
  • What policies govern client data use, privacy, retention, and deletion?
  • What provisions exist for data isolation, retention limitations, and access control?

As a must, users should carefully review security infrastructure and data policies when evaluating generative AI solutions, they should match all standards of the existing tech stack at the firm's disposal.

Old risks, not to be forgotten 

With new tech come new risks. But the old risks should not be forgotten. As with all technology, data access and compliance should be carefully considered. 

Implementing stringent access controls and strictly adhering to data retention limits are paramount in preventing unauthorized data manipulation. It is imperative for firms to ensure that AI providers comply with legal and industry data security standards, such as SOC II Type 2 compliance, to effectively mitigate potential legal ramifications. In addition, users should consider other certifications and compliances, including HIPAA, GDPR, and more, to bolster data protection in specific practice areas.

Providers must prioritize regular, independent security audits as a critical measure to ensure ongoing compliance. These audits enable the swift implementation of emerging security features and practices, fostering a steadfast commitment to maintaining the highest standards. A prime example is the annual third-party audit of SOC II Type 2 compliance, verifying companies' adherence to the rigorous requirements for certification renewal.

Users should ask detailed questions of providers to make sure that they match the rigor of the legal industry’s security. Some suggested questions: 

  • Have they attained SOC 2, ISO 27001, or other security certifications? (Depending on your practice areas, are there any other certifications that are crucial for your work? E.g., HIPAA, GDPR, etc.)
  • Are security audits conducted regularly? What is the frequency of audits? 

While this list isn't exhaustive, the highlighted risks hold significant importance in the legal generative AI industry. Safeguarding the information handled by AI systems is of utmost importance, with potential serious implications if mishandled.

Final Thoughts

As generative AI increasingly becomes an integral and beneficial tool in legal work, prioritizing data security is imperative. Both providers and users of these technologies bear a shared responsibility to ensure that confidentiality is preserved.

For providers, this means a heavy investment in security infrastructure, achieving essential certifications, conducting regular audits, and maintaining transparent data management policies. Following best practices in encryption, access controls, and data isolation is also critical.

For legal professionals, due diligence is necessary in evaluating providers. Look for security certifications relevant to your practice area. Scrutinize data retention periods, usage terms, and model training policies. Opt out of data sharing if possible. Ask direct questions about encryption standards and access controls.

By engaging in diligent risk mitigation, maintaining transparent practices, and continually evaluating these measures, legal professionals can confidently integrate AI into their practices while upholding the highest standards of client data protection.

Monthly newsletter
No spam. Just the latest releases and tips, interesting articles, and exclusive interviews in your inbox every month.
Read about our privacy policy.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.