Everyone is talking to OpenAI’s ChatGPT.1 Meanwhile, OpenAI has been talking to developers of legal technologies.2 In March, Casetext announced a collaboration with OpenAI and launched CoCounsel, Casetext’s latest product offering.3 CoCounsel is built on GPT-4,4 OpenAI’s most advanced Generative Pre-trained Transformer to date and the first large language model to be credited with a passing score in a simulated bar exam.5 Casetext describes CoCounsel as ‘groundbreaking’ and ‘[t]he legal AI you’ve been waiting for’.6 It invites lawyers to ‘delegate substantive work to … [CoCounsel] and trust the results.’7 Should they?
Casetext is no rookie start-up. The US-based company, which has just entered into an agreement to be acquired by Thomson Reuters for $650m,8 was an early adopter of advanced machine learning technologies. Its product offerings include Casetext Research, now simply ‘Research’ (a legal research platform), Compose (a document generation system) and AllSearch (a document search tool launched in June 2022). Compose and AllSearch leverage the functionality of Casetext’s Parallel Search, a transformer-based technology that enables concept-based searching using natural language processing as opposed to keyword search. Yet, according to Casetext, CoCounsel represents a ‘quantum leap in AI for the law’.9 What, precisely, does the system claim to offer, how does it work and what are the implications of use of the technology?
The Typology of Legal Technologies
We draw on our experience in developing the Typology of Legal Technologies to answer these questions. The Typology is an online interactive tool developed by our team at COHUBICOL. It features thirty examples of different types of legal technologies and provides a systematic analysis of each. Each analysis is intended to tease out the potential implications of a technology taking account of its intended users, claimed functionality, features, rationale and benefits, and design choices. The ultimate aim of the Typology is to contribute to – and provide a method for securing – a better understanding of the impact of legal technologies on legal outcomes, legal practice and, above all, the nature and quality of the protection afforded by law.10
CoCounsel – claimed functionality, rationale and benefits
Casetext claims that CoCounsel will allow lawyers to save time, cut costs,11 mitigate risks,12 achieve better outcomes13 and improve access to justice.14 These claims about the rationale for and the benefits of CoCounsel are unremarkable – many providers of legal technologies claim these or similar benefits. What is remarkable is that Casetext present CoCounsel as ‘the first AI legal assistant’.15 ‘For the first time’, they say, ‘lawyers can delegate substantive work to AI and trust the results’.16 Notice, however, that Casetext’s Terms17 are much more cautious, making it clear that output should be evaluated, including by way of human review.18
According to Casetext, CoCounsel can
- Review documents – it will ‘read them in full, and answer with citations to sources’19
- Prepare for a deposition – using a description of the deponent and what’s at issue in the case, CoCounsel will ‘[identify] multiple highly relevant topics … and draft questions for [these]’20
- Search a database – CoCounsel will provide answers to questions about the content of the database
- Carry out legal research – CoCounsel will provide answers to research questions in the form of a memo, complete with supporting references
- Summarise documents including contracts or legal opinions
- Extract data from contracts – CoCounsel will answer questions about contracts and provide a ‘complete list of relevant clauses from contracts in a set’21
- Identify contractual clauses which do not conform to a specified policy, report on the risks of using non-compliant language and recommend revisions22
The claimed functionality is impressive but as readers familiar with ChatGPT will know, the key question is not whether the system can provide answers, summaries, or citations but whether, as Casetext claims, users can trust these results. We can begin to explore this issue as well as the broader implications of CoCounsel for law and legal outcomes by examining what is known about its design.
Claimed design choices
CoCounsel is built on GPT-4, one of a series of Generative Pre-Trained Transformers developed by OpenAI.23 However Casetext differentiates CoCounsel from ‘a generalized, publicly available AI’ in three respects.24 First, CoCounsel integrates Casetext’s Parallel Search which is also built using Transformer models25 and has access to Casetext’s own legal databases. According to Casetext,
[t]his integration enables CoCounsel to draw upon reliable data to produce answers with exceptional accuracy and speed, supported by linked citations, making it easy for lawyers to verify CoCounsel’s output.26
Second, it is well-known that large language models tend to ‘hallucinate’ or make up facts.27 Casetext claims that:
CoCounsel does not … “hallucinate,” because we’ve implemented controls to limit CoCounsel to answering from known, reliable data sources … or not to answer at all.28
The ‘controls’ appear to consist in the integration of Parallel Search and Casetext’s legal databases with GPT-4, prompt engineering and continual review.29
Finally, Casetext claims that CoCounsel is configured so as to ‘keep […] lawyers’ and their clients’ data private and secure.’30 According to Casetext,
CoCounsel only accesses OpenAI’s model through dedicated servers – a “private entrance” –and through an interface between our technologies (an API) that never stores any content our customers upload to or enter into CoCounsel. This means none of the information used in CoCounsel is sent back to “train” OpenAI’s model.31
How, if at all, can we verify these claims? What issues – technical and legal – are potentially associated with the system in its context of use?
Substantiation and Potential Issues
Substantiation
Some information is available about GPT-4 and Parallel Search, the building blocks of CoCounsel.32 However, Casetext has not published details of the architecture of CoCounsel, nor about how, exactly, Parallel Search and GPT-4 are integrated into the system. In principle, large language models such as GPT-4 can be used in combination with other models and tools.33 They can be combined with information retrieval components and the ability to access external databases.34
Similarly, Casetext does not supply details about the technical means by which CoCounsel provides answers with linked citations. However, OpenAI suggests an approach to the use of large language models for question answering where the model answer must be supported by references.35 DeepMind proposes a system where the output of a large language model is accompanied by citations to external sources.36 The aim (as with CoCounsel) is to allow for independent verification of the model output.
Is it possible that Casetext has eliminated the possibility of hallucinations? Existing research suggests that developers can employ a range of methods to reduce hallucinations notably by constraining the system such that its answers are grounded in factual content.37 It is unclear whether these methods are wholly effective. It is doubtful that prompt engineering alone can eliminate hallucinations;38 sources of content can contain factual and other inaccuracies;39 even if the source can be trusted the system might still produce output that is inconsistent with the source.40
Casetext maintains that they access OpenAI’s servers through a ‘zero-retention API’.41 Without information about Casetext’s contract with OpenAI it is impossible to know what arrangements are in place about data sharing between the two. Ordinarily, OpenAI retain data submitted through the API ‘for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law).’42
However, OpenAI state that:
For trusted customers with sensitive applications, zero data retention may be available. With zero data retention, request and response bodies are not persisted to any logging mechanism and exist only in memory in order to serve the request.43
Their standard API data usage policies also confirm that
OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose.44
Potential technical issues
CoCounsel’s main claim is that users can trust it. What technical issues might potentially impact the trustworthiness of the system or its outputs?
Performance outside the scope of the data used to fine-tune the system
One of Casetext’s key claims is that they do not collect customer data,45 including prompts and responses generated by CoCounsel, and do not share that data with OpenAI. It is important for lawyers to be able to rely on these claims but this design choice imposes certain limitations. The GPT-4 model must be fine-tuned on prompt/response pairs in order to provide appropriate responses to very specific queries of the CoCounsel customers. Since Casetext claims to not be using customer data, they must test the system on prompts they expect customers to provide or data collected from customers earlier. They claim to have fine-tuned the system on 30,000 questions. Given the large amount it may cover quite a lot of issues, it is however unclear how well such a system would perform when having to deal with requests outside the scope of the training. If customers input prompts that are unclear, poorly framed or lack context, the system may also generate suboptimal responses.
Linked citations may be of limited utility
CoCounsel claims to provide links within generated text, so that the client may verify the output. However, needing to verify all the information might not save lawyers much time even if all the links to the sources are provided. Moreover, this approach doesn’t allow one to check whether any crucial information is missing from the output.
Hallucinations
Casetext claims to have implemented controls so that CoCounsel cannot hallucinate. We are sceptical about this claim though we accept that the controls Casetext describes may reduce hallucinations by not relying on false information from the internet, however since the response is automatically generated, it may still produce incorrect responses based on truthful information since it still generates new text rather than simply copy from the source. Hallucinations would impact on system performance and likely increase the time required for verification.
Evaluating and maintaining system performance
To our knowledge, Casetext does not publicly disclose how they evaluate performance of their system. One must also take into account that given the fast pace of the changes of the legal landscape, the system needs to be regularly updated or even retrained (at least the fine-tuning part) to be able to provide up-to-date information to the customer. This requires a lot of resources, and the system might not perform at the same level with each iteration. Given that GPT-4 model is not being retrained on new data, system performance may also be affected.
Potential legal impact
Casetext claims the system is very accurate. Ironically, the better the system performs, the greater the risk of over-reliance on the system;46 lawyers may be tempted to rely on the outputs without verifying the contents. Such over-reliance might impact on the nature and quality of legal advice, engender a passive and unproductive approach in aspects of lawyering and amount to an abrogation of lawyers’ professional obligations. Should lawyers fully ‘delegate to’ CoCounsel? Of course not – indeed this framing, and Casetext’s description of CoCounsel as an ‘AI legal assistant’ plays into anthropomorphic tropes. Should lawyers ‘trust’ CoCounsel’s outputs? That depends on what one means by ‘trust’. However well the system performs, however useful it may be, lawyers should recall that at its heart is a model that is trained to ‘predict the next token in a document’.47 It has no conception of the aims of law, no awareness of the context of its use; it can neither engage in legal reasoning nor offer legal advice. In Casetext’s own words, directed to lawyers:
You and your end users are responsible for all decisions made, advice given, actions taken, and failures to take action based on your use of AI Services.48
-
‘ChatGPT: New AI Chatbot Has Everyone Talking to It’ BBC News (7 December 2022) https://www.bbc.com/news/technology-63861322 accessed 22 April 2023. ↩
-
OpenAI’s Startup Fund invested $5 million in Harvey in November last year. Kyle Wiggers, ‘Harvey, Which Uses AI to Answer Legal Questions, Lands Cash from OpenAI’ (TechCrunch, 23 November 2022) https://techcrunch.com/2022/11/23/harvey-which-uses-ai-to-answer-legal-questions-lands-cash-from-openai/ accessed 22 April 2023. ↩
-
‘CoCounsel - The First AI Legal Assistant, Made for Lawyers’ (2 February 2023) https://casetext.com/cocounsel/ accessed 21 April 2023. ↩
-
Casetext, ‘Casetext’s CoCounsel, the First AI Legal Assistant, Is Powered by OpenAI’s GPT-4, the First Large Language Model to Pass Bar Exam’ https://www.prnewswire.com/news-releases/casetexts-cocounsel-the-first-ai-legal-assistant-is-powered-by-openais-gpt-4-the-first-large-language-model-to-pass-bar-exam-301771962.html accessed 16 May 2023. ↩
-
OpenAI, ‘GPT-4 Technical Report’ (arXiv, 27 March 2023) http://arxiv.org/abs/2303.08774 accessed 17 April 2023; Daniel Martin Katz and others, ‘GPT-4 Passes the Bar Exam’ (15 March 2023) https://papers.ssrn.com/abstract=4389233 accessed 17 April 2023. For a re-evaluation of GPT-4’s performance in the simulated bar exam see Eric Martínez, ‘Re-Evaluating GPT-4’s Bar Exam Performance’ (8 May 2023) https://papers.ssrn.com/abstract=4441311 accessed 19 May 2023. ↩
-
‘CoCounsel - The First AI Legal Assistant, Made for Lawyers’ (n 3). ↩
-
ibid. ↩
-
‘Casetext to Join Thomson Reuters, Ushering in a New Era of Legal Technology Innovation - Casetext’ (27 June 2023) https://casetext.com/blog/casetext-to-join-thomson-reuters-ushering-in-a-new-era-of-legal-technology-innovation/ accessed 3 July 2023. Assuming the deal closes, it will have significant implications for the legal search market. Thomson Reuters already owns Westlaw and Practical Law Company. ↩
-
‘CoCounsel - The First AI Legal Assistant, Made for Lawyers’ (n 3). ↩
-
For further information about the Typology see ‘FAQs & Methodology’ (COHUBICOL publications, 11 April 2023) https://publications.cohubicol.com/typology/faqs-and-methodology/ accessed 18 May 2023. ↩
-
‘CoCounsel - Legal AI for In-House Legal Teams’ (16 February 2023) https://casetext.com/in-house-counsel/ accessed 16 May 2023. ↩
-
‘CoCounsel - Tackle Transactional Legal Work with Ease’ (20 February 2023) https://casetext.com/transactional-law/ accessed 16 May 2023. ↩
-
‘CoCounsel - Your Secret Weapon for Stress-Free Litigation’ (16 February 2023) https://casetext.com/litigators/ accessed 16 May 2023. ↩
-
Casetext (n 4). ↩
-
ibid. ↩
-
‘CoCounsel - The First AI Legal Assistant, Made for Lawyers’ (n 3) (emphasis added). ↩
-
‘Terms - Casetext’ (25 February 2023) https://casetext.com/terms/ accessed 20 April 2023. ↩
-
Casetext’s Terms state that ‘You and your end users are responsible for all decisions made, advice given, actions taken, and failures to take action based on your use of AI Services. AI Services use machine learning models that generate predictions based on patterns in data. Output generated by a machine learning model is probabilistic and should be evaluated for accuracy as appropriate for your use case, including by employing human review of such output.’ ↩
-
‘CoCounsel - The First AI Legal Assistant, Made for Lawyers’ (n 3). ↩
-
ibid. ↩
-
ibid. ↩
-
ibid. ↩
-
OpenAI (n 5). ↩
-
‘CoCounsel, Our New AI Legal Assistant Powered by OpenAI, Is Here—and It Will Change the Practice of Law - Casetext’ (1 March 2023) https://casetext.com/blog/casetext-announces-cocounsel-ai-legal-assistant/ accessed 22 April 2023. ↩
-
‘The Machine Learning Technology Behind Parallel Search - Casetext’ (7 December 2021) https://web.archive.org/web/20211207103405/https:/casetext.com/blog/machine-learning-behind-parallel-search/ accessed 24 April 2023. ↩
-
‘CoCounsel, Our New AI Legal Assistant Powered by OpenAI, Is Here—and It Will Change the Practice of Law - Casetext’ (1 March 2023) https://casetext.com/blog/casetext-announces-cocounsel-ai-legal-assistant/ accessed 22 April 2023. ↩
-
OpenAI (n 5). ↩
-
‘CoCounsel Harnesses GPT-4’s Power to Deliver Results That Legal Professionals Can Rely on - Casetext’ (5 May 2023) https://casetext.com/blog/cocounsel-harnesses-gpt-4s-power-to-deliver-results-that-legal-professionals-can-rely-on/ accessed 16 May 2023. ↩
-
‘AI and Machine Learning Experts, Experienced Attorneys, Thousands of Hours of Prompt Engineering—and That’s Just to Launch - Casetext’ (12 May 2023) https://casetext.com/blog/building-an-ai-legal-assistant-lawyers-can-trust/ accessed 16 May 2023. ↩
-
‘CoCounsel Harnesses GPT-4’s Power to Deliver Results That Legal Professionals Can Rely on - Casetext’ (n 28). ↩
-
‘CoCounsel, Our New AI Legal Assistant Powered by OpenAI, Is Here—and It Will Change the Practice of Law - Casetext’ (n 26). ↩
-
See for example, OpenAI (n 5); ‘The Machine Learning Technology Behind Parallel Search - Casetext’ (n 25). ↩
-
Xavier Daull and others, ‘Complex QA and Language Models Hybrid Architectures, Survey’ (arXiv, 7 April 2023) http://arxiv.org/abs/2302.09051 accessed 20 May 2023. ↩
-
Patrick Lewis and others, ‘Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks’ (arXiv, 12 April 2021) http://arxiv.org/abs/2005.11401 accessed 21 May 2023; Sebastian Borgeaud and others, ‘Improving Language Models by Retrieving from Trillions of Tokens’ (arXiv, 7 February 2022) http://arxiv.org/abs/2112.04426 accessed 20 May 2023; Gautier Izacard and others, ‘Atlas: Few-Shot Learning with Retrieval Augmented Language Models’ (arXiv, 16 November 2022) http://arxiv.org/abs/2208.03299 accessed 20 May 2023. ↩
-
Reiichiro Nakano and others, ‘WebGPT: Browser-Assisted Question-Answering with Human Feedback’ (arXiv, 1 June 2022) http://arxiv.org/abs/2112.09332 accessed 20 May 2023. ↩
-
Jacob Menick and others, ‘Teaching Language Models to Support Answers with Verified Quotes’ (arXiv, 21 March 2022) http://arxiv.org/abs/2203.11147 accessed 20 May 2023. ↩
-
Lewis and others (n 33); Baolin Peng and others, ‘Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback’ (arXiv, 8 March 2023) http://arxiv.org/abs/2302.12813 accessed 21 May 2023. ↩
-
Richard MacManus, ‘Stopping AI Hallucinations for Enterprise Is Key for Vectara’ (The New Stack, 23 March 2023) https://thenewstack.io/stopping-ai-hallucinations-for-enterprise-is-key-for-vectara/ accessed 21 May 2023. ↩
-
Lewis and others (n 34). ↩
-
Retrieval Augmented Language Modeling (Directed by Melissa Dell, 2023) https://www.youtube.com/watch?v=XC4eFiIMOmY accessed 27 June 2023. ↩
-
‘You Can Get the Benefits of AI without Putting Your Information at Risk, If You Know What to Look for in a Solution. - Casetext’ (19 May 2023) https://casetext.com/blog/how-to-use-ai-and-keep-law-firm-and-client-data-safe/ accessed 21 May 2023. ↩
-
‘API Data Usage Policies’ https://openai.com/policies/api-data-usage-policies accessed 20 May 2023. ↩
-
‘Models - OpenAI API’ https://platform.openai.com/docs/models/how-we-use-your-data accessed 12 June 2023. ↩
-
‘API Data Usage Policies’ (n 42). ↩
-
‘CoCounsel, Our New AI Legal Assistant Powered by OpenAI, Is Here—and It Will Change the Practice of Law - Casetext’ (n 26). ↩
-
OpenAI (n 5) See also Laurence Diver and Pauline McBride, ‘Argument by Numbers: The Normative Impact of Statistical Legal Tech’ (2022) Communitas https://doi.org/10.31235/osf.io/ts259. ↩
-
OpenAI (n 5). ↩
-
‘Terms - Casetext’ (n 17) (emphasis added). ↩
Discussion