Figure 1: Image generated by Dall-E 3

Figure 1: Image generated by Dall-E 3

This article was prepared by the ASTC AI Task Force

This is Part 1 of the two-part series on the application of AI and AI Chatbots to the field of trial consulting. A key to understanding the potential (and drawbacks) of AI Chatbots in the field of trial consulting is understanding what AI and AI chatbots are, how they are constructed/trained, and their limitations. Part 1 addresses these issues and Part 2 focuses on the uses (and potential uses) of AI Chatbots in trial consulting.

It has been a little more than a year since ChatGPT stormed onto the scene. Since then, dozens of other publicly available AI chatbots have been released, including Copilot (Microsoft’s AI chatbot powered by ChatGPT-4), Gemini (Google’s AI chatbot), and Anthropic’s Claude.  Much has been made of how AI, and AI chatbots in particular, may change our lives, improve human performance, and perhaps even replace humans on a variety of tasks, which could range from nonphysical repetitive and routine tasks like tagging images and proofing texts to advanced tasks such as creating graphics, writing poetry and books, providing psychotherapy, writing contracts and other standard legal documents, and legal analysis (e.g., Lexis AI). A recent study in the United Kingdom suggested a number of occupations are likely to be threatened by AI and AI chatbots, including management professionals, business analysts, financial managers, accountants, and psychologists. (Interestingly, there was a strong relationship between high involvement with AI and potential replacement by AI.)

Although heralded and hyped as the next best great tool, the use of AI chatbots has not been without problems. For example, lawyers were sanctioned recently for filing a ChatGPT-generated brief that included fabricated citations. In another case, a sanctions motion is pending connected to fabricated case citations generated by Google’s Bard (predecessor to Gemini) and included in a brief submitted by lawyers for Michael Cohen, former “fixer” to former president Donald Trump. Recently, a professor and a movie artist collaborated on a test of AI chatbots that produce graphics in response to natural language prompts and showed that Midjourney and DALL-E 3 produced images that plagiarized copyrights and IP-protected movie images.

In Part 1 of this article, we will give an overview of AI chatbots, discuss their relationship to the larger universe of AI products, and point out some of their limitations.  In Part 2, appearing in the next issue of TJE, we will explore what we see as their uses in trial consulting, currently and in the future, and discuss ethical and security issues as well as the role that ASTC might play in overseeing the use of AI tools in our profession.

What are AI Chatbots?

AI chatbots are interactive programs trained to generate novel responses to a wide range of human requests submitted in written natural language through a chat interface. The four most widely used AI chatbots today are general purpose chatbots ChatGPT, Copilot, Gemini, and Claude. There are also a host of more specialized publicly available chatbots that help with tasks like writing social media posts, marketing, and studying.

Two of these–Copilot and Gemini–were explicitly built to be AI tools for Internet searching. The other two general purpose tools we list, Claude and ChatGPT (if ChatGPT’s use of Copilot is not engaged), are closed systems that generate responses to questions based on the fixed universe of textual data on which they were each trained—with the potential to add your own textual data to customized versions that you can create if you have a subscription or license the technology to develop your own program. Being disconnected from the Internet makes these chatbots more secure in certain ways, but it also limits their knowledge of current events and means that the programs have to be continually updated.

Underneath the hood, so to speak, of these chatbots are what computer scientists call large language models (LLMs). Large language models are programs designed to scan and review massive amounts of written text and, using predictive statistical modeling, to identify patterns in the text that allow the chatbot to simulate language.  Based on its ability to “understand” language, a chatbot can then generate novel responses to queries submitted to it in the same language in which it has been trained, within the data sources to which that chatbot has current access.

A chatbot is, in essence, a giant language pattern recognition machine: first, in its training phase, the chatbot teaches itself the patterns of, say, English; and then, finished with its training and released to be used, the chatbot leverages its ability to understand and generate natural language to locate secondary patterns within a universe of texts written in English.

LLMs rely on an underlying computer architecture called neural networks that mimic the complex, interconnected, parallel and complexly intersecting networks of neurons that exist in the human brain. This neural network design is what allows the LLM programs to teach themselves by “reading” vast troves of texts. They eat a lot of data, they predict what comes next based on what they have eaten, and they learn from their mistakes.

How Are AI Chatbots Trained?

Unlike human infants who need exposure to only a small amount of language at the right time in infancy to learn a language, LLMs need millions of texts to “learn” to speak because they use statistical methods and brute computing force to simulate natural language. They also need correction and input by humans to help them learn; for example, what is acceptable to say and what is not.

The companies that have created these chatbots have not revealed the specific universe of texts they have used to train their models, considering that proprietary information, but we do know that they have used publicly available textual datasets and may have also made use of privately owned digital libraries and data.

Just as is true with humans, the sophistication, scope, and accuracy of a chatbot’s answers to your queries depends, inevitably, on the universe of material on which it has been trained. It is unclear to what extent ChatGPT and other general purpose chatbots have been trained on scholarly literature, because most of which exists either in undigitized form or behind paywalls. Because companies will not reveal their specific text sources (and they may not know fully – they may have just set their programs to scour the Internet for everything), we can’t know how “well educated” a particular chatbot might be in advance–we can know only by the quality of its answer. This is vital to understand if you want to make effective use of AI chatbot technology for jury research and trial consulting. Using an AI chatbot is much like hiring a new research assistant–you have to find out how good it is, and what it can really do, by asking it to do work and seeing what it produces.

Understanding AI Chatbots in the World of AI

In this article, we focus on the uses and potential uses of the publicly available AI chatbot tools like Chat GPT and Copilot.

There are generally three ways that these programs can be accessed:

  • Via free versions on the Web, which are designed for general research and information purposes and may contain usage restrictions.
  • Via an individual subscription, which gives you access to more features, which may include the ability to create customized chatbots that are specialized in your areas of interest
  • Via an enterprise version, designed for use in a large organization, that allows you to create and deploy a customized version for use within your organization.

AI chatbots are only one of many AI tools available; there are many other AI tools in use, some of which have been in use for many years, but they are not designed for interactive use with the general public and are used for specialized tasks, such as reviewing digitized documents in discovery; facial recognition software; software designed to read medical imaging; and software used to run autonomous vehicles. AI chatbots suddenly grabbed so much attention because ChatGPT was the first generative language chatbot released to the general public that fulfilled some of the promise of creating a computer that could “talk” to anyone and respond like a human being. ChatGPT can even approximate some degree of emotional intelligence–it shows an increase in truthfulness and accuracy when the user adds emotional language to prompts (e.g., “This is important to my career.”).

While the release of general purpose AI chatbots caught the attention of the world, we believe the most useful AI tools for trial consulting will come from the creation of customized programs trained on texts particularly useful to our work, such as focus group transcripts, questionnaire data, trial transcripts, summary judgment motions and mediation briefs. Some of our authors have been experimenting with creating individual customized chatbots that produce voir dire questions, drafts of supplemental juror questionnaires, and even construct basic opening statements. Trial consultants and research companies have also developed more specialized tools on top of LLMs like ChatGPT to carry out tasks like analyzing and summarizing focus group transcripts or developing juror profiles. In addition, research and consulting companies have independently developed their own AI programs that are not chatbots but use neural networks and LLMs. More such tools will no doubt be coming. The AI Taskforce will be presenting demonstrations of specific products at the upcoming ASTC conference and will continue to help members understand and evaluate the developing universe of AI tools.

You Are What You Eat: Limits on AI Chatbots’ Knowledge

With all of their abilities, every AI chatbot is specialized and limited in use based on the specific domain of information on which it has been trained and the specific patterns it has been programmed to learn. It is also constrained by rules of speech and response programmed into it by human programmers.

For example, ChatGPT, like a lot of us, is good at English but not so good at math. That’s because it has been programmed to understand and generate natural language, not to do math. It can explain mathematical concepts in English, but it can’t perform arithmetic or analyze numbers or statistical data –ironic, considering that its skills are dependent on statistics! It can write code and create sample datasets, but it can’t decide what is worth researching or what a computer program should do to be useful to human beings.

For this reason, you need to both read the fine print of their user licenses and experiment with different chatbots to really understand what each can–and can’t–do.

It’s helpful to recognize that AI chatbots have significant limitations. None of the AI products currently available come anywhere near to demonstrating what is called AGI, or Artificial General Intelligence–the ability to think and act like a human, which includes things like finding patterns across domains, learning analogically, and learning from the environment through sensations and emotions.

“Hallucinations” and Model “Bias”

While we will address issues with using AI chatbots in trial consulting in Part 2, two practical issues have arisen with the use of AI chatbots.  The first issue is that the results they produce can contain incorrect information and downright fabrications.   Computer scientists have come to refer to these as “hallucinations.”

Hallucinations occur with most if not all AI chatbots[1] and are not an infrequent occurrence.  In fact, a recent study focusing on legal hallucinations (i.e., responses from the models not consistent with legal facts) found that the LLMs examined, including ChatGPT3.5 and Llama 2, produced hallucinations ranging 69% to 88% of the time, with invented information occurring more frequently as the task became more complex. 

Hallucinations are not simply limited to a minor misattribution or an inaccurate date. Arguments and content in support of the results can be fundamentally wrong and, upon query with the AI chatbots, may still be defended as being correct. One of the authors has found this occurring with some AI chatbot summaries of internet search results concerning potential jurors in the past. This is particularly problematic because the content produced appears well-written and convincing, absent verification. It is important to recognize that these hallucinations are not “mistakes.” It’s not that the chatbot had access to the correct answer but chose the wrong one. The chatbot, using the rules and the texts it was given, has constructed novel information that it believes is correct. It has produced plausible misinformation that it believes to be true (which, alas, only increases the extent to which it seems human!).

Because the LLMs are trained on a limited universe of texts, chatbots also have inherent biases. For example, non-English speakers are expressing concerns that ChatGPT trained more heavily on English texts and appears to be biased towards the English language and its use could increase existing inequities in global commerce and culture. The Internet is also a recent invention, and while some texts that predate the Internet have been digitized, Internet content is heavily biased towards texts produced after 1989 and so answers inevitably reflect a “presentist” bias. Publicly available content reflects biases embedded in our societies, and so chatbots have also generated responses that reflect these biases. For example, when asked to provide images of doctors, a chatbot has provided images of doctors who are all white men. According to researcher Petar Janovic, because of their generative ability, AI tools, including chatbots, can even recombine biases to create new ones.

Because every AI tool is trained based on a specific set of data (which can include many kinds of data besides language), this problem of bias exists in AI tools more broadly, not just chatbots. The biased nature of these tools is of great concern as more and more AI tools are being deployed by government and business to make critical decisions about such things as patient treatment, hiring, loan making and loan collection practices, facial recognition, and criminal sentencing decisions.  Trial consultants need to be aware of ways in which AI chatbots might reproduce and reinforce stereotypes and biases in the results produced.

Understanding what AI chatbots are and how they work provides a sound knowledge base for appreciating how and in what ways AI chatbots can be applied to trial consulting.  This understanding also can lessen the tendency to anthropomorphize these computers trained to seem human.

Stay tuned: Part 2: AI Chatbots and Trial Consulting will appear in the next edition of The Jury Expert. In that article, we will address a variety of ways in which AI chatbots have been and may well be used in the field of trial consulting. We will address pretrial applications (e.g., internet research on jurors, developing voir dire questions, generating supplemental juror questionnaires, and theme development), trial applications (e.g., opening statements and closing arguments), and a discussion of ethical and security issues. 

This article was prepared by (in alphabetical order) Erica Baer, Jeffrey Frederick, Anupama Gulati, Kristi Harrington, and Sarah Murray.

[1] Some companies (e.g., Lexis AI) have recently claimed to have minimized the intrusions of hallucination through database access, machine learning, and associated programming. Such results may be linked to a certain component of their products, e.g., legal citations.