The Case for Scientific Jury Experiments

DRI
Contact

DRI

[co-authors: Bernard Chao, Christopher Robertson JD PhD, David Yokum, JD PhD]*

The Case for Scientific Jury Experiments

For decades, litigators have relied on focus groups. While this approach can help identify issues for further exploration, attorneys often use focus groups to shape trial strategy or even predict outcomes. But focus groups are ill-suited for these applications because they suffer from three basic weaknesses: 1) they cannot explore unconscious decision-making; 2) they use too few mock jurors to provide reliable answers, and 3) they can become echo chambers that only surface a subset of the issues that an actual jury will consider.

Fortunately, recent technical advances in crowdsourcing and insights into human decision-making have opened the door to a better approach. We can now conduct large-scale (i.e., 100’s to 1000’s of mock jurors) A vs. B experiments for trial attorneys. These experiments avoid the problems of focus groups and can be used to test any number of issues. We highlight some examples from our research including: 1) the effects of anchoring, 2) the problem with self-diagnosing bias, 3) how subsequent remedial measures affect juries; and 4) how juries respond to a variety of different jury instructions.

The Limits of Focus Groups

Limitation #1: People Cannot Report Unconscious Causes Of Behavior

There is often a disconnect between a person's expressed opinions or intentions and his or her actual behavior. The gap emerges because most decisions are made unconsciously, and as a result, cannot be accurately self-reported. See John A. Bargh & Tanya L. Chartrand, The Unbearable Automaticity of Being, 54 Am. Psychologist 462 (1999). Psychologists have repeatedly demonstrated this over the past century, generating enough examples to fill bookshelves.

Experiments have shown, for instance, that people are more likely to participate in a retirement savings plan or donate their organs if the enrollment checkbox is pre-checked on the paperwork (there is a psychological tendency to stick with the default selection). John Beshears, James J Choi, David Laibson & Birgitte C. Madrian, The importance of default options for retirement savings outcomes: evidence from the United States, NBER Working Paper No. 12009 (2006); Eric J. Johnson & Daniel Goldstein, Do defaults save lives?, 302 Science 1338-1339 (2003. Experiments have also shown that people overestimate the value of a house if first presented with an inflated, suggested price. Gregory B. Northcraft & Margaret A. Neale, Experts, Amateurs, and Real Estate: An Anchoring-and-Adjustment Perspective on Property Pricing Decisions, 39 Org. Behav. & Hum. Decision Processes 84 (1987). The suggestion “anchors” the cognitive process generating the estimation. Other experiments have shown that people are more likely to ask a woman on a date if first walking across a high bridge. Donald G. Dutton & Arthor P. Aron, Some evidence for heightened sexual attraction under conditions of high anxiety, 30(4) J Personality & Social Psych 510-517 (1974). The adrenaline is misattributed as attraction. People have been shown to severely punish a moral violation if they smell a nearby a foul odor. Simone Schnall, Jonathan Haidt, Gerald L. Clore, & Alexander H. Jordan, Disgust as Embodied Moral Judgment, Personality & Soc. Psych. Bull. (2008). The irrelevant disgust transfers to the moral transgressor.

And yet even with such weighty decisions - money; organs; romance; morality - most people fail to realize that these contextual features have a causal role in their behavior. Even when directly confronted with the experimental data showing what caused their behavior, many people continue to insist it cannot be so; that such trivialities as default checkboxes, irrelevant numbers, high bridges, and smelly socks could not possibly, they insist, unconsciously shape their actions. Psychologists even invented a label for the phenomenon - the “bias blind spot.”

Such blind spots mean a focus group participant’s self-report can fail to reflect the real causal factors driving jury outcomes. In one study, two of the co-authors of this paper asked about 1,000 mock jurors to adjudicate the same medical malpractice case. David V. Yokum, Christopher T. Robertson, & Matt J. Palmer, The inability of jurors to self-diagnose bias, 96 Denver Law Review 869 (2019). Half of the participants, however, first read an article that was highly prejudicial against the physician defendant. When asked whether reading the article would affect their verdicts, the vast majority (9 out of 10) said no. In fact, these participants were more than twice as likely to find for the plaintiff. The pretrial publicity caused an enormous effect, but most people were completely incapable or unwilling to self-report the bias when asked.

For these reasons, mock jurors cannot reliably explain what caused their reactions to the case presentations that they saw. It may be anything or nothing in the case presentation itself.

Limitation #2: Focus Groups Involve Hopelessly Small Sample Sizes

Of course, not all reasons for behavior are inaccessible to the mind. Often, we can articulate why we believe certain facts or how we intend to behave in the future. Yet even when people are able to accurately self-report, a typical focus group is too small to make any reliable predictions about the average juror. After all, the jury consultant does not care about these twelve mock jurors, but is instead trying to provide a prediction about any twelve people who may be selected from the voir dire panel. Six, 12, 30 or even 50 focus group participants are simply not enough to make such a prediction with precision.

Consider trying to predict the likelihood of winning your case with 24 mock jurors (who you might divide into groups for deliberations). In their individual votes, suppose you get 18 mock jurors voting for your side of the case. Sounds great, right? That’s a 66% chance of winning.

However, using standard social science estimates of uncertainty, your consultant actually cannot tell you whether you will probably win the case or probably lose the case. The 95% confidence interval spans the range from 44.7% (you will probably lose, but it will be close) to 83.6% (you’ll probably win). With a different draw of jurors, your mock jury exercise could return any of those numbers!

In contrast, a study with 1,000 mock jurors can narrow that interval to plus or minus 3%, ensuring that you know the real risks of your case. This is why a typical Gallup poll involves 1,000 people or more. Such a sample size empowers statements such as: the percentage of Americans who approve of the Supreme Court is 45%, ±4%. Michael Smith & Frank Newport, Review of Most Republicans Continue to Disapprove of Supreme Court, Gallup (September 29, 2016), You can rule out the proposition that most Americans approve. Had Gallup only polled 30 people, it's margin of error would ±19% -- unable to tell you whether most approve or most disapprove. Such a survey would be useless.

The upshot is that focus groups involving small sample sizes are deeply unreliable predictors of how an actually empaneled jury will behave.

Limitation #3: Group Discussions Can Be A Polarizing Echo Chamber

Psychologists studying group interactions have documented a number of ways in which people behave differently in the presence of others. These peer pressure effects can undermine the ability to understand what an individual really thinks, since his or her observed behavior is merely a façade trying to please the person asking the question or get along with others in the crowd.

A famous example of the so-called “conformity effect” was provided by Solomon Ash in the 1950s. Solomon E. Asch, Studies of independence and conformity: A minority of one against a unanimous majority, 70 Psychological Monographs (1956). A group of participants was tasked with publicly saying which of the three lines matched the length of a separate line. The task was repeated several times. Unbeknownst to one person—the actual research participant—all other people in the group were confederates, hired staff of Dr. Ash. These confederates publicly gave their answer before the research participant was asked to give his or her answer. Over time the confederates began to give the same wrong answers—occasionally ridiculously wrong answers (e.g., that the Exhibit 1 line to the right matched “B” rather than “A”). Illustrating the power of peer pressure, a solid 3 out of 4 people nonetheless would, on at least one or more occasion, parrot the clearly wrong answer.

The presence of a jury consultant can heighten the concern to comply in a unique way. Namely, when subjects believe they understand what the researcher wants to know, they have a tendency to conform to those perceived expectations. The most famous example is the “Hawthorne effect,” a reference to experiments done at the Hawthorne Works in Cicero, Illinois. F.J. Roethlisberger & W.J. Dickson, Management and the Worker: An Account of a Research Program Conducted by the Western Electric Company, Hawthorne Works, Chicago, Harvard University Press (1939). Researchers found that factory output increased whenever employees were observed; their effort heightened when under a watchful eye. Yet productivity would return to normal as soon as the scientists left.

A perhaps non-intuitive feature of group discussion is a tendency toward, not compromise, but polarization. Even when there is diversity of initial viewpoints, people tend to focus on discussion points of common knowledge and agreement, a tendency known as shared information bias. Debate highlights only agreed upon positions, and in turn reinforces confidence in those beliefs. Occasionally the group position emerges as more extreme than that of any individual at the onset. Cass R. Sunstein, The Law of Group Polarization, 10 J. Pol. Phil. 175 (2002).

The upshot of such conformity, Hawthorne, and polarization effects is that the sole, useful function of focus groups can be undermined: rather than being a fountain for diverse idea generation, the group devolves in a polarized echo chamber. Reinforcing merely one strong voice, or even parroting back the focus group leader’s own preferences or foreordained ideas.

The Way Forward: Large, Randomized, Blinded Experiments

Large-scale online experiments solve these problems. This is how it works.

Conducting Online Trials

With the help of the attorneys involved in the case, we can create shortened presentations of both the plaintiff’s and the defendant’s cases. Social scientists call this the “stimulus”. Depending on how much time and money the parties are prepared to spend, there are two primary options. The more elaborate version involves making a video. We can create a PowerPoint presentation that includes an audio overlay. Attorneys record the arguments so that mock jurors can hear their arguments while watching the evidence on the slides. The presentation can include photographs, documents, animations, or even videotaped deposition testimony. This sort of audio-visual presentation has the advantage of simulating what juries might actually see and hear in closing statements. We have found that the core of a case, even if relatively sophisticated, can often be presented in about 15 minutes per side. For smaller stakes cases, we can simply draft a short statement of each side’s position. Again, the statements can incorporate important evidence. In addition, to the plaintiff’s and defendant’s presentation, we then add jury instructions that are read by a narrator acting as the judge.

To make the experiment, we create at least two versions of the case. For example, in one version we can have the defendant simple argue that there is no liability and attack the plaintiff’s damages case without providing an alternative damages figure. In another version of the case, the defendant can provide an alternative damages figure to see how that affects the jury. This is A vs. B experiment with two experimental conditions. But we can make more sophisticated experiments by layering on more conditions. For example, attorneys may also want to learn about the impact of two different sets of jury instructions. The result would be 2x2 experiment with four experimental conditions. In theory, we could keep on adding more conditions. But in practice, we have generally limited ourselves to three manipulations (e.g., different plaintiff’s damages arguments, different defendant’s damages arguments, and different limiting instructions).

Once the mini-trial is ready, mock jurors are recruited from one of several crowdsourcing platforms. The panel should be both broadly representative of the population of potential jurors and also be provided incentives to ensure that they pay attention. The cost will vary depending on the length of the mini-trial, the number of mock jurors and the number of experimental conditions. To take advantage of modern statistical techniques, we generally recommend over a hundred mock jurors per experimental conditions. But if the attorneys simply want to get a sense of what a jury might decide, they can elect for a smaller number and only test their baseline case. Regardless of how many they choose, the cost will likely be far less expensive than traditional mock juries. Participants on crowdsourcing platforms are typically paid minimum wage. When the trial only lasts half an hour, the cost for each mock juror will be minimal.

Online mock trials can be completed quickly. The authors have run cases in less than a week, sometimes as short as a day. The results can be provided in easy to digest form. Imagine finding that 324 out of 800 jurors, or 41%, determined that the defendant was liable. For those mock jurors that did find liability, the average damage award was $1.42 million dollars, with a 95% confidence interval ranging from $450,000 to $3.7 million.

We can also calculate case expected values by combining the data on verdicts and damages. We simply multiply the plaintiff’s chance of winning by the average recovery when winning. Here, the case expected value is $575,100. Of course, special jury forms can also impose comparative fault, third party fault, or other affirmative defenses to yield a realistic estimate of case value.

The Advantages

There are a number of advantages to running an experiment in this way.

First, by crowdsourcing mock jurors from across the internet, we are able to recruit thousands of individuals to participate in a single study. In this way, we get a large and diverse sample, not unlike a Gallup poll.

Second, rather than asking people how they might behave, we experiment in order to directly observe the behavior. Rather than asking a focus group if they would be affected by pretrial publicity (PTP), for example, we recruited 1,000 people and then randomly assigned each person to one of two conditions—either they saw the PTP or not—and then everyone watched the identical trial. Because of the randomized design, the only difference between the two groups was the presence or absence of PTP. And because of the large sample, it is statistically expected that the average verdict rate between the two groups should be identical, unless... the one thing we controlled to be different between the groups (i.e., the PTP) causes a difference.

Third, we blind respondents to the purpose of the experiment, so they cannot just tell us what we want to hear. Because each respondent is simply deciding the case, rather than telling us about the case, we get a realism and candor that sheds real light.

This sort of randomized design is the same technique the Food and Drug Administration (FDA) requires pharmaceutical companies use to test the efficacy of new drugs, and it represents the most rigorous tool that scientists have for understanding causality. It does not rely on self-report. We see the real behavior.

The Results

We have conducted a number of experiments on juries. Some of these have shown:

1) When a plaintiff asks for an absurdly high damages award (i.e., by anchoring), the tactic really does increase damages awards, but can cause the defendants to win more often. John Campbell, Bernard Chao, Christopher Robertson & David Yokum, Countering the Plaintiff’s Anchor: Jury Simulations to Evaluate Damages Arguments, 101 Iowa L. Rev. 543 (2016).

2) Evidence of a defendant’s subsequent remedial measures increases the defendant’s chance of losing on liability but reduces the damages defendants must pay. Bernard Chao & Kylie Santos, How Evidence of Subsequent Remedial Measures Matters, 84 Mo. L. Rev. 609 (2019).

3) Limiting instructions appear to be more effective when they are accompanied by an explanation for the instruction. Athan P. Papailiou, David V. Yokum & Christopher T. Robertson, The Novel New Jersey Eyewitness Instruction Induces Skepticism but Not Sensitivity, PLoS ONE (2015). Chao & Santos, supra (examining the effect of limiting instructions).

In short, online experiments can be very useful for shaping trial strategy, advancing arguments on disqualification and jury instructions to the judge, and even settling cases. Bernard Chao, Christopher Robertson & David Yokum, Crowdsourcing and Data Analytics: The New Settlement Tools, 102 Judicature 62 (Fall 2018) (explaining how verdicts from crowdsourcing can provide an unbiased reference point for settlement).

* Hugo Analytics

Written by:

DRI
Contact
more
less

DRI on:

Reporters on Deadline

"My best business intelligence, in one easy email…"

Your first step to building a free, personalized, morning email brief covering pertinent authors and topics on JD Supra:
*By using the service, you signify your acceptance of JD Supra's Privacy Policy.
Custom Email Digest
- hide
- hide