It was easy to see why many behavioural practitioners loved the idea that you could induce honesty by getting someone to sign a form at the top, not the bottom. It was practical. It was cheap to implement. It involved more than the common “send them a reminder” or “chuck a social norm on it” that comprises much of the applied behavioural science canon. (That said, don’t underestimate reminders.) It provided an unintuitive proposal to improve business outcomes that wasn’t likely to come from any other source.

The fraudulent data in the paper that first proposed this idea has attracted a lot of recent attention (plus a retraction). But apart from the pall cast over the field and Dan Ariely’s work, the exposure of the fraud has also had the unfortunate effect of distracting from what I think is a more important story arising from this study.

So, for a moment, let’s ignore the fraud and look at the state of play before its exposure.

In 2012, Lisa Shu and friends published a paper in PNAS in which they reported that “signing before—rather than after—the opportunity to cheat makes ethics salient when they are needed most and significantly reduces dishonesty.” The paper reported the results of two lab experiments and one field experiment, all pointing in the right direction.

Here is how the authors reported the field experiment:

Partnering with an automobile insurance company in the southeastern United States, we manipulated the policy review form, which asked customers to report the current odometer mileage of all cars insured by the company. Customers were randomly assigned to one of two forms, both of which required their signature following the statement: “I promise that the information I am providing is true.” Half the customers received the original forms used by the insurance company, where their signature was required at the end of the form; the other half received our treatment forms, where they were required to sign at the beginning. The forms were identical in every other respect. Reporting lower odometer mileage indicated less driving, lower risk of accident occurrence, and therefore lower insurance premiums. …

… Customers who signed at the beginning on average revealed higher use (M = 26,098.4, SD = 12,253.4) than those who signed at the end [M = 23,670.6, SD = 12,621.4; F(1, 13,485) = 128.63, P < 0.001]. The difference was 2,427.8 miles per car. That is, asking customers to sign at the beginning of the form led to a 10.25% increase in implied miles driven (based on reported odometer readings) over the current practice of asking for a signature at the end.

But then in 2020 came a new paper in PNAS by Ariella Kristal, Ashley Whillans and the authors of the 2012 paper. The abstract says it all:

In 2012, five of the current authors published a paper in PNAS showing that people are more honest when they are asked to sign a veracity statement at the beginning instead of at the end of a tax or insurance audit form. In a recent investigation, across five related experiments we failed to find an effect of signing at the beginning on dishonesty. Following up on these studies, we conducted one preregistered, high-powered direct replication of experiment 1 of the PNAS paper, in which we failed to replicate the original result. The current paper updates the scientific record by showing that signing at the beginning is unlikely to be a simple solution for increasing honest reporting.

These authors discussed what we should learn from these two papers in an article in Scientific American. It was pointedly titled “When We’re Wrong, It’s Our Responsibility as Scientists to Say So”.

The title relates to the point where I depart from their interpretation. They suggest scientists should say when they are wrong. But the question is, what exactly went wrong?

Here’s the authors' take:

We also hope that this collaboration serves as a positive example, whereby upon learning that something they had been promoting for nearly a decade may not be true, the original authors confronted the issue directly by running new and more rigorous studies, and the original journal was open to publishing a new peer-reviewed article documenting the correction.

While we may have lost confidence in the effect of signing first (versus last) being the simple fix that we thought we found, this collaboration has strengthened our belief that psychology, like all good science, is continuously updating and self-correcting and that it is up to all of us to maintain this growing positive trend.

This is underselling the nature of what went wrong and the correction that is required.

The problem is not simply that a single study about signing a form failed to replicate. The authors are hardly Robinson Crusoe in failing to have their work replicate. This isn’t even the first failed replication involving Dan Ariely or Nina Mazar on this topic of inducing honesty.

Rather, the problem is that there is a huge swathe of published literature in the behavioural sciences that, to use Andrew Gelman’s framing, suffers from the garden of forking paths, publication bias and the like. It is built on noisy experiments. The PNAS paper is a typical example, not an exception. Much of this literature won’t replicate. And despite this, people are still taking that work as true absent a failed replication.

So rather than saying they “lost confidence in the effect of signing first”, a better calibrated self-correction would have been to say “we should not have trusted this potentially spurious result absent rigourous replication, and we should be applying similar scepticism to more of the literature.”

To put it another way, the default position should not be to take all published results in the behavioural science literature as “the record”. Instead, we should be treating many of these published results as exploratory or as untested hypotheses. Start from a position near “unlikely to be true”, and update to a stronger belief in the presence of replications or other supporting evidence.

Absent that approach, the story repeats. Well-meaning practitioners will continue to pick up ideas from the literature, waste time and effort until (if they are lucky) someone gets around to killing off the original idea in a replication. And as highlighted in the Scientific American article, this is costly:

This matters because governments worldwide have spent considerable money and time trying to put this intervention into practice with limited success. In particular, failed attempts have been reported in several countries, and thousands of dollars were spent when one of us (Whillans) was working with a government in Canada to attempt to change their tax forms.

As a final aside, I’m not as optimistic that behavioural science is continuously updating and self-correcting, at least not yet in a way that matches the scale of the challenge. Responses to failed replications, such as the other failed replication of an intervention designed to induce honesty that I mentioned above, are typically far more defensive. Then there is the ongoing quantum of the flow of papers with questionable claims. They still outnumber robust replications.

Maybe the distinguishing point for these “sign at the top” experiments is that at least one incentive pointed in the right direction. By joining the replication effort and applying a less defensive approach to their work, the original authors got the great value of two PNAS papers for the price of one.