How data sleuths spotted fake numbers in widely cited article
IF YOU WRITE a book called “The Honest Truth About Dishonesty,” the last thing you want to be tied to is fake data. Yet an article by Dan Ariely, a professor at Duke University, appears to be based on this.
Listen on the go
To have The Economist app and read articles, wherever you are
In 2012, Mr. Ariely, Max Bazerman, Francesca Gino, Nina Mazar and Lisa Shu wrote an article claiming that people act more honestly when they state in advance that they will be truthful. He cited two studies from a lab and one based on auto insurance data.
On August 17, Leif Nelson, Joe Simmons and Uri Simonsohn, who run a Data Colada blog, wrote that they believed the insurance data was fake. All of the authors of the article have requested withdrawal from the study. They all deny responsibility, saying they have been duped rather than dishonest.
The study asked auto insurance customers to indicate how much they had driven and to sign a statement saying they were telling the truth. Half signed at the top of the form, half at the bottom. The first group registered 10% more miles, possibly increasing their bonuses.
Data Colada found three smoking guns. First, the questionable data see different. The spreadsheet contains 6,744 values in Cambria font. Each has a twin in Calibri, identical except for small mileage differences. This implies that the counterfeiters duplicated real data, added random variations, and forgot to cover their tracks. Then the miles distribution is not bell shaped like most real data, but looks like a box. Similar numbers of cars traveled each distance below 50,000; none exceeded this amount. And the data is too precise. Reported starting mileages are often rounded to powers of ten. However, in the distances studied, zero is not more common than the other final digits, a characteristic of machine-generated data.
Who is responsible? Mr Bazerman, Ms Gino and Ms Shu say they only worked on the lab studies. Ms. Mazar says she looked at the car’s data, but had no role in acquiring it. Mr Ariely says he was the only author to deal with the source of the data, although he did not name it. He says they must have been rigged before he saw them.
One potential explanation is that workers at the insurer falsified the numbers. The Hartford, an insurer, claims to have worked with Mr. Ariely on “a small project” in 2007-08, but cannot “locate any data, deliverables or results.” He says most of the employees involved have left. None of Mr. Ariely’s co-authors gave us an interview. “I didn’t fabricate the data,” Ariely said. “I’m ready to take a lie test on this.”■
Source: Colada data
This article appeared in the Graphic Detail section of the print edition under the headline “The Police is the Thing”