Sunday, August 27, 2017

New p-Value Thresholds for Statistical Significance

This is presently among the hottest topics / discussions / developments in statistics.  Seriously.  Just look at the abstract and dozens of distinguished authors of the paper below, which is forthcoming in one of the world's leading science outlets, Nature Human Behavior.

Of course data mining, or overfitting, or whatever you want to call it, has always been a problem, which has always warranted strong and healthy skepticism regarding alleged "new discoveries".  But the whole point of examining p-values is to AVOID anchoring on arbitrary significance thresholds, whether the old magic .05 or the newly-proposed magic .005.  Just report the p-value, and let people decide for themselves how they feel.  Why obsess over asterisks, and whether/when to put them next to things?


Reading the paper, which I had not done before writing the paragraph above (there's largely no need, as the wonderfully concise abstract says it all), I see that it anticipates my objection at the end of a section entitled "potential objections":
Changing the significance threshold is a distraction from the real solution, which is to replace null hypothesis significance testing (and bright-line thresholds) with more focus on effect sizes and confidence intervals, treating the P-value as a continuous measure, and/or a Bayesian method.
Here here! Marvelously well put.

The paper offers only a feeble refutation of that "potential" objection:
Many of us agree that there are better approaches to statistical analyses than null hypothesis significance testing, but as yet there is no consensus regarding the appropriate choice of replacement. ... Even after the significance threshold is changed, many of us will continue to advocate for alternatives to null hypothesis significance testing. 
I'm all for advocating alternatives to significance testing.  That's important and helpful.  As for continuing to promulgate significance testing with magic significance thresholds, whether .05 or .005, well, you can decide for yourself.

Redefine Statistical Significance
By:Daniel Benjamin ; James Berger ; Magnus Johannesson ; Brian Nosek ; E. Wagenmakers ; Richard Berk ; Kenneth Bollen ; Bjorn Brembs ; Lawrence Brown ; Colin Camerer ; David Cesarini ; Christopher Chambers ; Merlise Clyde ; Thomas Cook ; Paul De Boeck ; Zoltan Dienes ; Anna Dreber ; Kenny Easwaran ; Charles Efferson ; Ernst Fehr ; Fiona Fidler ; Andy Field ; Malcom Forster ; Edward George ; Tarun Ramadorai ; Richard Gonzalez ; Steven Goodman ; Edwin Green ; Donald Green ; Anthony Greenwald ; Jarrod Hadfield ; Larry Hedges ; Leonhard Held ; Teck Hau Ho ; Herbert Hoijtink ; James Jones ; Daniel Hruschka ; Kosuke Imai ; Guido Imbens ; John Ioannidis ; Minjeong Jeon ; Michael Kirchler ; David Laibson ; John List ; Roderick Little ; Arthur Lupia ; Edouard Machery ; Scott MaxwellMichael McCarthy ; Don Moore ; Stephen Morgan ; Marcus Munafo ; Shinichi Nakagawa ; Brendan Nyhan ; Timothy Parker ; Luis PericchiMarco Perugini ; Jeff Rouder ; Judith Rousseau ; Victoria Savalei ; Felix Schonbrodt ; Thomas Sellke ; Betsy Sinclair ; Dustin TingleyTrisha Zandt ; Simine Vazire ; Duncan WattsChristopher Winship ; Robert Wolpert ; Yu XieCristobal Young ; Jonathan Zinman ; Valen Johnson

We propose to change the default P-value threshold for statistical significance for claims of new discoveries from 0.05 to 0.005.