看板 Statistics 關於我們 聯絡資訊
最近ASA發了一篇關於統計顯著性跟p-values的陳述 先給上摘要跟全文(連結為我個人的dropbox,可不登入直接瀏覽): 簡短的摘要:http://tinyurl.com/zswny43 The ASA's statement on p-values: context, process, and purpose: http://tinyurl.com/zw5yyum 當中最重要的應該是提到以下六個原則: 1. P-values can indicate how incompatible the data are with a specified statistical model. 2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone. 3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold. 4. Proper inference requires full reporting and transparency. 5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. 6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis. 個人的一點小心得: 例如,第一點是說P-values can indicate how incompatible the data are with a specified statistical model.,就是說他是在比跟null hypothesis所指定的統計分配 差異,感覺是在指謫像檢定是否為特定分配的檢定,這就是明顯被誤用。(這一點也在全文 的第9頁,第二點中被提及,原文:Researchers often wish to turn a p-value into a statement about the truth of a null hypothesis, or about the probability that random chance produced the observed data. The p-value is neither。) 最常見的就是常態性檢定,做常態性檢定得到p-values > 0.1,就宣稱他的資料是 來自常態,他的虛無假設是這資料是常態,根據第一點,你檢定的是跟常態的不接近 程度,而非是否為常態,這個說明得非常小心;我看到一篇論文的標題,覺得頗有趣, 跟大家分享一下:Absence of evidence is not evidence of absence. 這其實是這次ASA的重點之一,不能說缺乏證據證明null hypothesis,就說是 null hypothesis就是對的,如同常態性檢定一樣,p-value > 0.1時, 結論是你沒證據顯示資料來自非常態,不代表資料來自常態一樣。 (Absence of evidence: 沒證據表明非常態) (evidence of absence: 常態的證據) (第二點的解釋也有提及:It is a statement about data in relation to a specified hypothetical explanation, and is not a statement about the explanation itself.) 第五點也很重要:A p-value, or statistical significance, does not measure the size of an effect or the importance of a result. p-values不能拿來比較重要性的 程度,p-values不代表越重要。ASA給了一個其他方式去衡量第五點,像是confidence, credibility, or prediction intervals; Bayesian methods; alternative measures of evidence, such as likelihood ratios or Bayes Factors。其全文如下: In view of the prevalent misuses of and misconceptions concerning p-values, some statisticians prefer to supplement or even replace p-values with other approaches. These include methods that emphasize estimation over testing, such as confidence, credibility, or prediction intervals; Bayesian methods; alternative measures of evidence, such as likelihood ratios or Bayes Factors; and other approaches such as decision-theoretic modeling and false discovery rates. All these measures and approaches rely on further assumptions, but they may more directly address the size of an effect (and its associated uncertainty) or whether the hypothesis is correct. 不知道大家對ASA這篇statement有沒有什麼想法? 3/11早上看到的一篇部落格文章,闡述一些p-value的價值所在: http://tinyurl.com/jebjua6 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 180.218.152.118 ※ 文章網址: https://www.ptt.cc/bbs/Statistics/M.1457625431.A.E49.html
allen1985: 我覺得這篇算是相當"中肯"的文章 值得讀一下 03/11 03:21
allen1985: 近年Anti-p-value的人很多 但有些批評又太過了 畢竟 03/11 03:22
這幾天看到R blogger有一篇文章寫 ASA says NO to p-values.... 這真的是太誇張了~"~,我會傾向ASA在闡述p-values的價值,以及校正觀念
allen1985: 統計還是得有個下結論的辦法 03/11 03:22
allen1985: 我一直想問的一個問題 p-value = 0.8 跟 p-value = 0.6 03/11 03:23
allen1985: 有沒有差異 以及 p-value = 0.01 跟 p-value = 0.00001 03/11 03:24
allen1985: 有沒有差異 03/11 03:24
光是比p-value這件事本身就是沒意義了,更遑論它們有沒有差異?
andrew43: 回樓上,我覺得這種比較單比沒太多意思,還是要再參考 03/11 07:27
andrew43: 其它指標吧,例如effect size或Bayes factor。 03/11 07:28
andrew43: 不然要說有差也有差,但做結論要說沒差也沒差的感覺。 03/11 07:31
我倒是滿好奇文章提到的likelihood ratios,是因為likelihood ratios是在兩個假設下 的likelihood比值,所以會比較適合拿來做measure of evidence嗎? 不像一般假設檢定是null跟alternative互為相反。
allen1985: 我主要是想說 大部分的文章 現在都認為 不顯著就是不 03/11 08:11
allen1985: 顯著 不顯著的兩個p-values是不能直接比較的 但還是 03/11 08:11
allen1985: 滿多人會拿來比的 03/11 08:11
allen1985: 我絕對贊同 很多東西不能單看p-value 需要其他指標 圖 03/11 08:12
allen1985: 才能下結論 03/11 08:12
這就是第三點了,不該用p-value做一翻兩瞪眼的推論,但是p-value其實無法做這件事 第三點的部分摘錄: Pragmatic considerations often require binary, “yes-no” decisions, but this does not mean that p-values alone can ensure that a decision is correct or incorrect. The widespread use of “statistical significance” (generally interpreted as “p <= 0.05”) as a license for making a claim of a scientific finding (or implied truth) leads to considerable distortion of the scientific process.
KirinGuess: 所以原po是不同意文章的第一個論點? 03/12 19:22
KirinGuess: 認為第一個論點和文章其他內容衝突? 03/12 19:22
我是覺得第一點說得很好啊XD,常態性檢定就是常見的誤用
milk0925: 這篇對我幫助超大的,感謝分享! 03/12 22:18
不客氣 ※ 編輯: celestialgod (180.218.152.118), 03/12/2016 22:31:09 ※ 編輯: celestialgod (140.109.74.87), 04/19/2016 15:23:49