[問題] 使用R輔助基礎機率論教學

作者anovachen (　)

看板R_Language

標題[問題] 使用R輔助基礎機率論教學

時間Sat Dec 14 22:12:27 2013

[問題類型]: 程式諮詢(我想用R 做某件事情，但是我不知道要怎麼用R 寫出來) [軟體熟悉度]: 入門(寫過其他程式，只是對語法不熟悉) [問題敘述]: 弱大數法則我還勉強能用丟銅板來"實驗"給學生看... 但是CLT要怎麼做呢= = 所以想用R來做電腦輔助教學... 而且我想示範： 1.不是n>=30就叫做大樣本。所以除了常態分配和長得比較對稱的均勻分配以外，其他都選取很極端的機率分佈，結果n=30之下作Jarque Bera test都拒絕虛無假設。 2.期望值和變異數存在且有限才可適用CLT。所以我故意選了Cauchy分配來示範。 (抽自柯西分配母體的樣本其平均仍然是柯西分配) 不過，我最後用ks.test的結果怪怪的，是不是參數沒設定好？(難道沒辦法讓他自己抓常態分配的參數嗎= =) (用SAS和SPSS都不用調參數....是不是用R作KS檢定一定要指定參數?) [程式範例]: #請記得先安裝TSA package library(TSA) #X~N(0,1) set.seed(1) par(mfrow=c(1,2)) popnorm<-rnorm(n=100000,mean=0, sd=1) mean(popnorm) var(popnorm) plot(density(popnorm)) meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=30,replace=TRUE)) } mean(meannorm) var(meannorm) plot(density(meannorm)) ks.test(meannorm,"pnorm") jarque.bera.test(meannorm) 結果： > ks.test(meannorm,"pnorm") One-sample Kolmogorov-Smirnov test data: meannorm D = 0.3406, p-value < 2.2e-16 alternative hypothesis: two-sided > jarque.bera.test(meannorm) Jarque Bera Test data: meannorm X-squared = 4.9045, df = 2, p-value = 0.0861 最後改用這個： mean<-mean(meannorm) sd<-sqrt(var(meannorm)) ks.test(meannorm,"pnorm",mean,sd) 結果： One-sample Kolmogorov-Smirnov test data: meannorm D = 0.029, p-value = 0.3692 alternative hypothesis: two-sided 沒有違背預期的結果。 [關鍵字]: 電腦輔助教學, 機率, 中央極限定理附註：以下試驗其他機率分佈 http://ideone.com/UkhaJO 為節省版面，貼到網路上供參考。 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 111.255.26.132

→ clickhere:忘了 sqrt(30) ? 12/15 08:05

→ clickhere:再把 CLT 看一遍. 12/15 08:07

→ andrew43:ks.test()若要檢驗分配是要指定參數沒錯。 12/15 11:50

→ andrew43:至於其它軟體不用指定參數，我猜是自動用樣本估的。 12/15 11:50

→ clickhere:no. sqrt(n) 才會給你正確的 CLT. n=30是猜不到的. 12/15 12:02

mean<-mean(meannorm) sd<-sqrt(var(meannorm)) ↑這裡的SD=sqrt(樣本變異數)=sqrt(母體變異數/30)=母體標準差/sqrt(30) ks.test(meannorm,"pnorm",mean,sd) 結果： One-sample Kolmogorov-Smirnov test data: meannorm D = 0.029, p-value = 0.3692 alternative hypothesis: two-sided 這個部份的sd應該不用再除sqrt(30) 因為樣本的變異數已經是母體變異數除以30了。 > mean(popnorm) [1] -0.002244083 > var(popnorm) [1] 1.007059 > mean(meannorm) [1] 0.002210398 > var(meannorm) [1] 0.03560481 其中抽出樣本的變異數約為母體變異數/30 > var(popnorm)/30 [1] 0.03356863 所以使用CLT的時候，標準差不用再除以sqrt(30)囉... 主要的問題是，不曉得有沒有用KS檢定卻不需指定參數的方法？類似SAS和SPSS的那種... 雖然用JB檢定是滿方便的，可是初等統計教科書大多沒寫這方法... 另外，抽樣的時候設定replace=TRUE應該沒問題吧？我想說這樣比較能確保每個樣本都iid於母體分配。 --- 程式又大幅度改寫： #X~N(0,1) set.seed(1) par(mfrow=c(1,2)) popnorm<-rnorm(n=100000,mean=0, sd=1) meanpopnorm<-mean(popnorm) varpopnorm<-var(popnorm) print(paste("The mean of the population is",round(meanpopnorm,5))) print(paste("The variance of the population is", round(varpopnorm,5))) plot(density(popnorm)) cltnorm<-function(n){ meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE)) } print("----------------------------------------------------------") print(paste("The mean of samples is",round(mean(meannorm),5))) print(paste(", and the variance of samples is",round(var(meannorm),5))) print(paste("According to CLT, the theoretical mean is", round(meanpopnorm,5))) print(paste(", and the theoretical variance of samples is",round(varpopnorm/n,5))) print("----------------------------------------------------------") plot(density(meannorm)) jarque.bera.test(meannorm) } 接下來重覆五次試驗(n=30)： > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is 0.00355" [1] ", and the variance of samples is 0.03627" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 4.4989, df = 2, p-value = 0.1055 > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is -0.00778" [1] ", and the variance of samples is 0.03286" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 0.4475, df = 2, p-value = 0.7995 > > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is 0.00405" [1] ", and the variance of samples is 0.03348" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 3.2785, df = 2, p-value = 0.1941 > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is -0.0048" [1] ", and the variance of samples is 0.03026" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 3.2825, df = 2, p-value = 0.1937 > cltnorm(30) [1] "----------------------------------------------------------" [1] "The mean of samples is 5e-05" [1] ", and the variance of samples is 0.03214" [1] "According to CLT, the theoretical mean is -0.00224" [1] ", and the theoretical variance of samples is 0.03357" [1] "----------------------------------------------------------" Jarque Bera Test data: meannorm X-squared = 10.1117, df = 2, p-value = 0.006372 最後一次試驗錯誤的拒絕虛無假設... 我該怎麼解釋呢= =" --- 剛這樣寫： testing<-function(n, m){ set.seed(1) popnorm<-rnorm(n=100000,mean=0, sd=1) error<-array(0,dim=c(m)) for (i in 1:m){ meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE)) } test<-jarque.bera.test(meannorm) print(test) p<-as.numeric(test$p.value) print(p) error[i]<-ifelse(p<0.05,1,0) print(error[i]) } return(list(matrix=error)) } 使用testing(30,50)測試，可是傳回的矩陣很怪... 變成有1000個元素的一維陣列，而且第五筆資料應該要存入1，卻反而是0。 ※ 編輯: anovachen 來自: 111.255.24.52 (12/15 23:48)

→ andrew43:可再抽出沒錯，不過你有大母體了也沒什麼影響。 12/16 08:49

set.seed(1) popnorm<-rnorm(n=100000,mean=0, sd=1) error<-array(0,dim=c(30)) testing<-function(n){ meannorm<-array(0,dim=c(1000)) for (i in 1:1000) { meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE)) } test<-jarque.bera.test(meannorm) print(test) p<-as.numeric(test$p.value) print(p) return(ifelse(p<0.05,1,0)) } for (i in 1:30){ error[i]<-testing(30) } 這樣寫就OK了。不曉得上面寫法問題出在哪?? >"< ---- 以下是某不願具名者提供的程式碼： n <- 30 ret <- NULL for(i in 1:100){ x <- rnorm(n) x.hat <- mean(x) * sqrt(n) # CLT stat. goes to N(0, 1) in d. ret[i] <- x.hat } ks.test(ret, "pnorm") # Test 100 x.hat's against N(0, 1) 他指出我原本的寫法是類似bootstrap，在有限的樣本(N=100000)自助重抽樣，以模擬從真正的母體裡抽樣的過程。而他的寫法則比較貼近我想表達的。依據他的寫法，我們可以如此改寫： n <- 100 ret <- NULL for(i in 1:100){ x <- runif(n,min=0,max=1) x.hat <- mean(x) ret[i] <- x.hat } hist(ret) ks.test(ret, "pnorm", mean=1/2, sd = sqrt(1/12/n)) 如此可驗證從U(0,1)中抽樣，n=100，則X-bar的分布是否會漸近N(1/2, 1/1200)。 ※ 編輯: anovachen 來自: 111.255.13.49 (12/19 00:47)

推 adgjlsfhk123:http://ccckmit.wikidot.com/st:centraltheorem 12/22 03:35

→ adgjlsfhk123:希望對你有幫助 12/22 03:36

→ anovachen:謝謝! 12/29 16:25