→ clickhere:忘了 sqrt(30) ? 12/15 08:05
→ clickhere:再把 CLT 看一遍. 12/15 08:07
→ andrew43:ks.test()若要檢驗分配是要指定參數沒錯。 12/15 11:50
→ andrew43:至於其它軟體不用指定參數,我猜是自動用樣本估的。 12/15 11:50
→ clickhere:no. sqrt(n) 才會給你正確的 CLT. n=30是猜不到的. 12/15 12:02
mean<-mean(meannorm)
sd<-sqrt(var(meannorm))
↑這裡的SD=sqrt(樣本變異數)=sqrt(母體變異數/30)=母體標準差/sqrt(30)
ks.test(meannorm,"pnorm",mean,sd)
結果:
One-sample Kolmogorov-Smirnov test
data: meannorm
D = 0.029, p-value = 0.3692
alternative hypothesis: two-sided
這個部份的sd應該不用再除sqrt(30)
因為樣本的變異數已經是母體變異數除以30了。
> mean(popnorm)
[1] -0.002244083
> var(popnorm)
[1] 1.007059
> mean(meannorm)
[1] 0.002210398
> var(meannorm)
[1] 0.03560481
其中抽出樣本的變異數約為母體變異數/30
> var(popnorm)/30
[1] 0.03356863
所以使用CLT的時候,標準差不用再除以sqrt(30)囉...
主要的問題是,不曉得有沒有用KS檢定卻不需指定參數的方法?
類似SAS和SPSS的那種...
雖然用JB檢定是滿方便的,可是初等統計教科書大多沒寫這方法...
另外,抽樣的時候設定replace=TRUE應該沒問題吧?
我想說這樣比較能確保每個樣本都iid於母體分配。
---
程式又大幅度改寫:
#X~N(0,1)
set.seed(1)
par(mfrow=c(1,2))
popnorm<-rnorm(n=100000,mean=0, sd=1)
meanpopnorm<-mean(popnorm)
varpopnorm<-var(popnorm)
print(paste("The mean of the population is",round(meanpopnorm,5)))
print(paste("The variance of the population is", round(varpopnorm,5)))
plot(density(popnorm))
cltnorm<-function(n){
meannorm<-array(0,dim=c(1000))
for (i in 1:1000) {
meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE))
}
print("----------------------------------------------------------")
print(paste("The mean of samples is",round(mean(meannorm),5)))
print(paste(", and the variance of samples is",round(var(meannorm),5)))
print(paste("According to CLT, the theoretical mean is",
round(meanpopnorm,5)))
print(paste(", and the theoretical variance of samples
is",round(varpopnorm/n,5)))
print("----------------------------------------------------------")
plot(density(meannorm))
jarque.bera.test(meannorm)
}
接下來重覆五次試驗(n=30):
> cltnorm(30)
[1] "----------------------------------------------------------"
[1] "The mean of samples is 0.00355"
[1] ", and the variance of samples is 0.03627"
[1] "According to CLT, the theoretical mean is -0.00224"
[1] ", and the theoretical variance of samples is 0.03357"
[1] "----------------------------------------------------------"
Jarque Bera Test
data: meannorm
X-squared = 4.4989, df = 2, p-value = 0.1055
> cltnorm(30)
[1] "----------------------------------------------------------"
[1] "The mean of samples is -0.00778"
[1] ", and the variance of samples is 0.03286"
[1] "According to CLT, the theoretical mean is -0.00224"
[1] ", and the theoretical variance of samples is 0.03357"
[1] "----------------------------------------------------------"
Jarque Bera Test
data: meannorm
X-squared = 0.4475, df = 2, p-value = 0.7995
>
> cltnorm(30)
[1] "----------------------------------------------------------"
[1] "The mean of samples is 0.00405"
[1] ", and the variance of samples is 0.03348"
[1] "According to CLT, the theoretical mean is -0.00224"
[1] ", and the theoretical variance of samples is 0.03357"
[1] "----------------------------------------------------------"
Jarque Bera Test
data: meannorm
X-squared = 3.2785, df = 2, p-value = 0.1941
> cltnorm(30)
[1] "----------------------------------------------------------"
[1] "The mean of samples is -0.0048"
[1] ", and the variance of samples is 0.03026"
[1] "According to CLT, the theoretical mean is -0.00224"
[1] ", and the theoretical variance of samples is 0.03357"
[1] "----------------------------------------------------------"
Jarque Bera Test
data: meannorm
X-squared = 3.2825, df = 2, p-value = 0.1937
> cltnorm(30)
[1] "----------------------------------------------------------"
[1] "The mean of samples is 5e-05"
[1] ", and the variance of samples is 0.03214"
[1] "According to CLT, the theoretical mean is -0.00224"
[1] ", and the theoretical variance of samples is 0.03357"
[1] "----------------------------------------------------------"
Jarque Bera Test
data: meannorm
X-squared = 10.1117, df = 2, p-value = 0.006372
最後一次試驗錯誤的拒絕虛無假設...
我該怎麼解釋呢= ="
---
剛這樣寫:
testing<-function(n, m){
set.seed(1)
popnorm<-rnorm(n=100000,mean=0, sd=1)
error<-array(0,dim=c(m))
for (i in 1:m){
meannorm<-array(0,dim=c(1000))
for (i in 1:1000) {
meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE))
}
test<-jarque.bera.test(meannorm)
print(test)
p<-as.numeric(test$p.value)
print(p)
error[i]<-ifelse(p<0.05,1,0)
print(error[i])
}
return(list(matrix=error))
}
使用testing(30,50)測試,
可是傳回的矩陣很怪...
變成有1000個元素的一維陣列,
而且第五筆資料應該要存入1,卻反而是0。
※ 編輯: anovachen 來自: 111.255.24.52 (12/15 23:48)
→ andrew43:可再抽出沒錯,不過你有大母體了也沒什麼影響。 12/16 08:49
set.seed(1)
popnorm<-rnorm(n=100000,mean=0, sd=1)
error<-array(0,dim=c(30))
testing<-function(n){
meannorm<-array(0,dim=c(1000))
for (i in 1:1000) {
meannorm[i]=mean(sample(popnorm,size=n,replace=TRUE))
}
test<-jarque.bera.test(meannorm)
print(test)
p<-as.numeric(test$p.value)
print(p)
return(ifelse(p<0.05,1,0))
}
for (i in 1:30){
error[i]<-testing(30)
}
這樣寫就OK了。
不曉得上面寫法問題出在哪?? >"<
----
以下是某不願具名者提供的程式碼:
n <- 30
ret <- NULL
for(i in 1:100){
x <- rnorm(n)
x.hat <- mean(x) * sqrt(n) # CLT stat. goes to N(0, 1) in d.
ret[i] <- x.hat
}
ks.test(ret, "pnorm")
# Test 100 x.hat's against N(0, 1)
他指出我原本的寫法是類似bootstrap,在有限的樣本(N=100000)自助重抽樣,
以模擬從真正的母體裡抽樣的過程。
而他的寫法則比較貼近我想表達的。
依據他的寫法,我們可以如此改寫:
n <- 100
ret <- NULL
for(i in 1:100){
x <- runif(n,min=0,max=1)
x.hat <- mean(x)
ret[i] <- x.hat
}
hist(ret)
ks.test(ret, "pnorm", mean=1/2, sd = sqrt(1/12/n))
如此可驗證從U(0,1)中抽樣,n=100,則X-bar的分布
是否會漸近N(1/2, 1/1200)。
※ 編輯: anovachen 來自: 111.255.13.49 (12/19 00:47)
→ adgjlsfhk123:希望對你有幫助 12/22 03:36
→ anovachen:謝謝! 12/29 16:25