Re: [討論] clustersample

作者punchdrunk (小學生都比我有錢)

看板NCCU08_SOCIO

標題Re: [討論] clustersample

時間Tue Oct 14 21:07:15 2008

※ 引述《thorpan3 (小白)》之銘言： : http://support.sas.com/kb/24/555.html : 同一戶 : 全抽 : If a listing of the entire target population is available and you want to : carry out a cluster sample, then here is how it can be done using : PROC SURVEYSELECT. The steps are to identify the individual clusters, select : a random sample of clusters, and then collect all the original observations : from each sampled cluster. proc sort data=all; by family; 先把所有資料(all=220960個人)按family這個變項排一下然後按照網頁上寫的一步一步去做 1.先定義clusters: proc freq data=all noprint; tables family / out=familylist(drop=count percent); run; (關鍵似乎在於tables的用法) 2.抽1500組clusters: proc surveyselect data=familylist out=familysample method=srs n=1500 noprint; run; 3.將原始的觀察值依照cluster併到新的資料庫(qqq)中: data qqq; merge familysample(in=sample) all(in=all); by family; if sample and all; run; (特別的地方在於in的使用，且那個sample跟all是可以自訂的) 這樣應該就可以進行cluster抽樣了吧?! 雖然有很多地方不求甚解，也不知道是不是有甚麼錯誤如果有高手能夠說明一下in跟tables的作用，那就太好了!!! -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 123.193.145.57 ※ 編輯: punchdrunk 來自: 123.193.145.57 (10/14 21:09)

→ punchdrunk:tables好像是把重複的刪掉，分層抽樣那題好像也可以 10/14 21:19

→ punchdrunk:用這種方式做 10/14 21:20

→ diwawa:tables這裡的用法我不太了解，可是in 的意思好像是作一種 10/14 22:10

→ diwawa:條件宣告，而下一行的BY就是依此條件之變項必須暫時轉化為 10/14 22:12

→ diwawa:sample & all，而兩者之family相同時，就會予以保留~~ 10/14 22:13

→ punchdrunk:原來如此，感謝你!!! 10/15 10:06

推 kenshin528:tables不是這個意思 10/15 17:15