看板 DataScience 關於我們 聯絡資訊
在 Billion-scale semi-supervised learning for image classification(Facebook A I Research) 當中有提到student model不將D與D-hat合起來train的原因: Remark: It is possible to use a mixture of data in D and Dˆ for training like in previous approaches [34]. However, this requires for searching for optimal mixing parameters, which depend on other parameters. This is resource-intensi ve in the case of our large-scale training. Additionally, as shown later in ou r analysis, taking full advantage of large-scale un- labelled data requires ad opting long pre-training schedules, which adds some complexity when mixing is involved. 不太確定第一個原因searching for mixing parameters指的是? 及第二個原因 D+D-hat不是在training student model前就準備好了嗎? 為何會增加complexity 謝謝大家 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.115.59.247 (臺灣) ※ 文章網址: https://www.ptt.cc/bbs/DataScience/M.1611575265.A.D9C.html
yiefaung: 應該是指D跟D hat sample的比例吧 他ref的那篇是固定6:4 01/26 20:21
yiefaung: 他懶得調參 01/26 20:21
yiefaung: 第二個就是全部用下去要train很久 所以乾脆不用 01/26 20:24
wang19980531: 好的 謝謝~ 01/26 21:20