作者wang19980531 (中立評論員)
看板DataScience
標題[討論] Teacher Student Model Semi-supervised
時間Mon Jan 25 19:47:43 2021
在 Billion-scale semi-supervised learning for image classification(Facebook A
I Research)
當中有提到student model不將D與D-hat合起來train的原因:
Remark: It is possible to use a mixture of data in D and Dˆ for training like
in previous approaches [34]. However, this requires for searching for optimal
mixing parameters, which depend on other parameters. This is resource-intensi
ve in the case of our large-scale training. Additionally, as shown later in ou
r analysis, taking full advantage of large-scale un- labelled data requires ad
opting long pre-training schedules, which adds some complexity when mixing is
involved.
不太確定第一個原因searching for mixing parameters指的是?
及第二個原因 D+D-hat不是在training student model前就準備好了嗎?
為何會增加complexity
謝謝大家
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.115.59.247 (臺灣)
※ 文章網址: https://www.ptt.cc/bbs/DataScience/M.1611575265.A.D9C.html
推 yiefaung: 應該是指D跟D hat sample的比例吧 他ref的那篇是固定6:4 01/26 20:21
→ yiefaung: 他懶得調參 01/26 20:21
→ yiefaung: 第二個就是全部用下去要train很久 所以乾脆不用 01/26 20:24
→ wang19980531: 好的 謝謝~ 01/26 21:20