看板 Prob_Solve 關於我們 聯絡資訊
Working on A3C deep reinforcement learning. Since I am too lazy to modify the last layer of my NN to softmax, I use a softmax filter to let the linear layer directly target the softmax output. The algorithm works in my test cases for now. But it might go wrong when the reward is on a different scale. Can anyone help me to check if my implementation is correct? https://goo.gl/FV8sFu -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 114.35.245.133 ※ 文章網址: https://www.ptt.cc/bbs/Prob_Solve/M.1507429910.A.70E.html
longlyeagle: It turns out that the current test case will 11/05 22:07
longlyeagle: make correct result target to 1 after softmax 11/05 22:08
longlyeagle: and the wrong result to 0 11/05 22:08
longlyeagle: That's why the reward will work in its current 11/05 22:09
longlyeagle: scale 11/05 22:09