看板 Grad-ProbAsk 關於我們 聯絡資訊
問題如下: For a pipeline processor, there are 3 clock cycle of latency for multiplication operation, there are 2 clock cycle of latency for any other ALU operation and there is 1 clock cycle of latency for any Branch operation and Load/Store operation. Let AR0 be an auxiliary register and R0, R1, and R2 be data register. For the following C code, for(i=1;i<=256;i++){a+=f(i)*g(i);} Let the associate assembly code be as follows. Loop: LOAD R0, 0(AR0) ;R0=*(AR0) LOAD R1, 1024(AR0) ;R1=*(AR0+1024) MPY R0, R0, R1 ;R0=R0*R1 ADD R2, R2, R0 ;R2=R2+R0 SUB AR0, AR0, #1 ;AR0=AR0-1 JNZ AR0, Loop ;Jump to Loop if AR0=0 Initial condition of registers and data arrangement are set such that they are suitable for the execution of the corresponding C codes. (1)How stalls are inserted into the above program if no scheduling is performed? (2)Reschedule the above program such that the least number of clock cycles is requires for the job. (3)Find the number of clock cycles required based on your design in(2). ------------------------------------------------------------------------------------ (1) Loop: LOAD R0, 0(AR0) LOAD R1, 1024(AR0) stall stall stall MPY R0, R0, R1 stall stall stall ADD R2, R2, R0 SUB AR0, AR0, #1 stall stall stall JNZ AR0, Loop (2) Loop: LOAD R0, 0(AR0) LOAD R1, 1024(AR0) SUB AR0, AR0, #1 stall stall MPY R0, R0, R1 stall stall stall ADD R2, R2, R0 JNZ AR0, Loop (3) 15*256 clock cycles 以上是我自己寫的答案,因為跟書上給的答案有出入 所以想請各位高手們幫我看看我有沒有寫錯... 鋼溫!!! -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 223.143.226.251