[試題] 97下 楊佳玲 計算機結構 期中考

作者robertshih (施抄)

看板NTU-Exam

標題[試題] 97下楊佳玲計算機結構期中考

時間Sat Apr 18 17:00:32 2009

課程名稱︰計算機結構課程性質︰必修課程教師︰楊佳玲開課學院：電資學院開課系所︰資工系考試日期（年月日）︰Apr 14 2009 考試時限（分鐘）：10:20 ~ 13:10 是否需發放獎勵金：是（如未明確表示，則不予發放）試題 : 1.(15 pts) Two important parameters control the performance of a processor: cycle time and cycle per instruction. There is an enduring trade-off between these two parameters in the design process of microprocessors. While some designers prefer to increase the processor frequency at the expense of large CPI, other designers follow a different school of thought in which reducing the CPI comes at the expense of lower processor frequency. Consider the following machines, and compare their performance using the instruction mix in Table 1. M1: The multi-cycle implementation shown in Figure 2 with a 1GHz clock M2: A multi-cycle implementation similar to Figure 2, except that register updates are done in the same clock cycle as a memory read or ALU operation. This machine has a 3.2GHz clock, since the register update increases the length of critical path. M3: A machine like M2 except the effective address calculation is done in the same clock cycle as a memory access. This machine has a 2.8GHz clock because of the long cycle created by combining address calculation and memory access. Find out which machine is fastest. ┌──────┬─────┐ │ Class │ Frequency│ ├──────┼─────┤ │ load │ 25% │ ├──────┼─────┤ │ stores │ 13% │ ├──────┼─────┤ │ R-type │ 47% │ ├──────┼─────┤ │ Branch/jump│ 15% │ └──────┴─────┘ Table 1 2.(5 pts) The designer at Intel claims that multimedia code sequences will see a 4.5 times (4.5X) speed up by using the MMX extensions. What is the fraction of the execution time that must be multimedia code in order to achieve an overall speedup of 2.8X? 3.(10 pts) When designeing a computer, it is important to consider its applications (or programs which run on it). The textbook divides computing applications into the following three major classes: (1) Desktop, (2) Server, and (3) Embedded. (a) (5 pts) Which of these three classes consumes the most processors sold today? (b) (5 pts) What would you emphasize if you are to design a processor for desktop PC's? (Name 3 items) 4.(10 pts) How does the register file size affect CPU performance? 5.(10 pts) What's the advantage of DLL (Dynamically Linked Libraries) over statically linked library? 6.(10 pts) In the following C code segment, i and j are integer variables. if(i == j){ i += j; if(i == 0) goto Far; } j += 1; If the variables i and j correspond to the registers $t0 and $t1, respectively, the compiled MIPS code would be: bne $t0, $t1, Next add $t0, $t0, $t1 beq $t0, $zero, Far Next: addi $t1, $t1, 1 Assume we place the code segment starting at location 256 in memory. (a)(5 pts) What is the value of the least significant 16-bit field for the bne instruction? (b)(5 pts) If an assembler has a problem in directly generating the above MIPS code segment, what is the problem? Show how the assembler might rewrite the above code sequence to solve the problem. 31 26 21 16 0 ┌───┬───┬───┬────────┐ │ op │ rs │ rt │ immediate │ └───┴───┴───┴────────┘ 6 bits 5 bits 5 bits 16 bits (bne instruction format) 7.(10 pts) Consider the C code segment: while (arr[i] == k) i += 1; Assume i and k correspond to register $s3 and $s5, respectively, and the base address of the array arr is saved in $s6. A possible assemly code for the above C code segment is: Loop: sll $t1, $s3, 2 add $t1, $t1, $s6 lw $t0, 0($t1) bne $t0, $s5, Exit addi $s3,$s3, 1 j Loop Exit: The above code segment uses both a conditional branch and an unconditional jump each time through the loop. Only poor compilers would produce such a code with this loop overhead. Rewrite the MIPS code so that it uses at most one branch or jump each time through the loop. Note that the new code segment should be more efficient than the original one. 8.(15 pts) Consider the following idea: Let's modify the instruction set architecture and remove the ability to specify an offset for memory access instructions. Specifically, all the load-store instructions with nonzero offsets would become pseudo-instructions and would be implemented using two instructions. For example, addi $at, $t1, 104 # add the offset to a temporary lw $t0, $at # new way of doing lw $t0, 104($t1) What changes would you make to the single-cycle datapath and control (Figure 1) if this simplified architecture were to be used? 9. (15 pts) We wish to add a new instruction jm (jump memory) to the multicycle datapath as shown in Figure 2. Its instruction format is similar to that of load word except that the rt field is not used because the data loaded from memory is put in the PC instead of the target register. (a)(7 pts) Add any necessary datapaths and control signals to Figure 2. (b)(8 pts) Revise the FSM shown in Figure 3 to include the new jm instruction. Figure 1: Single-Cycle Implementation. http://tinyurl.com/clr863 (與此類似) Figure 2: MultiCycle implementations. http://tinyurl.com/d2cvc7 Figure 3: MultiCycle finite state machine. http://tinyurl.com/cqyovf -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.112.30.91