[試題] 99下 蔡欣穆 資料結構與演算法下 期中考+解答

作者rod24574575 (天然呆)

看板NTU-Exam

標題[試題] 99下蔡欣穆資料結構與演算法下期中考+解答

時間Sat Dec 27 12:12:13 2014

課程名稱︰資料結構與演算法下課程性質︰必修課程教師︰蔡欣穆開課學院：電資學院開課系所︰資工系考試日期（年月日）︰2011.4.22 考試時限（分鐘）：180 試題 : Data Structure and Algorithm II, Spring 2011 Midterm Examination 140 points Time: 9:10am-12:10pm (180 minutes), Friday, April 22, 2011 Problem 1. In each of the following question, please specify if the statement is true or false. If the statement is true, explain why it is true. If it is false, explain what the correct answer is and why. (20 points. For each question, 1 point for the true/false answer and 3 points for the explanations.) 1. n·log(n) is polynomially larger than n. 1 + 10^(-100) 2. n is polynomially larger than n·log(n). 3. As long as a problem exhitbits optimal substructure, it can be solved by using a greedy algorithm. 4. The solution of T(n) = 4T(n/2) + n^2 is T(n) = Θ(n·log(n)). 5. As long as a recurrence is of the form T(n) = a·T(n/b) + f(n), it can be solved by using the master method. Sol: 1. False. n·log n is not larger than n by a factor n^ε, for any ε > 0. 10^(-100) 2. True. Let f(n) = n and g(n) = log(n). We can see from log(n) = O(n^ε), for any ε > 0, that f(n) is polynomially larger than g(n). 3. False. Take the 0-1 knapsack problem as a counterexample. 4. False. By master theorem, the solution is Θ((n^2)·log(n)). 5. False. The recurrence of the form T(n) = 2·T(n/2) + n·log(n) can't be solved the master theorem, since n·log(n) is not polynomially larger than n. (However, this can be solved by using the extended version ofthe master theorem.) Problem 2. "Short answer" questions: (25 points) 1. What is the main job of a Quality Assurance (QA) engineer in a software company? (4 points) 2. Give a formal definition of "uniform random permutation". (4 points) 3. Derive an optimal Huffman code for the first n Fibonacci numbers, i.e., {c1: 1, c2: 1, c3: 2, c4: 3, c5: 5, c6: 8, c7: 13, c8: 21, ... cn: F(n)}. In other words, show the general form of the codeword for the i-th character ci (with F(i) as its frequency). (5 points) 4. Give two reasons for implementing the paper prototype instead of the "real" prototype in a software project. (4 points) 5. How do we roughly estimate the cost of a software product? (4 points) 6. Give two reasons for writing down the specifications of a software project before we start writing codes. (4 points) Sol: 1. Quality Assurance (QA) engineers perform tests on the software function to guarantee the stability of the product. They supervise and report the software utility to see if they reflects the quality requirements from the specification. 2. A uniform random permutation is the one in which each of the n! possible permutations is equally likely to occur. 3. The Huffman tree for the first n Fibonacci numbers is a full binary tree of height n with exactly one external node in all levels, except for the first level which contains only the root and the last level which contains two external nodes, since nodes are being merged in the increasing order of their frequency. We claim this by observing that Σ(i = 1 to n) F_i = F_(n+2) - 1, for all n ≧ 1, which can be verified by induction. Based on this formulation, the internal node with frequency Σ(i = 1 to k) Fi (which is smaller than F_(k+2)) and the external node representing c_k with frequency F_(k+1) are the two nodes with the least frequency at the k-th merging step. Therefore, the Huffman code for c_k is 1 ... 10, in which there are (n - k) 1s precedes 0, for k = 2 to n, and the code for c_1 is a bit string of length (n - 1) containing no 0 in it. 4. The main idea of using paper prototype is to keep down the cost of time spent on implementing a real prototype, so that the total producing time can be well managed. The productivity gains can be significant once the time is less consumed. 5. The cost of producing a software product consists mostly of human resources, we can roughly estimate on the number of members involved in and the time required to finish the project. The cost of marketing can be estimated based on the scale of market and channels of distribution. 6. Writing down the specifications would save lots of time to design. Communications between divisions would be more efficient, while the product content is not only recognized by software engineers, but also people from other departments. Problem 3. Given a list of n distinct numbers (not sorted), please derive a divide-and-conquer algorithm to return the first k smallest numbers in the list. Your algorithm should have a running time of O(n). Note that you cannot use the number selection algorithm taught in the class for k times, as the running time will be Θ(kn) = O(n^2). Sorting does not work either as it will take O(n·log(n)). In addition to the algorithm, please write down the recurrence representing the running time, solve the recurrence, and prove your solution by induction. (15 points, 7 points for the algorithm, 3 points for the recurrence, and 5 points for the proof) Sol: ──────────────────────────────────── Algorithm 1 QuicksortFindFirstK(list, left, right, k) ──────────────────────────────────── 1: if right > left then 2: select median pivot between left and right using median of median 3: newPivot ← Partition(list, left, right, pivot) 4: if newPivot > k then 5: QuicksortFindFirstK(list, left, newPivot - 1, k) 6: else if newPivot < k then 7: QuicksortFindFirstK(list, newPivot + 1, right, k) ──────────────────────────────────── The algorithm 1 proposes a method to find the first k smallest integers using divide and conquer method. At first, we divide the list into 5 groups and find median of median. This part costs T(n/5). The second part is the partition function of quicksort. It runs in O(n). Lastly, we use newPivot to call the function recursively. The running time of this part is the max length between left to newPivot - 1 and newPivot + 1 to right, max(T(3n/10), T(7n/10)). Thus, the running time of the last part is T(7n/10). The equation 1 represents for the total running time of the algorithm 1. The similar proof could be found from the course materials. Therefore, the total running time is T(n) = θ(n). T(n) = T(n/5) + θ(n) + T(7n/10) Problem 4. Use the recursion-tree method to derive an asymptotic tight upper bound for T(n) = T(n/2) + T(n/4) + T(n/8) + n. You can assume that T(n) is a constant when n is sufficiently small. Prove that your bound is correct by induction. (10 points, 5 points for the recursion-tree and 5 points for the proof) Sol: By observing that expanding the i-th level in the recursion tree costs ((7/8)^i)n, we claim that the cost of expanding all nodes in the recursion tree is no more than n + (7/8)n + ((7/8)^2)n + ... = Θ(n). We assume in our induction hypothesis that T(n) ≦ cn, for some positive constant c. Then, T(n) = T(n/2) + T(n/4) + T(n/8) + n ≦ c(n/2 + n/4 + n/8) + n = ((7/8)c + 1)n ≦ cn, as long as we pick c ≧ 8. Since it is obvious that n is the minimum requirement for T(n), the tight upper bound for T(n) is O(n). Problem 5. For bit strings X = x_1 ... x_m, Y = y_1 ... y_n and Z = z_1 ... z_(m+n), we say that Z is an interleaving of X and Y if it can be obtained by interleaving the bits in X and Y in a way that maintains the left-to-right order of the bits in X and Y . For example if X = 101 and Y = 01 then x_1 x_2 y_1 x_3 y_2 = 10011 is an interleaving of X and Y , whereas 11010 is not. (15 points) 1. Please come up with the definition of the subproblem. Use your definition and prove that this problem has optimal substructure. (5 points) 2. Give the most efficient algorithm you can to determine if Z is an interleaving of X and Y. (7 points) 3. Analyze the time complexity of your algorithm as a function m = |X| and n = |Y|. (3 points) Sol: Consider x = x_1 ... x_m, y = y_1 ... y_n, and z = z_1 ... z_(m+n). We claim that if z is the interleaving of x and y, it contains a subproblem z' = z_1 ... z_(m+n-1) and the last bit z_(m+n) must be x_m or y_n. We will have the following substructure and build an two-dimension array D[m,n] to check whether z is the interleaving of x and y. Since all cases D[i,j] comes from one of the answers of the subproblems, this problem exists the optimal substructure property. At the end of this problem, we present a T(n) = O(mn) dynamic programming algorithm 2 to solve this problem. ╭ true, if i ＝ j ＝ 0 │ false, if x_i ≠ z_(i+j) and y_j ≠ z_(i+j) D[i, j] = ┤ D[i-1, j], if x_i ＝ z_(i+j) and y_j ≠ z_(i+j) │ D[i, j-1], if x_i ≠ z_(i+j) and y_j ＝ z_(i+j) ╰ D[i-1, j] or D[i, j-1], if x_i ＝ z_(i+j) and y_j ＝ z_(i+j) Problem 6. Consider the problem of making change for n dollars using the fewest number of coins. Assume that each coin's value is an integer. The same coin can be used for any number of times in the change. (25 points) 1. Prove that the problem exhibits optimal substructure. (5 points) 2. Assume the following set of coins is available to you: 1, 5, 10, 50. Prove that under this condition, the problem has the greedy property. (5 points) 3. Describe a greedy algorithm to solve the problem with the coin set in 2. (5 points) 4. Give a set of coins for which the greedy algorithm in 3 does not yield an optimal solution. Your set should include a one-dollar coin so that there is a solution to every value of n. Explain the intuition behind your choice of coins in the set. (3 points) 5. Describe a dynamic programming algorithm which solve the problem with any coin set with k different coins, assuming that one of them is a one-dollar coin. (7 points) Sol: 1. The coin changing problem exhibits optimal substructure in the following explanation. Consider we have a coin set C = {c_1, ..., c_k}. Suppose we want to make change for n dollars, we divide n dollars into left-part (n - m) dollars and right-part m dollars. Therefore, the solution to (n - m) dollars is s_L = {c_1, c_1, c_2} and the solution to m dollars is s_R = {c_1, c_2, c_4}. We claim that the left-part of solution must be the optimal way using coin set C and the right-part of the solution must be the optimal way using coin set C so that the optimal solution to n dollars is s = s_L + s_R. Proof. By contradiction, suppose there exists a solution s'_R better than right-part solution s_R. The right-part solution could be replaced by the better solution s'_R, yielding a solution s' = s_L + s'_R which is better than s = s_L + s_R. This contracts to our previous assumption. Therefore, the optimal solution to coin changing problem is composed of smaller subproblems of optimal solutions. 2. It it clear that there are at most 4, 1, 4 coins with value 1, 5, 10, respectively, in an optimal solution. Observing that 4*1 + 1*5 + 4*10 = 49 < 50, there's no way to not choose 50 to have an optimal solution while n > 50. With the same reason, there's no way to not choose 10 to have an optimal solution while 50 > n > 10, and so on. Therefore, the optimality is guaranteed by the greedy approach, which selects the largest available coin at each step. 3. ───────────────────────── Algorithm 3 minNum = GreedyCoinChange(n) ───────────────────────── 1: numOfCoins ← 0 2: for c from c_k to c_1 do 3: if c > n then 4: numOfCoins ← numOfCoins + floor(n/c) 5: n ← n mod c 6: return numOfCoins ───────────────────────── Suppose the coin set has the greedy property, the algorithm 3 is developed to solve this problem. We assume that for all c ∈ C = {c_1, ... , c_k} if i < j then c_i < c_j. The main concept of this algorithm is always select the biggest possible coin in order to make the least number of the coins. 4. Consider the list {1, 6, 7}. If n = 12, greedy algorithm give the solution {7, 1, 1, 1, 1, 1}. However, the optimal solution for n = 12 is {6, 6}, which is much less than the previous solution. This is because this coin list does not contains greedy property in the coin changing problem. 5. ────────────────────────── Algorithm 4 DPCoinChange(n) ────────────────────────── 1: D[0] ← 0 2: S[0] ← 0 3: for p from 1 to n do 4: for c from c_1 to c_k do 5: if c ≧ p and 1 + D[p - c] ＜ D[p] then 6: D[p] ← 1 + D[p - c] 7: S[p] ← c ────────────────────────── Here we present a DP algorithm 4 to solve the coin changing problem. D[1..n] array stores the optimal solution using coin set C. The S[1..n] array is the solution to the combination of the coins. For certain D[p], it stores the optimal solution to number of coins for p dollars using coin set C = {c_1, ... , c_k}. The following equation give the concept of the our algorithm. ╭ 0, if p ＝ 0 D[p] = ┤ ╰ min {1 + D[p - c_i]}, if p ＞ 0 i:c_i≦p Problem 7. This problem examines three algorithms for searching for a value x in an unsorted array A consisting of n elements (x appears in A for k times, k ≧ 0): (20 points) ˙ Algorithm α: We pick a random index i into A. If A[i] = x, then we terminate; otherwise, we continue the search by picking a new random index into A. We continue picking random indices into A until we find an index j such that A[j] = x or until we have checked every element of A. Note that we pick from the whole set of indices each time, so that we may examine a given element more than once. ˙ Algorithm β: The algorithm searches A for x in order, considering A[1], A[2], ..., A[n] until either it finds A[i] = x or it reaches the end of the array. Assume that all possible permutations of the input array are equally likely. ˙ Algorithm γ: We uniformly and randomly permute the input array A and then run Algorithm β. 1. For each algorithm, derive the expected running time when (a) k ＝ 0, (b) k ≧ 1. (18 points, 3 points for each answer) 2. Which algorithm would you use? Please explain your answer. (2 points) Sol: 1. ˙ Algorithm α: (a) Let T be the time to collect all n elements in A, and t_i be the time to collect the i-th element after (i - 1) elements have been collected. Since the probability of picking a new element given (i - 1) elements already being selected is p_i = (n-i+1)/n, the expected value of the geometrically distributed random variable t_i is E(t_i) = 1/p_i = n/(n-i+1). Therefore, E(T) = Σ(i = 1 to n) E(t_i) = n·Σ(i = 1 to n) 1/(n-i+1) = n·H_n, where H_n is the n-th harmonic number. Thus the expected running time is approximately n·ln(n). (b) Let x = (n-k)/n. The expected running time for k ≧ 1 is ╭ k ╮ ╭ n-k ╮╭ k ╮ ╭ n-k ╮2╭ k ╮ 1│──│ + 2│───││──│ + 3│───│ │──│ + ... ╰ n ╯ ╰ n ╯╰ n ╯ ╰ n ╯ ╰ n ╯ k ∞ ╭ n-k ╮i k ╭ ∞ i ∞ i╮ =── Σ (i + 1)│───│ =──│ Σ (i)(x) + Σ (x) │ n i=0 ╰ n ╯ n ╰i=0 i=0 ╯ k ╭ x 1 ╮ n =──│─────+ ──│= ── n ╰ (1-x)^2 1-x ╯ k ˙ Algorithm β: (a) The expected running time is n when k = 0. (b) Since the search will end until the first x is read, the expected time to read x is 1. For every y in A, y ≠ x, the probability of y has been read before the first x appears is 1/(k+1), and there are (n - k) elements in A which is not equal to k, thus the expected time of reading these non-x elements is (n-k)/(k+1). Therefore, the expected running time is 1 + (n-k)/(k+1) = (n+1)/(k+1). ˙ Algorithm γ: (a) The expected running time is n + n = 2n. (b) The expected running time is n + (n+1)/(k+1). 2. If we do not have any information about the input data, Algorithm β seems better in overall evaluation, while the expected cost of both the cases are asymptotically linear to n and will never exceed n. Problem 8. This semester we made lots of changes in the course compared to last semester (homeworks, programming assignments, course content), and we are curious about how you feel about the current form of the course. If you have the power of changing 3 things in the course, what would these 3 things be and how would you change them? Your feedbacks are very important to the teaching team; we thank you for your valuable opinion. :) (10 points) Sol: Skipped. -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 114.24.74.245 ※ 文章網址: http://www.ptt.cc/bbs/NTU-Exam/M.1419653536.A.886.html