Re: [問題] Google Interview Question (4)

作者stimim (qqaa)

看板Prob_Solve

標題Re: [問題] Google Interview Question (4)

時間Fri Mar 8 16:51:34 2013

剛剛和 Leon 討論了一下，發現我們對題目的了解有一點不同原題目： Given a document and a query of K words, how do u find the smallest window that covers all the words at least once in that document? (given you know the inverted lists of all K words, that is, for each word, you have a list of all its occurrrences). This one is really hard. Could someone propose an algorithm in O(n)? 我們可以確定： 1. document is given 2. 知道我們關心的是哪 K 個字 3. 這 K 個字的 occurrence list 也是給定的問題是，n 指的是什麼？ 1. n = document size (in number of words) 在這種情況下， O(n) 是可能的，方法基本上就是 Leon 提出來的。補充一個 O(document size) 的做法 // all indices starts from 1 Given occ (list of list of occurrences) array = [-1, ..., -1] (size = document size) i = 0 for list in occ i = i + 1 for index in list array[index] = i count = [0, 0, ..., 0] (size = K) nKinds = 0 start = 1 end = 1 minWindowSize = INF while end <= document size while end <= document size and nKinds < K word = array[end] end = end + 1 if word == -1 // we are not interested continue count[word] = count[word] + 1 if count[word] == 1 nKinds = nKinds + 1 while nKinds == K word = array[start] if word == -1 // we are not interested start = start + 1 continue count[word] = count[word] - 1 if count[word] == 0 // check this window nKinds = nKinds - 1 if end - start < minWindowSize minWindowSize = end - start start = start + 1 return minWindowSizejk 2. n = 這 K 個字出現的總次數 Ex: K = 3, 要找出包含 {a, b, c} 的最小 window occurrences: a = { 1, 1000, 2500} b = {400, 1001, 2000} c = {500, 1002, 1500} document size = 2500 甚至更高 (其他的位置是其他的字) n = 3 + 3 + 3 = 9 而不是 2500 在這種情況下，已知可以做到 O(n logK) (RockLee 寫的方法) 不過，在這種狀況下，給定原本的 document 似乎就沒什麼用了而且在讀入 input 時，讀入原本的 document 會花掉 O(document size) 的時間所以在和 Leon 討論時，他對這種解釋方法有一點懷疑 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.112.175.115 ※ 編輯: stimim 來自: 140.112.175.115 (03/08 17:09)

推 scwg:我 (ledia 應該也是) 自動採用 2. 的解釋, 因為 inverse 03/09 03:21

→ scwg:list 可以 offline 建好, 之後這個演算法可以重複多次 03/09 03:22