精華區beta NTU-Exam 關於我們 聯絡資訊
課程名稱︰資訊檢索 課程性質︰必修 課程教師︰唐牧群 開課學院:文學院 開課系所︰圖資系 考試日期(年月日)︰2013/12/25 考試時限(分鐘):180分鐘 是否需發放獎勵金:是 (如未明確表示,則不予發放) 試題 : 1. Here is an imaginary database that contains the following 5 document: D1: "a dog barks at a cat and it fell" D2: "a dog watches ants on the bark" D3: "a dog watches another dog barks a cat" D4: "a dog barks at a cat watches an ant" D5: "an ant fell from the bark" (Terms in the stop-word list have been grayed out). Please 1a. Create an inverted file for the database where each cell contains the TF*IDF weight of each term in the documents. (treat singulars and plurals as the same word stem). When calculating IDF, simply use N/n without logarithm. (10 points) 1b(空白) 1c. Calculate relevance scores(use inner-product without document length normalization) and rank the documents accordingly after the user submits the query “dog bark cat”. (10 points) 1d. After examining the results, the user marks D3, D5 as relevant and no non-relevant document. Produce the new ranking using Rocchio's method where α =1.0 β=1.0 γ=1.0(5 points) 1e. With the same query and relevant information, calculate the new query term weight for "dog", "bark", "cat" according to Robertson and Spark Jones term weighting method (hint: first you need to decide the value for "N", "R", "n", "r").(5 points) 2. Unlike data retrieval where perfect precision and recall are guaranteed, information retrieval is more of a probabilistic process where information conveyed in the retrieved documents might or might not answer user's information needs. What are the possible causes behind the uncertainty of IR (10 points)? 3. Define the following concepts and explain how they are related to one another: "specificity", "precision" and "IDF (Inverse document Frequency); "exhaustivity", "recall" and "TF(Term Frequency)". There is often a trade-off between precision and recall, is there also a trade-off between specificity and exhaustivity?(25 points)? 4. Explain the three basic models in information retrieve: Boolean, Vector space and Probabilistic (25 points). 5. Explain the rationale behind PageRank and the meaning of each component of the formula below (10 points). PR(A)=(1 - d) + d ΣPR(Ii) / C(Ii) -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.112.25.107
Yokuu :圖資系已收 01/02 11:22