精華區beta NTU-Exam 關於我們 聯絡資訊
課程名稱︰資訊檢索 課程性質︰系上必修 課程教師︰唐牧群 開課學院:文學院 開課系所︰圖資系 考試日期(年月日)︰97/12/02 考試時限(分鐘):3小時 是否需發放獎勵金:是 (如未明確表示,則不予發放) 試題 : 1. Here is an imaginary database that contains the following 5 document: D1: "a dog barks at a cat and it fell from a tree" D2: "a dog watches ants on the bark of a tree" D3: "a dog watches another dog watches a cat" D4: "a dog barks at a cat watches another cat" D5: "the bark fell from the tree as a cat watches" (Terms in the stop word list have been marked with grey). Please 1.Calculate document frequncy (DF) and IDF weight for each index term (simply use N/n without logarithm). 2.Create an inverted file for the database where each cell contains the TF*IDF weight of each term in the documents. 3.Give the ranking after the user submits the query "cat watch dog bark ant" 4.After the first iteration, the user marks D1, D3, D4 as relevant, and D2 and D5 as non-relevant, what would be the new ranking using Rocchio's method where α=1.0 β=1.0 γ=1.0 Answer 4 out of the following 5 questions. 2.Unlike data retrieval where perfect precision and recall are guaranteed, information retrieval is more of a probabilistic process where information conveyed in the retrieved documents might or might not answer user's information needs. What are the possible causes behind the uncertainty of IR? 3.Define the following concepts and explain how they are related to one another:"specificity", "precision" and"IDF(Inverse Document Frequency); and "exhaustivity", "recall" and"TF(Term Frequency). 4.Explain three basic models in information retrieve: Boolean, Vector space and Probabilistic. 5.Explain the rationales behind eliciting user's relevance feedback and how it can improve search results. What are the two mechanisms with which relevant terms can be identified an extracted (hint: IQE and AQE)? 6.How does interactive view of IP different from tje traditional view of IR? How does it propose to improve retrieval performance (i.e. what's tje most crucial component of IR process and how it can be improved)? Can you think of a few (1 or 2) techniques or approaches we have gone thtough in the class that aim at improving this particular component of IR? -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.112.245.126