看板 NTU-Exam 關於我們 聯絡資訊
課程名稱︰資訊檢索與擷取 課程性質︰資工系選修 課程教師︰陳信希 開課學院:電機資訊學院 開課系所︰資訊工程學系 考試日期(年月日)︰2021/11/11 考試時限(分鐘):180 試題 : 1. Term frequency and inverse document frequency are commonly used to measure the importance of a term in a document and a query. We aim to select terms with discriminative power within a document and between documents to repre- sent a document. How term frequency and inverse document frequency achieve the goal? (10 points) 2. A long document is usually composed of passages describing several topics. On the one hand, it is relatively easier to retrieve long documents than short documents with keyword-based approach. On the other hand, the repre- sentation of long documents tends to be vague when average word (term) em- bedding approach is used for aggregation. Do you have any ideas to deal with these issues in keyword-based approach and term embedding-based approach? (10 points) 3. In language modeling, each individual document can be considered as a docu- ment model for retrieval. Besides, a document collection can be also used to learn a collection model for smoothing in retrieval. Please describe the idea of integrating document model and collection model for IR. (10 points) 4. To model term-term relationship is important in information retrieval. Va- rious methods from conventional counting-based approach to current predic- tion-based approach have been proposed. Please show one method from each ap- proach to compute inter-term relationship. (10 points) 5. (a) What are the typical similarities and topical similarities? (5 points) (b) Term representations learned from models based on different size of con- texts (e.g., document, short window size, or short context) may capture different similarities (typical similarities or topical similarities). Please explain this statement. (5 points) (c) Exact matching and embedding space based matching have different effects on retrieval. Please discuss this point. (5 points) 6. An IR model is a quadruple $[D, Q, F, R(q_i, d_j)]$ where $D$ is a set of logical views for the documents in the collection, $Q$ is a set of logical views for the user queries, $F$ is a framework for modeling documents and queries, and $R(q_i, d_j)$ is a ranking function. Please specify the framework $F$ and the ranking function $R$ for each of the following models. (15 points) (a) BM25 Model (b) Translation Model (c) Term Embedding Model 7. Query expansion aims to introduce new query terms to the original query. Please specify how query expansion is introduced to each of the following models. (15 points) (a) Vector Space Model (b) Language Model (c) Term Embedding Model 8. In SIGIR 2016, two tutorial speakers classify "Question Answering from Docu- ments" into an "easy" problem in IR. In contrast, they regard "Question Ans- wering from Knowledge Base" as a "hard" problem in IR. Do you agree such a classification? Please show your thoughts. (10 points) 9. Neural information retrieval systems typically use chaining pipeline. Are there any practical considerations? Please suggest a cascade pipeline to ex- plain your idea. (10 points) 10. We often encounter mis-conception, mis-translation, and mis-formulation pro- blems to transform an information need to a query in ad hoc retrieval. You have learned fundamentals of information retrieval during the first half of semester. Please show the lessons to deal with these problems. (10 points) -- 第01話 似乎在課堂上聽過的樣子 第02話 那真是太令人絕望了 第03話 已經沒什麼好期望了 第04話 被當、21都是存在的 第05話 怎麼可能會all pass 第06話 這考卷絕對有問題啊 第07話 你能面對真正的分數嗎 第08話 我,真是個笨蛋 第09話 這樣成績,教授絕不會讓我過的 第10話 再也不依靠考古題 第11話 最後留下的補考 第12話 我最愛的學分 -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 111.249.65.236 (臺灣) ※ 文章網址: https://www.ptt.cc/bbs/NTU-Exam/M.1767058471.A.939.html
rod24574575 : 收錄資訊系精華區! 12/30 22:55