看板 NTU-Exam 關於我們 聯絡資訊
課程名稱︰自然語言處理 課程性質︰資工所選修 課程教師︰陳信希 開課學院:電資學院 開課系所︰資工所 考試日期(年月日)︰4/10 考試時限(分鐘):180min 是否需發放獎勵金:是 (如未明確表示,則不予發放) 試題 : 1.Opinion mining and sentiment analysis is a very important NLP application nowadays. A review is usually composed of some aspects about an opinion target and the opinion words expressing polarity about the aspects. The following review aboit Howard Civil Service International House (福華文教 會館) is selected from the tripadvisor. Please indicate what explicit aspects and opinion words are shown in this review. (10 points) 「我們的房間非常棒。飯店員工很不錯,而且總是會有會說英文的人 可以服務我們。他們提供非常好的用餐建議,並確認是否有優良計程 車司機可以為我們服務。地點很適合商務旅行,只要走一點路就可到 達餐廳、銀行和服務業。飯店自助餐還不錯,咖啡館也是。總之,這 是一個不錯的住宿經驗。」 2.Machine translation (MT) is another important NLP application. It aims to translate a document in one language into a document in another language. There are many challenging issues in designing MT systems. The following shows an English sentence and three Chinese sentences translated by using Google translate in 2008, 2012 and 2014, respectively. Please translate this Englisg sentence into a Chinese one and analyze why MT is challenging from this example. (10 points) Source: Taiwan wins gold in woman's 75 kg powerlifting in Paralympics 2008 : 台灣勝金在婦女的75公斤 powerlifting 在殘奧會 2012 : 台灣勝在殘奧會舉重女子75公斤黃金 2014 : 台灣勝金在女子75公斤級舉重殘奧會 3.Basically, an NLP system is a pipeline of four modules which deal with different problems on different linguistic levels. Please explain the functions of each module. (12 points) 4.A blog post may be composed of sentences with emoticons. The non-verbal emotional expressions described the author's feelings with s/he wrote down the post. The following shows some typical examples. Given a collection of sentences, each of them containing an emoticon, we plan to learn an emotion dictionary with mutual information. The dictionary keeps the emotion tendency of each word. Please define mutual information (MI) at first, and then discuss how you achieve the goal with MI. (10 points) ●今天跟你約吃飯 不知為什麼特別緊張 :o ●謝謝你請我吃飯 還送我禮物:目 ●但收到的時候還是很開心:P (以上表情符號皆為圖片,僅以相似之符號表達) 5.The t-test is a useful hypothesis testing tool. It can be used to learn multi-word expressions from a large corpus. Moreover, it can also be used to tell out if the performance of two models differ significantly. Please specify the TWO applications of t-test in detail. (10 points) 6.A person found an old book inside a wall when restructing a historical building. He claimed that the book was written in the 16th century. Assume you have several book corpora written in the 15th, 16th, ..., 20th century, respectively. How do you verify the claim is true based on the book content? The person further claimed that the book was written by William Shakespeare (1564-1616). Please design a method to verify if the book is fake based on the written style of William Shakespeare. (10 points) 7.The following defines basic symbols for smoothing. N:total occurrences of n-grams in a training dataset. B:total types of n-grams r:frequency of an n-grams Nr:total number of n-grams of frequency r in a training dataset Tr:total occurrences of n-grams of frequency r in further dataset Please give a formula to estimate the probability of an unseen n-gram for each of the smoothing methods. (12 points) (a) Add a small value λ to all types of n-grams. (b) Subtract a constant δ from each non-zero count. (c) Estimate by held out dataset. 8.What are the differences between deleted interpolation and back-off model? Please take the computation of P(Wn|Wn-3,Wn-2,Wn-1) as an example.(10 points) 9.Given a model λ and an observation sequence O, (a) find the probability of the sequence with Backward algorithm. (8 points) (b) find the best path with Viterbi algorithm. (8 points) 10.Forward probability and backward probability are often used to determine the parameters in an HMM model. Please show how it works. (10 points) -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 114.24.181.205 ※ 文章網址: http://www.ptt.cc/bbs/NTU-Exam/M.1397788775.A.EC1.html
rod24574575 :已收資訊系精華區! 04/18 11:57