精華區beta NTU-Exam 關於我們 聯絡資訊
課程名稱︰自然語言處理 課程性質︰系內選修 課程教師︰陳信希 開課學院:電資學院 開課系所︰資訊工程學系 考試日期(年月日)︰106/04/20 考試時限(分鐘):180 試題 : 1. The following questions concern the resources used in natural language processing (NLP) researches. (a) Annual Meeting of Association for Computational Linguistics (ACL) and International Conference on Computational Linguistics (COLING) are two top tier/representative conferences in NLP. Please specify the largest NLP archive in the world, which keeps the major NLP conference proceedings. (5 points) (b) If we need an English treebank to train a parser, please suggest an organization where we can purchase the required treebank. (5 points) (c) If we need a balanced Chinese corpus to develop a Chinese segmentation system, please suggest an organization where we can get the required corpus. (5 points) 2. A pipelined NLP system can be composed of morphological processing module, syntactic analysis module, semantic interpretation module and discourse analysis module. Please use the following sentence to describe any 5 operations in the pipelined system. The operations can be selected from the same module or different modules. Please also address to which module the mentioned operation belongs. (20 points) 英國首相今天宣布提前大選,英鎊轉貶,但隨後重升。 3. Labelling/tagging operation plays an important role in natural language processing. Different labels are proposed at different analysis levels. For example, a set of part-of-speech (POS) tags are defined at the lexical level. POS tagger aims at labelling each word in a sentence a POS tag. Here tagging is a labelling operation. Please specify 3 other labelling (tagging) operations in NLP. (15 points) 4. The following shows a review of a hotel: 客房古老,面積不大,不過景觀很好,可以看見秦淮河。 The words "客房", "面積", and "景觀" are aspect terms. In contrast, the words "古老", "大", and "好" are opinion words, which modifies aspect terms and shows the polarity on the aspect. In some case, only opinoin words are used in a review, but aspect terms are absent (i.e. implicit aspect). In the sentence "這是千萬畫素裡最便宜的一台", we know the opinion word "便宜" modifies an implicit aspect term "價錢". Given a hotel review corpus, please propose a method to find the collocation of opinion word and aspect term, and use the findings to deal with implicit aspect problem. (10 points) 5. One of the applications of language model is to estimate the probability of next word given previous n-1 words. Please compare traditional language model and neural probability language model to deal with this problem. (10 points) 6. In training HMM model, we need to compute number of each individual arc (link) passed for a training instance. How can we compute this number for each arc efficiently without enumerating all the paths? (10 points) 7. (a) What is zero probability problem? (5 points) (b) What is the major problem of Laplace smoothing? (5 points) (c) How does Kneser-Ney Smoothing word to deal with the zero probability problem? (10 points) (d) In traditional language modeling, smoothing technique is introduced to avoid zero probability problem. In distributed representation, we associate each word in the vocabulary with a distributed dense vector. Similar words (semantically and syntactically) will be close in the embedding space. Is it necessary to introduce smooth technique to neural probability language model? Please explain why. You can use the following examples to explain your answers. (10 points) The cat is walking in the bedroom. A dog was running in a room. 8. In analogy analysis, two pairs of words which share a relation are given. We aim at identifying a hidden word based on three other words. Word embedding is shown to be powerful in this application. Please present three similarity computation methods to find the hidden word. (10 points) -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.112.16.131 ※ 文章網址: https://www.ptt.cc/bbs/NTU-Exam/M.1492659691.A.0A2.html ※ 編輯: kevin1ptt (140.112.16.131), 04/20/2017 11:46:31