看板 NTU-Exam 關於我們 聯絡資訊
課程名稱︰數位語音處理概論 課程性質︰選修 課程教師︰李琳山 開課學院:電資學院 開課系所︰電機、資工系 考試日期(年月日)︰2006.12.15 考試時限(分鐘):120 是否需發放獎勵金:是 (如未明確表示,則不予發放) 試題 : Digital Speech Processing, Midterm Dec. 15, 2006, 10:10-12:10 ● OPEN EVERYTHING ● 除專有名詞可用英文以外,所有文字說明一律以中文為限,未用中文者不計分 ● Total points: 165 ● Note that you don't need to be able to answer all the questions. ─────────────────────────────────────── 1. (10) Explain the concept of "Corpus-based Text-to-Speech Synthesis", how it works and why it is good. ╴ 2. (25) Given a HMM λ = (A, B, π), an observation sequence O = o_1 o_2 ... o_t ... o_T and a state sequence q(上面加底線) = q_1 q_2 ... q_t ... q_T ╴ (a) (10) Formulate and describe the forward algorithm to evaluate P(O│λ). Explain how it works. (b) (10) Formulate and describe the Viterbi algorithm to find the best state sequence q*(上面加底線) = q_1* q_2* ... q_t* ... q_T* giving the highest probability Prob(q*(上面加底線), O(上面加底線) │λ). Explain how it works. (c) (5) Now in order to recognize L words w_1, w_2, ... , w_L each with an HMM respectively, λ_1, λ_2, ... , λ_L it is well known that one can use either the forward algorithm or the Viterbi algorithm, ╴ ╴ ╴ arg max P(O│λ_k) arg max P(q*, O│λ_k) k k Explain why and discuss the difference between them. 3. (10) Write down the procedures for LBG algorithm and discuss why and how it is better than the K-means algorithm. 4. (10) Explain: in designing the decision tree to train tri-phone models, how the information theory is used to split a node n into two nodes a and b. 5. (10) In Classification and Regression Trees (CART), one can use composite questions instead of simple questions only. Write down what you know about this. 6. (10) The perplexity of a language source S is H(S) PP(S) = 2 , H(S) = -Σ p(x_i) log[p(x_i)], i where x_i is a word in the language, Explain why PP(S) is the estimate of the branching factor for the language assuming a "virtual vocabulary"? 7. (10) Explain the detailed principles and process for Katz smoothing. 8. (10) Given a set of events {x_i, i = 1, 2, ... , M}, {p(x_i), i = 1, 2, ... , M} and {q(x_i), i = 1, 2, ... , M} are two probability distributions. What is the Kullback-Leibler(KL) distance between p(x_i) and q(x_i) and what does it mean? 9. (10) (a) (5) What are the voiced/unvoiced speech signals and their time-domain waveform characteristics? (b) (5) What is pitch in speech signals and how is it related to the tones in Mandarin Chinese? 10. (10) The Hamming window has much lower sidelobes but wider mainlobe as compared to the rectangular window. Why is it good for front-end feature extraction for speech recognition? 11. (10) For large vocabulary continuous speech recognition, explain how the Viterbi algorithm can be performed such that the knowledge from the acoustic models, lexicon and language model can be efficiently integrated? 12. (15) Under what kind of condition a heuristic search is admissible? Show or explain why? 13. (15) (a) (8) Explain why Maximum Likelihood Linear Regression (MLLR) approaches can adjust a set of speaker-independent acoustic models to a new speaker with very limited quantity of adaptation data, but the performance is saturated at relatively lower accuracy? (b) (7) Explain why tree-structured classes can be helpful here. 14. (10) In Latent Semantic Analysis the elements w_ij of the word-document matrix W(上面加底線) is c_ij w_ij = (1 - ε_i) ── n_j Where c_ij is the number of times the word w_i occurs in the document d_j, n_j is the total number of words in d_j, and 1 N c_ij c_ij N ε_i = - ─── Σ (──) log(──), t_i = Σ c_ij log N j=1 t_i t_i j=1 where N is the total number of documents. Explain the meaning of all these parameters. -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 114.45.28.88 ※ 文章網址: http://www.ptt.cc/bbs/NTU-Exam/M.1397992457.A.3DE.html