精華區beta NTU-Exam 關於我們 聯絡資訊
課程名稱︰數位語音處理概論 課程性質︰選修 課程教師︰李琳山教授 開課學院:電機資訊學院 開課系所︰電機系、資工所、網媒所 考試日期(年月日)︰2018/11/28 考試時限(分鐘):120分鐘 試題 : Introduction to Digital Speech Processing, Midterm Exam Nov. 28, 2018, 10:00-12:00 ● OPEN Lecture Power Point (Printed Version) and Personal Notes ● You have to use CHINESE sentences to answer all the questions, but you can use English terminologies ● Total points: 100 1. Take a look at the block diagram of a speech recognition system in Figure 1. https://i.imgur.com/9snxD8J.png Figure 1: A Speech Recognition System (a) In the block of front-end processing, why do we use the filter-bank? (4%) (b) Explain the rules of the acoustic models, lexicon, and language model in Figure 1? (12%) (c) Why do we need smoothing in the language model? (2%) (d) Which part includes the HMM-GMM? (2%) 2. Given a HMM λ= (A, B, π) with N states, an observation sequence O = o1o2 ... ot ... oT and a state sequence q = q1q2 ... qt ... qT, define α (i) = Prob[o1o2 ... ot, q = i|λ] t t β (i) = Prob[o o ... o | q = i, λ] t t+1 t+2 T t N (a) What is Σ α (i)β (i)? Show your results. (4%) i=1 t t α (i)β (i) t t (b) What is -------- ? Show your results. (4%) N Σ α (j)β (j) j=1 t t (c) What is α (i) a b (o )β (j)? Show your results. (4%) t ij j t+1 t+1 (d) Formulate and describe the Viterbi algorithm to find the best state sequence q* = q1*q2* ... qt* ... qT* giving the highest probability Prob[O, q*|λ]. Explain how it works and why backtracking is necessary. (4%) 3. Explain what is a tree lexicon and why it is useful in speech recognition. (8%) 4. (a) Given a discrete-valued random variable X with probability distribution M {p = Prob(X = x ), i = 1, 2, 3, ..., M}, Σ p = 1 i i i=1 i M Explain the meaning of H(X) = –Σ p [log(p )]. (4%) i=1 i i (b) Explain why and how H(X) above can be used to select the criterion to split a node into two in developing a decision tree. (4%) 5. (a) What is the perplexity of a language source? (4%) (b) What is the perplexity of a language model with respect to a corpus? (4%) (c) How are they related to a "virtual vocabulary"? (4%) 6. Please answer the following questions. (a) Explain what a triphone is and why it is useful. (4%) (b) Explain why and how the unseen triphones can be trained using decision trees. (4%) 7. What is the prosody of speech signals? How is it related to text-to-speech synthesis of speech? (6%) 8. Explain why and how beam search and two-pass search are useful in large vocabulary continuous speech recognition. (8%) 9. Please briefly describe LBG algorithm and K-means algorithm respectively. Which one of the above two algorithms usually performs better? (Explain your answer with descriptions, not just formula only.) (8%) 10.Homework problems (You can choose either HW2-1 or HW2-2 to answer.) HW2-1 (a) We added the sp and sil model in HW2-1. How can they be used in digital recognition? (2%) (b) Write down two methods to improve the baseline of the digital recognizer and explain the reason. (4%) HW2-2 (a) Why do we use Right-Context-Dependent Initial/Final to label? (2%) (b) What characteristics can we use to help distinguish the Initials and Finals? (4%) -- ※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 140.112.249.5 ※ 文章網址: https://www.ptt.cc/bbs/NTU-Exam/M.1543726655.A.BED.html
rod24574575 : 已收資訊系精華區! 12/02 13:30