課程名稱︰數位語音處理概論
課程性質︰選修
課程教師︰李琳山
開課學院:電資學院
開課系所︰資工系
考試日期(年月日)︰2005.12.2
考試時限(分鐘):120
是否需發放獎勵金:是
(如未明確表示,則不予發放)
試題 :
#OPEN EVERYTHING
#除專有名詞可用英文以外,所有文字說明一律以中文為限,未用中文者不計分
#Total points: 120, Time allocation: 1point/min
-----------------------------------------------------------------------
1.(10)Explain the concept of "Corpus-based Text-to-Speech Synthesis", how it
works and why it is good.
_
2.(20)Given a HMM λ=(A,B,π), an observation sequence O = o1o2...ot...oT
_
and a state sequence q = q1q2...qt..qT, define
αt(i) = Prob[o1 o2...ot, qt = i|λ]
βt(i) = Prob[ot+1 ot+2 ... oT|qt = i, λ]
αt(i)βt(i)
(a)(10)Let γt(i) = --------------------- , Explain what γt(i) is, where N is
Σi=1toN αt(i)βt(i) total number of states.
(b)(10)Formulate and describe the Viterbi algorithm to find the best state
_* * * * * _* _
sequence q = q1q2...qt...qT giving the highest probability Prob(q , O|λ).
Explain how it works.
3.(10)What is LBG algorithm and why is it better than K-means algorithm?
4.(10)Explain why and how the unseen triphones can be trained using decision
trees.
5.(10)Explain the meaning of the perplexity of a language model with respect
to a testing corpus.
6.(10)Explain the principles and procedures of estimating the probabilities
for unseen events in Katz smoothing.
7.(10)
(a)(5)What are the voiced/unvoiced speech signals and their time-domain
wave form characteristics?
(b)(5)What is pitch in speech signals and how is it related to the tones in
Mandarin Chinese?
8.(10)Write down what you know about the techniques for speech end point
detection.
9.(10)Explain how the tree lexicon can be used in the search algorithm for
large vocabulary continuous speech recognition and how it is helpful.
10.(20)Write down anything you learned about the following subjects which
were NOT mentioned in the class. Don't write anything mentioned in the class.
(a)Conversational interfaces
(b)Search problem/algorithms for large vocabulary continuous speech
recognition
(sol)
2-a
αt(i)是給定modelλ,看到observation o1o2...ot且在時間t時的state為i的機率。
βt(i)是給定modelλ,時間t時在state i的機率,而
^
αt(i)βt(i) P(O, qt = i|λ)
γt(i) = ---------------------- = ---^--------------
Σj=1toN αt(j)βt(j) P(O|λ)
故γt(i)是給定一observation sequence o1o2...oN時間t時state為i的機率。
2-b
定義一新變數δt(i)其值為時間t時到達state i的單一路徑之中最高的機率。則δt(i)
數學定義為:
δt(i) = max P[q1,q2,...,qt-1,qt = i, o1,o2,...,ot|λ]
q1*q2*...qN*
則我們可以進一步得到一遞迴定義:
δt+1(j) = max[δt(i)aij]‧bj(ot+1)
i
故Viterbi演算法利用此遞迴定義配合Dynamic Programming可快速找出最佳值並由
Backtracking找到最佳路徑。
4
Decision tree可以根據一連串的feature及利用entropy為splittingcriteria來有效分
類資料。
在Triphone的資料裡,許多的event在training data之中會成為unseenevent因此造成
training上的困難。因此根據聲學的知識,我們可以將各式各樣的triphone分類,使得
在發聲上接近的triphone能夠被找出,讓unseentriphone能夠由類似的triphone得到
適當的資料來估計。
6
Katz Smoothing是由Good-Turing Smoothing而來,後者會將高frequency的event之機率
搬動挪作低frequency的event的機率,但此一作法是設法總是有更高frequency的event
存在。在實際應用時此一假設若不成立則會遺失最高frequency event的機率。同時高
frequency的event往往是可信任的,因此Katz Smoothing真對frequency低於某一
threshold的eent做smoothing,若高於此一threshold則不調整。將低於threshold的
event做discount,將frequency分給unseen events,再依照Good-Turing Smoothing。
7-a
voiced signals:由振動聲代所產生的聲音,稱為濁音。在time domain上的波形特徵是會
有固定的形狀重複出現。
unvoiced signals:不經由振動聲帶所產生的聲音,稱為清音。在time domain上的波形
特徵是沒有固定的形狀重複出現。
7-b
pitch指的是聲音的音高,頻率越高的聲調越高。在中文裡經由變化音高造成聲調,倒是
一聲是音高不變,二聲則是音高從低變高,三聲則是由高變低再拉高。
9
tree lexicon的做法是將每個字的各聲學單元視為一個node,則在辨識時對每node只需
找出其最佳path即可得知到達該node的最大機率。這樣的好處是可以大量節省空間的使用
,並且可以加快運算速度,但是因為會捨棄掉一部分資訊,因此不會找到最佳解。
--
※ 發信站: 批踢踢實業坊(ptt.cc)
◆ From: 218.167.77.218