課程名稱︰機器學習
課程性質︰電機系選修
課程教師︰吳沛遠
開課學院:電機資訊學院
開課系所︰電機工程學系
考試日期(年月日)︰2018/12/28
試題 :
The paper consists of 9 questions. Total 105 points + 10 bonus points. In this
exam we denote
● Sigmoid function: $\sigma(z) = \frac1{1+e^{-z}}$. You may apply approximate
$\sigma(z) \approx \begin{cases}0,&\text{if }z\le-10,\\1,&\text{if }z\ge10.
\end{cases}$
● Sign function: $\operatorname{sgn}(z) = \begin{cases}1,&\text{if }z>0,\\0,
&\text{if }z=0,\\-1,&\text{if }z<0.\end{cases}$
● Unless otherwise specified, all log refer to natural log, i.e., $\log_e$.
Problem 1: (20 pts) Multiple Selection (多選題有倒扣,最多倒扣至本大題零分)
Please answer the following multiple selection questions. Wrong selections will
result in inverted scores. No derivation required.
(1) Suppose you are using a hard margin linear SVM classifier on 2 class clas-
sification problem. Now you have been given the following data in which
some points are dashed-circled that are representing support vectors.
https://i.imgur.com/0YlXylk.png
\begin{tikzpicture}
\def\ok{
-2.3/2.6, -1.8/1.9, -1.7/1, -1.6/2.4, -1.3/3.2,
-1.2/1.8, -1/3.9, -.9/2.5, -.7/1.4, -.5/2.2,
-.5/3.4, -.4/2.9, .2/2.6, .2/3.1, .7/3.3
}
\def\ng{
.7/0, .8/.6, 1.1/-.3, 1.2/.8, 1.3/.3,
1.6/1.1, 1.8/-.1, 1.9/.7, 2/1.7, 2.4/1.2,
2.4/2.1, 2.5/.6, 2.7/1.7, 3.1/2.2
}
\foreach \x/\y in \ok{
\draw (\x, \y) circle(1pt);
}
\foreach \x/\y in \ng{
\filldraw (\x, \y) circle(1pt);
}
\def\sx{{0, .2, 1.4}}
\def\sy{{1.7, .5, 1.8}}
\draw (\sx[0], \sy[0]) circle(1pt);
\draw[densely dotted] (\sx[0], \sy[0]) circle(2pt);
\foreach \i in {1, 2}{
\filldraw (\sx[\i], \sy[\i]) circle(1pt);
\draw[densely dotted] (\sx[\i], \sy[\i]) circle(2pt);
}
\foreach \i in {0, 1, 2}{
\draw[very thin, -stealth] (1.3, 3) -- (\sx[\i], \sy[\i]);
}
\draw (2, 3) node[fill=white]{\footnotesize support vectors};
\end{tikzpicture}
(A) Removing any dash-circled points from the data will change the decision
boundary.
(B) Removing any dash-circled points from the data will not change the de-
cision boundary.
(C) Removing any non-dash-circled points from the data will change the de-
cision boundary.
(D) Removing any non-dash-circled points from the data will not change the
decision boundary.
(E) Removing all non-dash-circled points from the data will not change the
decision boundary.
(2) If we increase parameter C in soft margin linear SVM classifier, what will
happen?
(A) The training error decreases.
(B) The training error increases.
(C) The margin decreases.
(D) The margin increases.
(E) The testing error decreases.
(3) Suppose you are using a kernel SVM to 2 class classification problem, where
the data points are distributed on the x-y plane (i.e., data points are 2
dimensional). Suppose we choose kernel function as $k((x, y), (x', y'))
= (xx'+yy')^2$, which of the following decision boundaries, as described by
equation f(x, y) = 0, are possible?
(A) f(x, y) = x + y.
(B) $f(x, y) = x^2 + y^2$.
(C) $f(x, y) = (x+y)^2$.
(D) $f(x, y) = (x-1)^2 + 3(y+2)^2$.
(E) $f(x, y) = x^2 - 4y$.
(4) Suppose you are using a kernel SVM to 2 class classification problem, where
the data points are distributed on the x-y plane (i.e., data points are 2
dimensional). Suppose we choose kernel function as $k((x, y), (x', y'))
= (1+xx'+yy')^2$, which of the following decision boundaries, as described
by equation f(x, y) = 0, are possible?
(A) f(x, y) = x + y.
(B) $f(x, y) = x^2 + y^2$.
(C) $f(x, y) = (x+y)^2$.
(D) $f(x, y) = (x-1)^2 + 3(y+2)^2$.
(E) $f(x, y) = x^2 - 4y$.
(5) Suppose a SVM classifier is trained from data set $\{(\bm x_i, y_i)
\}_{i=1}^N$, where $y_i \in \{+1, -1\}$ denotes the labels, and the classi-
fier classifies x as positive label if $f(\bm x) = \bm w^T\bm x+b \ge 0$.
The primal problem for solving w is given by
\begin{tabular}{ll}
Minimize & $\frac12\|\bm w\|^2+C\sum_{i=1}^N\xi_i$\\
Subject to & $y_i(\bm w^T\bm x_i+b)\ge1-\xi_i,\forall i=1,\ldots,N$\\
Variables & $\bm w\in\mathbb R^d,b\in\mathbb R,\xi_1,\ldots,\xi_N\ge0$
\end{tabular}
The dual problem for solving $\alpha_i$'s in $\bm w = \sum_{i=1}^N\alpha_i
y_i\bm x_i$ is given by
\begin{tabular}{ll}
Maximize & $\sum_{i=1}^N\alpha_i-\frac12\sum_{i=1}^N\sum_{j=1}^N
\alpha_i\alpha_jy_iy_j(\bm x_i^T\bm x_j)$\\
Subject to & $\sum_{i=1}^N\alpha_iy_i=0$\\
Variables & $0\le\alpha_i\le C$
\end{tabular}
Upon achieving optimal in both primal and dual problems,
(A) If $\alpha_i > 0$ then $\xi_i > 0$.
(B) If $\xi_i > 0$ then $\alpha_i > 0$.
(C) If $\alpha_i = C$ then $\xi_i > 0$.
(D) If $\xi_i > 0$ then $\alpha_i = C$.
(E) If $\alpha_i = 0$ then $\xi_i = 0$.
(6) Suppose the neural network was trained with dropout rate p = 0.2, in the
sense that each neuron had probability p of only passing zero to the conse-
cutive neurons. After the neural network is trained, how should we modify
the weights in the neural network, so it can be applied without dropout?
(A) Multiply each weight by 0.2.
(B) Multiply each weight by 0.8.
(C) Multiply each weight by 1.2.
(D) Multiply each weight by 1.25.
(E) No modification is needed.
(7) Suppose you have an input volume of dimension 48×48×3. That is, the in-
puts are images of size 48×48 with 3 channels (RGB). How many parameters
would a single 5×5 convolutional filter have (not including bias)?
Note: Since there is just a single 5×5 convolutional filter, the output
has only 1 channel.
(A) 3
(B) 25
(C) 75
(D) 2304
(E) 6912
(8) In the context of ensemble methods, which of the following statements are
true?
(A) In bagging, each weak classifier is independent of each other.
(B) In boosting, each weak classifier is independent of each other.
(C) In case of under-fitting, we expect bagging as a better remedy over
boosting.
(D) In case of over-fitting, we expect bagging as a better remedy over boo-
sting.
(E) AdaBoost (Adaptive boosting) considers hinge loss.
(9) Which of the following are convex functions on $\mathbb R^2$?
(A) $f(u, v) = u^2 - 4uv + v^2$
(B) f(u, v) = u - 3v
(C) $f(u, v) = \log(u^2+1)$
(D) $f(u, v) = u^2 + v^2 + \max(-u, 0)$
(E) f(u, v) = sgn(u)
(10) Select all that belong to unsupervised learning algorithms.
(A) Deep auto-encoder
(B) Hierarchical Agglomerative Clustering
(C) K-means
(D) Linear regression
(E) Logistic regression
(F) Locally Linear Embedding (LLE)
(G) Principal Component Analysis (PCA)
(H) Random forest
(I) Support Vector Machine (SVM)
(J) t-Distributed Stochastic Neighbor Embedding (t-SNE)
Problem 2: (10 pts) Linear Regression
Consider regression function $f_{\bm w}(x) = w_0 + w_1x + w_2x^2$. Consider 10
data points as follows.
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|}\hline
$i$ & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10\\\hline
$x_i$ & 0.89 & 0.03 & 0.49 & 0.17 & 0.98 & 0.71 & 0.50 & 0.47 & 0.06
& 0.68\\\hline
$y_i$ & 3.03 & $-1.14$ & 0.96 & $-0.53$ & 3.90 & 2.21 & 1.09 & 0.78 & $-0.77$
& 1.97\\\hline
\end{tabular}
Find the values of $w_0, w_1, w_2$ that minimizes the following loss function
\[L(\bm w) = \sum_{i=1}^{10}|y_i-f_{\bm w}(x_i)|^2\]
Problem 3: (12 pts) Fitting Single Gaussian Distribution
Denote $\mathcal N(\bm\mu, \bm\Sigma)$ as the Gaussian distribution with mean μ
and covariance matrix Σ.
(1) (2 pts) Write down the probability density function for $X \sim \mathcal N
\left(\begin{bmatrix}3\\-2\end{bmatrix}, \begin{bmatrix}2&-1\\-1&3
\end{bmatrix}\right)$.
(2) (10 pts) Suppose the following 10 data points
(4.48, 1.27), (2.36, 1.78), (4.21, -1.10), (5.42, 9.42), (3.48, -1.91),
(1.56, -2.39), (3.71, -2.97), (3.37, -1.13), (3.35, 1.04), (4.26, -1.65)
are independently generated from $\mathcal N(\bm\mu, \bm\Sigma)$. Find the
maximum likelihood estimator of the mean μ and covariance matrix Σ.
Problem 4: (3 pts) Cross-entropy
Let X = {滷肉飯, 牛肉麵, 大波蘿, 壽司, 素食}. Consider two probability distribu-
tions $P_X, Q_X$ as follows:
\begin{tabular}{|c|c|c|c|c|c|}\hline
$x$ & 滷肉飯 & 牛肉麵 & 大波蘿 & 壽司 & 素食\\\hline
$P_X(x)$ & 0.3 & 0.4 & 0.15 & 0.1 & 0.05\\\hline
$Q_X(x)$ & 0.2 & 0.1 & 0.05 & 0.25 & 0.4\\\hline
\end{tabular}
Find the cross entropy
\[H(P_X, Q_X) = \sum_{x\in X}P_X(x)\ln\left(\frac1{Q_X(x)}\right)\]
Problem 5: (10 pts) Logistic regression
A group of 10 students spend various hours studying for the machine learning
(ML) exam. The following table shows the number of hours each student spent stu-
dying, and whether they passed (1) or failed (0).
┌─────┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
│ Hours (X)│ 0.5│ 1 │ 1.5│1.75│ 2.5│2.75│3.25│ 4 │ 4.5│ 5 │
├─────┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
│ Pass (Y) │ 0 │ 0 │ 0 │ 1 │ 0 │ 1 │ 1 │ 1 │ 1 │ 1 │
└─────┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
Consider the logistic model that predicts the probability of passing the exam by
the hours spent studying
P(Y = 1|X = x) = σ(wx + b)
Find the cross entropy loss should we fit the data with logistic model with pa-
rameter w = 1.5, b = -4.
Problem 6: (5 pts + 10 pts Bonus) Gaussian Mixture Model and Expectation Maximi-
zation
Suppose we wish to fit the following 10 data points (distributed in 1-D space)
-12.72, -2.05, -6.56, 2.55, -1.77, 9.19, 8.85, -3.34, -3.74, 3.63
by Gaussian mixture model $p_\theta(x)$ parameterized by $\theta = (\pi_1,
\mu_1, \sigma_1, \pi_2, \mu_2, \sigma_2)$, as given as follows
\[p_\theta(x) = \pi_1(2\pi\sigma_1^2)^{-1/2}e^{-\frac{(x-\mu_1)^2}{2\sigma_1^2}}
+ \pi_2(2\pi\sigma_2^2)^{-1/2}e^{-\frac{(x-\mu_2)^2}{2\sigma_2^2}}\]
Suppose the initial guess of parameter is $\theta^{(0)} = \left(\pi_1^{(0)},
\mu_1^{(0)}, \sigma_1^{(0)}, \pi_2^{(0)}, \mu_2^{(0)}, \sigma_2^{(0)}\right)
= (0.3, -1, 2, 0.7, 2, 3)$.
(a) (5 pts) Compute the log likelihood of parameter $\theta^{(0)}$.
(b) (10 pts Bonus) Apply expectation maximization algorithm, find the next up-
date of parameters
\[\theta^{(1)} = \left(\pi_1^{(1)}, \mu_1^{(1)}, \sigma_1^{(1)},
\pi_2^{(1)}, \mu_2^{(1)}, \sigma_2^{(1)}\right)\]
Problem 7: (14 pts) Principal Component Analysis
Consider the following 10 data points distributed in 2-D space as follows
(1.91, -0.11), (-2.24, -1.09), (1.36, -0.20), (0.33, 0.13), (-0.33, 0.37),
(0.00, -0.63), (-3.10, -0.47), (-0.34, 2.38), (2.43, -3.00), (-0.02, 2.62)
(a) (10 pts) Find the first and second principal axes.
(b) (4 pts) Find the first and second principal components for data point (0.96,
0.28).
Note: The data points have zero mean.
Problem 8: (9 pts) LSTM
Consider a LSTM node as follows. Please fill out the following table in the ans-
wer sheet. No derivation required.
https://i.imgur.com/YDzJH7M.png
\begin{tikzpicture}
\draw[very thick] (0, 0) circle(.5) node{\Large $c^{(t)}$};
\filldraw (0, -2) circle(.1);
\draw[-stealth] (0, -1.9) -- (0, -.5);
\draw (0, -4) circle(.5);
\draw (-.3, -4.3) -- (.3, -3.7);
\draw (.15, -4.15) node{\scriptsize $g$};
\draw (.3, -3.7) node[anchor=south west]{\scriptsize identity};
\draw[-stealth] (0, -3.5) -- (0, -2.1);
\draw (0, 2) circle(.5);
\draw (-.3, 1.7) -- (.3, 2.3);
\draw (.15, 1.85) node{\scriptsize $h$};
\draw (.3, 2.3) node[anchor=south west]{\scriptsize identity};
\draw[-stealth] (0, .5) -- (0, 1.5);
\filldraw (0, 4) circle(.1);
\draw[-stealth] (0, 2.5) -- (0, 3.9);
\draw (-3, -2) circle(.5);
\draw[domain=-6:6, samples=97, variable=\t]
plot({\t/20-3}, {.5/(1+exp(-\t))-2.25});
\draw (-2.85, -2.15) node{\scriptsize $f$};
\draw (-3, -2.5) node[anchor=north]{\scriptsize sigmoid};
\draw (-2.7, -1.7) node[anchor=south west]{\bf\scriptsize Input Gate};
\draw[-stealth] (-2.5, -2) -- (-.1, -2);
\filldraw (.9, 0) circle(.1);
\draw[-stealth] (.4, .3) to[out=0, in=120] (.84, .08);
\draw[-stealth] (.84, -.08) to[out=240, in=0] (.4, -.3);
\draw (3, 0) circle(.5);
\draw[domain=-6:6, samples=97, variable=\t]
plot({\t/20+3}, {.5/(1+exp(-\t))-.25});
\draw (3.15, -.15) node{\scriptsize $f$};
\draw (3, -.5) node[anchor=north]{\scriptsize sigmoid};
\draw (2.7, .3) node[anchor=south east]{\bf\scriptsize Forget Gate};
\draw[-stealth] (2.5, 0) -- (1, 0);
\draw (-3, 4) circle(.5);
\draw[domain=-6:6, samples=97, variable=\t]
plot({\t/20-3}, {.5/(1+exp(-\t))+3.75});
\draw (-2.85, 3.85) node{\scriptsize $f$};
\draw (-3, 3.5) node[anchor=north]{\scriptsize sigmoid};
\draw (-2.7, 3.7) node[anchor=north west]{\bf\scriptsize Output Gate};
\draw[-stealth] (-2.5, 4) -- (-.1, 4);
\draw[thick] (3.5, -4.5) node[anchor=south east]{\bf\scriptsize Block}
rectangle (-3.5, 4.5);
\filldraw[lightgray] (-.3, -5.8) rectangle (.3, -5.2);
\draw (0, -5.5) node{+};
\draw[-stealth] (0, -5.2) -- (0, -4.5);
\filldraw[lightgray] (-2.4, -7.8) rectangle (-1.8, -7.2);
\draw (-2.1, -7.5) node{$x_1^{(t)}$};
\draw[-stealth] (-2.1, -7.2) node[anchor=south]{\tiny 1} -- (0, -5.8);
\filldraw[lightgray] (-1, -7.8) rectangle (-.4, -7.2);
\draw (-.7, -7.5) node{$x_2^{(t)}$};
\draw[-stealth] (-.7, -7.2) node[anchor=south]{\tiny\color{gray}0}
-- (0, -5.8);
\filldraw[lightgray] (.4, -7.8) rectangle (1, -7.2);
\draw (.7, -7.5) node{$x_3^{(t)}$};
\draw[-stealth] (.7, -7.2) node[anchor=south]{\tiny\color{gray}0}
-- (0, -5.8);
\filldraw[gray] (1.8, -7.8) rectangle (2.4, -7.2);
\draw (2.1, -7.5) node{1};
\draw[-stealth] (2.1, -7.2) node[anchor=south]{\tiny\color{gray}0}
-- (0, -5.8);
\filldraw[lightgray] (-4.8, -2.3) rectangle (-4.2, -1.7);
\draw (-4.5, -2) node{+};
\draw[-stealth] (-4.2, -2) -- (-3.5, -2);
\filldraw[lightgray] (-6.8, -.2) rectangle (-6.2, .4);
\draw (-6.5, .1) node{$x_1^{(t)}$};
\draw[-stealth] (-6.2, .1) node[anchor=west]{\tiny\color{gray}0}
-- (-4.8, -2);
\filldraw[lightgray] (-6.8, -1.6) rectangle (-6.2, -1);
\draw (-6.5, -1.3) node{$x_2^{(t)}$};
\draw[-stealth] (-6.2, -1.3) node[anchor=west]{\tiny\color{gray}0}
-- (-4.8, -2);
\filldraw[lightgray] (-6.8, -3) rectangle (-6.2, -2.4);
\draw (-6.5, -2.7) node{$x_3^{(t)}$};
\draw[-stealth] (-6.2, -2.7) node[anchor=west]{\tiny $-100$} -- (-4.8, -2);
\filldraw[gray] (-6.8, -4.4) rectangle (-6.2, -3.8);
\draw (-6.5, -4.1) node{1};
\draw[-stealth] (-6.2, -4.1) node[anchor=west]{\tiny $-10$} -- (-4.8, -2);
\filldraw[lightgray] (4.2, -.3) rectangle (4.8, .3);
\draw (4.5, 0) node{+};
\draw[-stealth] (4.2, 0) -- (3.5, 0);
\filldraw[lightgray] (6.2, 1.8) rectangle (6.8, 2.4);
\draw (6.5, 2.1) node{$x_1^{(t)}$};
\draw[-stealth] (6.2, 2.1) node[anchor=east]{\tiny\color{gray}0} -- (4.8, 0);
\filldraw[lightgray] (6.2, .4) rectangle (6.8, 1);
\draw (6.5, .7) node{$x_2^{(t)}$};
\draw[-stealth] (6.2, .7) node[anchor=east]{\tiny\color{gray}0} -- (4.8, 0);
\filldraw[lightgray] (6.2, -1) rectangle (6.8, -.4);
\draw (6.5, -.7) node{$x_3^{(t)}$};
\draw[-stealth] (6.2, -.7) node[anchor=east]{\tiny $-100$} -- (4.8, 0);
\filldraw[gray] (6.2, -2.4) rectangle (6.8, -1.8);
\draw (6.5, -2.1) node{1};
\draw[-stealth] (6.2, -2.1) node[anchor=east]{\tiny 10} -- (4.8, 0);
\filldraw[lightgray] (-4.8, 3.7) rectangle (-4.2, 4.3);
\draw (-4.5, 4) node{+};
\draw[-stealth] (-4.2, 4) -- (-3.5, 4);
\filldraw[lightgray] (-6.8, 5.8) rectangle (-6.2, 6.4);
\draw (-6.5, 6.1) node{$x_1^{(t)}$};
\draw[-stealth] (-6.2, 6.1) node[anchor=west]{\tiny\color{gray}0}
-- (-4.8, 4);
\filldraw[lightgray] (-6.8, 4.4) rectangle (-6.2, 5);
\draw (-6.5, 4.7) node{$x_2^{(t)}$};
\draw[-stealth] (-6.2, 4.7) node[anchor=west]{\tiny $-100$} -- (-4.8, 4);
\filldraw[lightgray] (-6.8, 3) rectangle (-6.2, 3.6);
\draw (-6.5, 3.3) node{$x_3^{(t)}$};
\draw[-stealth] (-6.2, 3.3) node[anchor=west]{\tiny 100} -- (-4.8, 4);
\filldraw[gray] (-6.8, 1.6) rectangle (-6.2, 2.2);
\draw (-6.5, 1.9) node{1};
\draw[-stealth] (-6.2, 1.9) node[anchor=west]{\tiny $-10$} -- (-4.8, 4);
\draw[-stealth] (0, 4.1) -- (0, 5.5) node[anchor=south]{\Large $y^{(t)}$};
\end{tikzpicture}
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|}\hline
Time & $t$ & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10\\\hline
\multirow{3}{*}{Input} & $x_1^{(t)}$ & 0 & 1 & 3 & 2 & 4 & $-3$ & 7 & 2 & 3
& $-5$ & 8\\\cline{2-13}
& $x_2^{(t)}$ & 0 & 0 & 1 & $-2$ & $-3$ & $-1$ & 0 & 0 & 0 & 0
& $-2$\\\cline{2-13}
& $x_3^{(t)}$ & 0 & 0 & $-1$ & $-1$ & $-1$ & 0 & 0 & 1 & 1 & $-1$
& $-1$\\\hline
Memory cell & $c^{(t)}$ & 0 &&&&&&&&&&\\\hline
Output & $y^{(t)}$ & 0 &&&&&&&&&&\\\hline
\end{tabular}
Problem 9: (22 pts) Feedforward and Back Propagation
Consider the following neural network
https://i.imgur.com/RzcU43s.png
\begin{tikzpicture}
\draw (0, 3) node{Input};
\filldraw[lightgray] (-.4, -.5) rectangle (.4, 2.5);
\filldraw[gray] (-.3, -.3) rectangle (.3, .3);
\draw (0, 0) node{$x_2$};
\filldraw[gray] (-.3, 1.7) rectangle (.3, 2.3);
\draw (0, 2) node{$x_1$};
\foreach \i in {1, ..., 4}{
\filldraw[gray] (2.7, {5.7-2*\i}) rectangle (3.3, {6.3-2*\i});
\draw (3, {6-2*\i}) node{+};
\draw[-stealth] (.3, 2)
-- (2.7, {6-2*\i}) node[anchor=south]{\tiny $w^1_{\i1}$};
\draw[-stealth] (.3, 0)
-- (2.7, {6-2*\i}) node[anchor=north]{\tiny $w^1_{\i2}$};
}
\filldraw[lightgray] (4.6, -2.5) rectangle (5.4, 4.5);
\filldraw[gray] (5, 3) ellipse(.7 and 1.6);
\filldraw[gray] (5, -1) ellipse(.7 and 1.6);
\foreach \i in {1, ..., 4}{
\filldraw[darkgray] (4.7, {5.7-2*\i}) rectangle (5.3, {6.3-2*\i});
\draw[white] (5, {6-2*\i}) node{$z^1_\i$};
\draw[-stealth] (3.3, {6-2*\i}) -- (4.7, {6-2*\i});
}
\filldraw[gray] (7, 3) ellipse(.5 and .3);
\draw (7, 3) node{Max};
\draw[-stealth] (5.3, 4) -- (6.5, 3);
\draw[-stealth] (5.3, 2) -- (6.5, 3);
\filldraw[gray] (7, -1) ellipse(.5 and .3);
\draw (7, -1) node{Max};
\draw[-stealth] (5.3, 0) -- (6.5, -1);
\draw[-stealth] (5.3, -2) -- (6.5, -1);
\filldraw[lightgray] (8.1, -1.5) rectangle (8.9, 3.5);
\filldraw[darkgray] (8.2, 2.7) rectangle (8.8, 3.3);
\draw[white] (8.5, 3) node{$a_1$};
\draw[-stealth] (7.5, 3) -- (8.2, 3);
\filldraw[darkgray] (8.2, -1.3) rectangle (8.8, -.7);
\draw[white] (8.5, -1) node{$a_2$};
\draw[-stealth] (7.5, -1) -- (8.2, -1);
\foreach \i in {1, ..., 4}{
\filldraw[gray] (11.2, {5.7-2*\i}) rectangle (11.8, {6.3-2*\i});
\draw (11.5, {6-2*\i}) node{+};
\draw[-stealth] (8.8, 3)
-- (11.2, {6-2*\i}) node[anchor=south]{\tiny $w^2_{\i1}$};
\draw[-stealth] (8.8, -1)
-- (11.2, {6-2*\i}) node[anchor=north]{\tiny $w^2_{\i2}$};
}
\filldraw[lightgray] (13.1, -2.5) rectangle (13.9, 4.5);
\filldraw[gray] (13.5, 3) ellipse(.7 and 1.6);
\filldraw[gray] (13.5, -1) ellipse(.7 and 1.6);
\foreach \i in {1, ..., 4}{
\filldraw[darkgray] (13.2, {5.7-2*\i}) rectangle (13.8, {6.3-2*\i});
\draw[white] (13.5, {6-2*\i}) node{$z^2_\i$};
\draw[-stealth] (11.8, {6-2*\i}) -- (13.2, {6-2*\i});
}
\filldraw[gray] (15.5, 3) ellipse(.5 and .3);
\draw (15.5, 3) node{Max};
\draw[-stealth] (13.8, 4) -- (15, 3);
\draw[-stealth] (13.8, 2) -- (15, 3);
\filldraw[gray] (15.5, -1) ellipse(.5 and .3);
\draw (15.5, -1) node{Max};
\draw[-stealth] (13.8, 0) -- (15, -1);
\draw[-stealth] (13.8, -2) -- (15, -1);
\draw (17, 4) node{Output};
\filldraw[lightgray] (16.6, -1.5) rectangle (17.4, 3.5);
\filldraw[darkgray] (16.7, 2.7) rectangle (17.3, 3.3);
\draw[white] (17, 3) node{$y_1$};
\draw[-stealth] (16, 3) -- (16.7, 3);
\filldraw[darkgray] (16.7, -1.3) rectangle (17.3, -.7);
\draw[white] (17, -1) node{$y_2$};
\draw[-stealth] (16, -1) -- (16.7, -1);
\end{tikzpicture}
The above neural network can be represented as a function $f_\theta$, namely
\[\begin{bmatrix}y_1\\y_2\end{bmatrix}
= f_\theta\left(\begin{bmatrix}x_1\\x_2\end{bmatrix}\right),\]
where parameter θ records all the weights $w^k_{ij}$.
(a) (6 pts) Suppose the weights are initialized as follows
https://i.imgur.com/gcHnMYA.png
\begin{tikzpicture}
\def\w{{1, 3, 2, -1, 0, 0, 4, -2, 5, 2, -2, -1, 0, 1, 1, -1}}
\draw (0, 3) node{Input};
\filldraw[lightgray] (-.4, -.5) rectangle (.4, 2.5);
\filldraw[gray] (-.3, -.3) rectangle (.3, .3);
\draw (0, 0) node{$x_2$};
\filldraw[gray] (-.3, 1.7) rectangle (.3, 2.3);
\draw (0, 2) node{$x_1$};
\foreach \i in {1, ..., 4}{
\filldraw[gray] (2.7, {5.7-2*\i}) rectangle (3.3, {6.3-2*\i});
\draw (3, {6-2*\i}) node{+};
\draw[-stealth] (.3, 2) -- (2.7, {6-2*\i}) node[anchor=south]{
\pgfmathparse{int(\w[2*\i-2])} $\pgfmathresult$};
\draw[-stealth] (.3, 0) -- (2.7, {6-2*\i}) node[anchor=north]{
\pgfmathparse{int(\w[2*\i-1])} $\pgfmathresult$};
}
\filldraw[lightgray] (4.6, -2.5) rectangle (5.4, 4.5);
\filldraw[gray] (5, 3) ellipse(.7 and 1.6);
\filldraw[gray] (5, -1) ellipse(.7 and 1.6);
\foreach \i in {1, ..., 4}{
\filldraw[darkgray] (4.7, {5.7-2*\i}) rectangle (5.3, {6.3-2*\i});
\draw[white] (5, {6-2*\i}) node{$z^1_\i$};
\draw[-stealth] (3.3, {6-2*\i}) -- (4.7, {6-2*\i});
}
\filldraw[gray] (7, 3) ellipse(.5 and .3);
\draw (7, 3) node{Max};
\draw[-stealth] (5.3, 4) -- (6.5, 3);
\draw[-stealth] (5.3, 2) -- (6.5, 3);
\filldraw[gray] (7, -1) ellipse(.5 and .3);
\draw (7, -1) node{Max};
\draw[-stealth] (5.3, 0) -- (6.5, -1);
\draw[-stealth] (5.3, -2) -- (6.5, -1);
\filldraw[lightgray] (8.1, -1.5) rectangle (8.9, 3.5);
\filldraw[darkgray] (8.2, 2.7) rectangle (8.8, 3.3);
\draw[white] (8.5, 3) node{$a_1$};
\draw[-stealth] (7.5, 3) -- (8.2, 3);
\filldraw[darkgray] (8.2, -1.3) rectangle (8.8, -.7);
\draw[white] (8.5, -1) node{$a_2$};
\draw[-stealth] (7.5, -1) -- (8.2, -1);
\foreach \i in {1, ..., 4}{
\filldraw[gray] (11.2, {5.7-2*\i}) rectangle (11.8, {6.3-2*\i});
\draw (11.5, {6-2*\i}) node{+};
\draw[-stealth] (8.8, 3) -- (11.2, {6-2*\i}) node[anchor=south]{
\pgfmathparse{int(\w[2*\i+6])} $\pgfmathresult$};
\draw[-stealth] (8.8, -1) -- (11.2, {6-2*\i}) node[anchor=north]{
\pgfmathparse{int(\w[2*\i+7])} $\pgfmathresult$};
}
\filldraw[lightgray] (13.1, -2.5) rectangle (13.9, 4.5);
\filldraw[gray] (13.5, 3) ellipse(.7 and 1.6);
\filldraw[gray] (13.5, -1) ellipse(.7 and 1.6);
\foreach \i in {1, ..., 4}{
\filldraw[darkgray] (13.2, {5.7-2*\i}) rectangle (13.8, {6.3-2*\i});
\draw[white] (13.5, {6-2*\i}) node{$z^2_\i$};
\draw[-stealth] (11.8, {6-2*\i}) -- (13.2, {6-2*\i});
}
\filldraw[gray] (15.5, 3) ellipse(.5 and .3);
\draw (15.5, 3) node{Max};
\draw[-stealth] (13.8, 4) -- (15, 3);
\draw[-stealth] (13.8, 2) -- (15, 3);
\filldraw[gray] (15.5, -1) ellipse(.5 and .3);
\draw (15.5, -1) node{Max};
\draw[-stealth] (13.8, 0) -- (15, -1);
\draw[-stealth] (13.8, -2) -- (15, -1);
\draw (17, 4) node{Output};
\filldraw[lightgray] (16.6, -1.5) rectangle (17.4, 3.5);
\filldraw[darkgray] (16.7, 2.7) rectangle (17.3, 3.3);
\draw[white] (17, 3) node{$y_1$};
\draw[-stealth] (16, 3) -- (16.7, 3);
\filldraw[darkgray] (16.7, -1.3) rectangle (17.3, -.7);
\draw[white] (17, -1) node{$y_2$};
\draw[-stealth] (16, -1) -- (16.7, -1);
\end{tikzpicture}
If $(x_1, x_2) = (1, -1)$, please fill out the following table in the answer
sheet. No derivation required.
\begin{tabular}{|c|c|c|c|c|c|c|c|c|c|c|c|c|}\hline
Variable & $z^1_1$ & $z^1_2$ & $z^1_3$ & $z^1_4$ & $a_1$ & $a_2$
& $z^2_1$ & $z^2_2$ & $z^2_3$ & $z^2_4$ & $y_1$ & $y_2$\\\hline
Value &&&&&&&&&&&&\\\hline
\end{tabular}
(b) (16 pts) Continuing (a), if the ground truth is $(\hat y_1, \hat y_2)
= (-10, 7)$, and the loss function is defined as
\[L(\theta) = \left\|\begin{bmatrix}\hat y_1\\\hat y_2\end{bmatrix}
- f_\theta\left(\begin{bmatrix}x_1\\x_2\end{bmatrix}\right)\right\|^2\]
Perform back propagation and fill out the following table in the answer
sheet. No derivation required.
\begin{tabular}{|c|c|c|c|c|c|c|c|c|}\hline
Variable & $\frac{\partial L}{\partial w^1_{11}}$ & $\frac{\partial L}
{\partial w^1_{12}}$ & $\frac{\partial L}{\partial w^1_{21}}$
& $\frac{\partial L}{\partial w^1_{22}}$ & $\frac{\partial L}
{\partial w^1_{31}}$ & $\frac{\partial L}{\partial w^1_{32}}$
& $\frac{\partial L}{\partial w^1_{41}}$ & $\frac{\partial L}
{\partial w^1_{42}}$\\\hline
Value &&&&&&&&\\\hline
Variable & $\frac{\partial L}{\partial w^2_{11}}$ & $\frac{\partial L}
{\partial w^2_{12}}$ & $\frac{\partial L}{\partial w^2_{21}}$
& $\frac{\partial L}{\partial w^2_{22}}$ & $\frac{\partial L}
{\partial w^2_{31}}$ & $\frac{\partial L}{\partial w^2_{32}}$
& $\frac{\partial L}{\partial w^2_{41}}$ & $\frac{\partial L}
{\partial w^2_{42}}$\\\hline
Value &&&&&&&&\\\hline
\end{tabular}
--
第01話 似乎在課堂上聽過的樣子 第02話 那真是太令人絕望了
第03話 已經沒什麼好期望了 第04話 被當、21都是存在的
第05話 怎麼可能會all pass 第06話 這考卷絕對有問題啊
第07話 你能面對真正的分數嗎 第08話 我,真是個笨蛋
第09話 這樣成績,教授絕不會讓我過的 第10話 再也不依靠考古題
第11話 最後留下的補考 第12話 我最愛的學分
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 36.230.21.38 (臺灣)
※ 文章網址: https://www.ptt.cc/bbs/NTU-Exam/M.1745260641.A.6DF.html
※ 編輯: xavier13540 (36.230.6.106 臺灣), 04/22/2025 15:06:46