Coursera-MachineLearning-Week5题目整理_综合

编号按照对应的内容，1-1代表第一大部分遇到的第一题，R代表Review，C代表运行的代码（Code）。

1-1.

1-1

解：D

使用优化算法来优化，需要提供代价函数J和对应的偏导数。

1-2.

1-2

解：D

FP使用a(1)，然后BP使用y(1)。FP使用a(2)，然后BP使用y(2)。

1-3.

1-3

解：D

如果所有的权重都为0，那么反向传播的结果由δ12决定，所以缺少必要的信息。

2-1.

2-1

解：C

从61到71，转化为1*11的矩阵。

2-2.

2-2

解：B

使用公式带入计算可得为3.0001.

2-3.

2-3

解：B

数值梯度算法非常缓慢。

2-4.

2-4

解：D

不，因为这不能打破对称性。

2-5.

2-5

解：C

画出每一次迭代时代价函数J的图像，来确保代价函数J的值是一直在下降的。

R1-1.

R1-1

解：A

This version is correct, as it takes the “outer product” of the two vectors δ*(3) and a*(2) which is a matrix such that the (i,j)-th entry is δi*(3)?(a(2))*j as desired.

即首先用正向传播方法计算出每一层的激活单元,利用训练集的结果与神经网络预测的结果求出最后一层的误差，然后利用该误差运用反向传播法计算出直至第一层的所有误差。

R1-2.

R1-2

解：A

This choice is correct, since Theta1 has 15 elements, so Theta2 begins at index 16 and ends at index 16 + 24 - 1 = 39.

从16到39，转化为4*6的矩阵。

R1-3.

R1-3

解：B

We compute \frac{(3(1.01)^3 + 2) - (3(0.99)^3 + 2)}{2(0.01)} = 9.00032(0.01)(3(1.01)3+2)?(3(0.99)3+2)=9.0003.

带入计算可得9.0003.

R1-4.

R1-4

解：AB

Checking the gradient numerically is a debugging tool: it helps ensure a correct implementation, but it is too slow to use as a method for actually computing gradients.从数字上检查梯度是一种调试工具:它有助于确保正确的实现，但作为一种实际计算梯度的方法太慢了。

If the gradient computed by backpropagation is the same as one computed numerically with gradient checking, this is very strong evidence that you have a correct implementation of backpropagation.如果通过反向传播计算的梯度与通过梯度检查数值计算的梯度相同，这是反向传播的正确实现的非常有力的证据。

A：为了保证效率，在使用反向传播算法前关闭梯度检验，正确。

B：使用梯度检验可以来检查反向传播是否正确，正确。

C：梯度检验对于梯度下降算法来说非常有用，错误。

D：梯度检验的效率要低于反向传播，错误。

R1-5.

R1-5

解：BC

A：即便是方阵，转置之后也会产生影响，错误。

B：Since gradient descent uses the gradient to take a step toward parameters with lower cost (ie, lower J(Θ)), the value of J(Θ) should be equal or less at each iteration if the gradient computation is correct and the learning rate is set properly.由于gradient descent使用梯度向成本更低的参数迈出一步(即更低的J(Θ))，如果梯度计算正确，学习率设置正确，则每次迭代时J(Θ)的值应该等于或小于。

C：If the learning rate is too large, the cost function can diverge during gradient descent. Thus, you should select a smaller value of α.如果学习率过大，梯度下降过程中代价函数会发散。因此，您应该选择较小的α值。

D：选择大的α值不一定能加速，可能会导致函数发散，无法找到最优解。