当前位置: 代码迷 >> 综合 >> 专业英语(5、Backpropagation)
  详细解决方案

专业英语(5、Backpropagation)

热度:83   发布时间:2023-11-22 03:38:13.0

Computational graphs(计算图)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-6Q1yNmxd-1588694834360)(WEBRESOURCE0ebdca55c84b4f0a3a861f8b13628580)]

Convolutional network(卷积网络)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jCew9tMl-1588694834364)(WEBRESOURCE79d0283e5eedbdffe1a5360d85da6ab4)]

Neural Turing Machine(神经图灵机)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FUZdu8Iw-1588694834366)(WEBRESOURCE6cfe41746d14dcc11b6cda21cda0a131)]

Why we need backpropagation?

Problem statement

The core problem studied in this section is as
follows: We are given some function f(x) where x is a vector of
inputs and we are interested in computing the gradient
of f at x (i.e.??(?) ).

Motivation

In the specific case of Neural Networks, f(x) will
correspond to the loss function (L) and the inputs x will consist of the training data and the neural network weights. The training data is given and parameters (e.g. W and b) are variables. We need to compute the gradient for all the parameters so that we can use it to perform a parameter update.

Backpropagation: a simple example

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-WqftopAV-1588694834369)(WEBRESOURCEcad7c5d0a768071367503bc029ed58db)]

  • Q1: What is a add gate?

  • Q2:What is a max gate?

  • Q3:What is a mul gate?

  • A1:add gate: gradient distributor(梯度分配器)

  • A2:max gate: gradient router(梯度路由器)

  • A3:mul gate: gradient switcher(梯度切换器)

Another example:

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-a4ecCliN-1588694834372)(WEBRESOURCE34b34a80606ec507abee2848fefb9ae9)]


[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-W57M9svd-1588694834375)(WEBRESOURCE3d6598a048d264470c3998dccfd55d35)]


[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-9NG3qIb2-1588694834376)(WEBRESOURCE70b036736314cf74a2d2e318d7fad059)]


[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vukknNeW-1588694834377)(WEBRESOURCE44d60d11f4318fabd55d4dbb47891750)]


[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-YnLwphw4-1588694834378)(WEBRESOURCEd7f8a0b0f0c211ec7fd9c17931985456)]

What about autograd?

  • Deep learning frameworks can automatically perform backprop!
  • Problems might surface related to underlying gradients when
    debugging your models.
  • 深度学习框架可以自动执行backprop(反向传播)!
  • 调试模型时,问题可能涉及表面的对底层梯度。

Modularized implementation: forward / backward API(模块化实现:前向/后向API)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vsnp5B1j-1588694834379)(WEBRESOURCEc7d0193d397227b5f6207ffafcc2e82a)]


[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JK7bWSuR-1588694834380)(WEBRESOURCE28ef78cdeef3f4376016bcd1e6da1019)]


例如:

f(x,y)=x+σ(y)σ(x)+(x+y)2,其中σ(x)=11+e?xf(x,y)=\frac{x+\sigma(y)}{\sigma(x)+(x+y)^2},其中\sigma(x)=\frac{1}{1+e^{-x}}f(x,y)=σ(x)+(x+y)2x+σ(y)?,σx=1+e?x1?

构建前向传播的代码模式:
x = 3 # 例子数值
y = -4# 前向传播
sigy = 1.0 / (1 + math.exp(-y)) # 分子中的sigmoi          #(1)
num = x + sigy # 分子                                    #(2)
sigx = 1.0 / (1 + math.exp(-x)) # 分母中的sigmoid         #(3)
xpy = x + y                                              #(4)
xpysqr = xpy**2                                          #(5)
den = sigx + xpysqr # 分母                                #(6)
invden = 1.0 / den                                       #(7)
f = num * invden # 搞定!                                 #(8)
构建后向传播的代码模式:
# 回传 f = num * invden
dnum = invden # 分子的梯度                                         #(8)
dinvden = num                                                     #(8)
# 回传 invden = 1.0 / den 
dden = (-1.0 / (den**2)) * dinvden                                #(7)
# 回传 den = sigx + xpysqr
dsigx = (1) * dden                                                #(6)
dxpysqr = (1) * dden                                              #(6)
# 回传 xpysqr = xpy**2
dxpy = (2 * xpy) * dxpysqr                                        #(5)
# 回传 xpy = x + y
dx = (1) * dxpy                                                   #(4)
dy = (1) * dxpy                                                   #(4)
# 回传 sigx = 1.0 / (1 + math.exp(-x))
dx += ((1 - sigx) * sigx) * dsigx # Notice += !! See notes below  #(3)
# 回传 num = x + sigy
dx += (1) * dnum                                                  #(2)
dsigy = (1) * dnum                                                #(2)
# 回传 sigy = 1.0 / (1 + math.exp(-y))
dy += ((1 - sigy) * sigy) * dsigy                                 #(1)
# 完成! 嗷~~

  • 前向传播:从输入计算到输出(绿色)
  • 反向传播:从尾部开始,根据链式法则递归地向前计算梯度(显示为红色),一直到网络的输入端。

?C?w\frac{\partial C}{\partial w}?w?C?

可以分解为两步:

1、

?z?w\frac{\partial z}{\partial w}?w?z?

这是个Forward pass(这个其实是求导的最后一步,可以从前向后直接得到)

2、

?C?z\frac{\partial C}{\partial z}?z?C?

这是个Backward pass(这部分不能直接求出来,需要运用递归思想,从后向前计算)

Summary so far…

  • neural nets will be very large: impractical to write down gradient formula
    by hand for all parameters
  • backpropagation = recursive application of the chain rule along a
    computational graph to compute the gradients of all
    inputs/parameters/intermediates
  • implementations maintain a graph structure, where the nodes implement forward() / backward() API
  • forward: compute result of an operation and save any intermediates
    needed for gradient computation in memory
  • backward: apply the chain rule to compute the gradient of the loss
    function with respect to the inputs

  • 神经网络将非常大:用手写下所有参数的梯度公式是不切实际的

  • backpropagation =沿着计算图递归应用链规则来计算所有的梯度

    输入/参数/中间体

  • 实现维护图结构,其中节点实现forward()/ backward()API

  • 计算操作结果并保存内存中梯度计算所需的任何中间体

  • 应用链规则计算相对于输入的损失函数的梯度