Bootstrap

Gan损失函数问题

生成器 G ( x ) G(x) G(x)
鉴别器 D ( x ) D(x) D(x)
r r r是真实的数据
z z z是噪声
g g g是生成器的分布
鉴别器损失函数
设为公式(1)
l o s s D ( x ) = − E x ∼ p r ( x ) ( log ⁡ ( D ( x ) ) ) − E z ∼ p z ( z ) ( log ⁡ ( 1 − D ( G ( z ) ) ) ) = − E x ∼ p r ( x ) ( log ⁡ ( D ( x ) ) ) − E x ∼ p g ( x ) ( log ⁡ ( 1 − D ( x ) ) ) \begin{aligned} &\quad loss_{D} \left(x \right )\\ &=- E_{x \sim p_{r}\left(x \right )} \left(\log \left(D\left(x \right ) \right ) \right ) - E_{z \sim p_{z}\left(z \right )} \left(\log \left(1- D \left(G\left(z \right ) \right ) \right ) \right ) \\ &= - E_{x \sim p_{r}\left(x \right )} \left(\log \left(D\left(x \right ) \right ) \right ) - E_{x \sim p_{g}\left(x \right )} \left(\log \left(1- D \left(x \right ) \right ) \right ) \end{aligned} lossD(x)=Expr(x)(log(D(x)))Ezpz(z)(log(1D(G(z))))=Expr(x)(log(D(x)))Expg(x)(log(1D(x)))
生成器损失函数
第一种
设为公式(2)
l o s s G ( x ) = E z ∼ p z ( z ) ( log ⁡ ( 1 − D ( G ( z ) ) ) ) = E x ∼ p g ( x ) ( log ⁡ ( 1 − D ( x ) ) ) \begin{aligned} &\quad loss_{G} \left(x \right )\\ &=E_{z \sim p_{z}\left(z \right )} \left(\log \left(1-D \left(G\left(z \right ) \right ) \right ) \right ) \\ &=E_{x \sim p_{g}\left(x \right )} \left(\log \left(1- D \left(x \right ) \right ) \right ) \end{aligned} lossG(x)=Ezpz(z)(log(1D(G(z))))=Expg(x)(log(1D(x)))
第二种
设为公式(3)
l o s s G ( x ) = E z ∼ p z ( z ) ( − log ⁡ ( D ( G ( z ) ) ) ) = E x ∼ p g ( x ) ( − log ⁡ ( D ( x ) ) ) \begin{aligned} &\quad loss_{G} \left(x \right )\\ &=E_{z \sim p_{z}\left(z \right )} \left(-\log \left(D \left(G\left(z \right ) \right ) \right ) \right ) \\ &=E_{x \sim p_{g}\left(x \right )} \left(-\log \left(D \left(x \right ) \right ) \right ) \end{aligned} lossG(x)=Ezpz(z)(log(D(G(z))))=Expg(x)(log(D(x)))
最优鉴别器
l o s s D ( x ) loss_{D} \left(x \right ) lossD(x)求导,令其 = 0 =0 =0
D ∗ ( x ) = p r ( x ) p r ( x ) + p g ( x ) D^{*} \left(x \right ) =\frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} D(x)=pr(x)+pg(x)pr(x)
对于公式2,加上一个与 g g g无关的项
E x ∼ p r ( x ) ( log ⁡ ( D ( x ) ) ) + E x ∼ p g ( x ) ( log ⁡ ( 1 − D ( x ) ) ) E_{x \sim p_{r}\left(x \right )} \left(\log \left(D\left(x \right ) \right ) \right ) + E_{x \sim p_{g}\left(x \right )} \left(\log \left(1- D \left(x \right ) \right ) \right ) Expr(x)(log(D(x)))+Expg(x)(log(1D(x)))
代入最优鉴别器
E x ∼ p r ( x ) ( log ⁡ ( D ∗ ( x ) ) ) + E x ∼ p g ( x ) ( log ⁡ ( 1 − D ∗ ( x ) ) ) = E p r log ⁡ p r ( x ) p r ( x ) + p g ( x ) + E p g log ⁡ ( 1 − p r ( x ) p r ( x ) + p g ( x ) ) = E p r log ⁡ p r ( x ) p r ( x ) + p g ( x ) + E p g log ⁡ ( p g ( x ) p r ( x ) + p g ( x ) ) \begin{aligned} &\quad E_{x \sim p_{r}\left(x \right )} \left(\log \left(D^{*}\left(x \right ) \right ) \right ) + E_{x \sim p_{g}\left(x \right )} \left(\log \left(1- D^{*} \left(x \right ) \right ) \right )\\ &=E_{p_{r}}\log \frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} + E_{p_{g}}\log \left(1- \frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} \right )\\ &= E_{p_{r}}\log \frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} + E_{p_{g}}\log \left( \frac{p_{g}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} \right ) \end{aligned} Expr(x)(log(D(x)))+Expg(x)(log(1D(x)))=Eprlogpr(x)+pg(x)pr(x)+Epglog(1pr(x)+pg(x)pr(x))=Eprlogpr(x)+pg(x)pr(x)+Epglog(pr(x)+pg(x)pg(x))
等价于
E p r log ⁡ p r ( x ) p r ( x ) + p g ( x ) 2 + E p g log ⁡ p g ( x ) p r ( x ) + p g ( x ) 2 − 2 log ⁡ 2 = D K L ( p r ∣ ∣ p r ( x ) + p g ( x ) 2 ) + D K L ( p g ∣ ∣ p r ( x ) + p g ( x ) 2 ) − 2 log ⁡ 2 = 2 J S D ( p r ∣ ∣ p g ) − 2 log ⁡ ( 2 ) \begin{aligned} &\quad E_{p_{r}}\log \frac{p_{r}\left(x \right )}{\frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}} + E_{p_{g}}\log \frac{p_{g}\left(x \right )}{\frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}}-2\log2\\ &=D_{KL} \left(p_{r} || \frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}\right )+ D_{KL} \left(p_{g} || \frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}\right ) -2\log2\\ &=2JSD \left(p_{r}|| p_{g} \right ) -2 \log \left(2 \right ) \end{aligned} Eprlog2pr(x)+pg(x)pr(x)+Epglog2pr(x)+pg(x)pg(x)2log2=DKL(pr2pr(x)+pg(x))+DKL(pg2pr(x)+pg(x))2log2=2JSD(prpg)2log(2)
所以越训练鉴别器他就越接近最优鉴别器
最小化生成器, G G G的分布就越接近真实分布
但是如果 G G G的分布和真实分布几乎没有重叠的部分
生成器的损失就会趋于一个常数
证明:
  因为几乎没有重叠,所以,对于 x , p r ≠ 0 x,p_{r}\neq 0 x,pr=0时, p g → 0 , p g ≠ 0 p_{g}\to 0 ,p_{g}\neq 0 pg0,pg=0时, p r → 0 p_{r}\to 0 pr0
  进而 J S JS JS散度为 0 0 0生成器损失为常数 − 2 log ⁡ 2 -2 \log 2 2log2,梯度为 0 0 0,就无法训练了
对于公式3
D K L ( p g ∣ ∣ p r ) = E p g ( log ⁡ p g p r ) = E p g ( log ⁡ p g p r ( x ) + p g ( x ) p r p r ( x ) + p g ( x ) ) = E p g ( log ⁡ 1 − D ∗ ( x ) D ∗ ( x ) ) = E p g ( log ⁡ ( 1 − D ∗ ( x ) ) ) − E p g log ⁡ ( D ∗ ( x ) ) \begin{aligned} &\quad D_{KL} \left(p_{g}|| p_{r} \right ) \\ &= E_{p_{g}} \left( \log \frac{p_{g}}{p_{r}} \right )\\ &= E_{p_{g}} \left( \log \frac{\frac{p_{g}}{p_{r}\left(x \right )+p_{g}\left(x \right )}}{\frac{p_{r}}{p_{r}\left(x \right )+p_{g}\left(x \right )}} \right )\\ &= E_{p_{g}} \left(\log \frac{1-D^{*}\left(x \right )}{D^{*}\left(x \right )} \right )\\ &= E_{p_{g}}\left( \log \left(1-D^{*} \left(x \right ) \right ) \right ) - E_{p_{g}} \log \left( D^{*} \left(x \right ) \right ) \end{aligned} DKL(pgpr)=Epg(logprpg)=Epg(logpr(x)+pg(x)prpr(x)+pg(x)pg)=Epg(logD(x)1D(x))=Epg(log(1D(x)))Epglog(D(x))
所以
E p g ( − log ⁡ ( D ∗ ( x ) ) ) = D K L ( p g ∣ ∣ p r ) − E p g ( log ⁡ ( 1 − D ∗ ( x ) ) ) = D K L ( p g ∣ ∣ p r ) − 2 J S D ( p r ∣ ∣ p g ) + 2 log ⁡ ( 2 ) + E p r ( log ⁡ ( D ∗ ( x ) ) ) \begin{aligned} &\quad E_{p_{g}} \left(- \log \left(D^{*} \left(x \right ) \right ) \right ) \\ &= D_{KL} \left(p_{g} || p_{r} \right ) - E_{p_{g}} \left( \log \left(1- D^{*} \left(x \right ) \right ) \right )\\ &= D_{KL} \left(p_{g} || p_{r} \right ) - 2JSD\left(p_{r}||p_{g} \right )+2\log \left(2 \right )+E_{p_{r}} \left( \log \left( D^{*} \left(x \right ) \right ) \right ) \end{aligned} Epg(log(D(x)))=DKL(pgpr)Epg(log(1D(x)))=DKL(pgpr)2JSD(prpg)+2log(2)+Epr(log(D(x)))
后面两项训练生成器时相当于常数
所以就等价于前两项
最小化KL散度时JS散度就会变大
这个就矛盾了,又要让他们相似,又要拉远

参考
https://blog.csdn.net/Invokar/article/details/88917214

;