生成器
G
(
x
)
G(x)
G(x)
鉴别器
D
(
x
)
D(x)
D(x)
r
r
r是真实的数据
z
z
z是噪声
g
g
g是生成器的分布
鉴别器损失函数
设为公式(1)
l
o
s
s
D
(
x
)
=
−
E
x
∼
p
r
(
x
)
(
log
(
D
(
x
)
)
)
−
E
z
∼
p
z
(
z
)
(
log
(
1
−
D
(
G
(
z
)
)
)
)
=
−
E
x
∼
p
r
(
x
)
(
log
(
D
(
x
)
)
)
−
E
x
∼
p
g
(
x
)
(
log
(
1
−
D
(
x
)
)
)
\begin{aligned} &\quad loss_{D} \left(x \right )\\ &=- E_{x \sim p_{r}\left(x \right )} \left(\log \left(D\left(x \right ) \right ) \right ) - E_{z \sim p_{z}\left(z \right )} \left(\log \left(1- D \left(G\left(z \right ) \right ) \right ) \right ) \\ &= - E_{x \sim p_{r}\left(x \right )} \left(\log \left(D\left(x \right ) \right ) \right ) - E_{x \sim p_{g}\left(x \right )} \left(\log \left(1- D \left(x \right ) \right ) \right ) \end{aligned}
lossD(x)=−Ex∼pr(x)(log(D(x)))−Ez∼pz(z)(log(1−D(G(z))))=−Ex∼pr(x)(log(D(x)))−Ex∼pg(x)(log(1−D(x)))
生成器损失函数
第一种
设为公式(2)
l
o
s
s
G
(
x
)
=
E
z
∼
p
z
(
z
)
(
log
(
1
−
D
(
G
(
z
)
)
)
)
=
E
x
∼
p
g
(
x
)
(
log
(
1
−
D
(
x
)
)
)
\begin{aligned} &\quad loss_{G} \left(x \right )\\ &=E_{z \sim p_{z}\left(z \right )} \left(\log \left(1-D \left(G\left(z \right ) \right ) \right ) \right ) \\ &=E_{x \sim p_{g}\left(x \right )} \left(\log \left(1- D \left(x \right ) \right ) \right ) \end{aligned}
lossG(x)=Ez∼pz(z)(log(1−D(G(z))))=Ex∼pg(x)(log(1−D(x)))
第二种
设为公式(3)
l
o
s
s
G
(
x
)
=
E
z
∼
p
z
(
z
)
(
−
log
(
D
(
G
(
z
)
)
)
)
=
E
x
∼
p
g
(
x
)
(
−
log
(
D
(
x
)
)
)
\begin{aligned} &\quad loss_{G} \left(x \right )\\ &=E_{z \sim p_{z}\left(z \right )} \left(-\log \left(D \left(G\left(z \right ) \right ) \right ) \right ) \\ &=E_{x \sim p_{g}\left(x \right )} \left(-\log \left(D \left(x \right ) \right ) \right ) \end{aligned}
lossG(x)=Ez∼pz(z)(−log(D(G(z))))=Ex∼pg(x)(−log(D(x)))
最优鉴别器
对
l
o
s
s
D
(
x
)
loss_{D} \left(x \right )
lossD(x)求导,令其
=
0
=0
=0
D
∗
(
x
)
=
p
r
(
x
)
p
r
(
x
)
+
p
g
(
x
)
D^{*} \left(x \right ) =\frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )}
D∗(x)=pr(x)+pg(x)pr(x)
对于公式2,加上一个与
g
g
g无关的项
E
x
∼
p
r
(
x
)
(
log
(
D
(
x
)
)
)
+
E
x
∼
p
g
(
x
)
(
log
(
1
−
D
(
x
)
)
)
E_{x \sim p_{r}\left(x \right )} \left(\log \left(D\left(x \right ) \right ) \right ) + E_{x \sim p_{g}\left(x \right )} \left(\log \left(1- D \left(x \right ) \right ) \right )
Ex∼pr(x)(log(D(x)))+Ex∼pg(x)(log(1−D(x)))
代入最优鉴别器
E
x
∼
p
r
(
x
)
(
log
(
D
∗
(
x
)
)
)
+
E
x
∼
p
g
(
x
)
(
log
(
1
−
D
∗
(
x
)
)
)
=
E
p
r
log
p
r
(
x
)
p
r
(
x
)
+
p
g
(
x
)
+
E
p
g
log
(
1
−
p
r
(
x
)
p
r
(
x
)
+
p
g
(
x
)
)
=
E
p
r
log
p
r
(
x
)
p
r
(
x
)
+
p
g
(
x
)
+
E
p
g
log
(
p
g
(
x
)
p
r
(
x
)
+
p
g
(
x
)
)
\begin{aligned} &\quad E_{x \sim p_{r}\left(x \right )} \left(\log \left(D^{*}\left(x \right ) \right ) \right ) + E_{x \sim p_{g}\left(x \right )} \left(\log \left(1- D^{*} \left(x \right ) \right ) \right )\\ &=E_{p_{r}}\log \frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} + E_{p_{g}}\log \left(1- \frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} \right )\\ &= E_{p_{r}}\log \frac{p_{r}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} + E_{p_{g}}\log \left( \frac{p_{g}\left(x \right )}{p_{r}\left(x \right )+p_{g}\left(x \right )} \right ) \end{aligned}
Ex∼pr(x)(log(D∗(x)))+Ex∼pg(x)(log(1−D∗(x)))=Eprlogpr(x)+pg(x)pr(x)+Epglog(1−pr(x)+pg(x)pr(x))=Eprlogpr(x)+pg(x)pr(x)+Epglog(pr(x)+pg(x)pg(x))
等价于
E
p
r
log
p
r
(
x
)
p
r
(
x
)
+
p
g
(
x
)
2
+
E
p
g
log
p
g
(
x
)
p
r
(
x
)
+
p
g
(
x
)
2
−
2
log
2
=
D
K
L
(
p
r
∣
∣
p
r
(
x
)
+
p
g
(
x
)
2
)
+
D
K
L
(
p
g
∣
∣
p
r
(
x
)
+
p
g
(
x
)
2
)
−
2
log
2
=
2
J
S
D
(
p
r
∣
∣
p
g
)
−
2
log
(
2
)
\begin{aligned} &\quad E_{p_{r}}\log \frac{p_{r}\left(x \right )}{\frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}} + E_{p_{g}}\log \frac{p_{g}\left(x \right )}{\frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}}-2\log2\\ &=D_{KL} \left(p_{r} || \frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}\right )+ D_{KL} \left(p_{g} || \frac{p_{r}\left(x \right )+p_{g}\left(x \right )}{2}\right ) -2\log2\\ &=2JSD \left(p_{r}|| p_{g} \right ) -2 \log \left(2 \right ) \end{aligned}
Eprlog2pr(x)+pg(x)pr(x)+Epglog2pr(x)+pg(x)pg(x)−2log2=DKL(pr∣∣2pr(x)+pg(x))+DKL(pg∣∣2pr(x)+pg(x))−2log2=2JSD(pr∣∣pg)−2log(2)
所以越训练鉴别器他就越接近最优鉴别器
最小化生成器,
G
G
G的分布就越接近真实分布
但是如果
G
G
G的分布和真实分布几乎没有重叠的部分
生成器的损失就会趋于一个常数
证明:
因为几乎没有重叠,所以,对于
x
,
p
r
≠
0
x,p_{r}\neq 0
x,pr=0时,
p
g
→
0
,
p
g
≠
0
p_{g}\to 0 ,p_{g}\neq 0
pg→0,pg=0时,
p
r
→
0
p_{r}\to 0
pr→0
进而
J
S
JS
JS散度为
0
0
0生成器损失为常数
−
2
log
2
-2 \log 2
−2log2,梯度为
0
0
0,就无法训练了
对于公式3
D
K
L
(
p
g
∣
∣
p
r
)
=
E
p
g
(
log
p
g
p
r
)
=
E
p
g
(
log
p
g
p
r
(
x
)
+
p
g
(
x
)
p
r
p
r
(
x
)
+
p
g
(
x
)
)
=
E
p
g
(
log
1
−
D
∗
(
x
)
D
∗
(
x
)
)
=
E
p
g
(
log
(
1
−
D
∗
(
x
)
)
)
−
E
p
g
log
(
D
∗
(
x
)
)
\begin{aligned} &\quad D_{KL} \left(p_{g}|| p_{r} \right ) \\ &= E_{p_{g}} \left( \log \frac{p_{g}}{p_{r}} \right )\\ &= E_{p_{g}} \left( \log \frac{\frac{p_{g}}{p_{r}\left(x \right )+p_{g}\left(x \right )}}{\frac{p_{r}}{p_{r}\left(x \right )+p_{g}\left(x \right )}} \right )\\ &= E_{p_{g}} \left(\log \frac{1-D^{*}\left(x \right )}{D^{*}\left(x \right )} \right )\\ &= E_{p_{g}}\left( \log \left(1-D^{*} \left(x \right ) \right ) \right ) - E_{p_{g}} \log \left( D^{*} \left(x \right ) \right ) \end{aligned}
DKL(pg∣∣pr)=Epg(logprpg)=Epg(logpr(x)+pg(x)prpr(x)+pg(x)pg)=Epg(logD∗(x)1−D∗(x))=Epg(log(1−D∗(x)))−Epglog(D∗(x))
所以
E
p
g
(
−
log
(
D
∗
(
x
)
)
)
=
D
K
L
(
p
g
∣
∣
p
r
)
−
E
p
g
(
log
(
1
−
D
∗
(
x
)
)
)
=
D
K
L
(
p
g
∣
∣
p
r
)
−
2
J
S
D
(
p
r
∣
∣
p
g
)
+
2
log
(
2
)
+
E
p
r
(
log
(
D
∗
(
x
)
)
)
\begin{aligned} &\quad E_{p_{g}} \left(- \log \left(D^{*} \left(x \right ) \right ) \right ) \\ &= D_{KL} \left(p_{g} || p_{r} \right ) - E_{p_{g}} \left( \log \left(1- D^{*} \left(x \right ) \right ) \right )\\ &= D_{KL} \left(p_{g} || p_{r} \right ) - 2JSD\left(p_{r}||p_{g} \right )+2\log \left(2 \right )+E_{p_{r}} \left( \log \left( D^{*} \left(x \right ) \right ) \right ) \end{aligned}
Epg(−log(D∗(x)))=DKL(pg∣∣pr)−Epg(log(1−D∗(x)))=DKL(pg∣∣pr)−2JSD(pr∣∣pg)+2log(2)+Epr(log(D∗(x)))
后面两项训练生成器时相当于常数
所以就等价于前两项
最小化KL散度时JS散度就会变大
这个就矛盾了,又要让他们相似,又要拉远