提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档
ilqr原理及公式推导
注:以后内容包含许多个人理解,如有错误,疏漏欢迎讨论指正
前言
LQR(线性二次调节器)和 iLQR(迭代线性二次调节器)都是用于控制系统优化的算法,但它们在适用范围和具体实现上有显著区别。
LQR(Linear Quadratic Regulator)
- 线性系统:LQR假设系统的动力学是线性的,描述如下:
x
t
+
1
=
A
x
t
+
B
u
t
x_{t+1} = Ax_t+Bu_t
xt+1=Axt+But
其中
x
x
x是状态向量,
u
u
u 是控制输入,
A
A
A 和
B
B
B 是系统矩阵。
- 二次性能指标:
J = ∫ 0 ∞ ( x T Q x + u T R u ) d t J = \int_{0}^{\infty} \left( x^T Q x + u^T R u \right) \, dt J=∫0∞(xTQx+uTRu)dt
其中 Q Q Q 和 R R R 是权重矩阵,用于平衡状态偏离和控制输入的权重。
iLQR(Iterative Linear Quadratic Regulator)
- 非线性系统:iLQR适用于非线性系统,描述如下:
x t + 1 = f ( x t , u t ) x_{t+1} = f(x_t,u_t) xt+1=f(xt,ut)
其中 f f f 是系统的非线性函数。
总结
- LQR 适用于线性系统,并且可以直接求解获得解析解。
- iLQR 适用于非线性系统,通过迭代的方式不断逼近最优解,每次迭代中需要线性化系统并求解LQR子问题。
也就是把非线性的问题通过数学方法变成线性问题在求解,因此后面先介绍如何求解LQR问题。
一、lqr问题求解
1. 问题定义
min
u
1
,
…
,
u
T
,
x
1
,
…
,
x
T
∑
t
=
1
T
c
(
x
t
,
u
t
)
s.t
x
t
=
f
(
x
t
−
1
,
u
t
−
1
)
(1)
\min_{u_1,\ldots,u_T,x_1,\ldots,x_T}\sum_{t=1}^Tc(x_t,u_t)\text{ s.t }x_t=f(x_{t-1},u_{t-1}) \tag{1}
u1,…,uT,x1,…,xTmint=1∑Tc(xt,ut) s.t xt=f(xt−1,ut−1)(1)
其中
c
(
x
t
,
u
t
)
c(x_t,u_t)
c(xt,ut)为代价函数,
f
f
f为系统函数,通过系统函数公式(1)可以写成下式:
min
u
1
,
…
,
u
T
c
(
x
1
,
u
1
)
+
c
(
f
(
x
1
,
u
1
)
,
u
2
)
+
⋯
+
c
(
f
(
f
(
…
)
…
)
,
u
T
)
(2)
\min\limits_{u_1,\dots,u_T}c(x_1,u_1)+c(f(x_1,u_1),u_2)+\dots+c(f(f(\dots)\dots),u_T) \tag{2}
u1,…,uTminc(x1,u1)+c(f(x1,u1),u2)+⋯+c(f(f(…)…),uT)(2)
其中lqr是线性二次调节器,其系统方程为线性,其代价函数为二次型,因此:
f
(
x
t
,
u
t
)
=
F
t
[
x
t
u
t
]
+
f
t
(3)
{f(\mathbf{x}_t,\mathbf{u}_t)=\mathbf{F}_t\left[\begin{array}{c}\mathbf{x}_t\\\mathbf{u}_t\end{array}\right]+\mathbf{f}_t} \tag{3}
f(xt,ut)=Ft[xtut]+ft(3)
c
(
x
t
,
u
t
)
=
1
2
[
x
t
u
t
]
T
C
t
[
x
t
u
t
]
+
[
x
t
u
t
]
T
c
t
(4)
c(\mathbf{x}_t,\mathbf{u}_t)=\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_t\\\mathbf{u}_t\end{array}\right]^T\mathbf{C}_t\left[\begin{array}{c}\mathbf{x}_t\\\mathbf{u}_t\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_t\\\mathbf{u}_t\end{array}\right]^T\mathbf{c}_t \tag{4}
c(xt,ut)=21[xtut]TCt[xtut]+[xtut]Tct(4)
为了便于后续公式推导,先声明一些概念
- Q ( x t , u t ) Q(x_t,u_t) Q(xt,ut)是在状态 x t x_t xt和动作 u t u_t ut下的代价加上之后所有时间步长的最优代价之和。假设未来的策略都是最优的。 Q ( x t , u t ) = ℓ ( x t , u t ) + V ( x t + 1 ) Q(x_t,u_t)=\ell(x_t,u_t)+V(x_{t+1}) Q(xt,ut)=ℓ(xt,ut)+V(xt+1),其中 ℓ ( x t , u t ) \ell(x_t,u_t) ℓ(xt,ut)为当前阶段的代价
- V ( x t ) V(x_t) V(xt)是在状态 x t x_t xt下最优策略的代价函数,即所有可能动作中最小的$Q(x_t,u_t)
-
同时
V ( x t ) = m i n Q ( x t , u t ) V(x_t)= minQ(x_t,u_t) V(xt)=minQ(xt,ut)
4. 公式推导
从
u
T
u_T
uT开始向后迭代,此时
Q
(
x
t
,
u
t
)
Q(x_t,u_t)
Q(xt,ut)为:
Q
(
x
T
,
u
T
)
=
1
2
[
x
T
u
T
]
T
C
T
[
x
T
u
T
]
+
[
x
T
u
T
]
T
c
T
(5)
Q(\mathbf{x}_T,\mathbf{u}_T)=\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_T\\\mathbf{u}_T\end{array}\right]^T\mathbf{C}_T\left[\begin{array}{c}\mathbf{x}_T\\\mathbf{u}_T\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_T\\\mathbf{u}_T\end{array}\right]^T\mathbf{c}_T \tag{5}
Q(xT,uT)=21[xTuT]TCT[xTuT]+[xTuT]TcT(5)
此时的代价矩阵
C
T
=
[
C
x
T
,
x
T
C
x
T
,
u
T
C
u
T
,
x
T
C
u
T
,
u
T
]
\mathbf{C}_T=\left[\begin{array}{cc}\mathbf{C}_{\mathbf{x}_T,\mathbf{x}_T}&\mathbf{C}_{\mathbf{x}_T,\mathbf{u}_T}\\\mathbf{C}_{\mathbf{u}_T,\mathbf{x}_T}&\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}\end{array}\right]
CT=[CxT,xTCuT,xTCxT,uTCuT,uT],
c
T
=
⌊
c
x
T
c
u
T
⌋
\mathbf{c}_T=\left\lfloor\begin{array}{c}\mathbf{c}_{\mathbf{x}_T}\\\mathbf{c}_{\mathbf{u}_T}\end{array}\right\rfloor
cT=⌊cxTcuT⌋
为了寻找最优控制
u
T
,
Q
(
x
T
,
u
T
)
u_T,Q(x_T,u_T)
uT,Q(xT,uT)对
u
T
u_T
uT求导并令其等于
0
0
0可得到:
∇
u
T
Q
(
x
T
,
u
T
)
=
C
u
T
,
x
T
x
T
+
C
u
T
,
u
T
u
T
+
c
u
T
T
=
0
(6)
\nabla_{\mathbf{u}_T}Q(\mathbf{x}_T,\mathbf{u}_T)=\mathbf{C}_{\mathbf{u}_T,\mathbf{x}_T}\mathbf{x}_T+\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}\mathbf{u}_T+\mathbf{c}_{\mathbf{u}_T}^T=0 \tag{6}
∇uTQ(xT,uT)=CuT,xTxT+CuT,uTuT+cuTT=0(6)
此时可以解出
u
T
=
−
C
u
T
,
u
T
−
1
(
C
u
T
,
x
T
x
T
+
c
u
T
)
\mathbf{u}_T=-\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}^{-1}\left(\mathbf{C}_{\mathbf{u}_T,\mathbf{x}_T}\mathbf{x}_T+\mathbf{c}_{\mathbf{u}_T}\right)
uT=−CuT,uT−1(CuT,xTxT+cuT),此时令
K
T
=
−
C
u
T
,
u
T
−
1
C
u
T
,
x
T
\mathbf{K}_T=-\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}^{-1}\mathbf{C}_{\mathbf{u}_T,\mathbf{x}_T}
KT=−CuT,uT−1CuT,xT,
k
T
=
−
C
u
T
,
u
T
−
1
c
u
T
\mathbf{k}_T=-\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}^{-1}\mathbf{c}_{\mathbf{u}_T}
kT=−CuT,uT−1cuT,则最优控制
u
T
u_T
uT可以写成:
u
T
=
K
T
x
T
+
k
T
(7)
\mathbf u_T=\mathbf K_T\mathbf x_T+\mathbf k_T \tag{7}
uT=KTxT+kT(7)
前面定义中:
V
(
x
t
)
=
m
i
n
Q
(
x
t
,
u
t
)
V(x_t)= minQ(x_t,u_t)
V(xt)=minQ(xt,ut),因此用最优
u
T
u_T
uT替换
Q
(
x
t
,
u
t
)
Q(x_t,u_t)
Q(xt,ut)中的
u
t
u_t
ut,此时我们可以得到
V
T
V_T
VT为:
V
(
x
T
)
=
const
+
1
2
[
x
T
K
T
x
T
+
k
T
]
T
C
T
[
x
T
K
T
x
T
+
k
T
]
+
[
x
T
K
T
x
T
+
k
T
]
T
c
T
(8)
V\left(\mathbf{x}_{T}\right)=\text { const }+\frac{1}{2}\left[\begin{array}{c} \mathbf{x}_{T} \\ \mathbf{K}_{T} \mathbf{x}_{T}+\mathbf{k}_{T} \end{array}\right]^{T} \mathbf{C}_{T}\left[\begin{array}{c} \mathbf{x}_{T} \\ \mathbf{K}_{T} \mathbf{x}_{T}+\mathbf{k}_{T} \end{array}\right]+\left[\begin{array}{c} \mathbf{x}_{T} \\ \mathbf{K}_{T} \mathbf{x}_{T}+\mathbf{k}_{T} \end{array}\right]^{T} \mathbf{c}_{T} \tag{8}
V(xT)= const +21[xTKTxT+kT]TCT[xTKTxT+kT]+[xTKTxT+kT]TcT(8)
将公式8展开可以得到:
V
(
x
T
)
=
1
2
x
T
T
C
x
T
,
x
T
x
T
+
1
2
x
T
T
C
x
T
,
u
T
K
T
x
T
+
1
2
x
T
T
K
T
T
C
u
T
,
x
T
x
T
+
1
2
x
T
T
K
T
T
C
u
T
,
u
T
K
T
x
T
+
x
T
T
K
T
T
C
u
T
,
u
T
k
T
+
1
2
x
T
T
C
x
T
,
u
T
k
T
+
x
T
T
c
x
T
+
x
T
T
K
T
T
c
u
T
+
c
o
n
s
t
(9)
V(\mathbf{x}_{T})=\frac{1}{2}\mathbf{x}_{T}^{T}\mathbf{C}_{\mathbf{x}_{T},\mathbf{x}_{T}}\mathbf{x}_{T}+\frac{1}{2}\mathbf{x}_{T}^{T}\mathbf{C}_{\mathbf{x}_{T},\mathbf{u}_{T}}\mathbf{K}_{T}\mathbf{x}_{T}+\frac{1}{2}\mathbf{x}_{T}^{T}\mathbf{K}_{T}^{T}\mathbf{C}_{\mathbf{u}_{T},\mathbf{x}_{T}}\mathbf{x}_{T}+\frac{1}{2}\mathbf{x}_{T}^{T}\mathbf{K}_{T}^{T}\mathbf{C}_{\mathbf{u}_{T},\mathbf{u}_{T}}\mathbf{K}_{T}\mathbf{x}_{T}+\\\mathbf{x}_{T}^{T}\mathbf{K}_{T}^{T}\mathbf{C}_{\mathbf{u}_{T},\mathbf{u}_{T}}\mathbf{k}_{T}+\frac{1}{2}\mathbf{x}_{T}^{T}\mathbf{C}_{\mathbf{x}_{T},\mathbf{u}_{T}}\mathbf{k}_{T}+\mathbf{x}_{T}^{T}\mathbf{c}_{\mathbf{x}_{T}}+\mathbf{x}_{T}^{T}\mathbf{K}_{T}^{T}\mathbf{c}_{\mathbf{u}_{T}}+\mathrm{const} \tag{9}
V(xT)=21xTTCxT,xTxT+21xTTCxT,uTKTxT+21xTTKTTCuT,xTxT+21xTTKTTCuT,uTKTxT+xTTKTTCuT,uTkT+21xTTCxT,uTkT+xTTcxT+xTTKTTcuT+const(9)
令
V
T
=
C
x
T
,
x
T
+
C
x
T
,
u
T
K
T
+
K
T
T
C
u
T
,
x
T
+
K
T
T
C
u
T
,
u
T
K
T
\mathbf{V}_T=\mathbf{C}_{\mathbf{x}_T,\mathbf{x}_T}+\mathbf{C}_{\mathbf{x}_T,\mathbf{u}_T}\mathbf{K}_T+\mathbf{K}_T^T\mathbf{C}_{\mathbf{u}_T,\mathbf{x}_T}+\mathbf{K}_T^T\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}\mathbf{K}_T
VT=CxT,xT+CxT,uTKT+KTTCuT,xT+KTTCuT,uTKT,同时
v
T
=
c
x
T
+
C
x
T
,
u
T
k
T
+
K
T
T
C
u
T
+
K
T
T
C
u
T
,
u
T
k
T
\mathbf{v}_T=\mathbf{c}_{\mathbf{x}_T}+\mathbf{C}_{\mathbf{x}_T,\mathbf{u}_T}\mathbf{k}_T+\mathbf{K}_T^T\mathbf{C}_{\mathbf{u}_T}+\mathbf{K}_T^T\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}\mathbf{k}_T
vT=cxT+CxT,uTkT+KTTCuT+KTTCuT,uTkT,因此公式9可以简化为:
V
(
x
T
)
=
c
o
n
s
t
+
1
2
x
T
T
V
T
x
T
+
x
T
T
v
T
(10)
V(\mathbf{x}_T)=\mathrm{const}+\frac{1}{2}\mathbf{x}_T^T\mathbf{V}_T\mathbf{x}_T+\mathbf{x}_T^T\mathbf{v}_T \tag{10}
V(xT)=const+21xTTVTxT+xTTvT(10)
继续向前迭代到
T
−
1
T-1
T−1步,此时:
Q
(
x
T
−
1
,
u
T
−
1
)
=
const
+
1
2
[
x
T
−
1
u
T
−
1
]
T
C
T
−
1
[
x
T
−
1
u
T
−
1
]
+
[
x
T
−
1
u
T
−
1
]
T
c
T
−
1
+
V
(
f
(
x
T
−
1
,
u
T
−
1
)
)
(11)
Q(\mathbf{x}_{T-1},\mathbf{u}_{T-1})=\text{const}+\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{C}_{T-1}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{c}_{T-1}+V(f(\mathbf{x}_{T-1},\mathbf{u}_{T-1})) \tag{11}
Q(xT−1,uT−1)=const+21[xT−1uT−1]TCT−1[xT−1uT−1]+[xT−1uT−1]TcT−1+V(f(xT−1,uT−1))(11)
其中第二部分为当前状态代价,第三部分为最佳cost-to-go,即当前到目标状态
x
T
x_T
xT的累计成本,同时根据系统函数
x
T
=
f
(
x
T
−
1
,
u
T
−
1
)
x_T=f(x_{T-1},u_{T-1})
xT=f(xT−1,uT−1),因此第三部分也就是公式10.
根据前文描述我们可以知道系统函数为:
f
(
x
T
−
1
,
u
T
−
1
)
=
x
T
=
F
T
−
1
[
x
T
−
1
u
T
−
1
]
+
f
T
−
1
\left.f(\mathbf{x}_{T-1},\mathbf{u}_{T-1})=\mathbf{x}_T=\mathbf{F}_{T-1}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right.\right]+\mathbf{f}_{T-1}
f(xT−1,uT−1)=xT=FT−1[xT−1uT−1]+fT−1,因此
V
(
x
T
)
V(x_T)
V(xT)可以表示为:
V
(
x
T
)
=
c
o
n
s
t
+
1
2
[
x
T
−
1
u
T
−
1
]
T
F
T
−
1
T
V
T
F
T
−
1
[
x
T
−
1
u
T
−
1
]
+
[
x
T
−
1
u
T
−
1
]
T
F
T
−
1
T
V
T
f
T
−
1
+
[
x
T
−
1
u
T
−
1
]
T
F
T
−
1
T
v
T
(12)
\left.V(\mathbf{x}_T)=\mathrm{const}+\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right.\right]^T\mathbf{F}_{T-1}^T\mathbf{V}_T\mathbf{F}_{T-1}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{F}_{T-1}^T\mathbf{V}_T\mathbf{f}_{T-1}+ \\ \left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{F}_{T-1}^T\mathbf{v}_T \tag{12}
V(xT)=const+21[xT−1uT−1]TFT−1TVTFT−1[xT−1uT−1]+[xT−1uT−1]TFT−1TVTfT−1+[xT−1uT−1]TFT−1TvT(12)
因此公式(11)可以写成:
Q
(
x
T
−
1
,
u
T
−
1
)
=
c
o
n
s
t
+
1
2
[
x
T
−
1
u
T
−
1
]
T
Q
T
−
1
[
x
T
−
1
u
T
−
1
]
+
[
x
T
−
1
u
T
−
1
]
T
q
T
−
1
Q(\mathbf{x}_{T-1},\mathbf{u}_{T-1})=\mathrm{const}+\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{Q}_{T-1}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{q}_{T-1}
Q(xT−1,uT−1)=const+21[xT−1uT−1]TQT−1[xT−1uT−1]+[xT−1uT−1]TqT−1
其中
Q
T
−
1
=
C
T
−
1
+
F
T
−
1
T
V
T
F
T
−
1
\mathbf{Q}_{T-1}=\mathbf{C}_{T-1}+\mathbf{F}_{T-1}^T\mathbf{V}_T\mathbf{F}_{T-1}
QT−1=CT−1+FT−1TVTFT−1,同时
q
T
−
1
=
c
T
−
1
+
F
T
−
1
T
V
T
f
T
−
1
+
F
T
−
1
T
v
T
\mathbf{q}_{T-1}=\mathbf{c}_{T-1}+\mathbf{F}_{T-1}^T\mathbf{V}_T\mathbf{f}_{T-1}+\mathbf{F}_{T-1}^T\mathbf{v}_T
qT−1=cT−1+FT−1TVTfT−1+FT−1TvT
此时我们已经将
Q
(
x
T
−
1
,
u
T
−
1
)
Q(x_{T-1},u_{T-1})
Q(xT−1,uT−1)用
x
T
−
1
,
u
T
−
1
x_{T-1},u_{T-1}
xT−1,uT−1表示出来,此时
Q
(
x
T
−
1
,
u
T
−
1
)
Q(x_{T-1},u_{T-1})
Q(xT−1,uT−1)对
u
T
−
1
u_{T-1}
uT−1求导并令其为0则:
∇
u
T
−
1
Q
(
x
T
−
1
,
u
T
−
1
)
=
Q
u
T
−
1
,
x
T
−
1
x
T
−
1
+
Q
u
T
−
1
,
u
T
−
1
u
T
−
1
+
q
u
T
−
1
T
=
0
(13)
\nabla_{\mathbf{u}_{T-1}}Q(\mathbf{x}_{T-1},\mathbf{u}_{T-1})=\mathbf{Q}_{\mathbf{u}_{T-1},\mathbf{x}_{T-1}}\mathbf{x}_{T-1}+\mathbf{Q}_{\mathbf{u}_{T-1},\mathbf{u}_{T-1}}\mathbf{u}_{T-1}+\mathbf{q}_{\mathbf{u}_{T-1}}^T=0 \tag{13}
∇uT−1Q(xT−1,uT−1)=QuT−1,xT−1xT−1+QuT−1,uT−1uT−1+quT−1T=0(13)
此时可以结出最优
u
T
−
1
=
K
T
−
1
x
T
−
1
+
k
T
−
1
\mathbf u_{T-1}=\mathbf K_{T-1}\mathbf x_{T-1}+\mathbf k_{T-1}
uT−1=KT−1xT−1+kT−1,其中
K
T
−
1
=
−
Q
u
T
−
1
,
u
T
−
1
−
1
Q
u
T
−
1
,
x
T
−
1
\mathbf{K}_{T-1}=-\mathbf{Q}_{\mathbf{u}_{T-1},\mathbf{u}_{T-1}}^{-1}\mathbf{Q}_{\mathbf{u}_{T-1},\mathbf{x}_{T-1}}
KT−1=−QuT−1,uT−1−1QuT−1,xT−1,同时
k
T
−
1
=
−
Q
u
T
−
1
,
u
T
−
1
−
1
q
u
T
−
1
\mathbf{k}_{T-1}=-\mathbf{Q}_{\mathbf{u}_{T-1},\mathbf{u}_{T-1}}^{-1}\mathbf{q}\mathbf{u}_{T-1}
kT−1=−QuT−1,uT−1−1quT−1
不断向前迭代直到
t
=
1
t=1
t=1
f
o
r
t
=
T
to 1:
Q
t
=
C
t
+
F
t
T
V
t
+
1
F
t
q
t
=
c
t
+
F
t
T
V
t
+
1
f
t
+
F
t
T
v
t
+
1
Q
(
x
t
,
u
t
)
=
c
o
n
s
t
+
1
2
[
x
t
u
t
]
T
Q
t
[
x
t
u
t
]
+
[
x
t
u
t
]
T
q
t
u
t
←
arg
min
u
t
Q
(
x
t
,
u
t
)
=
K
t
x
t
+
k
t
K
t
=
−
Q
u
t
,
u
t
−
1
Q
u
t
,
x
t
k
t
=
−
Q
u
t
,
u
t
−
1
q
u
t
V
t
=
Q
x
t
,
x
t
+
Q
x
t
,
u
t
K
t
+
K
t
T
Q
u
t
,
x
t
+
K
t
T
Q
u
t
,
u
t
K
t
v
t
=
q
x
t
+
Q
x
t
,
u
t
k
t
+
K
t
T
Q
u
t
+
K
t
T
Q
u
t
,
u
t
k
V
(
x
t
)
=
c
o
n
s
t
+
1
2
x
t
T
V
t
x
t
+
x
t
T
v
t
(14)
\begin{aligned} \mathrm{for}& t=T\text{ to 1:} \\ &\mathbf{Q}_{t}=\mathbf{C}_{t}+\mathbf{F}_{t}^{T}\mathbf{V}_{t+1}\mathbf{F}_{t} \\ &\mathbf{q}_t=\mathbf{c}_t+\mathbf{F}_t^T\mathbf{V}_{t+1}\mathbf{f}_t+\mathbf{F}_t^T\mathbf{v}_{t+1} \\ &Q(\mathbf{x}_{t},\mathbf{u}_{t})=\mathrm{const}+\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_{t}\\\mathbf{u}_{t}\end{array}\right]^{T}\mathbf{Q}_{t}\left[\begin{array}{c}\mathbf{x}_{t}\\\mathbf{u}_{t}\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_{t}\\\mathbf{u}_{t}\end{array}\right]^{T}\mathbf{q}_{t} \\ &\mathbf{u}_{t}\leftarrow\arg\operatorname*{min}_{\mathbf{u}_{t}}Q(\mathbf{x}_{t},\mathbf{u}_{t})=\mathbf{K}_{t}\mathbf{x}_{t}+\mathbf{k}_{t} \\ &\mathbf{K}_t=-\mathbf{Q}_{\mathbf{u}_t,\mathbf{u}_t}^{-1}\mathbf{Q}_{\mathbf{u}_t,\mathbf{x}_t} \\ &\mathbf{k}_{t}=-\mathbf{Q}_{\mathbf{u}_{t},\mathbf{u}_{t}}^{-1}\mathbf{q}\mathbf{u}_{t} \\ &\mathbf{V}_{t}=\mathbf{Q}_{\mathbf{x}_{t},\mathbf{x}_{t}}+\mathbf{Q}_{\mathbf{x}_{t},\mathbf{u}_{t}}\mathbf{K}_{t}+\mathbf{K}_{t}^{T}\mathbf{Q}_{\mathbf{u}_{t},\mathbf{x}_{t}}+\mathbf{K}_{t}^{T}\mathbf{Q}_{\mathbf{u}_{t},\mathbf{u}_{t}}\mathbf{K}_{t} \\ &\mathbf{v}_{t}=\mathbf{q}_{\mathbf{x}_{t}}+\mathbf{Q}_{\mathbf{x}_{t},\mathbf{u}_{t}}\mathbf{k}_{t}+\mathbf{K}_{t}^{T}\mathbf{Q}_{\mathbf{u}_{t}}+\mathbf{K}_{t}^{T}\mathbf{Q}_{\mathbf{u}_{t},\mathbf{u}_{t}}\mathbf{k} \\ &V(\mathbf{x}_{t})=\mathrm{const}+\frac{1}{2}\mathbf{x}_{t}^{T}\mathbf{V}_{t}\mathbf{x}_{t}+\mathbf{x}_{t}^{T}\mathbf{v}_{t} \end{aligned} \tag{14}
fort=T to 1:Qt=Ct+FtTVt+1Ftqt=ct+FtTVt+1ft+FtTvt+1Q(xt,ut)=const+21[xtut]TQt[xtut]+[xtut]Tqtut←argutminQ(xt,ut)=Ktxt+ktKt=−Qut,ut−1Qut,xtkt=−Qut,ut−1qutVt=Qxt,xt+Qxt,utKt+KtTQut,xt+KtTQut,utKtvt=qxt+Qxt,utkt+KtTQut+KtTQut,utkV(xt)=const+21xtTVtxt+xtTvt(14)
此时我们可以求出每一个阶段最优的
u
t
u_t
ut,同时我们知道初值
x
0
x_0
x0,因此我们又可以正向迭代求出每一个状态x
for
t
=
1
to
T
:
u
t
=
K
t
x
t
+
k
t
x
t
+
1
=
f
(
x
t
,
u
t
)
(15)
\begin{aligned} \text{for }t & =1\text{ to }T{:}\\ \mathbf{u}_{t} & =\mathbf{K}_{t}\mathbf{x}_{t}+\mathbf{k}_{t}\\ \mathbf{x}_{t+1} & =f(\mathbf{x}_{t},\mathbf{u}_{t}) \end{aligned} \tag{15}
for tutxt+1=1 to T:=Ktxt+kt=f(xt,ut)(15)
5.个人理解
写到这里lqr的解法已经写完了,但是当我第一次看公式推到的时候并没有明白
Q
t
Q_t
Qt的具体形式,同时
Q
u
t
,
u
t
,
Q
x
t
,
x
t
,
Q
x
t
,
u
t
Q_{u_t,u_t},Q_{x_t,x_t},Q_{x_t,u_t}
Qut,ut,Qxt,xt,Qxt,ut的具体形式是什么,经过我向我司新来的X博士多番询问,最终搞明白了。
首先我们做出如下定义:
q
x
=
∂
Q
(
x
t
,
u
t
)
∂
x
t
q_x=\frac{\partial Q(x_t,u_t)}{\partial x_t}
qx=∂xt∂Q(xt,ut)
q
u
=
∂
Q
(
x
t
,
u
t
)
∂
u
t
q_u=\frac{\partial Q(x_t,u_t)}{\partial u_t}
qu=∂ut∂Q(xt,ut)
q
x
x
=
∂
2
Q
(
x
t
,
u
t
)
∂
2
x
t
q_{xx}=\frac{\partial^2 Q(x_t,u_t)}{\partial ^2x_t}
qxx=∂2xt∂2Q(xt,ut)
q
u
u
=
∂
2
Q
(
x
t
,
u
t
)
∂
2
u
t
q_{uu}=\frac{\partial^2 Q(x_t,u_t)}{\partial ^2u_t}
quu=∂2ut∂2Q(xt,ut)
q
x
u
=
∂
2
Q
(
x
t
,
u
t
)
∂
x
t
∂
u
t
q_{xu}=\frac{\partial^2 Q(x_t,u_t)}{\partial x_t\partial u_t}
qxu=∂xt∂ut∂2Q(xt,ut)
下面我仅以
q
x
u
q_{xu}
qxu为例进行详细的推导:
Q
(
x
t
,
u
t
)
=
ℓ
(
x
t
,
u
t
)
+
V
(
x
t
+
1
)
=
ℓ
(
x
t
,
u
t
)
+
V
(
f
(
x
t
,
u
t
)
)
\begin{aligned} Q(x_t,u_t)& =\ell(x_t,u_t)+V(x_{t+1}) \\ & =\ell(x_t,u_t)+V(f(x_t,u_t)) \end{aligned}
Q(xt,ut)=ℓ(xt,ut)+V(xt+1)=ℓ(xt,ut)+V(f(xt,ut))
根据链式法则:
q
x
=
∂
Q
(
x
t
,
u
t
)
∂
x
t
=
∂
ℓ
(
x
t
,
u
t
)
∂
x
t
+
∂
V
(
f
(
x
t
,
u
t
)
)
∂
x
t
=
ℓ
x
+
∂
V
∂
f
⋅
∂
f
(
x
t
,
u
t
)
∂
x
t
\begin{aligned} q_x&=\frac{\partial Q(x_t,u_t)}{\partial x_t} \\ &= \frac{\partial\ell(x_t,u_t)}{\partial x_t}+\frac{\partial V(f(x_t,u_t))}{\partial x_t} \\ &= \ell_x+\frac{\partial V}{\partial f}\cdot\frac{\partial f(x_t,u_t)}{\partial x_t} \end{aligned}
qx=∂xt∂Q(xt,ut)=∂xt∂ℓ(xt,ut)+∂xt∂V(f(xt,ut))=ℓx+∂f∂V⋅∂xt∂f(xt,ut)
其中
∂
V
∂
f
\frac{\partial V}{\partial f}
∂f∂V为
V
V
V对
f
f
f的梯度,即为
v
x
v_x
vx,
∂
f
(
x
t
,
u
t
)
∂
x
t
\frac{\partial f(x_t,u_t)}{\partial x_t}
∂xt∂f(xt,ut)为
f
f
f对
x
t
x_t
xt的雅可比矩阵即为
f
x
f_x
fx
因此:
q
x
=
ℓ
x
+
f
x
T
v
x
q_x = \ell_x+f_x^{T}v_x
qx=ℓx+fxTvx
然后
q
x
u
=
∂
∂
u
t
(
ℓ
x
+
f
x
T
v
x
)
=
ℓ
x
u
+
f
u
T
v
x
x
f
x
\begin{aligned} q_{xu}&=\frac{\partial}{\partial u_t}( \ell_x+f_x^{T}v_x) \\ & =\ell_{xu}+\mathbf{f}_u^T\mathbf{v}_{xx}\mathbf{f}_x \end{aligned}
qxu=∂ut∂(ℓx+fxTvx)=ℓxu+fuTvxxfx
因此所有的结果为:
1.
q
x
=
ℓ
x
+
f
x
T
v
x
2.
q
u
=
ℓ
u
+
f
u
T
v
x
3.
q
x
x
=
ℓ
x
x
+
f
x
T
v
x
x
f
x
4.
q
u
u
=
ℓ
u
u
+
f
u
T
v
x
x
f
u
+
μ
I
5.
q
x
u
=
ℓ
x
u
+
f
u
T
v
x
x
f
x
\begin{aligned} &1. q_{x}=\ell_{x}+\mathbf{f}_{x}^{T}\mathbf{v}_{x} \\ &2. q_{u}=\ell_{u}+\mathbf{f}_{u}^{T}\mathbf{v}_{x} \\ &3. q_{xx}=\ell_{xx}+\mathbf{f}_{x}^{T}\mathbf{v}_{xx}\mathbf{f}_{x} \\ &4 .q_{uu}=\ell_{uu}+\mathbf{f}_{u}^{T}\mathbf{v}_{xx}\mathbf{f}_{u}+\mu\mathbf{I} \\ &5. q_{xu}=\ell_{xu}+\mathbf{f}_u^T\mathbf{v}_{xx}\mathbf{f}_x \end{aligned}
1.qx=ℓx+fxTvx2.qu=ℓu+fuTvx3.qxx=ℓxx+fxTvxxfx4.quu=ℓuu+fuTvxxfu+μI5.qxu=ℓxu+fuTvxxfx
其中:
v
x
=
q
x
+
K
t
T
q
u
u
k
t
+
K
t
T
q
u
+
q
u
x
T
k
t
v
x
x
=
q
x
x
+
K
t
T
q
u
u
K
t
+
K
t
T
q
u
x
+
q
u
x
T
K
t
\mathbf{v}_{x}=q_{x}+K_{t}^{T}q_{uu}k_{t}+K_{t}^{T}q_{u}+q_{ux}^{T}k_{t}\\\mathbf{v}_{xx}=q_{xx}+K_{t}^{T}q_{uu}K_{t}+K_{t}^{T}q_{ux}+q_{ux}^{T}K_{t}
vx=qx+KtTquukt+KtTqu+quxTktvxx=qxx+KtTquuKt+KtTqux+quxTKt
总结
还没写完,后面会有基于ros,ilqr路径规划代码实现,尽情期待!!!!!!!!!!!!!!!!!!!!!!