Bootstrap

ilqr原理及公式推导(更新ing)

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档


注:以后内容包含许多个人理解,如有错误,疏漏欢迎讨论指正

前言

LQR(线性二次调节器)和 iLQR(迭代线性二次调节器)都是用于控制系统优化的算法,但它们在适用范围和具体实现上有显著区别。

LQR(Linear Quadratic Regulator)

  1. 线性系统:LQR假设系统的动力学是线性的,描述如下:

x t + 1 = A x t + B u t x_{t+1} = Ax_t+Bu_t xt+1=Axt+But
其中 x x x是状态向量, u u u 是控制输入, A A A B B B 是系统矩阵。

  1. 二次性能指标:
    J = ∫ 0 ∞ ( x T Q x + u T R u )   d t J = \int_{0}^{\infty} \left( x^T Q x + u^T R u \right) \, dt J=0(xTQx+uTRu)dt
    其中 Q Q Q R R R 是权重矩阵,用于平衡状态偏离和控制输入的权重。

iLQR(Iterative Linear Quadratic Regulator)

  1. 非线性系统:iLQR适用于非线性系统,描述如下:
    x t + 1 = f ( x t , u t ) x_{t+1} = f(x_t,u_t) xt+1=f(xt,ut)
    其中 f f f 是系统的非线性函数。

总结

  • LQR 适用于线性系统,并且可以直接求解获得解析解。
  • iLQR 适用于非线性系统,通过迭代的方式不断逼近最优解,每次迭代中需要线性化系统并求解LQR子问题。
  • 也就是把非线性的问题通过数学方法变成线性问题在求解,因此后面先介绍如何求解LQR问题。

一、lqr问题求解

1. 问题定义

min ⁡ u 1 , … , u T , x 1 , … , x T ∑ t = 1 T c ( x t , u t )  s.t  x t = f ( x t − 1 , u t − 1 ) (1) \min_{u_1,\ldots,u_T,x_1,\ldots,x_T}\sum_{t=1}^Tc(x_t,u_t)\text{ s.t }x_t=f(x_{t-1},u_{t-1}) \tag{1} u1,,uT,x1,,xTmint=1Tc(xt,ut) s.t xt=f(xt1,ut1)(1)
其中 c ( x t , u t ) c(x_t,u_t) c(xt,ut)为代价函数, f f f为系统函数,通过系统函数公式(1)可以写成下式:
min ⁡ u 1 , … , u T c ( x 1 , u 1 ) + c ( f ( x 1 , u 1 ) , u 2 ) + ⋯ + c ( f ( f ( …   ) …   ) , u T ) (2) \min\limits_{u_1,\dots,u_T}c(x_1,u_1)+c(f(x_1,u_1),u_2)+\dots+c(f(f(\dots)\dots),u_T) \tag{2} u1,,uTminc(x1,u1)+c(f(x1,u1),u2)++c(f(f()),uT)(2)
其中lqr是线性二次调节器,其系统方程为线性,其代价函数为二次型,因此:
f ( x t , u t ) = F t [ x t u t ] + f t (3) {f(\mathbf{x}_t,\mathbf{u}_t)=\mathbf{F}_t\left[\begin{array}{c}\mathbf{x}_t\\\mathbf{u}_t\end{array}\right]+\mathbf{f}_t} \tag{3} f(xt,ut)=Ft[xtut]+ft(3)

c ( x t , u t ) = 1 2 [ x t u t ] T C t [ x t u t ] + [ x t u t ] T c t (4) c(\mathbf{x}_t,\mathbf{u}_t)=\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_t\\\mathbf{u}_t\end{array}\right]^T\mathbf{C}_t\left[\begin{array}{c}\mathbf{x}_t\\\mathbf{u}_t\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_t\\\mathbf{u}_t\end{array}\right]^T\mathbf{c}_t \tag{4} c(xt,ut)=21[xtut]TCt[xtut]+[xtut]Tct(4)
为了便于后续公式推导,先声明一些概念

  1. Q ( x t , u t ) Q(x_t,u_t) Q(xt,ut)是在状态 x t x_t xt和动作 u t u_t ut下的代价加上之后所有时间步长的最优代价之和。假设未来的策略都是最优的。 Q ( x t , u t ) = ℓ ( x t , u t ) + V ( x t + 1 ) Q(x_t,u_t)=\ell(x_t,u_t)+V(x_{t+1}) Q(xt,ut)=(xt,ut)+V(xt+1),其中 ℓ ( x t , u t ) \ell(x_t,u_t) (xt,ut)为当前阶段的代价
  2. V ( x t ) V(x_t) V(xt)是在状态 x t x_t xt下最优策略的代价函数,即所有可能动作中最小的$Q(x_t,u_t)
  3. 同时

V ( x t ) = m i n Q ( x t , u t ) V(x_t)= minQ(x_t,u_t) V(xt)=minQ(xt,ut)

4. 公式推导

u T u_T uT开始向后迭代,此时 Q ( x t , u t ) Q(x_t,u_t) Q(xt,ut)为:
Q ( x T , u T ) = 1 2 [ x T u T ] T C T [ x T u T ] + [ x T u T ] T c T (5) Q(\mathbf{x}_T,\mathbf{u}_T)=\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_T\\\mathbf{u}_T\end{array}\right]^T\mathbf{C}_T\left[\begin{array}{c}\mathbf{x}_T\\\mathbf{u}_T\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_T\\\mathbf{u}_T\end{array}\right]^T\mathbf{c}_T \tag{5} Q(xT,uT)=21[xTuT]TCT[xTuT]+[xTuT]TcT(5)
此时的代价矩阵 C T = [ C x T , x T C x T , u T C u T , x T C u T , u T ] \mathbf{C}_T=\left[\begin{array}{cc}\mathbf{C}_{\mathbf{x}_T,\mathbf{x}_T}&\mathbf{C}_{\mathbf{x}_T,\mathbf{u}_T}\\\mathbf{C}_{\mathbf{u}_T,\mathbf{x}_T}&\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}\end{array}\right] CT=[CxT,xTCuT,xTCxT,uTCuT,uT], c T = ⌊ c x T c u T ⌋ \mathbf{c}_T=\left\lfloor\begin{array}{c}\mathbf{c}_{\mathbf{x}_T}\\\mathbf{c}_{\mathbf{u}_T}\end{array}\right\rfloor cT=cxTcuT

为了寻找最优控制 u T , Q ( x T , u T ) u_T,Q(x_T,u_T) uTQ(xT,uT) u T u_T uT求导并令其等于 0 0 0可得到:
∇ u T Q ( x T , u T ) = C u T , x T x T + C u T , u T u T + c u T T = 0 (6) \nabla_{\mathbf{u}_T}Q(\mathbf{x}_T,\mathbf{u}_T)=\mathbf{C}_{\mathbf{u}_T,\mathbf{x}_T}\mathbf{x}_T+\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}\mathbf{u}_T+\mathbf{c}_{\mathbf{u}_T}^T=0 \tag{6} uTQ(xT,uT)=CuT,xTxT+CuT,uTuT+cuTT=0(6)
此时可以解出 u T = − C u T , u T − 1 ( C u T , x T x T + c u T ) \mathbf{u}_T=-\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}^{-1}\left(\mathbf{C}_{\mathbf{u}_T,\mathbf{x}_T}\mathbf{x}_T+\mathbf{c}_{\mathbf{u}_T}\right) uT=CuT,uT1(CuT,xTxT+cuT),此时令 K T = − C u T , u T − 1 C u T , x T \mathbf{K}_T=-\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}^{-1}\mathbf{C}_{\mathbf{u}_T,\mathbf{x}_T} KT=CuT,uT1CuT,xT, k T = − C u T , u T − 1 c u T \mathbf{k}_T=-\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}^{-1}\mathbf{c}_{\mathbf{u}_T} kT=CuT,uT1cuT,则最优控制 u T u_T uT可以写成:
u T = K T x T + k T (7) \mathbf u_T=\mathbf K_T\mathbf x_T+\mathbf k_T \tag{7} uT=KTxT+kT(7)
前面定义中: V ( x t ) = m i n Q ( x t , u t ) V(x_t)= minQ(x_t,u_t) V(xt)=minQ(xt,ut),因此用最优 u T u_T uT替换 Q ( x t , u t ) Q(x_t,u_t) Q(xt,ut)中的 u t u_t ut,此时我们可以得到 V T V_T VT为:
V ( x T ) =  const  + 1 2 [ x T K T x T + k T ] T C T [ x T K T x T + k T ] + [ x T K T x T + k T ] T c T (8) V\left(\mathbf{x}_{T}\right)=\text { const }+\frac{1}{2}\left[\begin{array}{c} \mathbf{x}_{T} \\ \mathbf{K}_{T} \mathbf{x}_{T}+\mathbf{k}_{T} \end{array}\right]^{T} \mathbf{C}_{T}\left[\begin{array}{c} \mathbf{x}_{T} \\ \mathbf{K}_{T} \mathbf{x}_{T}+\mathbf{k}_{T} \end{array}\right]+\left[\begin{array}{c} \mathbf{x}_{T} \\ \mathbf{K}_{T} \mathbf{x}_{T}+\mathbf{k}_{T} \end{array}\right]^{T} \mathbf{c}_{T} \tag{8} V(xT)= const +21[xTKTxT+kT]TCT[xTKTxT+kT]+[xTKTxT+kT]TcT(8)
将公式8展开可以得到:
V ( x T ) = 1 2 x T T C x T , x T x T + 1 2 x T T C x T , u T K T x T + 1 2 x T T K T T C u T , x T x T + 1 2 x T T K T T C u T , u T K T x T + x T T K T T C u T , u T k T + 1 2 x T T C x T , u T k T + x T T c x T + x T T K T T c u T + c o n s t (9) V(\mathbf{x}_{T})=\frac{1}{2}\mathbf{x}_{T}^{T}\mathbf{C}_{\mathbf{x}_{T},\mathbf{x}_{T}}\mathbf{x}_{T}+\frac{1}{2}\mathbf{x}_{T}^{T}\mathbf{C}_{\mathbf{x}_{T},\mathbf{u}_{T}}\mathbf{K}_{T}\mathbf{x}_{T}+\frac{1}{2}\mathbf{x}_{T}^{T}\mathbf{K}_{T}^{T}\mathbf{C}_{\mathbf{u}_{T},\mathbf{x}_{T}}\mathbf{x}_{T}+\frac{1}{2}\mathbf{x}_{T}^{T}\mathbf{K}_{T}^{T}\mathbf{C}_{\mathbf{u}_{T},\mathbf{u}_{T}}\mathbf{K}_{T}\mathbf{x}_{T}+\\\mathbf{x}_{T}^{T}\mathbf{K}_{T}^{T}\mathbf{C}_{\mathbf{u}_{T},\mathbf{u}_{T}}\mathbf{k}_{T}+\frac{1}{2}\mathbf{x}_{T}^{T}\mathbf{C}_{\mathbf{x}_{T},\mathbf{u}_{T}}\mathbf{k}_{T}+\mathbf{x}_{T}^{T}\mathbf{c}_{\mathbf{x}_{T}}+\mathbf{x}_{T}^{T}\mathbf{K}_{T}^{T}\mathbf{c}_{\mathbf{u}_{T}}+\mathrm{const} \tag{9} V(xT)=21xTTCxT,xTxT+21xTTCxT,uTKTxT+21xTTKTTCuT,xTxT+21xTTKTTCuT,uTKTxT+xTTKTTCuT,uTkT+21xTTCxT,uTkT+xTTcxT+xTTKTTcuT+const(9)
V T = C x T , x T + C x T , u T K T + K T T C u T , x T + K T T C u T , u T K T \mathbf{V}_T=\mathbf{C}_{\mathbf{x}_T,\mathbf{x}_T}+\mathbf{C}_{\mathbf{x}_T,\mathbf{u}_T}\mathbf{K}_T+\mathbf{K}_T^T\mathbf{C}_{\mathbf{u}_T,\mathbf{x}_T}+\mathbf{K}_T^T\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}\mathbf{K}_T VT=CxT,xT+CxT,uTKT+KTTCuT,xT+KTTCuT,uTKT,同时 v T = c x T + C x T , u T k T + K T T C u T + K T T C u T , u T k T \mathbf{v}_T=\mathbf{c}_{\mathbf{x}_T}+\mathbf{C}_{\mathbf{x}_T,\mathbf{u}_T}\mathbf{k}_T+\mathbf{K}_T^T\mathbf{C}_{\mathbf{u}_T}+\mathbf{K}_T^T\mathbf{C}_{\mathbf{u}_T,\mathbf{u}_T}\mathbf{k}_T vT=cxT+CxT,uTkT+KTTCuT+KTTCuT,uTkT,因此公式9可以简化为:
V ( x T ) = c o n s t + 1 2 x T T V T x T + x T T v T (10) V(\mathbf{x}_T)=\mathrm{const}+\frac{1}{2}\mathbf{x}_T^T\mathbf{V}_T\mathbf{x}_T+\mathbf{x}_T^T\mathbf{v}_T \tag{10} V(xT)=const+21xTTVTxT+xTTvT(10)
继续向前迭代到 T − 1 T-1 T1步,此时:
Q ( x T − 1 , u T − 1 ) = const + 1 2 [ x T − 1 u T − 1 ] T C T − 1 [ x T − 1 u T − 1 ] + [ x T − 1 u T − 1 ] T c T − 1 + V ( f ( x T − 1 , u T − 1 ) ) (11) Q(\mathbf{x}_{T-1},\mathbf{u}_{T-1})=\text{const}+\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{C}_{T-1}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{c}_{T-1}+V(f(\mathbf{x}_{T-1},\mathbf{u}_{T-1})) \tag{11} Q(xT1,uT1)=const+21[xT1uT1]TCT1[xT1uT1]+[xT1uT1]TcT1+V(f(xT1,uT1))(11)
其中第二部分为当前状态代价,第三部分为最佳cost-to-go,即当前到目标状态 x T x_T xT的累计成本,同时根据系统函数 x T = f ( x T − 1 , u T − 1 ) x_T=f(x_{T-1},u_{T-1}) xT=f(xT1,uT1),因此第三部分也就是公式10.
根据前文描述我们可以知道系统函数为: f ( x T − 1 , u T − 1 ) = x T = F T − 1 [ x T − 1 u T − 1 ] + f T − 1 \left.f(\mathbf{x}_{T-1},\mathbf{u}_{T-1})=\mathbf{x}_T=\mathbf{F}_{T-1}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right.\right]+\mathbf{f}_{T-1} f(xT1,uT1)=xT=FT1[xT1uT1]+fT1,因此 V ( x T ) V(x_T) V(xT)可以表示为:
V ( x T ) = c o n s t + 1 2 [ x T − 1 u T − 1 ] T F T − 1 T V T F T − 1 [ x T − 1 u T − 1 ] + [ x T − 1 u T − 1 ] T F T − 1 T V T f T − 1 + [ x T − 1 u T − 1 ] T F T − 1 T v T (12) \left.V(\mathbf{x}_T)=\mathrm{const}+\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right.\right]^T\mathbf{F}_{T-1}^T\mathbf{V}_T\mathbf{F}_{T-1}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{F}_{T-1}^T\mathbf{V}_T\mathbf{f}_{T-1}+ \\ \left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{F}_{T-1}^T\mathbf{v}_T \tag{12} V(xT)=const+21[xT1uT1]TFT1TVTFT1[xT1uT1]+[xT1uT1]TFT1TVTfT1+[xT1uT1]TFT1TvT(12)
因此公式(11)可以写成:
Q ( x T − 1 , u T − 1 ) = c o n s t + 1 2 [ x T − 1 u T − 1 ] T Q T − 1 [ x T − 1 u T − 1 ] + [ x T − 1 u T − 1 ] T q T − 1 Q(\mathbf{x}_{T-1},\mathbf{u}_{T-1})=\mathrm{const}+\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{Q}_{T-1}\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_{T-1}\\\mathbf{u}_{T-1}\end{array}\right]^T\mathbf{q}_{T-1} Q(xT1,uT1)=const+21[xT1uT1]TQT1[xT1uT1]+[xT1uT1]TqT1
其中 Q T − 1 = C T − 1 + F T − 1 T V T F T − 1 \mathbf{Q}_{T-1}=\mathbf{C}_{T-1}+\mathbf{F}_{T-1}^T\mathbf{V}_T\mathbf{F}_{T-1} QT1=CT1+FT1TVTFT1,同时 q T − 1 = c T − 1 + F T − 1 T V T f T − 1 + F T − 1 T v T \mathbf{q}_{T-1}=\mathbf{c}_{T-1}+\mathbf{F}_{T-1}^T\mathbf{V}_T\mathbf{f}_{T-1}+\mathbf{F}_{T-1}^T\mathbf{v}_T qT1=cT1+FT1TVTfT1+FT1TvT
此时我们已经将 Q ( x T − 1 , u T − 1 ) Q(x_{T-1},u_{T-1}) Q(xT1uT1) x T − 1 , u T − 1 x_{T-1},u_{T-1} xT1uT1表示出来,此时 Q ( x T − 1 , u T − 1 ) Q(x_{T-1},u_{T-1}) Q(xT1uT1) u T − 1 u_{T-1} uT1求导并令其为0则:
∇ u T − 1 Q ( x T − 1 , u T − 1 ) = Q u T − 1 , x T − 1 x T − 1 + Q u T − 1 , u T − 1 u T − 1 + q u T − 1 T = 0 (13) \nabla_{\mathbf{u}_{T-1}}Q(\mathbf{x}_{T-1},\mathbf{u}_{T-1})=\mathbf{Q}_{\mathbf{u}_{T-1},\mathbf{x}_{T-1}}\mathbf{x}_{T-1}+\mathbf{Q}_{\mathbf{u}_{T-1},\mathbf{u}_{T-1}}\mathbf{u}_{T-1}+\mathbf{q}_{\mathbf{u}_{T-1}}^T=0 \tag{13} uT1Q(xT1,uT1)=QuT1,xT1xT1+QuT1,uT1uT1+quT1T=0(13)
此时可以结出最优 u T − 1 = K T − 1 x T − 1 + k T − 1 \mathbf u_{T-1}=\mathbf K_{T-1}\mathbf x_{T-1}+\mathbf k_{T-1} uT1=KT1xT1+kT1,其中 K T − 1 = − Q u T − 1 , u T − 1 − 1 Q u T − 1 , x T − 1 \mathbf{K}_{T-1}=-\mathbf{Q}_{\mathbf{u}_{T-1},\mathbf{u}_{T-1}}^{-1}\mathbf{Q}_{\mathbf{u}_{T-1},\mathbf{x}_{T-1}} KT1=QuT1,uT11QuT1,xT1,同时 k T − 1 = − Q u T − 1 , u T − 1 − 1 q u T − 1 \mathbf{k}_{T-1}=-\mathbf{Q}_{\mathbf{u}_{T-1},\mathbf{u}_{T-1}}^{-1}\mathbf{q}\mathbf{u}_{T-1} kT1=QuT1,uT11quT1

不断向前迭代直到 t = 1 t=1 t=1
f o r t = T  to 1: Q t = C t + F t T V t + 1 F t q t = c t + F t T V t + 1 f t + F t T v t + 1 Q ( x t , u t ) = c o n s t + 1 2 [ x t u t ] T Q t [ x t u t ] + [ x t u t ] T q t u t ← arg ⁡ min ⁡ u t Q ( x t , u t ) = K t x t + k t K t = − Q u t , u t − 1 Q u t , x t k t = − Q u t , u t − 1 q u t V t = Q x t , x t + Q x t , u t K t + K t T Q u t , x t + K t T Q u t , u t K t v t = q x t + Q x t , u t k t + K t T Q u t + K t T Q u t , u t k V ( x t ) = c o n s t + 1 2 x t T V t x t + x t T v t (14) \begin{aligned} \mathrm{for}& t=T\text{ to 1:} \\ &\mathbf{Q}_{t}=\mathbf{C}_{t}+\mathbf{F}_{t}^{T}\mathbf{V}_{t+1}\mathbf{F}_{t} \\ &\mathbf{q}_t=\mathbf{c}_t+\mathbf{F}_t^T\mathbf{V}_{t+1}\mathbf{f}_t+\mathbf{F}_t^T\mathbf{v}_{t+1} \\ &Q(\mathbf{x}_{t},\mathbf{u}_{t})=\mathrm{const}+\frac{1}{2}\left[\begin{array}{c}\mathbf{x}_{t}\\\mathbf{u}_{t}\end{array}\right]^{T}\mathbf{Q}_{t}\left[\begin{array}{c}\mathbf{x}_{t}\\\mathbf{u}_{t}\end{array}\right]+\left[\begin{array}{c}\mathbf{x}_{t}\\\mathbf{u}_{t}\end{array}\right]^{T}\mathbf{q}_{t} \\ &\mathbf{u}_{t}\leftarrow\arg\operatorname*{min}_{\mathbf{u}_{t}}Q(\mathbf{x}_{t},\mathbf{u}_{t})=\mathbf{K}_{t}\mathbf{x}_{t}+\mathbf{k}_{t} \\ &\mathbf{K}_t=-\mathbf{Q}_{\mathbf{u}_t,\mathbf{u}_t}^{-1}\mathbf{Q}_{\mathbf{u}_t,\mathbf{x}_t} \\ &\mathbf{k}_{t}=-\mathbf{Q}_{\mathbf{u}_{t},\mathbf{u}_{t}}^{-1}\mathbf{q}\mathbf{u}_{t} \\ &\mathbf{V}_{t}=\mathbf{Q}_{\mathbf{x}_{t},\mathbf{x}_{t}}+\mathbf{Q}_{\mathbf{x}_{t},\mathbf{u}_{t}}\mathbf{K}_{t}+\mathbf{K}_{t}^{T}\mathbf{Q}_{\mathbf{u}_{t},\mathbf{x}_{t}}+\mathbf{K}_{t}^{T}\mathbf{Q}_{\mathbf{u}_{t},\mathbf{u}_{t}}\mathbf{K}_{t} \\ &\mathbf{v}_{t}=\mathbf{q}_{\mathbf{x}_{t}}+\mathbf{Q}_{\mathbf{x}_{t},\mathbf{u}_{t}}\mathbf{k}_{t}+\mathbf{K}_{t}^{T}\mathbf{Q}_{\mathbf{u}_{t}}+\mathbf{K}_{t}^{T}\mathbf{Q}_{\mathbf{u}_{t},\mathbf{u}_{t}}\mathbf{k} \\ &V(\mathbf{x}_{t})=\mathrm{const}+\frac{1}{2}\mathbf{x}_{t}^{T}\mathbf{V}_{t}\mathbf{x}_{t}+\mathbf{x}_{t}^{T}\mathbf{v}_{t} \end{aligned} \tag{14} fort=T to 1:Qt=Ct+FtTVt+1Ftqt=ct+FtTVt+1ft+FtTvt+1Q(xt,ut)=const+21[xtut]TQt[xtut]+[xtut]TqtutargutminQ(xt,ut)=Ktxt+ktKt=Qut,ut1Qut,xtkt=Qut,ut1qutVt=Qxt,xt+Qxt,utKt+KtTQut,xt+KtTQut,utKtvt=qxt+Qxt,utkt+KtTQut+KtTQut,utkV(xt)=const+21xtTVtxt+xtTvt(14)
此时我们可以求出每一个阶段最优的 u t u_t ut,同时我们知道初值 x 0 x_0 x0,因此我们又可以正向迭代求出每一个状态x
for  t = 1  to  T : u t = K t x t + k t x t + 1 = f ( x t , u t ) (15) \begin{aligned} \text{for }t & =1\text{ to }T{:}\\ \mathbf{u}_{t} & =\mathbf{K}_{t}\mathbf{x}_{t}+\mathbf{k}_{t}\\ \mathbf{x}_{t+1} & =f(\mathbf{x}_{t},\mathbf{u}_{t}) \end{aligned} \tag{15} for tutxt+1=1 to T:=Ktxt+kt=f(xt,ut)(15)

5.个人理解

写到这里lqr的解法已经写完了,但是当我第一次看公式推到的时候并没有明白 Q t Q_t Qt的具体形式,同时 Q u t , u t , Q x t , x t , Q x t , u t Q_{u_t,u_t},Q_{x_t,x_t},Q_{x_t,u_t} Qut,ut,Qxt,xt,Qxt,ut的具体形式是什么,经过我向我司新来的X博士多番询问,最终搞明白了。
首先我们做出如下定义:
q x = ∂ Q ( x t , u t ) ∂ x t q_x=\frac{\partial Q(x_t,u_t)}{\partial x_t} qx=xtQ(xt,ut)
q u = ∂ Q ( x t , u t ) ∂ u t q_u=\frac{\partial Q(x_t,u_t)}{\partial u_t} qu=utQ(xt,ut)
q x x = ∂ 2 Q ( x t , u t ) ∂ 2 x t q_{xx}=\frac{\partial^2 Q(x_t,u_t)}{\partial ^2x_t} qxx=2xt2Q(xt,ut)
q u u = ∂ 2 Q ( x t , u t ) ∂ 2 u t q_{uu}=\frac{\partial^2 Q(x_t,u_t)}{\partial ^2u_t} quu=2ut2Q(xt,ut)
q x u = ∂ 2 Q ( x t , u t ) ∂ x t ∂ u t q_{xu}=\frac{\partial^2 Q(x_t,u_t)}{\partial x_t\partial u_t} qxu=xtut2Q(xt,ut)
下面我仅以 q x u q_{xu} qxu为例进行详细的推导:
Q ( x t , u t ) = ℓ ( x t , u t ) + V ( x t + 1 ) = ℓ ( x t , u t ) + V ( f ( x t , u t ) ) \begin{aligned} Q(x_t,u_t)& =\ell(x_t,u_t)+V(x_{t+1}) \\ & =\ell(x_t,u_t)+V(f(x_t,u_t)) \end{aligned} Q(xt,ut)=(xt,ut)+V(xt+1)=(xt,ut)+V(f(xt,ut))
根据链式法则:
q x = ∂ Q ( x t , u t ) ∂ x t = ∂ ℓ ( x t , u t ) ∂ x t + ∂ V ( f ( x t , u t ) ) ∂ x t = ℓ x + ∂ V ∂ f ⋅ ∂ f ( x t , u t ) ∂ x t \begin{aligned} q_x&=\frac{\partial Q(x_t,u_t)}{\partial x_t} \\ &= \frac{\partial\ell(x_t,u_t)}{\partial x_t}+\frac{\partial V(f(x_t,u_t))}{\partial x_t} \\ &= \ell_x+\frac{\partial V}{\partial f}\cdot\frac{\partial f(x_t,u_t)}{\partial x_t} \end{aligned} qx=xtQ(xt,ut)=xt(xt,ut)+xtV(f(xt,ut))=x+fVxtf(xt,ut)
其中 ∂ V ∂ f \frac{\partial V}{\partial f} fV V V V f f f的梯度,即为 v x v_x vx, ∂ f ( x t , u t ) ∂ x t \frac{\partial f(x_t,u_t)}{\partial x_t} xtf(xt,ut) f f f x t x_t xt的雅可比矩阵即为 f x f_x fx
因此:
q x = ℓ x + f x T v x q_x = \ell_x+f_x^{T}v_x qx=x+fxTvx
然后
q x u = ∂ ∂ u t ( ℓ x + f x T v x ) = ℓ x u + f u T v x x f x \begin{aligned} q_{xu}&=\frac{\partial}{\partial u_t}( \ell_x+f_x^{T}v_x) \\ & =\ell_{xu}+\mathbf{f}_u^T\mathbf{v}_{xx}\mathbf{f}_x \end{aligned} qxu=ut(x+fxTvx)=xu+fuTvxxfx
因此所有的结果为:
1. q x = ℓ x + f x T v x 2. q u = ℓ u + f u T v x 3. q x x = ℓ x x + f x T v x x f x 4. q u u = ℓ u u + f u T v x x f u + μ I 5. q x u = ℓ x u + f u T v x x f x \begin{aligned} &1. q_{x}=\ell_{x}+\mathbf{f}_{x}^{T}\mathbf{v}_{x} \\ &2. q_{u}=\ell_{u}+\mathbf{f}_{u}^{T}\mathbf{v}_{x} \\ &3. q_{xx}=\ell_{xx}+\mathbf{f}_{x}^{T}\mathbf{v}_{xx}\mathbf{f}_{x} \\ &4 .q_{uu}=\ell_{uu}+\mathbf{f}_{u}^{T}\mathbf{v}_{xx}\mathbf{f}_{u}+\mu\mathbf{I} \\ &5. q_{xu}=\ell_{xu}+\mathbf{f}_u^T\mathbf{v}_{xx}\mathbf{f}_x \end{aligned} 1.qx=x+fxTvx2.qu=u+fuTvx3.qxx=xx+fxTvxxfx4.quu=uu+fuTvxxfu+μI5.qxu=xu+fuTvxxfx
其中:
v x = q x + K t T q u u k t + K t T q u + q u x T k t v x x = q x x + K t T q u u K t + K t T q u x + q u x T K t \mathbf{v}_{x}=q_{x}+K_{t}^{T}q_{uu}k_{t}+K_{t}^{T}q_{u}+q_{ux}^{T}k_{t}\\\mathbf{v}_{xx}=q_{xx}+K_{t}^{T}q_{uu}K_{t}+K_{t}^{T}q_{ux}+q_{ux}^{T}K_{t} vx=qx+KtTquukt+KtTqu+quxTktvxx=qxx+KtTquuKt+KtTqux+quxTKt


总结

还没写完,后面会有基于ros,ilqr路径规划代码实现,尽情期待!!!!!!!!!!!!!!!!!!!!!!

;