Can You Take Nonlinear Control Before Continuous Control Reddit


Decentralized Stabilization for a Class of Continuous-Time Nonlinear Interconnected Systems Using Online Learning Optimal Control Approach


Neural-network-based Online Learning Optimal Control

Decentralized Control Strategy

  1. Cost functions (critic neural networks) – local optimal controllers
  2. Feedback gains to the optimal control policies – decentralized control strategy

Optimal Control Problem (Stabilization)

Hamilton-Jacobi-Bellman (HJB) Equations

  • Apply Online Policy Iteration Algorithm (construct and train critic neural networks) to solve HJB Equations.

The decentralized control has been a control of choice for large-scale systems because it is computationally efficient to formulate control law that use only locally available subsystem states or outputs.

Though dynamic programming is a useful technique to solve the optimization and optimal control problems, in may cases, it is computationally difficult to apply it because of the curse of dimensionality .


Considering the effectiveness of ADP and reinforcement learning techniques in solving the nonlinear optimal control problem , the decentralized control approach established is natural and convenient.


Notation

i=1,2,...,N. : ith subsystem.

{\color{Blue} x}_i(t)\in \mathbb{R}^{n_i} : state vector of the i th subsystem.

{\color{Blue} x}_1,{\color{Blue} x}_2, ...,{\color{Blue} x}_N : local states .

{\color{Red} \bar u}_i({\color{Blue} x}_i(t)) \in \mathbb{R}^{m_i} : control vector of the ith subsystem.

{\color{Red} \bar u}_1({\color{Blue} x}_1) , {\color{Red} \bar u}_2({\color{Blue} x}_2) , ..., {\color{Red} \bar u}_N({\color{Blue} x}_N) : local controls .

{\color{Red} u}_i({\color{Blue} x}_i), i=1,2,...,N : control policies .

{\color{Golden} f}_i({\color{Blue} x}_i) : nonlinear internal dynamics .

{\color{Magenta} g}_i({\color{Blue} x}_i) : input gain matrix .

{\color{Magenta} g}_i(x_i){\color{Magenta} \bar Z}_i(x) : interconnected term. Zi (x) 's x has no i .

{\color{Golden} R}_i \in \mathbb{R}^{m_i \times m_i}, i=1,2,...,N. : symmetric positive definite matrices .

\rho : nonnegative constants.

{\color{Orange} h}_{ij}(x_j) : positive semidefinite function.

Q_i(x_i), i=1,2,...,N. : positive definite functions satisfying{\color{Orange} h}_i(x_i) \leq Q_i(x_i), i=1,2,...,N.

{\color{Red} \mu}_i({\color{Blue} x}_i) : control policy .

\Omega _i :{\color{Golden} f}_i+{\color{Magenta} g}_i {\color{Red} u}_i is Lipshcitz continuous on a set\Omega _i in\mathbb{R} ^{n_i} containing the origin, and the subsystem is controllable in the sense that there exists a continuous control policy on\Omega _i that asymptotically stabilizes the subsystem.


Decentralized Control Problem of the Large-Scale System

Paper studies a class of continuous-time nonlinear large-scale systems: composed of N interconnected subsystems described by

\begin{align*} \dot{{\color{Blue} x}}_i(t)&={\color{Golden} f}_i \left ( {\color{Blue} x}_i(t) \right ) + {\color{Magenta} g}_i\left ( {\color{Blue} x}_i(t) \right ) \left ( {\color{Red} \bar u}_i ({\color{Blue} x}_i(t)) + {\color{Magenta} \bar Z}_i ({\color{Blue} x}(t))\right )\\ i &=1,2,...,N \end{align*}   (1)

{\color{Blue} x}_i(0)={\color{Blue} x}_{i0} : initial state of the ith subsystem,

Assumption 1 : When{\color{Blue} x}_i=0 , ith subsystem is equilibrium .

Assumption 2 :{\color{Golden} f}_i({\color{Blue} x}_i) and{\color{Magenta} g}_i({\color{Blue} x}_i) are differentiable in arguments with {\color{Golden} f}_i({\color{Blue} 0})=0 .

Assumption 3 : When{\color{Blue} x}_i=0 , the feedback control vector {\color{Red} \bar u}_i ({\color{Blue} x}_i) =0 .

{\color{Magenta} Z}_i(x)={\color{Golden} R}_i^{1/2} {\color{Magenta} \bar Z}_i(x)

where

{\color{Golden} R}_i \in \mathbb{R}^{m_i \times m_i}, i=1,2,...,N. : symmetric positive definite matrices .

{\color{Magenta} Z}_i(x) \in \mathbb{R} ^{m_i},i=1,2,...,N.

are bounded as follows:

\begin{align*} \left \| {\color{Magenta} Z}_i(x) \right \| &\leq \sum_{j=1}^N \rho _{ij} {\color{Orange} h}_{ij}(x_j), \\ i&=1,2,...,N. \end{align*}   (2)

Define

h_{\color{DarkGreen} i}(x_i)=max\left \{ h_{{\color{Red} 1}{\color{DarkGreen} i}}(x_i) , h_{{\color{Red} 2}{\color{DarkGreen} i}}(x_i),...,h_{{\color{Red} N}{\color{DarkGreen} i}}(x_i)\right \}

then (2) can be formulated as

\begin{align*} \left \| Z_i(x) \right \| &\leq \sum_{j=1}^N {\color{Blue} \lambda_{ij}}{\color{Orange} h_j(x_j)},\ i=1,2,...,N.\\ \indent where\\ \indent {\color{Blue} \lambda_{ij}} &\geq \frac{\rho_{ij}h_{ij}(x_j)}{{\color{Orange} h_j(x_j)}} \end{align*}


C1 – Optimal Control of Isolated Subsystems (Framework of HJB Equations)

C2 – Decentralized Control Strategy

Consider the N isolated subsystems corresponding to (1)

\begin{align*} \dot{{\color{Blue} x}}_i(t)&={\color{Golden} f}_i \left ( {\color{Blue} x}_i(t) \right ) + {\color{Magenta} g}_i\left ( {\color{Blue} x}_i(t) \right ) \left ( {\color{Red} u}_i ({\color{Blue} x}_i(t)) \right )\\ i &=1,2,...,N \end{align*} (4)

Find the control policies {\color{Red} u}_i({\color{Blue} x}_i), i=1,2,...,N which minimize the local cost functions

\begin{align*} {\color{Blue} J}_i({\color{Blue} x}_{i0})&=\int_{0}^{\infty} \left \{ {\color{DarkGreen} Q}_i^2({\color{Blue} x}_i(\tau ))+{\color{Red} u}_i^T({\color{Blue} x}_i(\tau)){\color{Golden} R}_i {\color{Red} u}_i({\color{Blue} x}_i(\tau)) \right \}d\tau \\ i&=1,2,...,N \end{align*} (5)

( How to get the equation 5 ? Should Q = Q and R = P , ( Q and P Lyapunov Equation ) ? )

to deal with the infinite horizon optimal control problem .

where

Q_i(x_i), i=1,2,...,N. : positive definite functions satisfying

{\color{Orange} h}_i(x_i) \leq Q_i(x_i), i=1,2,...,N. (6)

Based on optimal control theory, feedback controls ( control policies ) must be admissible  , i.e., stabilize the subsystmes on\Omega _i , guarantee cost function (5) are finite .

Admissible Control

Definition 1

Consider the isolated subsystem i,

\begin{align*} {\color{Red} \mu}_i &\in \Psi_i( \Omega_i)\\ {\color{Red} \mu}_i(0) &=0 \\ {\color{Red} u}_i(x_i)&={\color{Red} \mu}_i(x_i)\\ \end{align*}

For any set of admissible control policies {\color{Red} \mu}_i \in \Psi_i(\Omega_i), i=1,2,...,N, if the associated cost functions

\begin{align*} {\color{Blue} V}_i(x_{i0})&=\int_{0}^{\infty} \left \{ Q_i^2(x_i(\tau)) + {\color{Red} \mu}_i^T (x_i(\tau)) R_i {\color{Red} \mu}_i(x_i(\tau))\right \}d\tau \\ i&=1,2,...,N. \end{align*}

(7)

are continuously differentiable, then the infinitesimal versions of (7) are the so-called nonlinear Lyapunov equations

0=Q^2_i(x_i)+{\color{Golden} \mu_i^T(x_i)R_i\mu_i(x_i)}+ ( \bigtriangledown {\color{Blue} V}_i(x_i))^T \left({\color{Golden} f}_i(x_i))+{\color{Magenta} g}_i(x_i){\color{Red} \mu}_i(x_i) \right ) (8)

( How to get the equation 8 ? Should Q = Q and R = P , ( Q and P Lyapunov Equation ) ? )

where

\begin{align*} {\color{Blue} V}_i(0) &=0 \\ \bigtriangledown {\color{Blue} V}_i(x_i)&=\frac{\partial {\color{Blue} V}_i(x_i)}{\partial x_i}\\ i&=1,2,...,N. \end{align*}

———————————-

Lyapunov Equation

Linear Quadratic Lyapunov Theory

Linear Quadratic Lyapunov Theory Notes

Lyapunov Equation

We assume{\color{Blue} A} \in \mathbb{R}^{n \times n}, {\color{Golden} P}={\color{Golden} P}^T \in \mathbb{R}^{n \times n}. It follows that{\color{Magenta} Q}={\color{Magenta} Q}^T \in \mathbb{R}^{n \times n}. Continuous-time linear systems:for\ \dot x={\color{Blue} A}x, V(z)=z^T{\color{Golden} P}z, we\ have\ \dot V (z)=-z^T{\color{Magenta} Q}z          where                                    P                      ,                                    Q                                              satisfy (continuous-time)                          Lyapunov Equation:                {\color{Blue} A}^T{\color{Golden} P}+{\color{Golden} P}{\color{Blue} A}+{\color{Magenta} Q}=0                                              If                                    P>0                      ,                                    Q>0                      , then system is (globally asymptotically)                      stable          . If                                    P>0                      ,                                    Q≥0                      , and (                          Q                      ,                          A                      )                      observable          , then system is (globally asymptotically)                      stable          .        

{\color{Blue} A}^T{\color{Golden} P}+{\color{Golden} P}{\color{Blue} A}+{\color{Magenta} Q}=0

where A , P , Q  ∈ Rn x n, and P , Q are symmetric

interpretation : for linear system

\dot{x}={\color{Blue} A}x

if

V(z)=z^T {\color{Golden} P}z

V(z)={\color{Golden}z^T Pz}

then

\dot{V}(z)=({\color{Blue} A}z)^T{\color{Golden} P}z+z^T{\color{Golden} P}({\color{Blue} A}z)=-z^T{\color{Magenta} Q}z

\dot{V}(z)=({\color{Blue} A}z)^T{\color{Golden} P}z+z^T{\color{Golden} P}({\color{Blue} A}z)={\color{Magenta}-z^T Qz}

i.e., if{\color{Golden} z^TPz} is the (generalized) energy , then{\color{Magenta} z^TQz} is the associated (generalized) dissipation

Lyapunov Integral

If A is stable there is an explicit formula for solution of Lyapunov equation :

{\color{Golden} P}=\int_{0}^{\infty} e ^{t{\color{Blue} A}^T}{\color{Magenta} Q}e^{t{\color{Blue} A}}dt

to see this, we note that

\begin{align*} {\color{Blue} A}^T{\color{Golden} P}+{\color{Golden} P}{\color{Blue} A} &=\int_{0}^{\infty}\left ( {\color{Blue} {\color{Blue} A}}^Te^{t{\color{Blue} A}^T}{\color{Magenta} Q}e^{t{\color{Blue} A}} +e^{t{\color{Blue} A}^T}{\color{Magenta} Q}e^{t{\color{Blue} A}} {\color{Blue} A} \right ) \\ &=\int_{0}^{\infty} \left ( \frac{\mathrm{d} }{\mathrm{d} t}e^{t{\color{Blue} A}^T}{\color{Magenta} Q}e^{t{\color{Blue} A}}\right )dt \\ &=e^{t{\color{Blue} A}^T}{\color{Magenta} Q}e^{t{\color{Blue} A}}\mid_{0}^{\infty} \\ &=-{\color{Magenta} Q} \end{align*}

Interpretation as cost-to-go

If A is stable, and P is (unique) solution of

{\color{Blue} A}^T{\color{Golden} P}+{\color{Golden} P}{\color{Blue} A}+{\color{Magenta} Q}=0

, then

\begin{align*} V(z) &=z^T {\color{Golden} P}z \\ &=z^T \left ( \int_{0}^{\infty} e^{t{\color{Blue} A}^T}{\color{Magenta} Q}e^{t{\color{Blue} A}} dt \right )z \\ &=\int_{0}^{\infty} x(t)^T{\color{Magenta} Q}x(t)dt\\ \end{align*}\\ where \ \dot{x}={\color{Blue} A}x,{\color{Red} x(0)=z}

thus V(z) is cost-to-go from point z (with no input) and integral quadratic cost function with matrix Q

If A is stable and Q>0, then for each t,e^{t{\color{Blue} A}^T}{\color{Magenta} Q}e^{t{\color{Blue} A}}>0, so

{\color{Golden} P}=\int_{0}^{\infty} e ^{t{\color{Blue} A}^T}{\color{Magenta} Q}e^{t{\color{Blue} A}}dt>0

meaning: if A is stable,

In particular: a                      linear system                    is                      stable                    if an only if there is a                      quadratic Layapunov function                    that proves it.

Evaluating Quadratic Integrals

Suppose\dot x ={\color{Blue} A}x is stable , and define

J=\int_{0}^{\infty}x(t)^T{\color{Magenta} Q}x(t)dt

to find J , we solve Lyapunov equation

{\color{Blue} A}^T{\color{Golden} P}+{\color{Golden} P}{\color{Blue} A}+{\color{Magenta} Q}=0

for P then,

J=x(0)^T{\color{Golden} P}x(0)

In other words: we can evaluate                      quadratic integral                    exactly, by solving a set of                      linear equations          , without even computing a matrix exponential.

———————————-


Online Policy Iteration Algorithm (Critic Networks)

Solve HJB Equations


fosterfaim1956.blogspot.com

Source: https://blogs.cuit.columbia.edu/p/decentralized_stabilization_for_a_class_of_continuous-time_nonlinear_interconnected_systems_using_online_learning_optimal_control_approach/

Belum ada Komentar untuk "Can You Take Nonlinear Control Before Continuous Control Reddit"

Posting Komentar

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel