Contents
1 Introduction 1
1.1 Optimal Control 1
1.1.1 Continuous-Time LQR 1
1.1.2 Discrete-Time LQR 2
1.2 Adaptive Dynamic Programming 3
1.3 Review of Matrix Algebra 5
References 6
2 Neural-Network-BasedApproach for Finite-TimeOptimal Control 7
2.1 Introduction 7
2.2 Problem Formulation and Motivation 9
2.3 The Data-Based Identifier 9
2.4 Derivation of the Iterative ADP Algorithm with Convergence Analysis 11
2.5 Neural Network Implementation of theIterative Control Algorithm 17
2.6 Simulation Study 18
2.7 Conclusion 20
References 22
3 Nearly Finite-HorizonOptimalControlfor Nonafiine Time-Delay Nonlinear Systems 25
3.1 Introduction 25
3.2 Problem Statement 26
3.3 The Iteration ADP Algorithm and ItsConvergence 30
3.3.1 The Novel ADP Iteration Algorithm 30
3.3.2 Convergence Analysis of the Improved Iteration Algorithm 33
3.3.3 Neural Network Implementation of the Iteration ADP Algorithm 38
3.4 Simulation Study 40
3.5 Conclusion 48
References 48
4 Multi-objective Optimal Control for Time-Delay Systems 49
4.1 Introduction 49
4.2 Problem Formulation 50
4.3 Derivation of the ADP Algorithm for Time-Delay Systems 51
4.4 Neural Network Implementation for the Multi-objective Optimal Control Problem of Time-Delay Systems 54
4.5 Simulation Study 55
4.6 Conclusion 61
References 62
5 Multiple Actor-Critic Optimal Control via ADP 63
5.1 Introduction 63
5.2 Problem Statement 65
5.3 SIANN Architecture-Based Classification 66
5.4 Optimal Control Based on ADP 69
5.4.1 Model Neural Network 70
5.4.2 Critic Network and Action Network 74
5.5 Simulation Study 82
5.6 Conclusion 91
References 91
6 Optimal Control for a Class of Complex-Valued Nonlinear Systems 95
6.1 Introduction 95
6.2 Motivations and Preliminaries 96
6.3 ADP-Based Optimal Control Design 99
6.3.1 Critic Network 99
6.3.2 Action Network. 101
6.3.3 Design of the Compensation Controller 102
6.3.4 Stability Analysis 103
6.4 Simulation Study 107
6.5 Conclusion. 110
References 110
7 Off-Policy Neuro-Optimal Control for Unknown Complex-Valued Nonlinear Systems 113
7.1 Introduction 113
7.2 Problem Statement 114
7.3 Off-Policy Optimal Control Method 115
7.3.1 Convergence Analysis of Off-Policy PI Algorithm 117
7.3.2 Implementation Method of Off-Policy Iteration Algorithm 119
7.3.3 Implementation Process 122
7.4 Simulation Study 122
7.5 Conclusion 125
References 125
8 Approximation-Error-ADP-Based Optimal Tracking Control for Chaotic Systems 127
8.1 Introduction 127
8.2 Problem Formulation and Preliminaries 128
8.3 Optimal Tracking Control Scheme Basedon Approximation-Error ADP Algorithm 130
8.3.1 Description of Approximation-Error ADP Algorithm 130
8.3.2 Convergence Analysis of the Iterative ADP Algorithm 132
8.4 Simulation Study 136
8.5 Conclusion 144
References 144
9 Off-Policy Actor-Critic Structure for Optimal Controlof Unknown Systems with Disturbances 147
9.1 Introduction 147
9.2 Problem Statement 148
9.3 Off-Policy Actor-Critic Integral Reinforcement Learning 151
9.3.1 On-Policy IRL for Nonzero Disturbance 151
9.3.2 Off-Policy IRL for Nonzero Disturbance 152
9.3.3 NN Approximation for Actor-Critic Structure 154
9.4 Disturbance Compensation Redesign andStability Analysis 157
9.4.1 Disturbance Compensation Off-Policy Controller Design 157
9.4.2 Stability Analysis 158
9.5 Simulation Study 161
9.6 Conclusion 163
References 163
10 An Iterative ADP Method to Solve for a Class of Nonlinear Zero-Sum DifferentialGames 165
10.1 Introduction 165
10.2 Preliminaries and Assumptions 166
10.3 Iterative Approximate Dynamic Programming Method for ZS Differential Games 169
10.3.1 Derivation of the Iterative ADP Method 169
10.3.2 The Procedure of theMethod 174
10.3.3 The Properties of theIterativeADP Method 176
10.4 Neural Network Implementation 190
10.4.1 The Model Network 191
10.4.2 The Critic Network 192
10.4.3 The Action Network 193
10.5 Simulation Study 195
10.6 Conclusion 204
References 204
11 Neural-Network-Based Synchronous Iteration Learning Method for Multi-player Zero-Sum Games 207
11.1 Introduction 207
11.2 Motivations and Preliminaries 208
11.3 Synchronous Solution of Multi-playerZSGames 213
11.3.1 Derivation of Off-Policy Algorithm 213
11.3.2 Implementation Method for Off-Policy Algorithm 214
11.3.3 Stability Analysis 218
11.4 Simulation Study 219
11.5 Conclusion 224
References 224
12 Off-Policy Integral Reinforcement Learning Method for Multi-player Non-Zero-Sum Games 227
12.1 Introduction 227
12.2 Problem Statement 228
12.3 Multi-player Learning PI SolutionforNZSGames 229
12.4 Off-Policy Integral ReinforcementLearningMethod 234
12.4.1 Derivation of Off-Policy Algorithm 234
12.4.2 Implementation Method for Off-Policy Algorith