Citation: | YIN Chang-ming, WHANG Han-xing, ZHAO Fei. Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion[J]. Applied Mathematics and Mechanics, 2007, 28(3): 369-378. |
[1] |
Sutton R S.Learning to predict by the method of temporal difference[J].Machine Learning,1988,3(1):9-44.
|
[2] |
Sutton R S. Open the oretical questions in reinforcement learning[A].In:Proc of Euro COLT'99(Computational Learning Theory)[C].Cambridge, MA: MIT Press,1999,11-17.
|
[3] |
Sutton R S,Barto A G.Reinforcement Learning: An Introduction[M].Massachusetts: MIT Press, 1998, 20-300.
|
[4] |
Watkins C J C H,Dayan P.Q-learning[J].Machine Learning,1992,8(13):279-292.
|
[5] |
Watkins C J C H. Learning from delayed rewards[D].England:University of Cambridge,1989.
|
[6] |
Bertsekas D P,Tsitsiklis J N.Parallel and Distributed Computation: Numerical Methods[M].Englewood Cliffs, New Jersey: Prentice-Hall,1989,10-109.
|
[7] |
YIN Chang-ming,CHEN Huan-wen,XIE Li-juan. A Relative Value Iteration Q-learning Algorithm and its Convergence Based-on Finite Samples[J].Journal of Computer Research and Development,2002,39(9):1064-1070.
|
[8] |
YIN Chang-ming,CHEN Huan-wen,XIE Li-juan.Optimality cost relative value iteration Q-learning algorithm based on finite samples[J].Journal of Computer Engineering and Applications,2002,38(11):65-67.
|
[9] |
Wiering M, Schmidhuber J.Speeding up Q-learning[A].In:Proc of the 10th European Conf on Machine Learning[C].Germany:Springer-Verlag,1998,352-363.
|
[10] |
Singh S.Soft dynamic programming algorithms: convergence proofs[A].In:Proceedings of Workshop on Computational Learning and Natural Learning (CLNL)[C].Massachusetts:Town of Provinceton.University of Massachuetts,1993.
|
[11] |
Cavazos-Cadena R,Montes-de-Oca R.The value iteration algorithm in risk-sensitive average Markov decision chains with finite state[J].Mathematics of Operations Research,2003,28(4):752-776. doi: 10.1287/moor.28.4.752.20515
|
[12] |
Peng J,Williams R.Incremental multi-step Q-learning[J].Machine Learning,1996,22(4):283-290.
|
[13] |
Singh S. Reinforcement learning algorithm for average-payoff Markovian decision processes[A].Procedins of the 12th National Conference on Artificial Intelligence[C].Taho city:Ca Morgan Kaufmann,1994,1:700-705.
|
[1] | WANG Yannan, ZENG Jiaqin, HUANG Nanjing. Iterative Methods for Random Generalized Quasi Variational Inequalities With Applications[J]. Applied Mathematics and Mechanics, 2023, 44(11): 1378-1388. doi: 10.21656/1000-0887.440199 |
[2] | ZHANG Man, CAO Yanhua, YANG Xiaozhong. Numerical Analysis of a Class of Fractional Langevin Equations With the Block-by-Block Method[J]. Applied Mathematics and Mechanics, 2021, 42(6): 562-574. doi: 10.21656/1000-0887.410337 |
[3] | LI Ying. A Splitting Iterative Algorithm for Solving Continuous Sylvester Matrix Equations[J]. Applied Mathematics and Mechanics, 2020, 41(1): 115-124. doi: 10.21656/1000-0887.400133 |
[4] | LI Yuanfei. Convergence Results on Heat Source for 2D Viscous Primitive Equations of Ocean Dynamics[J]. Applied Mathematics and Mechanics, 2020, 41(3): 339-352. doi: 10.21656/1000-0887.400176 |
[5] | WANG Xin, GUO Ke. Convergence of the Generalized Alternating Direction Method of Multipliers for a Class of Nonconvex Optimization Problems[J]. Applied Mathematics and Mechanics, 2018, 39(12): 1410-1425. doi: 10.21656/1000-0887.380334 |
[6] | YE Chao, LUO Xian-nan, WEN Li-ping. High-Order Numerical Methods of the Fractional Order Stokes’ First Problem for a Heated Generalized Second Grade Fluid[J]. Applied Mathematics and Mechanics, 2012, 33(1): 61-75. doi: 10.3879/j.issn.1000-0887.2012.01.006 |
[7] | LI Ai-bing, ZHANG Li-feng, ZANG Zeng-liang, ZHANG Yun. Iterative and Adjusting Method for Computing Stream Function and Velocity Potential in Limited Domains and Its Convergence Analysis[J]. Applied Mathematics and Mechanics, 2012, 33(6): 651-662. doi: 10.3879/j.issn.1000-0887.2012.06.002 |
[8] | LUO Xue-ping, HUANG Nan-jing. Generalized H-η-Accretive Operators in Banach Spaces With an Application to Variational Inclusions[J]. Applied Mathematics and Mechanics, 2010, 31(4): 472-480. doi: 10.3879/j.issn.1000-0887.2010.04.009 |
[9] | CHEN Guang-hua, CHEN Guang-ming, DAI Zhi-hua. Modified Domain Decomposition Method for Hamilton-Jacobi-Bellman Equations[J]. Applied Mathematics and Mechanics, 2010, 31(12): 1496-1502. doi: 10.3879/j.issn.1000-0887.2010.12.010 |
[10] | WU Wei, XU Dong-po, LI Zheng-xue. Convergence of Gradient Method for Elman Networks[J]. Applied Mathematics and Mechanics, 2008, 29(9): 1117-1123. |
[11] | QIN Xin-qiang, MA Yi-chen, ZHANG Yin. Two-Grid Method for Characteristics Finite-Element Solution of 2D Nonlinear Convection-Dominated Diffusion Problem[J]. Applied Mathematics and Mechanics, 2005, 26(11): 1365-1372. |
[12] | ZENG Liu-chuan. Existence and Algorithm of Solutions for General Multivalued Mixed Implicit Quasi-Variational Inequalities[J]. Applied Mathematics and Mechanics, 2003, 24(11): 1170-1178. |
[13] | HUANG Ting-zhu, WANG Guang-bin. Convergence Theorems for the AOR Method[J]. Applied Mathematics and Mechanics, 2002, 23(11): 1183-1187. |
[14] | XIU Nai-hua, GAO Zi-you. Convergence of a Modified SLP Algorithm for the Extended Linear Complementarity Problem[J]. Applied Mathematics and Mechanics, 2001, 22(5): 534-540. |
[15] | Chen Zengqiang, Lin Maoqiong, Yuan Zhuzhi. Convergence and Stability of Recursive Damped Least Square Algorithm[J]. Applied Mathematics and Mechanics, 2000, 21(2): 209-214. |
[16] | ZHANG Hong-qing, YAN Zhen-ya. Two Types of New Algorithms for Finding Explicit Analytical Solutions of Nonlinear Differential Equations[J]. Applied Mathematics and Mechanics, 2000, 21(12): 1285-1292. |
[17] | Bai Zhongzhi. Parallel Interval Matrix Multisplitting AOR Methods and Their Convergence[J]. Applied Mathematics and Mechanics, 1999, 20(2): 169-174. |
[18] | Li Hong-mei, Ding Xie-ping. Generallzed Strongly Nonlinear Quasi-Complementarlty Problems[J]. Applied Mathematics and Mechanics, 1994, 15(4): 289-296. |
[19] | Zhang Shi-sheng, Huang Nan-jing. Generalized Complementarity Problems for Fuzzy Mappings[J]. Applied Mathematics and Mechanics, 1992, 13(8): 667-675. |
[20] | Sun Xing-ming, Luo Zhi-hui, Wei Ling-de. On the Inefficiency of the Quasi-Gradient Screening Algorithm[J]. Applied Mathematics and Mechanics, 1992, 13(6): 539-542. |