Citation: | YIN Chang-ming, WHANG Han-xing, ZHAO Fei. Risk-Sensitive Reinforcement Learning Algorithms With Generalized Average Criterion[J]. Applied Mathematics and Mechanics, 2007, 28(3): 369-378. |
[1] |
Sutton R S.Learning to predict by the method of temporal difference[J].Machine Learning,1988,3(1):9-44.
|
[2] |
Sutton R S. Open the oretical questions in reinforcement learning[A].In:Proc of Euro COLT'99(Computational Learning Theory)[C].Cambridge, MA: MIT Press,1999,11-17.
|
[3] |
Sutton R S,Barto A G.Reinforcement Learning: An Introduction[M].Massachusetts: MIT Press, 1998, 20-300.
|
[4] |
Watkins C J C H,Dayan P.Q-learning[J].Machine Learning,1992,8(13):279-292.
|
[5] |
Watkins C J C H. Learning from delayed rewards[D].England:University of Cambridge,1989.
|
[6] |
Bertsekas D P,Tsitsiklis J N.Parallel and Distributed Computation: Numerical Methods[M].Englewood Cliffs, New Jersey: Prentice-Hall,1989,10-109.
|
[7] |
YIN Chang-ming,CHEN Huan-wen,XIE Li-juan. A Relative Value Iteration Q-learning Algorithm and its Convergence Based-on Finite Samples[J].Journal of Computer Research and Development,2002,39(9):1064-1070.
|
[8] |
YIN Chang-ming,CHEN Huan-wen,XIE Li-juan.Optimality cost relative value iteration Q-learning algorithm based on finite samples[J].Journal of Computer Engineering and Applications,2002,38(11):65-67.
|
[9] |
Wiering M, Schmidhuber J.Speeding up Q-learning[A].In:Proc of the 10th European Conf on Machine Learning[C].Germany:Springer-Verlag,1998,352-363.
|
[10] |
Singh S.Soft dynamic programming algorithms: convergence proofs[A].In:Proceedings of Workshop on Computational Learning and Natural Learning (CLNL)[C].Massachusetts:Town of Provinceton.University of Massachuetts,1993.
|
[11] |
Cavazos-Cadena R,Montes-de-Oca R.The value iteration algorithm in risk-sensitive average Markov decision chains with finite state[J].Mathematics of Operations Research,2003,28(4):752-776. doi: 10.1287/moor.28.4.752.20515
|
[12] |
Peng J,Williams R.Incremental multi-step Q-learning[J].Machine Learning,1996,22(4):283-290.
|
[13] |
Singh S. Reinforcement learning algorithm for average-payoff Markovian decision processes[A].Procedins of the 12th National Conference on Artificial Intelligence[C].Taho city:Ca Morgan Kaufmann,1994,1:700-705.
|