%PDF-1.4
%
1 0 obj
<>
endobj
2 0 obj
<>
endobj
3 0 obj
<>stream
IEEE
IEEE Transactions on Automatic Control;2020;65;8;10.1109/TAC.2019.2953089
Markov decision processes (MDPs)
Monte Carlo methods
policy gradient
renewal theory
reinforcement learning
stochastic approximation
Renewal Monte Carlo: Renewal Theory-Based Reinforcement Learning
Jayakumar Subramanian
Aditya Mahajan
endstream
endobj
4 0 obj
<>stream
x+ |
endstream
endobj
5 0 obj
<>stream
xɱ
0>$U(VkhFҤ&)Oo|GzFв+W(#CQS;?ѝ!a
*A*Խoˇ;>egQO)D"9Ȋ:Elqniq^1B 90jH_=8
endstream
endobj
6 0 obj
<>stream
x+ |
endstream
endobj
7 0 obj
<>stream
xɱ0>51XЄք%b[bsGAo:qڗ!am(#DɚdoU_YW/݂/&EV)?D+:\GFTK6LvG,Iښ