Skip to content. Skip to navigation
CIM Menus

UdeM-McGill-MITACS Machine Learning Seminar

A Stochastic Algorithm for Partially Observable Markov Decision

Francois Laviolette
Departement d'Informatique Universite Laval

March 4, 2008 at  10:00 AM
George Zames Room MC437

We introduce a new backup operator for point-based POMDPs algorithms which performs a look-ahead search at depth greater than one. We apply this operator into a new algorithm, called Stochastic Search Value Iteration (SSVI). This new algorithm relies on stochastic explo-ration of the environment in order to update the value function. The un- derlying ideas are very similar to temporal difference learning algorithms for MDPs. In particular, SSVI takes advantage of a soft-max actions se-lection function and of the random character of the environment itself. This is in opposition with existing POMDPs point-based algorithms. Empirical results show that our algorithm is very competitive on usual benchmark problems. This suggests that stochastic algorithms are an alternative for solving large POMDPs.