Efficient spatial exploration is a key aspect of search and rescue. In this paper, we present a search algorithm that generates efficient trajectories that optimize the rate at which probability mass is covered by a searcher. This should allow an autonomous vehicle find one or more lost targets as rapidly as possible. We do this by performing non-uniform sampling of the search region. The path generated minimizes the expected time to locate the missing target by visiting high probability regions using non-myopic path generation based on reinforcement learning. We model the target probability distribution using a classic mixture of Gaussians model with means and mixture coefficients tuned according to the location and time of sightings of the lost target. Key features of our search algorithm are the ability to employ a very general non-deterministic action model and the ability to generate action plans for any new probability distribution using the parameters learned on other similar looking distributions. One of the key contributions of this paper is the use of non-uniform state aggregation for policy search in the context of robotics. We compare the paths generated by our algorithm with other accepted spatial coverage techniques such as distribution independent boustrophedonic coverage and model dependent spiral search. We present a proof showing that rewarding for clearing probability mass instead of locating the target does not bias the objective function. The experiments show that the learned policy outperforms several well-known baselines even in scenarios different from the one it has been trained on.