Assignment #2: Optimum Policies for Toy Car Driving

Due: February 23 by 23:59

This assignment is to be done individually.

The questions for this assignment have been updated significantly since February 9. Please review the entire section.

Academic Integrity

The following is offered with apologies to the vast majority of students who do their work honestly and take their university learning seriously:

Your instructor takes academic integrity seriously and has no tolerance for plagiarism or any other form of academic misconduct. Failure to respect these guidelines will result in you receiving a grade of zero on this assignment.

Acceptable collaboration between students, provided it is acknowledged explicitly in your report and code, might include:

  1. discussing some aspect of the assignment specifications in attempting to understand a particular point
  2. discussing high-level aspect of the Bellman equation
  3. discussing a problem you encountered while implementing the value iteration algorithm as described in the course text
Sharing of any computer code between students, or re-using any code from a third party (e.g., open source) is acceptable, provided that you indicate this explicitly at the start of your report and (as comments) in the source code. In this case, only the portion of work that you did individually will be considered toward your grade.

Unacceptable collaboration and violations of academic integrity include, but are not limited to:

  1. including any code that was not your own and failing to indicate so
  2. copying parts of the solutions to any question from other students

If you are uncertain about any of these guidelines, please discuss with your instructor as soon as possible.

The Car World

Consider the one-dimensional world shown in the picture below.

An intelligent toy racing car is designed to drive autonomously along a straight 20 m path. The car's motor can draw power at two levels. The first level consumes j units of power and overcomes the friction of the path, allowing the car to maintain its current speed. The second level consumes 2j units of power and accelerates the car at 2 m/s2. If no power is applied, friction causes the car to decelerate by -2 m/s2 until it comes to rest. The car's maximum speed is 4 m/s. Any effort invested to overcome this maximum will be lost to heat.

All values of time, distance, and speed are discretized to units of 1 s, 1 m, and 1 m/s.

The transition model follows the laws of motion you learned in high school:

Questions

  1. For each of the scenarios below, design the state space, which includes:

    State any assumptions you make. The design might not be unique for certain cases. The scenarios are as follows:

    1. The racing car needs to reach the finish line as fast as possible.
    2. The racing car needs to reach the finish line as fast as possible without hitting a barrier there. The control circuit, however, is faulty. As a result, the motor responds correctly to the signal coming from the microcontroller with probability P, and fails to respond with probability (1-P). In the latter case, no power is supplied to the motor.
    3. The racing car needs to reach the finish line as fast as possible, subject to a total budget of 25 units of power. If this is not possible, the car should not attempt the trip. The motor can be assumed always to operate correctly. Hint: the reward R(s) in Bellman's update should become dependant on the action and next state R(s,a,s'), and the update rule should be modified accordingly.
  2. Bonus: The state space is sparse. Minimize it by eliminating states that will never be visited.

  3. Implement both policy iteration and value iteration to find the optimum policy for each scenario. Although the initial policy should not change the solution, for consistency of results, all scenarios should use an initial policy of "always attempt to apply the maximum power level".

    Your program should accept a command line option of -p for the use of policy iteration and -v for the use of value iteration, one of the letters a, b, or c to specify the desired scenario from question 1, and possibly an additional parameter, if needed for the particular scenario. Following UNIX convention, the command line syntax is:

    racecar {-p | -v} a
    racecar {-p | -v} b P
    racecar {-p | -v} c j

    where, for scenario (b), P is a real number between 0 and 1, indicating the probability that the motor responds correctly, and for scenario (c), j is a positive integer, denoting the number of energy units consumed at the first power level of the car.

  4. As output, your program should display:

    In your report, include the final utilities and policies as tables in an appendix. The state space is large, so you should dedicate a full page to each scenario.

    For scenario (b), solve for P = 0 (the control circuit is not faulty), P = 0.4, and P = 0.8 (highly faulty control circuit).

    For scenario (c), solve for both j = 3 and j = 7.

  5. Assume that the car wants to optimize its driving policy in terms of minimizing both time and power consumption. You are not asked to implement this in your code, but describe what you would change in the design of your state space to account for this requirement. Provide an abstract of your design.

Additional notes

You may program your assignment in any computer language of your preference, provided it can be compiled and run on the Trottier Engineering Linux machines (TR5130GU-<x>.ece.mcgill.ca where x ∈ [1,16]). To login, use your McGill UEA (first.last@mail.mcgill.ca) and your regular MCF password.

Submitting your assignment

The hardcopy should explain the design decisions you made and address the items noted above. Remember that your report is the main method of communicating what you have accomplished to the reader. Therefore, make sure that it is well organized and well written; you will lose marks for spelling errors and poor grammar. The report should be a maximum of three pages in length, exclusive of any appendices as relevant. You are welcome to include illustrations as useful to elucidate the text. These are not considered as part of the three-page limit. No printout of program code is required.

In addition to the hardcopy submission, you must submit, through myCourses/WebCT (yes, we are using myCourses, but only for this purpose), an electronic version of your assignment in zip or UNIX tar format, which includes:

If you are not handing in a hardcopy of your report until after the Study Week, then make sure also to include a PDF of your report with the electronic submission. In either case, do not include with your submission any other items, such as object code (*.o) files or your executable program. or marks will be deducted.

The hardcopy and electronic versions may be submitted at different times prior to the assigned deadline. Assignments will not be considered complete until both components have been received.

Marking scheme

Component Weight
Questions 115 pts (5 for each scenario)
Questions 215 pts (2.5 for each scenario)
Questions 34 pts
program documentation6 pts

Last updated on 22 February 2011
by Mohannad Elhamod and Jeremy Cooperstock