# Assignment #3: Wall Street

## Due date: March 8 by 16:00 (extended)

This assignment is to be done individually.

## Introduction

Financial market analysis and forecasting is one of the major consumers of statistical and AI research, given the obvious profit motive: if you can predict, with reasonable success, how the exchange rate between two currencies or the value of a particular stock, will behave tomorrow, next week, or next month, you stand to gain a rather nice improvement to your standard of living. Such forecasting is a typical example of a time series analysis problem.

To give you an appreciation of the potential for AI methods, in particular, a neural network approach, for making such predictions, this assignment deals with predicting the adjusted close price of the S&P/TSX Composite Index. Note that the only data you should be using is this set of adjusted close figures.

Your task is to experiment with a regular feed-forward, backpropagation neural network, using a single hidden layer, to see how well you can predict the adjusted close for various days in the future, given training data of the market's recent behaviour.

In addition to varying the learning rate, α, and momentum, γ, the number of hidden layer units, and the size of training data, you will also want to experiment with the length of each such training vector, in other words, the number of inputs to your network. You can think of this as corresponding to how many days' worth of adjusted closes are considered relevant to predicting the adjusted close on some future date.

Try varying the meaning of each input to your network, for example:

• input xi corresponds to the adjusted close i days in the past
• input xi corresponds to the gain or loss in adjusted closes between i and i+1 days in the past
• input xi corresponds to the ratio of adjusted closes between i and i+K days in the past, where K is another parameter to vary

You may wish to use the sample C code provided here or the (far more complete) Matlab Neural Network toolbox for your assignment, although you are equally encouraged to write your own code for this purpose.

## Questions

• Describe how you split your data between training and test vectors.
• Describe how you chose the number of hidden layer units.
• How many training iterations or epochs does it take for your network to converge? How is this influenced by your choice of α and γ? Provide sample results in a graphical format, with separate graphs showing performance on the training and test sets. Multiple plots may be overlaid on the same graph for different values of α and γ.
• For training vectors of lengths 5, 25, 50, and 100, discuss how well your network predicts performance for the successive period, for example, 10 days into the future, or further beyond. Illustrate your response with graphs, indicating accuracy as a function of how many days ahead the network is predicting. Quality of prediction for any particular look-ahead period (e.g. 5 days in the future) should be measured by the mean squared error over the test set.
• What input encoding provided the best prediction performance for 1 day in the future? For 5 days? For 50 days? (Bonus marks will be allotted to the student with the highest prediction performance in these categories.)
• If you were to conduct financial trades based on predictions made by your neural network, what sort of profit would you expect to make?

You must submit, at the start of class on the due date, a brief hardcopy report, which answers the questions, above, and explains the design decisions you made, as well as any interesting results you observed.

Remember that your report is the main method of communicating what you have accomplished to the reader. Therefore, make sure that it is well organized and well written; you will lose marks for spelling errors and poor grammar. The report should be a maximum of three pages in length; anything beyond the three-page limit will not be read. You are welcome to include illustrations to elucidate the text. A hardcopy of your source code and a brief set of testing results should be included as an appendix. (The appendix does not count as part of the three-page limit.)

In addition to the hardcopy submission, you must submit, through WebCT, an electronic version of your assignment in UNIX tar format, which includes:

• all of the source code ready to be compiled under the Trottier Engineering Linux machines
• a Makefile for compiling your source code
Do not include with your submission any other items, such as a copy of your report, object code (*.o) files, or your executable program, or marks will be deducted.

Assignments will not be considered complete if they have not been submitted both in hardcopy and electronically or if the two versions of the source code differ. The hardcopy and electronic versions may be submitted at different times prior to the assigned deadline.

## Marking scheme

 Component Weight Relevant experiments and discussion 6 Answers to questions 10 Actual code, reproduction of results, network performance 4

Last updated on 6 March 2007
Assignment suggested and tested by Samuel Audet