CSCI 4150: Introduction to Artificial Intelligence, Fall 2003 |
This web tester does 2 sets of tests. The first set uses the a7example.scm file that you have. The second set uses randomly generated utilities, rewards, and transition probabilities. The code you submit should not have any discount factor or exploration function in place --- it should simply pick the action with the greatest expected utility.
After running the random-player for a while to learn a model of the transition probabilities and rewards, we should be able to just call your value-iteration procedure. When it finishes (hopefully not too much time!), it should have learned the hit utilities.
After running the random-player for a while to learn a model of the transition probabilities and rewards, we should be able to just call your policy-iteration procedure. When it finishes (hopefully not too much time!), it should have learned the hit utilities.
The rl-strategy procedure will be modified from what you turned in for problem 1 so that it uses your exploration function. The td-player procedure is a procedure of zero arguments which defines a player, akin to random-player in the a7header.scm file.
We should be able to initialize the tables and then call (play-match 10000 (td-player)) , and it should play the game and learn utilities simultaneously.
This will simply check to make sure everything that the save-learning procedure saves is still there. You may also add a modified rl-strategy procedure to this file if we should use it to test how well your learned player performs.