CSCI 4150: Introduction to Artificial Intelligence, Fall 2005 |
I think the original "stubs" file had these names correct.
This will enable the support code to automatically set the utility of terminal states to be the average reward for that state.
The optional second argument util-init may be a number (the initial utility value for all nonterminal states), or a procedure of 1 argument (which is called for each nonterminal state, passed the number of the state, and must return the initial utility value for that state). If omitted, all initial nonterminal utilities will be set to 0.0.
If a procedure is given, this procedure is called before each hand is played (including the first hand). If it returns #t, then the hand is not played and the play-match procedure returns.
You can use this mechanism to keep playing hands until some condition is met, such as the maximum change in any utility value falling below some threshold.
(save-tables fname) — this procedure writes the variables used to store the information in the tables to a file. BE CAREFUL: this procedure will probably OVERWRITE the filename that you give it.
To load tables that you have saved, you can just load the file like you load any Scheme file. You should not call init-tables after doing so, lest your newly loaded information be erased.
A few general things:
Here are the parts for problem 5:
Give a brief explanation why you transformed the game state to a reinforcement learning state this way.
Please note that I want you to report the amount won/lost and the the total amount wagered for each of these three steps in your writeup, so make sure you record this information!
I suggest you save the tables to a file after this step:
(save-tables "a7p5b-model.scm")so that you can try different things in part B-2 without doing this step again.
(define enable-table-updates #f)This will keep the transition probabilities and average rewards from changing while you are learning the utilities.
Learn utilities by playing backjack with the following player:
(define (td-player) (list "TD-player" (create-exploring-rl-strategy R+ Ne) (create-td-learning alpha-fn)))You will need to decide upon values for R+ and Ne and what your alpha-fn function should be. You will also have to figure out when to stop learning.
Save the tables after this step and upload this file to the webtester.
(save-tables "a7p5b-utilities.scm")
(define (utility-player) (list "Your name here" basic-rl-strategy non-learning-procedure))Make sure that you have disabled the table updates (as in the previous step)
Please note that I want you to report the amount won/lost and the the total amount wagered for each of these two steps in your writeup, so make sure you record this information!
Here are the steps you should follow:
Make sure you have reenabled table updates if you disabled them.
Save the tables after this step and upload this file to the webtester.
(save-tables "a7p5c-utilities.scm")