; Decision Tree learning example 11/08/2001 (revised 11/20) ; Data set is from "Artificial Intelligence: a modern approach" ; by Russell and Norvig ; ; first, highlights from a few manual steps... (define (log2 x) (/ (log x) (log 2))) ; information required to classifiy the entire training set (tally-tdata restaurant-data) ;Value: ((no 6) (yes 6)) (+ (* -6/12 (log2 6/12)) ; 6 yes's (-6/12 is a "rational" number) (* -6/12 (log2 6/12))) ; 6 no's ;Value: 1. (split-tdata restaurant-data restaurant-names 'type) ;Value: ((italian ((no (yes yes yes yes full $$$ no yes italian upto30)) ; (yes (no yes no yes some $$ yes yes italian upto10)))) ; (burger ((yes (yes yes yes yes full $ no no burger upto60)) ; (no (no yes yes no full $ yes no burger above60)) ; (no (no yes no no none $ yes no burger upto10)) ; (yes (no yes no no some $ no no burger upto10)))) ; (thai ((no (no no no no none $ no no thai upto10)) ; (yes (no no no yes some $$ yes yes thai upto10)) ; (yes (yes no yes yes full $ no no thai upto30)) ; (no (yes no no yes full $ no no thai upto60)))) ; (french ((no (yes no yes no full $$$ no yes french above60)) ; (yes (yes no no yes some $$$ no yes french upto10))))) (+ (* (/ 2 12) 1) (* (/ 4 12) 1) (* (/ 4 12) 1) (* (/ 2 12) 1)) ;Value: 1 (split-tdata restaurant-data restaurant-names 'patrons) ;Value: ((none ((no (no no no no none $ no no thai upto10)) ; (no (no yes no no none $ yes no burger upto10)))) ; (full ((yes (yes yes yes yes full $ no no burger upto60)) ; (no (yes yes yes yes full $$$ no yes italian upto30)) ; (no (no yes yes no full $ yes no burger above60)) ; (no (yes no yes no full $$$ no yes french above60)) ; (yes (yes no yes yes full $ no no thai upto30)) ; (no (yes no no yes full $ no no thai upto60)))) ; (some ((yes (no no no yes some $$ yes yes thai upto10)) ; (yes (no yes no yes some $$ yes yes italian upto10)) ; (yes (no yes no no some $ no no burger upto10)) ; (yes (yes no no yes some $$$ no yes french upto10))))) (+ (* (/ 2 12) 0) (* (/ 6 12) (+ (* -2/6 (log2 2/6)) (* -4/6 (log2 4/6)))) (* (/ 4 12) 0)) ;Value: .4591479170272448 (define full-split (second (second (split-tdata restaurant-data restaurant-names 'patrons)))) (split-tdata full-split restaurant-names 'rain) ;Value: ((yes ((no (no yes yes no full $ yes no burger above60)))) (no ((no (yes no no yes full $ no no thai upto60)) (yes (yes no yes yes full $ no no thai upto30)) (no (yes no yes no full $$$ no yes french above60)) (no (yes yes yes yes full $$$ no yes italian upto30)) (yes (yes yes yes yes full $ no no burger upto60))))) (* -3/5 (log2 3/5)) ;Value: .44217935649972373 (split-tdata full-split restaurant-names 'hungry) ;Value: ((no ((no (yes no yes no full $$$ no yes french above60)) ; (no (no yes yes no full $ yes no burger above60)))) ; (yes ((no (yes no no yes full $ no no thai upto60)) ; (yes (yes no yes yes full $ no no thai upto30)) ; (no (yes yes yes yes full $$$ no yes italian upto30)) ; (yes (yes yes yes yes full $ no no burger upto60))))) (* -4/6 (log2 4/6)) ;Value: .38997500048077083 (define hungry=yes:split (second (second (split-tdata full-split restaurant-names 'hungry)))) ;Value: hungry=yes:split ; Now, a "narration" from the executing the entire algorithm ; >(learn-dtree restaurant-data restaurant-names) Training data have mixed classification; splitting on some attribute. The information required after splitting on wait-estimate is .792481250360578. The information required after splitting on type is .9999999999999999. The information required after splitting on reservation is .9792791603760922. The information required after splitting on rain is 1.. The information required after splitting on price is .8042903712002692. The information required after splitting on patrons is .4591479170272448. The information required after splitting on hungry is .8042903712002691. The information required after splitting on fri is .9792791603760922. The information required after splitting on bar is 1.. The information required after splitting on alternate is 1.. The attribute with greatest information gain is patrons. Now working on the split for the patrons="some" with examples: ((yes (no no no yes some $$ yes yes thai upto10)) (yes (no yes no yes some $$ yes yes italian upto10)) (yes (no yes no no some $ no no burger upto10)) (yes (yes no no yes some $$$ no yes french upto10))) All data have the same classification of: yes. Now working on the split for the patrons="full" with examples: ((yes (yes yes yes yes full $ no no burger upto60)) (no (yes yes yes yes full $$$ no yes italian upto30)) (no (no yes yes no full $ yes no burger above60)) (no (yes no yes no full $$$ no yes french above60)) (yes (yes no yes yes full $ no no thai upto30)) (no (yes no no yes full $ no no thai upto60))) Training data have mixed classification; splitting on some attribute. The information required after splitting on wait-estimate is .6666666666666666. The information required after splitting on type is .6666666666666666. The information required after splitting on reservation is .6666666666666666. The information required after splitting on rain is .8091254953788906. The information required after splitting on price is .6666666666666666. The information required after splitting on hungry is .6666666666666666. The information required after splitting on fri is .8091254953788906. The information required after splitting on bar is .9182958340544896. The information required after splitting on alternate is .8091254953788906. The attribute with greatest information gain is hungry. Now working on the split for the hungry="yes" with examples: ((no (yes no no yes full $ no no thai upto60)) (yes (yes no yes yes full $ no no thai upto30)) (no (yes yes yes yes full $$$ no yes italian upto30)) (yes (yes yes yes yes full $ no no burger upto60))) Training data have mixed classification; splitting on some attribute. The information required after splitting on wait-estimate is 1.. The information required after splitting on type is .5. The information required after splitting on reservation is .6887218755408672. The information required after splitting on rain is 1.. The information required after splitting on price is .6887218755408672. The information required after splitting on fri is .6887218755408672. The information required after splitting on bar is 1.. The information required after splitting on alternate is 1.. The attribute with greatest information gain is type. Now working on the split for the type="thai" with examples: ((yes (yes no yes yes full $ no no thai upto30)) (no (yes no no yes full $ no no thai upto60))) Training data have mixed classification; splitting on some attribute. The information required after splitting on wait-estimate is 0. The information required after splitting on reservation is 1.. The information required after splitting on rain is 1.. The information required after splitting on price is 1.. The information required after splitting on fri is 0. The information required after splitting on bar is 1.. The information required after splitting on alternate is 1.. The attribute with greatest information gain is fri. Now working on the split for the fri="yes" with examples: ((yes (yes no yes yes full $ no no thai upto30))) All data have the same classification of: yes. Now working on the split for the fri="no" with examples: ((no (yes no no yes full $ no no thai upto60))) All data have the same classification of: no. Now working on the split for the type="italian" with examples: ((no (yes yes yes yes full $$$ no yes italian upto30))) All data have the same classification of: no. Now working on the split for the type="burger" with examples: ((yes (yes yes yes yes full $ no no burger upto60))) All data have the same classification of: yes. Now working on the split for the hungry="no" with examples: ((no (yes no yes no full $$$ no yes french above60)) (no (no yes yes no full $ yes no burger above60))) All data have the same classification of: no. Now working on the split for the patrons="none" with examples: ((no (no no no no none $ no no thai upto10)) (no (no yes no no none $ yes no burger upto10))) All data have the same classification of: no. ;Value: (patrons (none no) ; (full (hungry (no no) ; (yes (type (burger yes) ; (italian no) ; (thai (fri (no no) ; (yes yes))))))) ; (some yes))