; annotated output produced by my assignment 6 solutions (learn-dtree snorkel-data snorkel-names) Node 0 ---------------------------------------------------------------- Training data have mixed classification; splitting on some attribute. 8 positive examples, 4 negative examples; Information required = 0.92 bits Splitting on time yields: (morning afternoon) (((no 1) (yes 4)) ((yes 4) (no 3))) The information on this split is: (.7219280948873623 .9852281360342516) * (5/12 7/12) = 0.88 bits Splitting on waves yields: (small medium large) (((no 1) (yes 3)) ((no 1) (yes 3)) ((no 2) (yes 2))) The information on this split is: (.8112781244591328 .8112781244591328 1.) * (4/12 4/12 4/12) = 0.87 bits Splitting on skies yields: (sunny cloudy) (((no 4) (yes 4)) ((yes 4))) The information on this split is: (1. 0) * (8/12 4/12) = 0.67 bits Splitting on tide yields: (high low) (((no 2) (yes 5)) ((yes 3) (no 2))) The information on this split is: (.863120568566631 .9709505944546686) * (7/12 5/12) = 0.91 bits The attribute with greatest information gain is skies. Node 1A ---------------------------------------------------------------- Now working on the split for the skies="cloudy" with examples: ((yes (high cloudy medium afternoon)) (yes (high cloudy large afternoon)) (yes (low cloudy small afternoon)) (yes (low cloudy large afternoon))) All data have the same classification of: yes. Node 1B ---------------------------------------------------------------- Now working on the split for the skies="sunny" with examples: ((yes (high sunny small morning)) (no (high sunny small afternoon)) (yes (high sunny medium morning)) (yes (high sunny small morning)) (no (high sunny large afternoon)) (no (low sunny medium morning)) (no (low sunny large afternoon)) (yes (low sunny medium morning))) Training data have mixed classification; splitting on some attribute. 4 positive examles and 4 negative examples; Information required = 1 bit Splitting on time yields: (afternoon morning) (((no 3)) ((no 1) (yes 4))) The information on this split is: (0 .7219280948873623) * (3/8 5/8) = 0.45 bits Splitting on waves yields: (large medium small) (((no 2)) ((no 1) (yes 2)) ((no 1) (yes 2))) The information on this split is: (0 .9182958340544896 .9182958340544896) * (2/8 3/8 3/8) = 0.69 bits Splitting on tide yields: (low high) (((no 2) (yes 1)) ((yes 3) (no 2))) The information on this split is: (.9182958340544896 .9709505944546686) * (3/8 5/8) = 0.95 bits The attribute with greatest information gain is time. Node 2AA ---------------------------------------------------------------- Now working on the split for the time="morning" with examples: ((yes (low sunny medium morning)) (no (low sunny medium morning)) (yes (high sunny small morning)) (yes (high sunny medium morning)) (yes (high sunny small morning))) Training data have mixed classification; splitting on some attribute. 4 positive examples, 1 negative example; Information required = 0.72 bits Splitting on waves yields: (small medium) (((yes 2)) ((no 1) (yes 2))) The information on this split is: (0 .9182958340544896) * (2/5 3/5) = 0.55 bits Splitting on tide yields: (high low) (((yes 3)) ((yes 1) (no 1))) The information on this split is: (0 1.) * (3/5 2/5) = 0.40 bits The attribute with greatest information gain is tide. Node 3AAA ---------------------------------------------------------------- Now working on the split for the tide="low" with examples: ((no (low sunny medium morning)) (yes (low sunny medium morning))) Training data have mixed classification; splitting on some attribute. 1 positive example, 1 negative example; Information required = 1 bit Splitting on waves yields: (medium) (((no 1) (yes 1))) The information on this split is: (1.) * (2/2) = 1 bit The attribute with greatest information gain is waves. Node 4AAAA ---------------------------------------------------------------- Now working on the split for the waves="medium" with examples: ((yes (low sunny medium morning)) (no (low sunny medium morning))) No more attributes to split upon Taking the majority which is: no Node 3AAB ---------------------------------------------------------------- Now working on the split for the tide="high" with examples: ((yes (high sunny small morning)) (yes (high sunny medium morning)) (yes (high sunny small morning))) All data have the same classification of: yes. Node 2AB ---------------------------------------------------------------- Now working on the split for the time="afternoon" with examples: ((no (low sunny large afternoon)) (no (high sunny large afternoon)) (no (high sunny small afternoon))) All data have the same classification of: no. ; (skies (cloudy yes) ; (sunny (time (morning (tide (low (waves (medium no))) ; (high yes))) ; (afternoon no))))