Assignment 7 Questions & Answers

Question: How should we generate the learning curve?
Answer: You can make a call just like the ones being used to test your code for Problem 2. For example:
```
(test-perceptron (learn-perceptron bc-data-l1 bc-data-l2) bc-data-l2)
(test-perceptron (learn-perceptron bc-data-l1 bc-data-l2) bc-data-l3)
```
The first example above tests the perceptron against the testing data. With the synthetic data, you'd have to use this option since I've only provided two training/testing sets. The second tests the perceptron against an independent set of data (one not used at all in learning the perceptron). If you wanted to do this with the synthetic data, you could generate your own data sets using the code provided on the A7 page.
You can additionally show how the perceptron is doing on the training data. (And the perceptron should be doing better on its training data than on other data sets.) For example:
```
(test-perceptron (learn-perceptron bc-data-l1 bc-data-l2) bc-data-l1)
```
Question: Another question I have, for problem 2. I assume we're supposed to ensure that the function can handle testing and training data with different amounts of attributes. For instance, A1 and A2 have two attributes each, while B1 and B2 have 10 each. Am I write in this assumption? Should we be able to train out perceptron on A1 and then test it on B2? Or some other cominbation?
Answer: When you learn weights for a perceptron, the number of inputs to that perceptron is fixed. You cannot use it with a data set with a different number of inputs. Therefore, if you train a perceptron with two dimensional data sets, you can only test it against two dimensional data sets. Your procedure, however, should be able to handle learning and testing perceptrons that take different numbers of inputs.
Question: My main question regarding the error value is how we reach a final value? We're supposed to minus the correct output from the actual output, but these are both vectors...
Answer: no, they are both scalars. the output of the perceptron procedure is a scalar (1 or -1) and so is the correct output in each training example.
Question: For the training examples that are perfectly linearly separable in lsd.scm (a1,a2,b2,b3,..) am I right to assume that my "learn-perceptron" procedure should be capable of correctly classifying all examples in the set? For example, should I expect (learn-perceptron a1 a2) to return a set of weights that can achieve 100% accuracy, trained on a1, and tested on a2? I ask this because I am having trouble achieving 100% accuracy = consistently...
Answer: With the perceptron learning rule, for sufficiently small alpha (and assuming no problems with numerical accuracy), the weights of the perceptron should converge to values that perfectly separate the training data, assuming the training data are linearly separable. It is not possible to guarantee that they will perfectly classify other data drawn from the same distribution. For example, suppose you have a training data set consisting of the points in input space (0,0) and (1,1) with outputs -1 and 1 respectively. There are many different hyperplanes that can separate these two points, not all of which will produce the same output for the input point (0,1) or (1,0). If your perceptron has settled to the minimum of an error function defined over the entire set of training data, then the hyperplane chosen should be "halfway" between these two points.
Question: First, when you say that the error is the correct output minus the actual output we've calculated, what is the correct output?
Answer: the correct output is what you are given with the training example. (How else would you get the correct output?)
Question: Second, when we update the weights do we also update the threshold value in the weight's list. I am assuming not, but I am just making sure.
Answer: the threshold value is one of the weights, so it ggets updated with all the other weights.
Question: Third, do we exclude the extra -1 that we appended to the input vector when we are applying the perceptron learning rule to update the weights?
Answer: If you do, then the weights and the input vector are not the same length.
Question: How do I use the perceptron learning rule?
Answer: The training example consists of a list of N inputs and a single output (either 1 or -1). You take the N inputs and prepend a -1 to them to form the "actual" input vector. You can then take your vector of $N+1$ weights and the input vector and give them to the perceptron procedure which will compute the output of a perceptron for that input. Please examine the support code of this procedure.
You then compute the error which is the correct output value minus the value that the perceptron returns. You then update the weights by adding to the weight vector alpha (the learning rate) times the error times the input vector. One epoch consists of doing this for all examples in a training data set.
Question: For problem 1, what are we suppose to return, a weight list or a list of weight list? If we are to return a single weight list, do we apply that formula recursively?
Answer: Your perceptron-epoch procedure should return a single list of weights which are the current weights after one pass through the training data. If there are N inputs in the training data, both the start-weights and the list of weights you return should have (N+1) elements.
Question: For problem 2, what's the different between training-data and testing-data?
Answer: They are the same sort of thing, but you should use one for testing and the other for training.
Question: Can you explain to me why we need a list of n+1 weights with n attributes? The formula given in problem 1, how do we apply that formula if w has one more element than I?
Answer: Read the first bullet under "Notes, Conventions, and suggestions" in the assignment handout.
Question: I'm not entirely sure how the [perceptron learning rule] works. Do we use the dot product to multiply alpha with the Input vector with the error? Or do we use the dot product when adding the weight vector with the result of multiplying alpha with the Input vector with the error?
Answer: You can only take a dot product of two vectors. Alpha (the learning rate) is a scalar, and so is the error. So what the perceptron learning rule says is update the weights by those constants times the input vector.
Question: how is this perceptron-epoch function run? Are we just supposed to run the given equation over the given training data and return whatever the result is?
Answer: The perceptron-epoch function should go through all the examples in the training data and apply the perceptron learning rule for each one. Be sure you go through the training examples in order because the web tester assumes this.
Question: is the error calculated from the training data and the start-weights? Error is supposed to be the correct output minus the actual output. Well what outputs are these?
Answer: For each example, you use the current weights to calculate the output of the perceptron for the example inputs. the error is then the difference between the correct output value and the actual output value you computed. then you update the weights with the perceptron learning rule and then go on to the next example.
Question: Should this list of weights include or exclude the threshhold value? Which brings me to my second question, what is a good threshhold value, and should it be altered by the perceptron learning rule or not?
Answer: See the first bullet under the "notes, conventions, and suggestions" section on the assignnment handout. Since the threshold value is incorporated into the weights, this is learned by the perceptron along with the value of the other weights and is therefore part of the weight vector returned by the perceptron-epoch procedure.
Question: [could you give an example of] what the perceptron-epoch function should output when given specific values for the weights, training-data, and alpha value. I think I have an idea, but without a check on this I wouldn't know if I was right or wrong.
No, I don't think I will be providing a specific example. I would rather encourage students to make sure that they understand the algorithm that they are implementing and for them to test it themselves. One way to test your procedure is of course to learn a perceptron with it; its performance should improve! You could also test your procedure using the ideas we covered in class on Thursday --- see whether the value of an error function is actually decreasing as you take these steps in the direction of the negative gradient.