Computer Vision Fall 2024 Skidmore College Instructor: Michael Eckmann Title: Pseudocode for a 4 layer Neural Network with bias units and Numpy hints Pseudocode for a Neural Network of 4 layers with bias unit on input and hidden layers. This could be generalized to a network of any number of layers s1 = input layer size (# of input units) s2 = 1st hidden layer size (# of hidden units) s3 = 2nd hidden layer size s4 = output layer size (# of output units) Note: s1, s2 and s3 sizes exclude the bias x is assumed to be a column vector and have the bias unit of 1 already as the first element Do all the following some number of epochs (that is, the pseudocode below is for 1 epoch) for each batch: all Deltas <- 0 for each training example x in a batch: # start forward propagation a1 = x (size: (s1+1)x 1) z2 = w1@a1 (w1 is s2x(s1+1) and a1 is (s1+1) x 1, z2 becomes s2x1) a2 = g(z2) (g(z2) is s2x1 --- must then add a0=1 to a2 to make a2 be (s2+1)x1 ) z3 = w2@a2 (w2 is s3x(s2+1) and a2 is (s2+1)x1 to make z3 be s3x1 ) a3 = g(z3) (a3 is s3x1 --- must then add a0=1 to a3 to make a3 be (s3+1)x1 ) z4 = w3@a3 (w3 is s4x(s3+1) and a3 is (s3+1)x1 to make z4 be s4x1 ) a4 = g(z4) (a4 is s4x1) # end forward propagation # start backprop d4 = a4 - ytrain_for_that_example (size: s4x1) w3wob = w3 with 1st column removed making w3wob be s4xs3 w3wobT is w3wob transpose which becomes s3xs4 d3 = w3wobT @ d4 * g'(z3) (w3wobT is s3xs4, d4 is s4x1 @ results in s3x1. g'(z3) is s3x1 and do elementwise * results in d3 of size: s3x1) w2wob = w2 with 1st column removed making w2wob be s3xs2 w2wobT is w2wob transpose which becomes s2xs3 d2 = (w2wobT @ d3) * g'(z2) (w2wobT is s2xs3, d3 is s3x1 results in s2x1. g'(z2) is s2x1 and do elementwise * results in d2 of size: s2x1 # end of backprop #accumulate the partial derivatives of the weights # note: a1T is a1 transpose, similarly for a2T and a3T Delta1 += d2 @ a1T s2x1 @ 1x(s1+1) results in (s2)x(s1+1) Delta2 += d3 @ a2T s3x1 @ 1x(s2+1) results in (s3)x(s2+1) Delta3 += d4 @ a3T s4x1 @ 1x(s3+1) results in (s4)x(s3+1) # end accumulate # gradient descent update of the weights for 1 batch w1 = w1 - learnRate*(Delta1/batchSize) w2 = w2 - learnRate*(Delta2/batchSize) w3 = w3 - learnRate*(Delta3/batchSize) ===== Some numpy things that will help with programming assignment // all the following assumes we did: // import numpy as np 1. suppose mat is a matrix of say 5x10 e.g. mat = np.zeros((5,10)) mat[:,1:] is a 5x9 matrix with the first column of original mat removed 2. to add a 1 to the top of a column vector v (note: it is 2-dimensional but number of columns is 1) say v = np.zeros((5,1)) then do: v = np.vstack((np.array([1]), v)) now v is shape (6,1) with a 1 in top spot 3. transpose() to get the transpose of a 2-dimensional array (e.g. a matrix or 2-d column vector) example: v.transpose() is 1x6 mat.transpose() is a 10x5 matrix (mat is 5x10) 4. to get the number e to a power example: np.exp(2) is e squared 5. to use a formula as index to a numpy array suppose v is: array([[1.], [0.], [0.], [0.], [0.], [0.]]) then v[v == 0] = 42 makes v become: array([[ 1.], [42.], [42.], [42.], [42.], [42.]]) 6. @ is matrix multiplication or matrix vector multiplication or column vector times a row vector resulting in a matrix 7. * does elementwise multiplication on arrays (must be same shape) * does scalar multiplication of an array and scalar + does elementwise addition on arrays (must be same shape) + does scalar addition to each element of an array - similar / similar