Computer Vision
Fall 2024
Skidmore College
Instructor: Michael Eckmann

Title: Pseudocode for a 4 layer Neural Network with bias units and
       Numpy hints

Pseudocode for a Neural Network of 4 layers with bias unit on input and hidden layers.
This could be generalized to a network of any number of layers 

s1 = input layer size (# of input units)
s2 = 1st hidden layer size (# of hidden units)
s3 = 2nd hidden layer size 
s4 = output layer size (# of output units)
Note: s1, s2 and s3 sizes exclude the bias

x is assumed to be a column vector and have the bias unit of 1 already as the first element

Do all the following some number of epochs (that is, the pseudocode below is for 1 epoch)

for each batch:
    all Deltas <- 0
    for each training example x in a batch:
        # start forward propagation
        a1 = x        (size: (s1+1)x 1)
        z2 = w1@a1    (w1 is s2x(s1+1) and a1 is (s1+1) x 1, z2 becomes s2x1)
        a2 = g(z2)    (g(z2) is s2x1 --- must then add a0=1 to a2 to make a2 be (s2+1)x1 )
        z3 = w2@a2    (w2 is s3x(s2+1) and a2 is (s2+1)x1 to make z3 be s3x1 )
        a3 = g(z3)    (a3 is s3x1 --- must then add a0=1 to a3 to make a3 be (s3+1)x1 )
        z4 = w3@a3    (w3 is s4x(s3+1) and a3 is (s3+1)x1 to make z4 be s4x1 )
        a4 = g(z4)    (a4 is s4x1)
        # end forward propagation
        
        # start backprop
        d4 = a4 - ytrain_for_that_example    (size: s4x1)
        
        w3wob = w3 with 1st column removed making w3wob be s4xs3
        w3wobT is w3wob transpose            which becomes s3xs4
        
        d3 = w3wobT @ d4 * g'(z3)            (w3wobT is s3xs4, d4 is s4x1  @ 
                                              results in s3x1.  g'(z3) is s3x1
                                              and do elementwise *
                                              results in d3 of size: s3x1)
        
        w2wob = w2 with 1st column removed making w2wob be s3xs2
        w2wobT is w2wob transpose            which becomes s2xs3
        
        d2 = (w2wobT @ d3) * g'(z2)          (w2wobT is s2xs3, d3 is s3x1 
                                              results in s2x1.  g'(z2) is s2x1 
                                              and do elementwise *
                                              results in d2 of size: s2x1
        # end of backprop
        
        #accumulate the partial derivatives of the weights
        # note: a1T is a1 transpose, similarly for a2T and a3T
        Delta1 += d2 @ a1T                    s2x1 @ 1x(s1+1) results in (s2)x(s1+1)
        Delta2 += d3 @ a2T                    s3x1 @ 1x(s2+1) results in (s3)x(s2+1)
        Delta3 += d4 @ a3T                    s4x1 @ 1x(s3+1) results in (s4)x(s3+1)
        # end accumulate
    
    # gradient descent update of the weights for 1 batch    
    w1 = w1 - learnRate*(Delta1/batchSize)
    w2 = w2 - learnRate*(Delta2/batchSize)
    w3 = w3 - learnRate*(Delta3/batchSize)
    
    
=====
Some numpy things that will help with programming assignment

// all the following assumes we did: 
// import numpy as np

1. suppose mat is a matrix of say 5x10 e.g.

   mat = np.zeros((5,10))
   
   mat[:,1:] is a 5x9 matrix with the first column of original mat removed
   
2. to add a 1 to the top of a column vector v (note: it is 2-dimensional but number of
   columns is 1)

   say v = np.zeros((5,1)) then do:
   
   v = np.vstack((np.array([1]), v))
   
   now v is shape (6,1) with a 1 in top spot
   
3. transpose() to get the transpose of a 2-dimensional array (e.g. a matrix or 2-d column vector)

   example:
   v.transpose() is 1x6 
   mat.transpose() is a 10x5 matrix (mat is 5x10)

4. to get the number e to a power
 
   example:
   np.exp(2)   is e squared
   
5. to use a formula as index to a numpy array

   suppose v is:

   array([[1.],
          [0.],
          [0.],
          [0.],
          [0.],
          [0.]])

   then 
   v[v == 0] = 42
   
   makes v become:
   
   array([[ 1.],
          [42.],
          [42.],
          [42.],
          [42.],
          [42.]])

6.  @ is matrix multiplication or matrix vector multiplication
      or column vector times a row vector resulting in a matrix

7.  * does elementwise multiplication on arrays (must be same shape) 
    * does scalar multiplication of an array and scalar 
    + does elementwise addition on arrays (must be same shape) 
    + does scalar addition to each element of an array 
    - similar
    / similar