Back

CNN


Drawings

Brief Description:

illustrates a convolutional neural network 100 in accordance with one embodiment.

Detailed Description:

Figure 1 illustrates an exemplary convolutional neural network 100. The convolutional neural network 100 arranges its neurons in three dimensions (width, height, depth), as visualized in convolutional layer 104. Every layer of the convolutional neural network 100 transforms a 3D volume of inputs to a 3D output volume of neuron activations. In this example, the input layer 102 encodes the image, so its width and height would be the dimensions of the image, and the depth would be 3 (Red, Green, Blue channels). The convolutional layer 104 further transforms the outputs of the input layer 102, and the output layer 106 transforms the outputs of the convolutional layer 104 into one or more classifications of the image content.

Brief Description:

illustrates a convolutional neural network layers 200 in accordance with one embodiment.

Detailed Description:

Figure 2 illustrates an exemplary convolutional neural network layers 200 in more detail. An example subregion of the input layer region 204 of an input layer region 202 region of an image is analyzed by a set of convolutional layer subregion 208 in the convolutional layer 206. The input layer region 202 is 32×32 neurons long and wide (e.g., 32×32 pixels), and three neurons deep (e.g., three color channels per pixel). Each neuron in the convolutional layer 206 is connected only to a local region in the input layer region 202 spatially (in height and width), but to the full depth (i.e. all color channels if the input is an image). Note, there are multiple neurons (5 in this example) along the depth of the convolutional layer subregion 208 that analyzes the subregion of the input layer region 204 of the input layer region 202, in which each neuron of the convolutional layer subregion 208 may receive inputs from every neuron of the subregion of the input layer region 204

Brief Description:

illustrates a VGG net 300 in accordance with one embodiment.

Detailed Description:

Figure 3 illustrates a popular form of a CNN known as a VGG net 300. The initial convolution layer 302 stores the raw image pixels and the final pooling layer 320 determines the class scores. Each of the intermediate convolution layers ( convolution layer 306, convolution layer 312, and convolution layer 316) and rectifier activations ( RELU layer 304, RELUlayer 308, RELUlayer 314, and RELUlayer 318) and intermediate pooling layers (pooling layer 310, pooling layer 320) along the processing path is shown as a column.

The VGG net 300 replaces the large single-layer filters of basic CNNs with multiple 3×3 sized filters in series. With a given receptive field (the effective area size of input image on which output depends), multiple stacked smaller size filters may perform better at image feature classification than a single layer with a larger filter size, because multiple non-linear layers increase the depth of the network which enables it to learn more complex features. In a VGG net 300 each pooling layer may be only 2×2.

Brief Description:

illustrates a convolution layer filtering 400 in accordance with one embodiment.

Detailed Description:

Figure 4 illustrates a convolution layer filtering 400 that connects the outputs from groups of neurons in a convolution layer 402 to neurons in a next layer 406. A receptive field is defined for the convolution layer 402, in this example sets of 5×5 neurons. The collective outputs of each neuron the receptive field are weighted and mapped to a single neuron in the next layer 406. This weighted mapping is referred to as the filter 404 for the convolution layer 402 (or sometimes referred to as the kernel of the convolution layer 402). The filter 404 depth is not illustrated in this example (i.e., the filter 404 is actually a cubic volume of neurons in the convolution layer 402, not a square as illustrated). Thus what is shown is a “slice” of the full filter 404. The filter 404 is slid, or convolved, around the input image, each time mapping to a different neuron in the next layer 406. For example Figure 4 shows how the filter 404 is stepped to the right by 1 unit (the “stride”), creating a slightly offset receptive field from the top one, and mapping its output to the next neuron in the next layer 406. The stride can be and often is other numbers besides one, with larger strides reducing the overlaps in the receptive fields, and hence further reducing the size of the next layer 406. Every unique receptive field in the convolution layer 402 that can be defined in this stepwise manner maps to a different neuron in the next layer 406. Thus, if the convolution layer 402 is 32x32x3 neurons per slice, the next layer 406 need only be 28x28x1 neurons to cover all the receptive fields of the convolution layer 402. This is referred to as an activation map or feature map. There is thus a reduction in layer complexity from the filtering. There are 784 different ways that a 5 x 5 filter can uniquely fit on a 32 x 32 convolution layer 402, so the next layer 406 need only be 28 x 28. The depth of the convolution layer 402 is also reduced from 3 to 1 in the next layer 406.

 The number of total layers to use in a CNN, the number of convolution layers, the filter sizes, and the values for strides at each layer are examples of “hyperparameters” of the CNN. 

Brief Description:

illustrates a pooling layer function 500 in accordance with one embodiment.

Detailed Description:

Figure 5 illustrates a pooling layer function 500 with a 2×2 receptive field and a stride of two. The pooling layer function 500 is an example of the maxpool pooling technique. The outputs of all the neurons in a particular receptive field of the input layer 502 are replaced by the maximum valued one of those outputs in the pooling layer 504. Other options for pooling layers are average pooling and L2-norm pooling. The reason to use a pooling layer is that once a specific feature is recognized in the original input volume (there will be a high activation value), its exact location is not as important as its relative location to the other features. Pooling layers can drastically reduce the spatial dimension of the input layer 502 from that pont forward in the neural network (the length and the width change but not the depth). This serves two main purposes. The first is that the amount of parameters or weights is greatly reduced thus lessening the computation cost. The second is that it will control overfitting. Overfitting refers to when a model is so tuned to the training examples that it is not able to generalize well when applied to live data sets.


Parts List

100

convolutional neural network

102

input layer

104

convolutional layer

106

output layer

200

convolutional neural network layers

202

input layer region

204

subregion of the input layer region

206

convolutional layer

208

convolutional layer subregion

300

VGG net

302

convolution layer

304

RELU layer

306

convolution layer

308

RELUlayer

310

pooling layer

312

convolution layer

314

RELUlayer

316

convolution layer

318

RELUlayer

320

pooling layer

400

convolution layer filtering

402

convolution layer

404

filter

406

next layer

500

pooling layer function

502

input layer

504

pooling layer


Terms/Definitions