Convolutional neural networks (CNNs) are particularly well suited to classifying features in data sets modelled in two or three dimensions. This makes CNNs popular for image classification, because images can be represented in computer memories in three dimensions (two dimensions for width and height, and a third dimension for pixel features like color components and intensity). For example a color JEG image of size 480 x 480 pixels can be modelled in computer memory using an array that is 480 x 480 x 3, where each of the values of the third dimension is a red, green, or blue color component intensity for the pixel ranging from 0 to 255. Inputting this array of numbers to a trained CNN will generate outputs that describe the probability of the image being a certain class (.80 for cat, .15 for dog, .05 for bird, etc). Image classification is the task of taking an input image and outputting a class (a cat, dog, etc) or a probability of classes that best describes the image.

Fundamentally, CNNs input the data set, pass it through a series of convolutional transformations, nonlinear activation functions (e.g., RELU), and pooling operations (downsampling, e.g., maxpool), and an output layer (e.g., softmax) to generate the classifications.