Variables and Derivatives¶
In the example code of this tutorial, we assume for simplicity that the following symbols are already imported.
import numpy as np
import chainer
from chainer.backends import cuda
from chainer import Function, gradient_check, report, training, utils, Variable
from chainer import datasets, iterators, optimizers, serializers
from chainer import Link, Chain, ChainList
import chainer.functions as F
import chainer.links as L
from chainer.training import extensions
As described previously, Chainer uses the “Define-by-Run” scheme, so forward computation itself defines the network.
In order to start forward computation, we have to set the input array to a Variable
object.
Here we start with a simple ndarray
with only one element:
>>> x_data = np.array([5], dtype=np.float32)
>>> x = Variable(x_data)
A Variable object has basic arithmetic operators. In order to compute \(y = x^2 - 2x + 1\), just write:
>>> y = x**2 - 2 * x + 1
The resulting y
is also a Variable object, whose value can be extracted by accessing the data
attribute:
>>> y.data
array([16.], dtype=float32)
What y
holds is not only the result value.
It also holds the history of computation (or computational graph), which enables us to compute its derivative.
This is done by calling its backward()
method:
>>> y.backward()
This runs error backpropagation (a.k.a. backprop or reverse-mode automatic differentiation).
Then, the gradient is computed and stored in the grad
attribute of the input variable x
:
>>> x.grad
array([8.], dtype=float32)
Also we can compute gradients of intermediate variables.
Note that Chainer, by default, releases the gradient arrays of intermediate variables for memory efficiency.
In order to preserve gradient information, pass the retain_grad
argument to the backward method:
>>> z = 2*x
>>> y = x**2 - z + 1
>>> y.backward(retain_grad=True)
>>> z.grad
array([-1.], dtype=float32)
All these computations are can be generalized to a multi-element array input.
While single-element arrays are automatically initialized to [1]
, to start backward computation from a variable holding a multi-element array, we must set the initial error manually.
This is done simply by setting the grad
attribute of the output variable:
>>> x = Variable(np.array([[1, 2, 3], [4, 5, 6]], dtype=np.float32))
>>> y = x**2 - 2*x + 1
>>> y.grad = np.ones((2, 3), dtype=np.float32)
>>> y.backward()
>>> x.grad
array([[ 0., 2., 4.],
[ 6., 8., 10.]], dtype=float32)
Note
Many functions taking Variable
object(s) are defined in the functions
module.
You can combine them to realize complicated functions with automatic backward computation.