activation¶

class GLU(size, initialiser=<function init_xavier>, *args, **kwargs)[source]¶

Gated Linear Unit (GLU) activation function.

This layer applies the GLU function to the input tensor. GLU(x) = x_left * sigmoid(x_right)

Parameters:

size (int) – Size of the input tensor.
initialiser (callable) – Function to initialize the weights. Defaults to init_xavier.

Apply the GLU function to the input tensor.

update(optimiser)[source]¶

Update the layer parameters using the given optimizer.

Parameters:: optimiser (Optimiser) – The optimizer to use for updating parameters.

class GeLU(*args, approximate=False, **kwargs)[source]¶

Gaussian Error Linear Unit (GELU) activation function.

This layer applies the GELU function element-wise to the input tensor. GELU(x) ≈ 0.5x * (1 + tanh(sqrt(2/π) * (x + 0.044715 * x^3)))

Parameters:: approximate (bool) – Whether to use the approximate version of GELU. Defaults to False.

backward(grad)[source]¶

Compute the gradient of the GELU function.

forward(tensor)[source]¶

Apply the GELU function to the input tensor.

class ReLU[source]¶

Rectified Linear Unit (ReLU) activation function.

This layer applies the ReLU function element-wise to the input tensor. ReLU(x) = max(0, x)

Apply the ReLU function to the input tensor.

class Swish[source]¶

Swish activation function.

This layer applies the Swish function element-wise to the input tensor. Swish(x) = x * sigmoid(x)

Note: This implementation is equivalent to the SiLU activation function as it omits the bias term.

backward(grad)[source]¶

Compute the gradient of the Swish function.

forward(tensor)[source]¶

Apply the Swish function to the input tensor.