activation

class GLU(size, initialiser=<function init_xavier>, *args, **kwargs)[source]

Bases: Layer

Gated Linear Unit (GLU) activation function.

This layer applies the GLU function to the input tensor. GLU(x) = x_left * sigmoid(x_right)

Parameters:
  • size (int) – Size of the input tensor.

  • initialiser (callable) – Function to initialize the weights. Defaults to init_xavier.

forward(x)[source]

Apply the GLU function to the input tensor.

Parameters:

x (Tensor) – Input tensor.

Returns:

Output tensor after applying GLU.

Return type:

Tensor

from_gpu()[source]

Move the layer parameters from GPU to CPU memory.

linear: Dense
to_gpu()[source]

Move the layer parameters to GPU memory.

update(optimiser)[source]

Update the layer parameters using the given optimizer.

Parameters:

optimiser (Optimiser) – The optimizer to use for updating parameters.

zero_grad()[source]

Reset the gradients of the layer parameters to zero.

class GeLU(*args, approximate=False, **kwargs)[source]

Bases: Layer

Gaussian Error Linear Unit (GELU) activation function.

This layer applies the GELU function element-wise to the input tensor. GELU(x) ≈ 0.5x * (1 + tanh(sqrt(2/π) * (x + 0.044715 * x^3)))

Parameters:

approximate (bool) – Whether to use the approximate version of GELU. Defaults to False.

CONST_1 = 0.7978845608028654
CONST_2 = 0.044715
backward(grad)[source]

Compute the gradient of the GELU function.

Parameters:

grad (Tensor) – Upstream gradient.

Returns:

Gradient with respect to the input.

Return type:

Tensor

forward(tensor)[source]

Apply the GELU function to the input tensor.

Parameters:

tensor (Tensor) – Input tensor.

Returns:

Output tensor after applying GELU.

Return type:

Tensor

class ReLU[source]

Bases: Layer

Rectified Linear Unit (ReLU) activation function.

This layer applies the ReLU function element-wise to the input tensor. ReLU(x) = max(0, x)

forward(x)[source]

Apply the ReLU function to the input tensor.

Parameters:

x (Tensor) – Input tensor.

Returns:

Output tensor after applying ReLU.

Return type:

Tensor

class Swish[source]

Bases: Layer

Swish activation function.

This layer applies the Swish function element-wise to the input tensor. Swish(x) = x * sigmoid(x)

Note: This implementation is equivalent to the SiLU activation function as it omits the bias term.

backward(grad)[source]

Compute the gradient of the Swish function.

Parameters:

grad (Tensor) – Upstream gradient.

Returns:

Gradient with respect to the input.

Return type:

Tensor

forward(tensor)[source]

Apply the Swish function to the input tensor.

Parameters:

tensor (Tensor) – Input tensor.

Returns:

Output tensor after applying Swish.

Return type:

Tensor