activation¶
- class GLU(size, initialiser=<function init_xavier>, *args, **kwargs)[source]¶
Bases:
Layer
Gated Linear Unit (GLU) activation function.
This layer applies the GLU function to the input tensor. GLU(x) = x_left * sigmoid(x_right)
- Parameters:
size (int) – Size of the input tensor.
initialiser (callable) – Function to initialize the weights. Defaults to init_xavier.
- class GeLU(*args, approximate=False, **kwargs)[source]¶
Bases:
Layer
Gaussian Error Linear Unit (GELU) activation function.
This layer applies the GELU function element-wise to the input tensor. GELU(x) ≈ 0.5x * (1 + tanh(sqrt(2/π) * (x + 0.044715 * x^3)))
- Parameters:
approximate (bool) – Whether to use the approximate version of GELU. Defaults to False.
- CONST_1 = 0.7978845608028654¶
- CONST_2 = 0.044715¶
- class ReLU[source]¶
Bases:
Layer
Rectified Linear Unit (ReLU) activation function.
This layer applies the ReLU function element-wise to the input tensor. ReLU(x) = max(0, x)
- class Swish[source]¶
Bases:
Layer
Swish activation function.
This layer applies the Swish function element-wise to the input tensor. Swish(x) = x * sigmoid(x)
Note: This implementation is equivalent to the SiLU activation function as it omits the bias term.