layers¶
- class Dense(from_size, to_size, initialiser=<function init_xavier>, name=None)[source]¶
Bases:
Layer
A dense (fully connected) layer.
- Parameters:
from_size (int)
to_size (int)
name (str | None)
- from_size¶
Input size.
- Type:
int
- to_size¶
Output size.
- Type:
int
- name¶
Optional name for the layer.
- Type:
str | None
- from_size: int¶
- to_gpu(device=0)[source]¶
Move the layer to GPU.
- Parameters:
device (int) – The GPU device number. Defaults to 0.
- Returns:
The layer itself.
- Return type:
- to_size: int¶
- update(optimiser)[source]¶
Update the weights using the given optimiser.
- Parameters:
optimiser (Optimiser) – The optimiser to use for updating weights.
- class Dropout(probability)[source]¶
Bases:
Layer
A dropout layer for regularization.
- Parameters:
probability (float)
- probability¶
The probability of dropping out a unit.
- Type:
float
- class Embedding(from_size, to_size, name=None, initialiser=<function init_xavier>)[source]¶
Bases:
Layer
Embedding layer that converts indices to dense vectors.
This layer implements a lookup-based embedding, converting input indices to dense vector representations.
- Parameters:
from_size (int)
to_size (int)
name (str | None)
- vocab_size¶
Size of the vocabulary (number of embeddings).
- Type:
int
- from_gpu()[source]¶
Moves the embedding weights from GPU to CPU.
- Returns:
The embedding layer with weights moved to CPU.
- Return type:
- to_gpu(device=0)[source]¶
Moves the embedding weights to the GPU.
- Parameters:
device (int) – The GPU device number.
- Returns:
The embedding layer with weights moved to GPU.
- Return type:
- class Layer[source]¶
Bases:
ABC
A generic Layer object, representing a single operation in a neural network.
- abstract forward(tensor)[source]¶
Perform the forward pass of the layer.
- Parameters:
tensor (Tensor) – Input tensor.
- Raises:
NotImplementedError – This method should be implemented by subclasses.
- to_gpu(device=0)[source]¶
Move the layer to GPU.
- Parameters:
device (int) – The GPU device number. Defaults to 0.
- class LayerNorm(embedding_dim, eps=1e-05)[source]¶
Bases:
Layer
A Layer Normalization layer.
- Parameters:
embedding_dim (int)
- eps¶
A small value added for numerical stability.
- Type:
float
- to_gpu(device=0)[source]¶
Move the layer to GPU.
- Parameters:
device (int) – The GPU device number. Defaults to 0.
- Returns:
The layer itself.
- Return type:
- class RMSNorm(embedding_dim, REALLY_SMALL_NUMBER=0.0001)[source]¶
Bases:
Layer
Root Mean Square Layer Normalization.
This class implements RMSNorm, a normalization technique that normalizes the inputs using the root mean square.
- Parameters:
embedding_dim (int)
- embedding_dim¶
The size of the input’s last dimension.
- Type:
int
- REALLY_SMALL_NUMBER¶
A small constant to avoid division by zero.
- Type:
float
- from_gpu()[source]¶
Moves the layer’s parameters from GPU to CPU.
- Returns:
The layer with parameters moved to CPU.
- Return type:
- to_gpu(device=0)[source]¶
Moves the layer’s parameters to the GPU.
- Parameters:
device (int) – The GPU device number.
- Returns:
The layer with parameters moved to GPU.
- Return type:
- update(optimiser)[source]¶
Updates the layer’s parameters using the given optimizer.
- Parameters:
optimiser (Optimiser) – The optimizer to use for updating parameters.
- class RotaryEncode(embedding_dim, n_heads, context_window, theta=None)[source]¶
Bases:
Layer
Applies rotary positional encoding to a key and query.
This layer implements the Rotary Position Embedding (RoPE) technique for transformer models.
- Parameters:
embedding_dim (int)
n_heads (int)
context_window (int)
theta (float)
- embedding_dim¶
The size of the embedding dimension.
- Type:
int
- n_heads¶
The number of attention heads.
- Type:
int
- context_window¶
The size of the context window.
- Type:
int
- theta¶
The base value for frequency calculation.
- Type:
float
- head_size¶
The size of each attention head.
- Type:
int
- freqs_cos¶
Precomputed cosine of frequencies.
- Type:
ArrayLike
- freqs_sin¶
Precomputed sine of frequencies.
- Type:
ArrayLike
- context_window: int¶
- embedding_dim: int¶
- n_heads: int¶
- precompute_constants()[source]¶
Precomputes the cosine and sine of frequencies for rotary encoding.
- Returns:
Precomputed cosine and sine values.
- Return type:
tuple[ArrayLike, ArrayLike]
- theta: float = 10000.0¶
- class Sequential(*layers)[source]¶
Bases:
Layer
A sequential container of layers.
This class allows for the creation of a sequential chain of layers, where the output of each layer is fed as input to the next layer.
- Parameters:
layers (Sequence[Layer])
- layers¶
A tuple of Layer objects in the sequential chain.
- Type:
tuple
- to_gpu(device=0)[source]¶
Moves all layers to the GPU.
- Parameters:
device (int) – The GPU device number.