models

GPT model implementation using the Tricycle framework.

This module defines the GPT class, which implements a GPT-style transformer model using components from the Tricycle framework.

class GPT(config)[source]

Bases: Layer

Generative Pre-trained Transformer (GPT) model implementation.

This class implements a GPT-style transformer model using components from the Tricycle framework. It includes token and position embeddings, multiple transformer blocks, and a final output layer.

Parameters:

config (GPTConfig)

embedding_dim

Dimension of the embedding space.

Type:

int

context_window

Size of the context window for position embeddings.

Type:

int

token_embedding

Embedding layer for input tokens.

Type:

Embedding

position_embedding

Embedding layer for positional information.

Type:

Embedding

input_dropout

Dropout layer applied to the input embeddings.

Type:

Dropout

blocks

List of GPT2TransformerBlock instances.

Type:

list

head

Final dense layer for output.

Type:

Dense

norm

Normalization layer.

Type:

LayerNorm or RMSNorm

layers

List of all layers in the model.

Type:

list

display()[source]

Prints a string representation of the model.

forward(tensor)[source]

Performs a forward pass through the GPT model.

Parameters:

tensor (Tensor) – Input tensor, expected to be one-hot encoded.

Returns:

Output tensor after passing through the model.

Return type:

Tensor

Raises:

AssertionError – If the input tensor doesn’t match the expected context window size.

from_gpu()[source]

Moves all layers of the model from GPU back to CPU.

Returns:

The current GPT instance.

Return type:

GPT

to_gpu(device=0)[source]

Moves all layers of the model to the specified GPU device.

Parameters:

device (int, optional) – The GPU device number. Defaults to 0.

Returns:

The current GPT instance.

Return type:

GPT

update(optimiser)[source]

Updates all layers in the model using the provided optimiser.

Parameters:

optimiser (Optimiser) – The optimiser to use for updating model parameters.

Returns:

The current GPT instance.

Return type:

GPT

zero_grad()[source]

Zeroes out the gradients of all layers in the model.

Returns:

The current GPT instance.

Return type:

GPT