models¶

GPT model implementation using the Tricycle framework.

This module defines the GPT class, which implements a GPT-style transformer model using components from the Tricycle framework.

class GPT(config)[source]¶

Bases: Layer

Generative Pre-trained Transformer (GPT) model implementation.

This class implements a GPT-style transformer model using components from the Tricycle framework. It includes token and position embeddings, multiple transformer blocks, and a final output layer.

Parameters:: config (GPTConfig)

embedding_dim¶

Dimension of the embedding space.

Type:: int

context_window¶

Size of the context window for position embeddings.

Type:: int

token_embedding¶

Embedding layer for input tokens.

Type:: Embedding

position_embedding¶

Embedding layer for positional information.

Type:: Embedding

input_dropout¶

Dropout layer applied to the input embeddings.

Type:: Dropout

blocks¶

List of GPT2TransformerBlock instances.

Type:: list

head¶

Final dense layer for output.

Type:: Dense

norm¶

Normalization layer.

Type:: LayerNorm or RMSNorm

layers¶

List of all layers in the model.

Type:: list

display()[source]¶: Prints a string representation of the model.

forward(tensor)[source]¶

Performs a forward pass through the GPT model.

Parameters:: tensor (Tensor) – Input tensor, expected to be one-hot encoded.
Returns:: Output tensor after passing through the model.
Return type:: Tensor
Raises:: AssertionError – If the input tensor doesn’t match the expected context window size.

from_gpu()[source]¶

Moves all layers of the model from GPU back to CPU.

Returns:: The current GPT instance.
Return type:: GPT

to_gpu(device=0)[source]¶

Moves all layers of the model to the specified GPU device.

Parameters:: device (int, optional) – The GPU device number. Defaults to 0.
Returns:: The current GPT instance.
Return type:: GPT

update(optimiser)[source]¶

Updates all layers in the model using the provided optimiser.

Parameters:: optimiser (Optimiser) – The optimiser to use for updating model parameters.
Returns:: The current GPT instance.
Return type:: GPT

zero_grad()[source]¶

Zeroes out the gradients of all layers in the model.

Returns:: The current GPT instance.
Return type:: GPT