models¶
GPT model implementation using the Tricycle framework.
This module defines the GPT class, which implements a GPT-style transformer model using components from the Tricycle framework.
- class GPT(config)[source]¶
Bases:
Layer
Generative Pre-trained Transformer (GPT) model implementation.
This class implements a GPT-style transformer model using components from the Tricycle framework. It includes token and position embeddings, multiple transformer blocks, and a final output layer.
- Parameters:
config (GPTConfig)
- embedding_dim¶
Dimension of the embedding space.
- Type:
int
- context_window¶
Size of the context window for position embeddings.
- Type:
int
- blocks¶
List of GPT2TransformerBlock instances.
- Type:
list
- layers¶
List of all layers in the model.
- Type:
list
- from_gpu()[source]¶
Moves all layers of the model from GPU back to CPU.
- Returns:
The current GPT instance.
- Return type:
- to_gpu(device=0)[source]¶
Moves all layers of the model to the specified GPU device.
- Parameters:
device (int, optional) – The GPU device number. Defaults to 0.
- Returns:
The current GPT instance.
- Return type: