Index _ | A | B | C | D | E | F | G | H | I | J | L | M | N | O | P | R | S | T | U | V | W | X | Z _ _dict (WeakSet attribute) _grad (Attention attribute) (BinarySubtract attribute) (CrossEntropy attribute) (Sigmoid attribute) (Softmax attribute) _grad_1 (BinaryMax attribute) (BinaryMin attribute) _grad_2 (BinaryMax attribute) (BinaryMin attribute) _id (Tensor attribute) _idx (CausalLMDataset attribute) _index (Dataset attribute) _indices (Dataset attribute) _is_bigger_1 (BinaryMax attribute) _is_bigger_2 (BinaryMax attribute) _is_smaller_1 (BinaryMin attribute) _is_smaller_2 (BinaryMin attribute) _log_softmax_pred (CrossEntropy attribute) _out (BinaryAdd attribute) (BinaryMax attribute) (BinaryMin attribute) (BinarySubtract attribute) (CrossEntropy attribute) (Sigmoid attribute) (Softmax attribute) _parents (Tensor attribute) _to_tensor (InfiniteBatchDataset attribute) _y_true (CrossEntropy attribute) A activation_fn (DebugConfig attribute) (FeedForward attribute), [1] (GPT2TransformerBlock attribute), [1] (GPTConfig attribute), [1] (MLPBlock attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) AdamW (class in tricycle.optimisers) add() (WeakSet method) args (Tensor attribute) array (Tensor attribute) as_tensor (CausalLMDataset attribute) Attention (class in tricycle.attention) B back_fn() (Batch method) (Embedding method) (LayerNorm method) (Reshape method) (RMSNorm method) (Softmax method) (Split method) (UnaryCos method) (UnaryExp method) (UnaryLog method) (UnaryMask method) (UnaryMax method) (UnaryMin method) (UnaryMultiply method) (UnaryPower method) (UnarySin method) (UnarySum method) (Unbatch method) back_fn_1() (BinaryMax method) (BinaryMin method) back_fn_2() (BinaryMax method) (BinaryMin method) (BinarySubtract method) back_fns (Tensor attribute) backward() (Attention method) (CrossEntropy method) (GeLU method) (Mean method) (MeanSquaredError method) (RotaryEncode method) (Sigmoid method) (Swish method) (Tensor method) Batch (class in tricycle.unary) batch() (CausalLMDataset method) (in module tricycle.unary) batch_indices (CausalLMDataset attribute) batch_size (CausalLMDataset attribute) (DebugConfig attribute) (GPTConfig attribute), [1] (InfiniteBatchDataset attribute) (ShakespeareConfig attribute) (SmolGPTConfig attribute) beta (LayerNorm attribute) beta1 (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) beta2 (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) beta_back_fn() (LayerNorm method) betas (AdamW attribute) BinaryAdd (class in tricycle.binary) BinaryDivide (class in tricycle.binary) BinaryMax (class in tricycle.binary) BinaryMin (class in tricycle.binary) BinaryMultiply (class in tricycle.binary) BinarySubtract (class in tricycle.binary) blocks (GPT attribute) BPETokeniser (class in tricycle.tokeniser) build_mask() (in module tricycle.attention) (in module tricycle.blocks) C CausalLMDataset (class in tricycle.dataset) chars (ShakespeareChar attribute), [1] close_to() (Tensor method) CodeParrot (class in tricycle_datasets.codeparrot) coef (CosineSchedule attribute) combined_subscript (EinsumBackOp attribute) CONST_1 (GeLU attribute) CONST_2 (GeLU attribute) context_window (Attention attribute) (CausalLMDataset attribute) (DebugConfig attribute) (GPT attribute) (GPTConfig attribute), [1] (MultiHeadSelfAttention attribute), [1] (RotaryEncode attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) copy() (Dataset method) CosineSchedule (class in tricycle.scheduler) count_pairs() (in module tricycle.tokeniser) CrossEntropy (class in tricycle.loss) D Dataset (class in tricycle.dataset) (class in tricycle.utils) DebugConfig (class in tricycle.configs) decode() (BPETokeniser method) (CodeParrot method) (FineWeb method) (Shakespeare method) (ShakespeareChar method) Dense (class in tricycle.layers) device (CausalLMDataset attribute) device_idx (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) dict() (GPTConfig method) diff (MeanSquaredError attribute) discard() (WeakSet method) display() (GPT method) divisor (MeanSquaredError attribute) download() (Shakespeare method) (ShakespeareChar method) Dropout (class in tricycle.layers) dropout_prob (FeedForward attribute), [1] (MLPBlock attribute), [1] dtype (Tensor property) E Einsum (class in tricycle.einsum) einsum() (Tensor method) EinsumBackOp (class in tricycle.einsum) Embedding (class in tricycle.layers) embedding_dim (Attention attribute) (DebugConfig attribute) (FeedForward attribute), [1] (GPT attribute) (GPT2TransformerBlock attribute), [1] (GPTConfig attribute), [1] (MLPBlock attribute), [1] (MultiHeadSelfAttention attribute), [1] (RMSNorm attribute) (RotaryEncode attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) encode() (BPETokeniser method) (CodeParrot method) (FineWeb method) (Shakespeare method) (ShakespeareChar method) eps (AdamW attribute) (LayerNorm attribute) eval_interval (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) eval_steps (DebugConfig attribute) (ShakespeareConfig attribute) (SmolGPTConfig attribute) expansion_ratio (DebugConfig attribute) (FeedForward attribute), [1] (GPT2TransformerBlock attribute), [1] (GPTConfig attribute), [1] (MLPBlock attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) F FeedForward (class in tricycle.blocks) FineWeb (class in tricycle_datasets.fineweb) forward() (Attention method) (Batch method) (BinaryAdd method) (BinaryDivide method) (BinaryMax method) (BinaryMin method) (BinaryMultiply method) (BinarySubtract method) (CrossEntropy method) (Dense method) (Dropout method) (Embedding method) (FeedForward method) (GeLU method) (GLU method) (GPT method) (GPT2TransformerBlock method) (Layer method) (LayerNorm method) (Mean method) (MeanSquaredError method) (MLPBlock method) (MultiHeadSelfAttention method) (Op method) (ReLU method) (Repeat method) (Reshape method) (RMSNorm method) (RotaryEncode method) (Sequential method) (Sigmoid method) (Softmax method) (Split method) (Swish method) (UnaryAdd method) (UnaryCos method) (UnaryDivide method) (UnaryExp method) (UnaryLog method) (UnaryMask method) (UnaryMax method) (UnaryMin method) (UnaryMultiply method) (UnaryPower method) (UnarySin method) (UnarySquareRoot method) (UnarySubtract method) (UnarySum method) (Unbatch method) freqs_cos (RotaryEncode attribute) freqs_sin (RotaryEncode attribute) from_batched() (Tensor method) from_gpu() (Attention method) (CausalLMDataset method) (Dense method) (Embedding method) (FeedForward method) (GLU method) (GPT method) (GPT2TransformerBlock method) (Layer method) (LayerNorm method) (MLPBlock method) (MultiHeadSelfAttention method) (RMSNorm method) (Sequential method) (Tensor method) from_size (Dense attribute), [1] from_split() (Subscript class method) G gamma (LayerNorm attribute) gamma_back_fn() (LayerNorm method) GeLU (class in tricycle.activation) generate() (Shakespeare method) (ShakespeareChar method) GLU (class in tricycle.activation) GPT (class in tricycle.models) GPT2TransformerBlock (class in tricycle.blocks) GPTConfig (class in tricycle.configs) GPUDisabledException grad (Tensor attribute) grad_back_fn() (Dense method) gradient_accumulation_steps (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) H head (GPT attribute) head_size (RotaryEncode attribute) I idx (EinsumBackOp attribute) InfiniteBatchDataset (class in tricycle.dataset) init_xavier() (in module tricycle.initialisers) input_dropout (GPT attribute) input_dropout_prob (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) inputs (Dataset attribute) (Subscript attribute), [1] is_batch (CausalLMDataset attribute) is_batched (Batch attribute) (InfiniteBatchDataset attribute), [1] (Tensor attribute) (Unbatch attribute) is_infinite (InfiniteBatchDataset attribute), [1] J join() (Subscript static method) L Layer (class in tricycle.layers) LayerNorm (class in tricycle.layers) layers (GPT attribute) (Layer attribute), [1] (Sequential attribute) learning_rate (AdamW attribute) (StochasticGradientDescent attribute) left_tensors (EinsumBackOp attribute) linear (GLU attribute) linear_1 (FeedForward attribute), [1] (MLPBlock attribute), [1] linear_2 (FeedForward attribute), [1] (MLPBlock attribute), [1] linear_dropout_prob (DebugConfig attribute) (GPT2TransformerBlock attribute), [1] (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) linear_schedule() (in module tricycle.scheduler) load() (BPETokeniser class method) log_memory_and_time() (in module tricycle.utils) log_softmax() (CrossEntropy method) logger (StochasticGradientDescent attribute) loss_scale_factor (TricycleContext attribute), [1] M mask (Attention attribute) masked_fill() (in module tricycle.blocks) max_learning_rate (CosineSchedule attribute) (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) Mean (class in tricycle.ops) mean() (Tensor method) MeanSquaredError (class in tricycle.loss) merges (BPETokeniser attribute) min_learning_rate (CosineSchedule attribute) (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) MIN_TOKENS (BPETokeniser attribute) mlflow_enabled (DebugConfig attribute) (ShakespeareConfig attribute) (SmolGPTConfig attribute) mlflow_experiment_name (GPTConfig attribute), [1] mlflow_tracking_uri (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) MLPBlock (class in tricycle.blocks) module tricycle.activation tricycle.attention tricycle.binary tricycle.blocks tricycle.configs tricycle.context tricycle.dataset tricycle.einsum tricycle.exceptions tricycle.functions tricycle.initialisers tricycle.layers tricycle.loss tricycle.models tricycle.ops tricycle.optimisers tricycle.reduce tricycle.scheduler tricycle.tensor tricycle.tokeniser tricycle.unary tricycle.utils tricycle.weakset tricycle_datasets.codeparrot tricycle_datasets.fineweb tricycle_datasets.shakespeare momentum (AdamW attribute) (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) (StochasticGradientDescent attribute) momentum_store (StochasticGradientDescent attribute) most_common_pair() (BPETokeniser method) MultiHeadSelfAttention (class in tricycle.blocks) N n_heads (Attention attribute) (DebugConfig attribute) (GPTConfig attribute), [1] (MultiHeadSelfAttention attribute), [1] (RotaryEncode attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) n_layers (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) n_steps (CosineSchedule attribute) n_tokens_to_generate (SmolGPTConfig attribute) name (Dense attribute), [1] (Tensor attribute) ndim (Tensor property) None (GPUDisabledException attribute) norm (GPT attribute) norm_fn (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) nothing() (in module tricycle.unary) numpy() (Tensor method) O on_gpu (Tensor property) Op (class in tricycle.ops) optimal_n_tokens() (in module tricycle.utils) Optimiser (class in tricycle.optimisers) output (Subscript attribute), [1] outputs (Dataset attribute) P pairs (BPETokeniser attribute) position_embedding (GPT attribute) precompute_constants() (RotaryEncode method) prepare_data() (in module tricycle_datasets.codeparrot) (in module tricycle_datasets.fineweb) probability (Dropout attribute) R r_squared() (in module tricycle.utils) raw_data_path (Shakespeare attribute), [1] (ShakespeareChar attribute), [1] REALLY_SMALL_NUMBER (RMSNorm attribute) (UnaryLog attribute) ReduceMax (class in tricycle.reduce) ReduceMin (class in tricycle.reduce) ReLU (class in tricycle.activation) Repeat (class in tricycle.ops) repeat() (Tensor method) replace_pair() (BPETokeniser method) (in module tricycle.tokeniser) requires_grad (Tensor attribute) reset() (Dataset method) Reshape (class in tricycle.ops) reshape() (Tensor method) residual_dropout_prob (DebugConfig attribute) (GPT2TransformerBlock attribute), [1] (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) right_tensors (EinsumBackOp attribute) RMSNorm (class in tricycle.layers) RotaryEncode (class in tricycle.layers) S sample_size (DebugConfig attribute) (ShakespeareConfig attribute) save() (BPETokeniser method) select_backend() (in module tricycle.tensor) Sequential (class in tricycle.layers) Shakespeare (class in tricycle_datasets.shakespeare) ShakespeareChar (class in tricycle_datasets.shakespeare) ShakespeareConfig (class in tricycle.configs) shape (Tensor property) shapes_match() (in module tricycle.utils) should_one_hot_encode (CausalLMDataset attribute) shuffle() (CausalLMDataset method) (Dataset method) Sigmoid (class in tricycle.functions) SmolGPTConfig (class in tricycle.configs) smooth() (in module tricycle.utils) Softmax (class in tricycle.functions) Split (class in tricycle.ops) split() (Subscript method) (Tensor method) square_momentum (AdamW attribute) step() (AdamW method) (CosineSchedule method) steps (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) StochasticGradientDescent (class in tricycle.optimisers) Subscript (class in tricycle.einsum) subscript (Einsum attribute), [1] (EinsumBackOp attribute) (Subscript attribute), [1] sum() (Tensor method) Swish (class in tricycle.activation) T Tensor (class in tricycle.tensor) tensors (EinsumBackOp attribute) (Layer attribute), [1] theta (RotaryEncode attribute), [1] timestep (AdamW attribute) to_batched() (Tensor method) to_gpu() (Attention method) (CausalLMDataset method) (Dense method) (Embedding method) (FeedForward method) (GLU method) (GPT method) (GPT2TransformerBlock method) (Layer method) (LayerNorm method) (MLPBlock method) (MultiHeadSelfAttention method) (RMSNorm method) (Sequential method) (Tensor method) to_size (Dense attribute), [1] to_tensor() (CausalLMDataset method) (Dataset method) (InfiniteBatchDataset method) token_embedding (GPT attribute) token_path (CodeParrot attribute), [1] (FineWeb attribute), [1] (Shakespeare attribute), [1] tokenise_document() (in module tricycle_datasets.codeparrot) (in module tricycle_datasets.fineweb) tokenise_ints() (BPETokeniser method) tokeniser (Shakespeare attribute) tokeniser_string (CodeParrot attribute), [1] (FineWeb attribute), [1] tokens (CausalLMDataset attribute) (CodeParrot attribute), [1] (FineWeb attribute), [1] (Shakespeare attribute), [1] total_steps (CosineSchedule attribute) train() (BPETokeniser method) train_ints() (BPETokeniser method) tricycle.activation module tricycle.attention module tricycle.binary module tricycle.blocks module tricycle.configs module tricycle.context module tricycle.dataset module tricycle.einsum module tricycle.exceptions module tricycle.functions module tricycle.initialisers module tricycle.layers module tricycle.loss module tricycle.models module tricycle.ops module tricycle.optimisers module tricycle.reduce module tricycle.scheduler module tricycle.tensor module tricycle.tokeniser module tricycle.unary module tricycle.utils module tricycle.weakset module tricycle_datasets.codeparrot module tricycle_datasets.fineweb module tricycle_datasets.shakespeare module TricycleContext (class in tricycle.context) type_ (BPETokeniser attribute) U UnaryAdd (class in tricycle.unary) UnaryCos (class in tricycle.unary) UnaryDivide (class in tricycle.unary) UnaryExp (class in tricycle.unary) UnaryLog (class in tricycle.unary) UnaryMask (class in tricycle.unary) UnaryMax (class in tricycle.unary) UnaryMin (class in tricycle.unary) UnaryMultiply (class in tricycle.unary) UnaryPower (class in tricycle.unary) UnarySin (class in tricycle.unary) UnarySquareRoot (class in tricycle.unary) UnarySubtract (class in tricycle.unary) UnarySum (class in tricycle.unary) Unbatch (class in tricycle.unary) unbatch() (CausalLMDataset method) (in module tricycle.unary) unique_input_indices (Subscript property) update() (Dense method) (Embedding method) (FeedForward method) (GLU method) (GPT method) (GPT2TransformerBlock method) (Layer method) (LayerNorm method) (MLPBlock method) (MultiHeadSelfAttention method) (RMSNorm method) (Sequential method) update_weight() (AdamW method) (StochasticGradientDescent method) url (CodeParrot attribute), [1] (Shakespeare attribute), [1] (ShakespeareChar attribute), [1] use_mixed_precision (TricycleContext attribute), [1] UseMixedPrecision (class in tricycle.utils) V vocab (BPETokeniser attribute) vocab_size (BPETokeniser attribute) (CausalLMDataset attribute) (CodeParrot attribute), [1] (DebugConfig attribute) (Embedding attribute) (FineWeb attribute), [1] (GPTConfig attribute), [1] (Shakespeare attribute), [1] (ShakespeareChar attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) W warmup_steps (CosineSchedule attribute) (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) WeakSet (class in tricycle.weakset) weight_back_fn() (Dense method) (RMSNorm method) weight_decay (AdamW attribute) (DebugConfig attribute) (GPTConfig attribute), [1] (ShakespeareConfig attribute) (SmolGPTConfig attribute) (StochasticGradientDescent attribute) weights (Dense attribute), [1] (Embedding attribute) (RMSNorm attribute) X xp (Tensor property) Z zero_grad() (Dense method) (Embedding method) (FeedForward method) (GLU method) (GPT method) (GPT2TransformerBlock method) (Layer method) (LayerNorm method) (MLPBlock method) (MultiHeadSelfAttention method) (RMSNorm method) (Sequential method) (Tensor method)