transformer#

Attention Is All You Need

Classes

Decoder(*args, **kwargs)

Transformer Decoder that supports both one-time and distributed decoding strategies.

DecoderLayer(*args, **kwargs)

Encoder(*args, **kwargs)

Transformer([predict_sequence_length, config])

Transformer model

TransformerBlock(*args, **kwargs)

Basic Transformer block with attention and feed-forward layers.

TransformerConfig([hidden_size, num_layers, ...])

Initializes the configuration for the Transformer model with the specified parameters.