GPTConfig#
- class tfts.models.gpt.GPTConfig(hidden_size: int = 64, num_layers: int = 2, num_attention_heads: int = 4, ffn_intermediate_size: int = 256, hidden_act: str = 'gelu', hidden_dropout_prob: float = 0.0, attention_probs_dropout_prob: float = 0.0, max_position_embeddings: int = 512, type_vocab_size: int = 2, initializer_range: float = 0.02, layer_norm_eps: float = 1e-12, pad_token_id: int = 0, positional_type: str = 'absolute', use_cache: bool = True, dense_units: Tuple[int] = (512, 1024), classifier_dropout: float | None = None, **kwargs: Dict[str, object])[source]#
Bases:
BaseConfigConfiguration class for GPT decoder model, inheriting from BaseConfig.
- Parameters:
hidden_size – The size of the hidden layers. Default is 64.
num_hidden_layers – The number of hidden layers in the transformer encoder. Default is 2.
num_attention_heads – The number of attention heads in each attention layer. Default is 4.
ffn_intermediate_size – The size of the intermediate (feed-forward) layer. Default is 256.
hidden_act – The activation function for hidden layers. Default is “gelu”.
hidden_dropout_prob – The dropout probability for hidden layers. Default is 0.1.
attention_probs_dropout_prob – The dropout probability for attention probabilities. Default is 0.1.
max_position_embeddings – The maximum length of the input sequences. Default is 512.
type_vocab_size – The vocabulary size for token types (usually 2). Default is 2.
initializer_range – The standard deviation for weight initialization. Default is 0.02.
layer_norm_eps – The epsilon value for layer normalization. Default is 1e-12.
pad_token_id – The ID for the padding token. Default is 0.
positional_type – The type of position embedding (“absolute” or “relative”). Default is “absolute”.
use_cache – Whether to use the cache during inference. Default is True.
classifier_dropout – Dropout probability for the classifier layer. Default is None.
**kwargs – Additional keyword arguments passed to the parent BaseConfig class.
- Inherited-members:
Methods
from_dict(config_dict)from_json(json_file)from_pretrained(pretrained_model_name_or_path)save_pretrained(save_directory)to_dict()to_json(json_file)update(config_dict)Attributes
attribute_mapmodel_type