Prompt

Prompt is a core class that combines prompt format and prompt provider. It takes care of all the internal modifications that should be applied to model and tokenizer to insert the prompt provider and make it trainable. In particular, when you call Prompt.patch, the following things happen:

Underlying prompt format is initialized using the tokenizer. At this step, the <P>Initial manual prompt</P> patterns are tokenized, the prompt format is compiled, and initialization tokens and their positions are identified to be passed to prompt provider.
The prompt provider is initialized: it is given the initialization tokens from step 1 and after regular weight initialization overrides them with the embeddings corresponding to the passed intialization tokens.
The special tokens needed by prompt format are added to the tokenizer, and their ids are stored.
A special drop-in torch.nn.Embedding replacement is initialized with the prompt provider and prompt token ids from step 3.
The smart embedding layer from step 4 replaces the default embedding layer in the model.

Note

Steps 1 and 2 are skipped if the prompt was created with Prompt.from_pretrained.

Prompt class also implements sharing methods:

Prompt.save_pretrained - saves the trained prompt to disk ot pushes it to HF Hub
Prompt.push_to_hub - pushes the trained prompt to HF Hub
Prompt.from_pretrained - loads the prompt from disk or HF Hub

`class`
`ruprompts.prompt.Prompt`

`(format: BasePromptFormat, provider: BasePromptProvider, config: Optional[ruprompts.prompt.PromptConfig] = None)`

Core class combining PromptFormat and BasePromptProvider.

Implements saving/loading methods and HF hub integration.

Examples:

>>> p = Prompt(PromptFormat("<P*50>"), TensorPromptProvider())
>>> p.patch(model, tokenizer)
>>> trainer.train(model)
>>> p.save_pretrained("./checkpoint/path")
>>> p.push_to_hub("konodyuk/prompt_rugpt3large_detox")

>>> p = Prompt.from_pretrained("./checkpoint/path")
>>> p.patch(model, tokenizer)

>>> a = p(toxic="...")
>>> b = p.format(toxic="...")
>>> assert a == b

>>> p = Prompt.from_pretrained("konodyuk/prompt_rugpt3large_detox")
>>> ppln = pipeline("text-generation", model=model, tokenizer=tokenizer)
>>> ppln_prompt = pipeline("text-generation-with-prompt", prompt=p, model=model, tokenizer=tokenizer)
>>> a = ppln(p("text"))
>>> b = ppln_prompt("text")
>>> assert a == b

Parameters:

Name	Type	Description	Default
`format`	`BasePromptFormat`	Format used to format text for training and inference and for adding special tokens to the tokenizer.	required
`provider`	`BasePromptProvider`	Provider used to insert trainable embeddings to the positions defined by prompt format.	required

`classmethod`
`from_pretrained`

`(pretrained_prompt_name_or_path: Union[str, os.PathLike], as_safe: bool = False) -> Prompt`

Loads a pretrained prompt from disk or HF Hub.

Parameters:

Name	Type	Description	Default
`pretrained_prompt_name_or_path`	`Union[str, os.PathLike]`	Either a HF Hub identifier (`konodyuk/prompt_rugpt3large_detox`) or path to a directory containing prompt saved with `save_pretrained`.	required
`as_safe`	`bool`	Whether to load prompt format as `PromptFormat` or `PromptFormatSafe`.	`False`

Returns:

Type	Description
`Prompt`	`Prompt`: Pretrained prompt instance.

Source code in ruprompts/prompt.py

@classmethod
def from_pretrained(
    cls, pretrained_prompt_name_or_path: Union[str, os.PathLike], as_safe: bool = False
) -> "Prompt":
    """Loads a pretrained prompt from disk or HF Hub.

    Args:
        pretrained_prompt_name_or_path: Either a HF Hub identifier (`konodyuk/prompt_rugpt3large_detox`)
            or path to a directory containing prompt saved with :s:`ruprompts.prompt.Prompt.save_pretrained`.
        as_safe: Whether to load prompt format as :s:`ruprompts.prompt_format.PromptFormat`
            or :s:`ruprompts.prompt_format.PromptFormatSafe`.

    Returns:
        # !s!`ruprompts.prompt.Prompt`: Pretrained prompt instance.
    """
    if os.path.isdir(pretrained_prompt_name_or_path):
        prompt_file = os.path.join(pretrained_prompt_name_or_path, PROMPT_FILE_NAME)
        prompt_provider_file = os.path.join(
            pretrained_prompt_name_or_path, PROMPT_PROVIDER_FILE_NAME
        )
    else:
        prompt_file = _resolve_file(pretrained_prompt_name_or_path, PROMPT_FILE_NAME)
        prompt_provider_file = _resolve_file(
            pretrained_prompt_name_or_path, PROMPT_PROVIDER_FILE_NAME
        )

    with open(prompt_file, "r") as f:
        prompt_dict = json.load(f)

    prompt_format_cls = PromptFormat
    if as_safe:
        prompt_format_cls = PromptFormatSafe
    prompt_format = prompt_format_cls(**prompt_dict["format"])
    prompt_config = PromptConfig(**prompt_dict.get("config", {}))

    with open(prompt_provider_file, "rb") as f:
        prompt_provider_weights = torch.load(f)
    prompt_provider = TensorPromptProvider.from_pretrained(prompt_provider_weights)

    prompt = cls(format=prompt_format, provider=prompt_provider, config=prompt_config)
    prompt.ctx.is_initialized = True

    return prompt

`patch`

`(model: PreTrainedModel, tokenizer: PreTrainedTokenizerBase)`

Applies the prompt to model and tokenizer.

Injects the prompt by adding special prompt tokens to the tokenizer and switching input embedding layer of the model with prompt embedding layer that inserts embeddings from prompt provider into the positions defined by special tokens specified in prompt format.

Parameters:

Name	Type	Description	Default
`model`	`PreTrainedModel`	Model to patch.	required
`tokenizer`	`PreTrainedTokenizerBase`	Tokenizer to patch.	required

Source code in ruprompts/prompt.py

def patch(self, model: PreTrainedModel, tokenizer: PreTrainedTokenizerBase):
    """Applies the prompt to model and tokenizer.

    Injects the prompt by adding special prompt tokens to the tokenizer
    and switching input embedding layer of the model with prompt embedding
    layer that inserts embeddings from prompt provider into the positions
    defined by special tokens specified in prompt format.

    Args:
        model: Model to patch.
        tokenizer: Tokenizer to patch.
    """

    self.initialize(model, tokenizer)
    self.attach(model, tokenizer)

`save_pretrained`

`(save_directory: Union[str, os.PathLike], push_to_hub: bool = False, **kwargs)`

Save a prompt to a directory, so that it can be re-loaded using the Prompt.from_pretrained class method.

Parameters:

Name	Type	Description	Default
`save_directory`	`Union[str, os.PathLike]`	Directory to which to save. Will be created if it doesn't exist.	required
`push_to_hub`	`bool`	Whether or not to push your model to the Hugging Face model hub after saving it.	`False`
`**kwargs`		Additional key word arguments passed along to the `PushToHubMixin.push_to_hub` method.	`{}`

Source code in ruprompts/prompt.py

def save_pretrained(
    self, save_directory: Union[str, os.PathLike], push_to_hub: bool = False, **kwargs
):
    """
    Save a prompt to a directory, so that it can be re-loaded using the
    :c:`ruprompts.prompt.Prompt.from_pretrained` class method.

    Args:
        save_directory: Directory to which to save. Will be created if it doesn't exist.
        push_to_hub: Whether or not to push your model to the Hugging Face model hub after saving it.

        !!! warning

            Using `push_to_hub=True` will synchronize the repository you are pushing to with
            `save_directory`, which requires `save_directory` to be a local clone of the repo you are
            pushing to if it's an existing folder. Pass along `temp_dir=True` to use a temporary directory
            instead.

        **kwargs: Additional key word arguments passed along to the
            [`PushToHubMixin.push_to_hub`](https://huggingface.co/docs/transformers/main_classes/model#transformers.file_utils.PushToHubMixin.push_to_hub) method.
    """
    if not self.is_initialized:
        raise UserWarning("Prompt should be initialized to be saved")

    if os.path.isfile(save_directory):
        raise UserWarning("save_directory should be a directory, got file instead")

    if push_to_hub:
        commit_message = kwargs.pop("commit_message", None)
        repo = self._create_or_get_repo(save_directory, **kwargs)

    os.makedirs(save_directory, exist_ok=True)

    output_prompt_provider_file = os.path.join(save_directory, PROMPT_PROVIDER_FILE_NAME)
    self.provider.save_pretrained(output_prompt_provider_file)

    output_prompt_file = os.path.join(save_directory, PROMPT_FILE_NAME)
    json.dump(self.as_dict(), open(output_prompt_file, "w", encoding="utf-8"))

    if push_to_hub:
        self._push_to_hub(repo, commit_message=commit_message)

`class`
`ruprompts.prompt.MultiPrompt`

`(prompts: Optional[Dict[str, Union[ruprompts.prompt.Prompt, str]]] = None)`

Implements serving multiple prompts with one model.

Receives a dict of pretrained prompts with string keys. These keys are used to switch formats.

Parameters:

Name	Type	Description	Default
`prompts`	dict with string keys and `Prompt` values		required

Examples:

>>> mp = MultiPrompt({
...     "key1": "path/to/pretrained/prompt1",
...     "key2": "hfhub/prompt_id",
...     "key3": Prompt.from_pretrained("another_hfhub/prompt_id"),
...     'key4": Prompt.from_pretrained("/path/to/another/checkpoint")
... })
>>> mp.patch(model, tokenizer)
>>> ppln = pipeline("text-generation", model=model, tokenizer=tokenizer)
>>> ppln(mp(key="key2", "Text for second prompt"))
>>> ppln(mp(key="key3", text="Text for third prompt"))
>>> ppln(mp(key="key4", keyword="Keyword for fourth prompt"))

Prompt

class ruprompts.prompt.Prompt (format: BasePromptFormat, provider: BasePromptProvider, config: Optional[ruprompts.prompt.PromptConfig] = None)

classmethod from_pretrained (pretrained_prompt_name_or_path: Union[str, os.PathLike], as_safe: bool = False) -> Prompt

Prompt: Pretrained prompt instance.

patch (model: PreTrainedModel, tokenizer: PreTrainedTokenizerBase)

save_pretrained (save_directory: Union[str, os.PathLike], push_to_hub: bool = False, **kwargs)

class ruprompts.prompt.MultiPrompt (prompts: Optional[Dict[str, Union[ruprompts.prompt.Prompt, str]]] = None)

`class`
`ruprompts.prompt.Prompt`

`(format: BasePromptFormat, provider: BasePromptProvider, config: Optional[ruprompts.prompt.PromptConfig] = None)`

`classmethod`
`from_pretrained`

`(pretrained_prompt_name_or_path: Union[str, os.PathLike], as_safe: bool = False) -> Prompt`

`Prompt`: Pretrained prompt instance.

`patch`

`(model: PreTrainedModel, tokenizer: PreTrainedTokenizerBase)`

`save_pretrained`

`(save_directory: Union[str, os.PathLike], push_to_hub: bool = False, **kwargs)`

`class`
`ruprompts.prompt.MultiPrompt`

`(prompts: Optional[Dict[str, Union[ruprompts.prompt.Prompt, str]]] = None)`