tooluniverse.tool_finder_embedding module¶
- class tooluniverse.tool_finder_embedding.SentenceTransformer(model_name_or_path: str | None = None, modules: Iterable[Module] | None = None, device: str | None = None, prompts: dict[str, str] | None = None, default_prompt_name: str | None = None, similarity_fn_name: str | SimilarityFunction | None = None, cache_folder: str | None = None, trust_remote_code: bool = False, revision: str | None = None, local_files_only: bool = False, token: str | bool | None = None, use_auth_token: str | bool | None = None, truncate_dim: int | None = None, model_kwargs: dict[str, Any] | None = None, tokenizer_kwargs: dict[str, Any] | None = None, config_kwargs: dict[str, Any] | None = None, model_card_data: SentenceTransformerModelCardData | None = None, backend: Literal['torch', 'onnx', 'openvino'] = 'torch')[source][source]¶
Bases:
Sequential
,FitMixin
,PeftAdapterMixin
Loads or creates a SentenceTransformer model that can be used to map sentences / text to embeddings.
- Parameters:
model_name_or_path (str, optional) – If it is a filepath on disk, it loads the model from that path. If it is not a path, it first tries to download a pre-trained SentenceTransformer model. If that fails, tries to construct a model from the Hugging Face Hub with that name.
modules (Iterable[nn.Module], optional) – A list of torch Modules that should be called sequentially, can be used to create custom SentenceTransformer models from scratch.
device (str, optional) – Device (like “cuda”, “cpu”, “mps”, “npu”) that should be used for computation. If None, checks if a GPU can be used.
prompts (Dict[str, str], optional) – A dictionary with prompts for the model. The key is the prompt name, the value is the prompt text. The prompt text will be prepended before any text to encode. For example:
{"query": "query: ", "passage": "passage: "}
or {“clustering”: “Identify the main category based on the titles in “}.default_prompt_name (str, optional) – The name of the prompt that should be used by default. If not set, no prompt will be applied.
similarity_fn_name (str or SimilarityFunction, optional) – The name of the similarity function to use. Valid options are “cosine”, “dot”, “euclidean”, and “manhattan”. If not set, it is automatically set to “cosine” if
similarity
orsimilarity_pairwise
are called whilemodel.similarity_fn_name
is stillNone
.cache_folder (str, optional) – Path to store models. Can also be set by the SENTENCE_TRANSFORMERS_HOME environment variable.
trust_remote_code (bool, optional) – Whether or not to allow for custom models defined on the Hub in their own modeling files. This option should only be set to True for repositories you trust and in which you have read the code, as it will execute code present on the Hub on your local machine.
revision (str, optional) – The specific model version to use. It can be a branch name, a tag name, or a commit id, for a stored model on Hugging Face.
local_files_only (bool, optional) – Whether or not to only look at local files (i.e., do not try to download the model).
token (bool or str, optional) – Hugging Face authentication token to download private models.
use_auth_token (bool or str, optional) – Deprecated argument. Please use
token
instead.truncate_dim (int, optional) – The dimension to truncate sentence embeddings to. Defaults to None.
model_kwargs (Dict[str, Any], optional) –
Additional model configuration parameters to be passed to the Hugging Face Transformers model. Particularly useful options are:
`torch_dtype``
: Override the defaulttorch.dtype
and load the model under a specific ``dtype`. The different options are:1.
`torch.float16``
,``torch.bfloat16``
or``torch.float`
: load in a specified`dtype``
, ignoring the model’s``config.torch_dtype`
if one exists. If not specified - the model will get loaded in`torch.float`
(fp32).2.
`"auto"``
- A``torch_dtype``
entry in the``config.json`
file of the model will be attempted to be used. If this entry isn’t found then next check the`dtype`
of the first weight in the checkpoint that’s of a floating point type and use that as`dtype`
. This will load the model using the`dtype`
it was saved in at the end of the training. It can’t be used as an indicator of how the model was trained. Since it could be trained in one of half precision dtypes, but saved in fp32.`attn_implementation`
: The attention implementation to use in the model (if relevant). Can be any of"eager"
(manual implementation of the attention),"sdpa"
(using F.scaled_dot_product_attention), or"flash_attention_2"
(usingDao-AILab/flash-attention <https://github.com/Dao-AILab/flash-attention>``_). By default, if available, SDPA will be used for torch>=2.1.1. The default is otherwise the manual ``"eager"
implementation.`provider`
: If backend is “onnx”, this is the provider to use for inference, for example “CPUExecutionProvider”, “CUDAExecutionProvider”, etc. See https://onnxruntime.ai/docs/execution-providers/ for all ONNX execution providers.`file_name`
: If backend is “onnx” or “openvino”, this is the file name to load, useful for loading optimized or quantized ONNX or OpenVINO models.`export`
: If backend is “onnx” or “openvino”, then this is a boolean flag specifying whether this model should be exported to the backend. If not specified, the model will be exported only if the model repository or directory does not already contain an exported model.
See the PreTrainedModel.from_pretrained documentation for more details.
tokenizer_kwargs (Dict[str, Any], optional) – Additional tokenizer configuration parameters to be passed to the Hugging Face Transformers tokenizer. See the AutoTokenizer.from_pretrained documentation for more details.
config_kwargs (Dict[str, Any], optional) – Additional model configuration parameters to be passed to the Hugging Face Transformers config. See the AutoConfig.from_pretrained documentation for more details.
model_card_data (:class:
~sentence_transformers.model_card.SentenceTransformerModelCardData
, optional) – A model card data object that contains information about the model. This is used to generate a model card when saving the model. If not set, a default model card data object is created.backend (str) – The backend to use for inference. Can be one of “torch” (default), “onnx”, or “openvino”. See https://sbert.net/docs/sentence_transformer/usage/efficiency.html for benchmarking information on the different backends.
Example
from sentence_transformers import SentenceTransformer # Load a pre-trained SentenceTransformer model model = SentenceTransformer('all-mpnet-base-v2') # Encode some texts sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium.", ] embeddings = model.encode(sentences) print(embeddings.shape) # (3, 768) # Get the similarity scores between all sentences similarities = model.similarity(embeddings, embeddings) print(similarities) # tensor([[1.0000, 0.6817, 0.0492], # [0.6817, 1.0000, 0.0421], # [0.0492, 0.0421, 1.0000]])
- __init__(model_name_or_path: str | None = None, modules: Iterable[Module] | None = None, device: str | None = None, prompts: dict[str, str] | None = None, default_prompt_name: str | None = None, similarity_fn_name: str | SimilarityFunction | None = None, cache_folder: str | None = None, trust_remote_code: bool = False, revision: str | None = None, local_files_only: bool = False, token: str | bool | None = None, use_auth_token: str | bool | None = None, truncate_dim: int | None = None, model_kwargs: dict[str, Any] | None = None, tokenizer_kwargs: dict[str, Any] | None = None, config_kwargs: dict[str, Any] | None = None, model_card_data: SentenceTransformerModelCardData | None = None, backend: Literal['torch', 'onnx', 'openvino'] = 'torch') None [source][source]¶
Initialize internal Module state, shared by both nn.Module and ScriptModule.
- get_backend() Literal['torch', 'onnx', 'openvino'] [source][source]¶
Return the backend used for inference, which can be one of “torch”, “onnx”, or “openvino”.
- Returns:
The backend used for inference.
- Return type:
- get_model_kwargs() list[str] [source][source]¶
Get the keyword arguments specific to this model for the
encode
,encode_query
, orencode_document
methods.Example
>>> from sentence_transformers import SentenceTransformer, SparseEncoder >>> SentenceTransformer("all-MiniLM-L6-v2").get_model_kwargs() [] >>> SentenceTransformer("jinaai/jina-embeddings-v4", trust_remote_code=True).get_model_kwargs() ['task', 'truncate_dim'] >>> SparseEncoder("opensearch-project/opensearch-neural-sparse-encoding-doc-v3-distill").get_model_kwargs() ['task']
- encode_query(sentences: str | list[str] | ndarray, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding', 'token_embeddings'] | None = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) list[Tensor] | ndarray | Tensor | dict[str, Tensor] | list[dict[str, Tensor]] [source][source]¶
Computes sentence embeddings specifically optimized for query representation.
This method is a specialized version of :meth:
encode
that differs in exactly two ways:If no
`prompt_name``
or``prompt`
is provided, it uses a predefined “query” prompt, if available in the model’s`prompts`
dictionary.It sets the
`task``
to “query”. If the model has a :class:``~sentence_transformers.models.Router` module, it will use the “query” task type to route the input through the appropriate submodules.
Tip
If you are unsure whether you should use :meth:
encode
, :meth:encode_query
, or :meth:encode_document
, your best bet is to use :meth:encode_query
and :meth:encode_document
for Information Retrieval tasks with clear query and document/passage distinction, and use :meth:encode
for all other tasks.Note that :meth:
encode
is the most general method and can be used for any task, including Information Retrieval, and that if the model was not trained with predefined prompts and/or task types, then all three methods will return identical embeddings.- Parameters:
prompt_name (Optional[str], optional) – The name of the prompt to use for encoding. Must be a key in the
prompts
dictionary, which is either set in the constructor or loaded from the model configuration. For example if`prompt_name``
is “query” and the``prompts`
is {“query”: “query: “, …}, then the sentence “What is the capital of France?” will be encoded as “query: What is the capital of France?” because the sentence is appended to the prompt. If`prompt`
is also set, this argument is ignored. Defaults to None.prompt (Optional[str], optional) – The prompt to use for encoding. For example, if the prompt is “query: “, then the sentence “What is the capital of France?” will be encoded as “query: What is the capital of France?” because the sentence is appended to the prompt. If
`prompt``
is set,``prompt_name`
is ignored. Defaults to None.batch_size (int, optional) – The batch size used for the computation. Defaults to 32.
show_progress_bar (bool, optional) – Whether to output a progress bar when encode sentences. Defaults to None.
output_value (Optional[Literal["sentence_embedding", "token_embeddings"]], optional) – The type of embeddings to return: “sentence_embedding” to get sentence embeddings, “token_embeddings” to get wordpiece token embeddings, and
None
, to get all output values. Defaults to “sentence_embedding”.precision (Literal["float32", "int8", "uint8", "binary", "ubinary"], optional) – The precision to use for the embeddings. Can be “float32”, “int8”, “uint8”, “binary”, or “ubinary”. All non-float32 precisions are quantized embeddings. Quantized embeddings are smaller in size and faster to compute, but may have a lower accuracy. They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks. Defaults to “float32”.
convert_to_numpy (bool, optional) – Whether the output should be a list of numpy vectors. If False, it is a list of PyTorch tensors. Defaults to True.
convert_to_tensor (bool, optional) – Whether the output should be one large tensor. Overwrites
convert_to_numpy
. Defaults to False.device (Union[str, List[str], None], optional) –
Device(s) to use for computation. Can be:
A single device string (e.g., “cuda:0”, “cpu”) for single-process encoding
A list of device strings (e.g., [“cuda:0”, “cuda:1”], [“cpu”, “cpu”, “cpu”, “cpu”]) to distribute encoding across multiple processes
None to auto-detect available device for single-process encoding
If a list is provided, multi-process encoding will be used. Defaults to None.
normalize_embeddings (bool, optional) – Whether to normalize returned vectors to have length 1. In that case, the faster dot-product (util.dot_score) instead of cosine similarity can be used. Defaults to False.
truncate_dim (int, optional) – The dimension to truncate sentence embeddings to. Truncation is especially interesting for
Matryoshka models <https://sbert.net/examples/sentence_transformer/training/matryoshka/README.html>``_, i.e. models that are trained to still produce useful embeddings even if the embedding dimension is reduced. Truncated embeddings require less memory and are faster to perform retrieval with, but note that inference is just as fast, and the embedding performance is worse than the full embeddings. If None, the ```truncate_dim`
from the model initialization is used. Defaults to None.pool (Dict[Literal["input", "output", "processes"], Any], optional) – A pool created by
start_multi_process_pool()
for multi-process encoding. If provided, the encoding will be distributed across multiple processes. This is recommended for large datasets and when multiple GPUs are available. Defaults to None.chunk_size (int, optional) – Size of chunks for multi-process encoding. Only used with multiprocessing, i.e. when
`pool``
is not None or``device`
is a list. If None, a sensible default is calculated. Defaults to None.
- Returns:
By default, a 2d numpy array with shape [num_inputs, output_dimension] is returned. If only one string input is provided, then the output is a 1d array with shape [output_dimension]. If
`convert_to_tensor`
, a torch Tensor is returned instead. If`self.truncate_dim <= output_dimension``
then output_dimension is``self.truncate_dim`
.- Return type:
Union[List[Tensor], ndarray, Tensor]
Example
from sentence_transformers import SentenceTransformer # Load a pre-trained SentenceTransformer model model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1") # Encode some queries queries = [ "What are the effects of climate change?", "History of artificial intelligence", "Technical specifications product XYZ", ] # Using query-specific encoding embeddings = model.encode_query(queries) print(embeddings.shape) # (3, 768)
- encode_document(sentences: str | list[str] | ndarray, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding', 'token_embeddings'] | None = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) list[Tensor] | ndarray | Tensor | dict[str, Tensor] | list[dict[str, Tensor]] [source][source]¶
Computes sentence embeddings specifically optimized for document/passage representation.
This method is a specialized version of :meth:
encode
that differs in exactly two ways:If no
`prompt_name``
or``prompt`
is provided, it uses a predefined “document” prompt, if available in the model’s`prompts`
dictionary.It sets the
`task``
to “document”. If the model has a :class:``~sentence_transformers.models.Router` module, it will use the “document” task type to route the input through the appropriate submodules.
Tip
If you are unsure whether you should use :meth:
encode
, :meth:encode_query
, or :meth:encode_document
, your best bet is to use :meth:encode_query
and :meth:encode_document
for Information Retrieval tasks with clear query and document/passage distinction, and use :meth:encode
for all other tasks.Note that :meth:
encode
is the most general method and can be used for any task, including Information Retrieval, and that if the model was not trained with predefined prompts and/or task types, then all three methods will return identical embeddings.- Parameters:
prompt_name (Optional[str], optional) – The name of the prompt to use for encoding. Must be a key in the
prompts
dictionary, which is either set in the constructor or loaded from the model configuration. For example if`prompt_name``
is “query” and the``prompts`
is {“query”: “query: “, …}, then the sentence “What is the capital of France?” will be encoded as “query: What is the capital of France?” because the sentence is appended to the prompt. If`prompt`
is also set, this argument is ignored. Defaults to None.prompt (Optional[str], optional) – The prompt to use for encoding. For example, if the prompt is “query: “, then the sentence “What is the capital of France?” will be encoded as “query: What is the capital of France?” because the sentence is appended to the prompt. If
`prompt``
is set,``prompt_name`
is ignored. Defaults to None.batch_size (int, optional) – The batch size used for the computation. Defaults to 32.
show_progress_bar (bool, optional) – Whether to output a progress bar when encode sentences. Defaults to None.
output_value (Optional[Literal["sentence_embedding", "token_embeddings"]], optional) – The type of embeddings to return: “sentence_embedding” to get sentence embeddings, “token_embeddings” to get wordpiece token embeddings, and
None
, to get all output values. Defaults to “sentence_embedding”.precision (Literal["float32", "int8", "uint8", "binary", "ubinary"], optional) – The precision to use for the embeddings. Can be “float32”, “int8”, “uint8”, “binary”, or “ubinary”. All non-float32 precisions are quantized embeddings. Quantized embeddings are smaller in size and faster to compute, but may have a lower accuracy. They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks. Defaults to “float32”.
convert_to_numpy (bool, optional) – Whether the output should be a list of numpy vectors. If False, it is a list of PyTorch tensors. Defaults to True.
convert_to_tensor (bool, optional) – Whether the output should be one large tensor. Overwrites
convert_to_numpy
. Defaults to False.device (Union[str, List[str], None], optional) –
Device(s) to use for computation. Can be:
A single device string (e.g., “cuda:0”, “cpu”) for single-process encoding
A list of device strings (e.g., [“cuda:0”, “cuda:1”], [“cpu”, “cpu”, “cpu”, “cpu”]) to distribute encoding across multiple processes
None to auto-detect available device for single-process encoding
If a list is provided, multi-process encoding will be used. Defaults to None.
normalize_embeddings (bool, optional) – Whether to normalize returned vectors to have length 1. In that case, the faster dot-product (util.dot_score) instead of cosine similarity can be used. Defaults to False.
truncate_dim (int, optional) – The dimension to truncate sentence embeddings to. Truncation is especially interesting for
Matryoshka models <https://sbert.net/examples/sentence_transformer/training/matryoshka/README.html>``_, i.e. models that are trained to still produce useful embeddings even if the embedding dimension is reduced. Truncated embeddings require less memory and are faster to perform retrieval with, but note that inference is just as fast, and the embedding performance is worse than the full embeddings. If None, the ```truncate_dim`
from the model initialization is used. Defaults to None.pool (Dict[Literal["input", "output", "processes"], Any], optional) – A pool created by
start_multi_process_pool()
for multi-process encoding. If provided, the encoding will be distributed across multiple processes. This is recommended for large datasets and when multiple GPUs are available. Defaults to None.chunk_size (int, optional) – Size of chunks for multi-process encoding. Only used with multiprocessing, i.e. when
`pool``
is not None or``device`
is a list. If None, a sensible default is calculated. Defaults to None.
- Returns:
By default, a 2d numpy array with shape [num_inputs, output_dimension] is returned. If only one string input is provided, then the output is a 1d array with shape [output_dimension]. If
`convert_to_tensor`
, a torch Tensor is returned instead. If`self.truncate_dim <= output_dimension``
then output_dimension is``self.truncate_dim`
.- Return type:
Union[List[Tensor], ndarray, Tensor]
Example
from sentence_transformers import SentenceTransformer # Load a pre-trained SentenceTransformer model model = SentenceTransformer("mixedbread-ai/mxbai-embed-large-v1") # Encode some documents documents = [ "This research paper discusses the effects of climate change on marine life.", "The article explores the history of artificial intelligence development.", "This document contains technical specifications for the new product line.", ] # Using document-specific encoding embeddings = model.encode_document(documents) print(embeddings.shape) # (3, 768)
- encode(sentences: str, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding', 'token_embeddings'] = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: Literal[False] = True, convert_to_tensor: bool = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) Tensor [source][source]¶
- encode(sentences: str | list[str] | np.ndarray, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding'] = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: Literal[True] = True, convert_to_tensor: Literal[False] = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) np.ndarray
- encode(sentences: str | list[str] | np.ndarray, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding'] = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: Literal[True] = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) Tensor
- encode(sentences: list[str] | np.ndarray, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['sentence_embedding', 'token_embeddings'] = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) list[Tensor]
- encode(sentences: list[str] | np.ndarray, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: None = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) list[dict[str, Tensor]]
- encode(sentences: str, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: None = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) dict[str, Tensor]
- encode(sentences: str, prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, show_progress_bar: bool | None = None, output_value: Literal['token_embeddings'] = 'sentence_embedding', precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', convert_to_numpy: bool = True, convert_to_tensor: bool = False, device: str | list[str | torch.device] | None = None, normalize_embeddings: bool = False, truncate_dim: int | None = None, pool: dict[Literal['input', 'output', 'processes'], Any] | None = None, chunk_size: int | None = None, **kwargs) Tensor
Computes sentence embeddings.
Tip
If you are unsure whether you should use :meth:
encode
, :meth:encode_query
, or :meth:encode_document
, your best bet is to use :meth:encode_query
and :meth:encode_document
for Information Retrieval tasks with clear query and document/passage distinction, and use :meth:encode
for all other tasks.Note that :meth:
encode
is the most general method and can be used for any task, including Information Retrieval, and that if the model was not trained with predefined prompts and/or task types, then all three methods will return identical embeddings.- Parameters:
prompt_name (Optional[str], optional) – The name of the prompt to use for encoding. Must be a key in the
prompts
dictionary, which is either set in the constructor or loaded from the model configuration. For example if`prompt_name``
is “query” and the``prompts`
is {“query”: “query: “, …}, then the sentence “What is the capital of France?” will be encoded as “query: What is the capital of France?” because the sentence is appended to the prompt. If`prompt`
is also set, this argument is ignored. Defaults to None.prompt (Optional[str], optional) – The prompt to use for encoding. For example, if the prompt is “query: “, then the sentence “What is the capital of France?” will be encoded as “query: What is the capital of France?” because the sentence is appended to the prompt. If
`prompt``
is set,``prompt_name`
is ignored. Defaults to None.batch_size (int, optional) – The batch size used for the computation. Defaults to 32.
show_progress_bar (bool, optional) – Whether to output a progress bar when encode sentences. Defaults to None.
output_value (Optional[Literal["sentence_embedding", "token_embeddings"]], optional) – The type of embeddings to return: “sentence_embedding” to get sentence embeddings, “token_embeddings” to get wordpiece token embeddings, and
None
, to get all output values. Defaults to “sentence_embedding”.precision (Literal["float32", "int8", "uint8", "binary", "ubinary"], optional) – The precision to use for the embeddings. Can be “float32”, “int8”, “uint8”, “binary”, or “ubinary”. All non-float32 precisions are quantized embeddings. Quantized embeddings are smaller in size and faster to compute, but may have a lower accuracy. They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks. Defaults to “float32”.
convert_to_numpy (bool, optional) – Whether the output should be a list of numpy vectors. If False, it is a list of PyTorch tensors. Defaults to True.
convert_to_tensor (bool, optional) – Whether the output should be one large tensor. Overwrites
convert_to_numpy
. Defaults to False.device (Union[str, List[str], None], optional) –
Device(s) to use for computation. Can be:
A single device string (e.g., “cuda:0”, “cpu”) for single-process encoding
A list of device strings (e.g., [“cuda:0”, “cuda:1”], [“cpu”, “cpu”, “cpu”, “cpu”]) to distribute encoding across multiple processes
None to auto-detect available device for single-process encoding
If a list is provided, multi-process encoding will be used. Defaults to None.
normalize_embeddings (bool, optional) – Whether to normalize returned vectors to have length 1. In that case, the faster dot-product (util.dot_score) instead of cosine similarity can be used. Defaults to False.
truncate_dim (int, optional) – The dimension to truncate sentence embeddings to. Truncation is especially interesting for
Matryoshka models <https://sbert.net/examples/sentence_transformer/training/matryoshka/README.html>``_, i.e. models that are trained to still produce useful embeddings even if the embedding dimension is reduced. Truncated embeddings require less memory and are faster to perform retrieval with, but note that inference is just as fast, and the embedding performance is worse than the full embeddings. If None, the ```truncate_dim`
from the model initialization is used. Defaults to None.pool (Dict[Literal["input", "output", "processes"], Any], optional) – A pool created by
start_multi_process_pool()
for multi-process encoding. If provided, the encoding will be distributed across multiple processes. This is recommended for large datasets and when multiple GPUs are available. Defaults to None.chunk_size (int, optional) – Size of chunks for multi-process encoding. Only used with multiprocessing, i.e. when
`pool``
is not None or``device`
is a list. If None, a sensible default is calculated. Defaults to None.
- Returns:
By default, a 2d numpy array with shape [num_inputs, output_dimension] is returned. If only one string input is provided, then the output is a 1d array with shape [output_dimension]. If
`convert_to_tensor`
, a torch Tensor is returned instead. If`self.truncate_dim <= output_dimension``
then output_dimension is``self.truncate_dim`
.- Return type:
Union[List[Tensor], ndarray, Tensor]
Example
from sentence_transformers import SentenceTransformer # Load a pre-trained SentenceTransformer model model = SentenceTransformer("all-mpnet-base-v2") # Encode some texts sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium.", ] embeddings = model.encode(sentences) print(embeddings.shape) # (3, 768)
- forward(input: dict[str, Tensor], **kwargs) dict[str, Tensor] [source][source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the :class:
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- property similarity_fn_name: Literal['cosine', 'dot', 'euclidean', 'manhattan'][source]¶
Return the name of the similarity function used by :meth:
SentenceTransformer.similarity
and :meth:SentenceTransformer.similarity_pairwise
.- Returns:
- The name of the similarity function. Can be None if not set, in which case it will
default to “cosine” when first called.
- Return type:
Optional[str]
Example
>>> model = SentenceTransformer("multi-qa-mpnet-base-dot-v1") >>> model.similarity_fn_name 'dot'
- property similarity: Callable[[Tensor | ndarray[tuple[int, ...], dtype[float32]], Tensor | ndarray[tuple[int, ...], dtype[float32]]], Tensor][source]¶
Compute the similarity between two collections of embeddings. The output will be a matrix with the similarity scores between all embeddings from the first parameter and all embeddings from the second parameter. This differs from
similarity_pairwise
which computes the similarity between each pair of embeddings. This method supports only embeddings with fp32 precision and does not accommodate quantized embeddings.- Parameters:
embeddings1 (Union[Tensor, ndarray]) – [num_embeddings_1, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
embeddings2 (Union[Tensor, ndarray]) – [num_embeddings_2, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
- Returns:
A [num_embeddings_1, num_embeddings_2]-shaped torch tensor with similarity scores.
- Return type:
Tensor
Example
>>> model = SentenceTransformer("all-mpnet-base-v2") >>> sentences = [ ... "The weather is so nice!", ... "It's so sunny outside.", ... "He's driving to the movie theater.", ... "She's going to the cinema.", ... ] >>> embeddings = model.encode(sentences, normalize_embeddings=True) >>> model.similarity(embeddings, embeddings) tensor([[1.0000, 0.7235, 0.0290, 0.1309], [0.7235, 1.0000, 0.0613, 0.1129], [0.0290, 0.0613, 1.0000, 0.5027], [0.1309, 0.1129, 0.5027, 1.0000]]) >>> model.similarity_fn_name "cosine" >>> model.similarity_fn_name = "euclidean" >>> model.similarity(embeddings, embeddings) tensor([[-0.0000, -0.7437, -1.3935, -1.3184], [-0.7437, -0.0000, -1.3702, -1.3320], [-1.3935, -1.3702, -0.0000, -0.9973], [-1.3184, -1.3320, -0.9973, -0.0000]])
- property similarity_pairwise: Callable[[Tensor | ndarray[tuple[int, ...], dtype[float32]], Tensor | ndarray[tuple[int, ...], dtype[float32]]], Tensor][source]¶
Compute the similarity between two collections of embeddings. The output will be a vector with the similarity scores between each pair of embeddings. This method supports only embeddings with fp32 precision and does not accommodate quantized embeddings.
- Parameters:
embeddings1 (Union[Tensor, ndarray]) – [num_embeddings, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
embeddings2 (Union[Tensor, ndarray]) – [num_embeddings, embedding_dim] or [embedding_dim]-shaped numpy array or torch tensor.
- Returns:
A [num_embeddings]-shaped torch tensor with pairwise similarity scores.
- Return type:
Tensor
Example
>>> model = SentenceTransformer("all-mpnet-base-v2") >>> sentences = [ ... "The weather is so nice!", ... "It's so sunny outside.", ... "He's driving to the movie theater.", ... "She's going to the cinema.", ... ] >>> embeddings = model.encode(sentences, normalize_embeddings=True) >>> model.similarity_pairwise(embeddings[::2], embeddings[1::2]) tensor([0.7235, 0.5027]) >>> model.similarity_fn_name "cosine" >>> model.similarity_fn_name = "euclidean" >>> model.similarity_pairwise(embeddings[::2], embeddings[1::2]) tensor([-0.7437, -0.9973])
- start_multi_process_pool(target_devices: list[str] | None = None) dict[Literal['input', 'output', 'processes'], Any] [source][source]¶
Starts a multi-process pool to process the encoding with several independent processes via :meth:
SentenceTransformer.encode_multi_process <sentence_transformers.SentenceTransformer.encode_multi_process>
.This method is recommended if you want to encode on multiple GPUs or CPUs. It is advised to start only one process per GPU. This method works together with encode_multi_process and stop_multi_process_pool.
- Parameters:
target_devices (List[str], optional) – PyTorch target devices, e.g. [“cuda:0”, “cuda:1”, …], [“npu:0”, “npu:1”, …], or [“cpu”, “cpu”, “cpu”, “cpu”]. If target_devices is None and CUDA/NPU is available, then all available CUDA/NPU devices will be used. If target_devices is None and CUDA/NPU is not available, then 4 CPU devices will be used.
- Returns:
A dictionary with the target processes, an input queue, and an output queue.
- Return type:
Dict[str, Any]
- static stop_multi_process_pool(pool: dict[Literal['input', 'output', 'processes'], Any]) None [source][source]¶
Stops all processes started with start_multi_process_pool.
- encode_multi_process(sentences: list[str], pool: dict[Literal['input', 'output', 'processes'], Any], prompt_name: str | None = None, prompt: str | None = None, batch_size: int = 32, chunk_size: int | None = None, show_progress_bar: bool | None = None, precision: Literal['float32', 'int8', 'uint8', 'binary', 'ubinary'] = 'float32', normalize_embeddings: bool = False, truncate_dim: int | None = None) ndarray [source][source]¶
Warning
This method is deprecated. You can now call :meth:
SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>
with the same parameters instead, which will automatically handle multi-process encoding using the provided`pool`
.Encodes a list of sentences using multiple processes and GPUs via :meth:
SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>
. The sentences are chunked into smaller packages and sent to individual processes, which encode them on different GPUs or CPUs. This method is only suitable for encoding large sets of sentences.- Parameters:
sentences (List[str]) – List of sentences to encode.
pool (Dict[Literal["input", "output", "processes"], Any]) – A pool of workers started with :meth:
SentenceTransformer.start_multi_process_pool <sentence_transformers.SentenceTransformer.start_multi_process_pool>
.prompt_name (Optional[str], optional) – The name of the prompt to use for encoding. Must be a key in the
prompts
dictionary, which is either set in the constructor or loaded from the model configuration. For example if`prompt_name``
is “query” and the``prompts`
is {“query”: “query: “, …}, then the sentence “What is the capital of France?” will be encoded as “query: What is the capital of France?” because the sentence is appended to the prompt. If`prompt`
is also set, this argument is ignored. Defaults to None.prompt (Optional[str], optional) – The prompt to use for encoding. For example, if the prompt is “query: “, then the sentence “What is the capital of France?” will be encoded as “query: What is the capital of France?” because the sentence is appended to the prompt. If
`prompt``
is set,``prompt_name`
is ignored. Defaults to None.batch_size (int) – Encode sentences with batch size. (default: 32)
chunk_size (int) – Sentences are chunked and sent to the individual processes. If None, it determines a sensible size. Defaults to None.
show_progress_bar (bool, optional) – Whether to output a progress bar when encode sentences. Defaults to None.
precision (Literal["float32", "int8", "uint8", "binary", "ubinary"]) – The precision to use for the embeddings. Can be “float32”, “int8”, “uint8”, “binary”, or “ubinary”. All non-float32 precisions are quantized embeddings. Quantized embeddings are smaller in size and faster to compute, but may have lower accuracy. They are useful for reducing the size of the embeddings of a corpus for semantic search, among other tasks. Defaults to “float32”.
normalize_embeddings (bool) – Whether to normalize returned vectors to have length 1. In that case, the faster dot-product (util.dot_score) instead of cosine similarity can be used. Defaults to False.
truncate_dim (int, optional) – The dimension to truncate sentence embeddings to. Truncation is especially interesting for
Matryoshka models <https://sbert.net/examples/sentence_transformer/training/matryoshka/README.html>``_, i.e. models that are trained to still produce useful embeddings even if the embedding dimension is reduced. Truncated embeddings require less memory and are faster to perform retrieval with, but note that inference is just as fast, and the embedding performance is worse than the full embeddings. If None, the ```truncate_dim`
from the model initialization is used. Defaults to None.
- Returns:
A 2D numpy array with shape [num_inputs, output_dimension].
- Return type:
np.ndarray
Example
from sentence_transformers import SentenceTransformer def main(): model = SentenceTransformer("all-mpnet-base-v2") sentences = ["The weather is so nice!", "It's so sunny outside.", "He's driving to the movie theater.", "She's going to the cinema."] * 1000 pool = model.start_multi_process_pool() embeddings = model.encode_multi_process(sentences, pool) model.stop_multi_process_pool(pool) print(embeddings.shape) # => (4000, 768) if __name__ == "__main__": main()
- set_pooling_include_prompt(include_prompt: bool) None [source][source]¶
Sets the
include_prompt
attribute in the pooling layer in the model, if there is one.This is useful for INSTRUCTOR models, as the prompt should be excluded from the pooling strategy for these models.
- Parameters:
include_prompt (bool) – Whether to include the prompt in the pooling layer.
- Returns:
None
- get_max_seq_length() int | None [source][source]¶
Returns the maximal sequence length that the model accepts. Longer inputs will be truncated.
- Returns:
The maximal sequence length that the model accepts, or None if it is not defined.
- Return type:
Optional[int]
- tokenize(texts: list[str] | list[dict] | list[tuple[str, str]], **kwargs) dict[str, Tensor] [source][source]¶
Tokenizes the texts.
- get_sentence_embedding_dimension() int | None [source][source]¶
Returns the number of dimensions in the output of :meth:
SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>
.- Returns:
The number of dimensions in the output of
encode
. If it’s not known, it’sNone
.- Return type:
Optional[int]
- truncate_sentence_embeddings(truncate_dim: int | None) Iterator[None] [source][source]¶
In this context, :meth:
SentenceTransformer.encode <sentence_transformers.SentenceTransformer.encode>
outputs sentence embeddings truncated at dimension`truncate_dim`
.This may be useful when you are using the same model for different applications where different dimensions are needed.
- Parameters:
truncate_dim (int, optional) – The dimension to truncate sentence embeddings to.
`None`
does no truncation.
Example
from sentence_transformers import SentenceTransformer model = SentenceTransformer("all-mpnet-base-v2") with model.truncate_sentence_embeddings(truncate_dim=16): embeddings_truncated = model.encode(["hello there", "hiya"]) assert embeddings_truncated.shape[-1] == 16
- save(path: str, model_name: str | None = None, create_model_card: bool = True, train_datasets: list[str] | None = None, safe_serialization: bool = True) None [source][source]¶
Saves a model and its configuration files to a directory, so that it can be loaded with
`SentenceTransformer(path)`
again.- Parameters:
path (str) – Path on disk where the model will be saved.
model_name (str, optional) – Optional model name.
create_model_card (bool, optional) – If True, create a README.md with basic information about this model.
train_datasets (List[str], optional) – Optional list with the names of the datasets used to train the model.
safe_serialization (bool, optional) – If True, save the model using safetensors. If False, save the model the traditional (but unsafe) PyTorch way.
- save_pretrained(path: str, model_name: str | None = None, create_model_card: bool = True, train_datasets: list[str] | None = None, safe_serialization: bool = True) None [source][source]¶
Saves a model and its configuration files to a directory, so that it can be loaded with
`SentenceTransformer(path)`
again.- Parameters:
path (str) – Path on disk where the model will be saved.
model_name (str, optional) – Optional model name.
create_model_card (bool, optional) – If True, create a README.md with basic information about this model.
train_datasets (List[str], optional) – Optional list with the names of the datasets used to train the model.
safe_serialization (bool, optional) – If True, save the model using safetensors. If False, save the model the traditional (but unsafe) PyTorch way.
- save_to_hub(repo_id: str, organization: str | None = None, token: str | None = None, private: bool | None = None, safe_serialization: bool = True, commit_message: str = 'Add new SentenceTransformer model.', local_model_path: str | None = None, exist_ok: bool = False, replace_model_card: bool = False, train_datasets: list[str] | None = None) str [source][source]¶
DEPRECATED, use
push_to_hub
instead.Uploads all elements of this Sentence Transformer to a new HuggingFace Hub repository.
- Parameters:
repo_id (str) – Repository name for your model in the Hub, including the user or organization.
token (str, optional) – An authentication token (See https://huggingface.co/settings/token)
private (bool, optional) – Set to true, for hosting a private model
safe_serialization (bool, optional) – If true, save the model using safetensors. If false, save the model the traditional PyTorch way
commit_message (str, optional) – Message to commit while pushing.
local_model_path (str, optional) – Path of the model locally. If set, this file path will be uploaded. Otherwise, the current model will be uploaded
exist_ok (bool, optional) – If true, saving to an existing repository is OK. If false, saving only to a new repository is possible
replace_model_card (bool, optional) – If true, replace an existing model card in the hub with the automatically created model card
train_datasets (List[str], optional) – Datasets used to train the model. If set, the datasets will be added to the model card in the Hub.
- Returns:
The url of the commit of your model in the repository on the Hugging Face Hub.
- Return type:
- push_to_hub(repo_id: str, token: str | None = None, private: bool | None = None, safe_serialization: bool = True, commit_message: str | None = None, local_model_path: str | None = None, exist_ok: bool = False, replace_model_card: bool = False, train_datasets: list[str] | None = None, revision: str | None = None, create_pr: bool = False) str [source][source]¶
Uploads all elements of this Sentence Transformer to a new HuggingFace Hub repository.
- Parameters:
repo_id (str) – Repository name for your model in the Hub, including the user or organization.
token (str, optional) – An authentication token (See https://huggingface.co/settings/token)
private (bool, optional) – Set to true, for hosting a private model
safe_serialization (bool, optional) – If true, save the model using safetensors. If false, save the model the traditional PyTorch way
commit_message (str, optional) – Message to commit while pushing.
local_model_path (str, optional) – Path of the model locally. If set, this file path will be uploaded. Otherwise, the current model will be uploaded
exist_ok (bool, optional) – If true, saving to an existing repository is OK. If false, saving only to a new repository is possible
replace_model_card (bool, optional) – If true, replace an existing model card in the hub with the automatically created model card
train_datasets (List[str], optional) – Datasets used to train the model. If set, the datasets will be added to the model card in the Hub.
revision (str, optional) – Branch to push the uploaded files to
create_pr (bool, optional) – If True, create a pull request instead of pushing directly to the main branch
- Returns:
The url of the commit of your model in the repository on the Hugging Face Hub.
- Return type:
- evaluate(evaluator: SentenceEvaluator, output_path: str | None = None) dict[str, float] | float [source][source]¶
Evaluate the model based on an evaluator
- Parameters:
evaluator (SentenceEvaluator) – The evaluator used to evaluate the model.
output_path (str, optional) – The path where the evaluator can write the results. Defaults to None.
- Returns:
The evaluation results.
- static load(input_path) SentenceTransformer [source][source]¶
- property device: device[source]¶
Get torch.device from module, assuming that the whole module has one device. In case there are no PyTorch parameters, fall back to CPU.
- property max_seq_length: int[source]¶
Returns the maximal input sequence length for the model. Longer inputs will be truncated.
- Returns:
The maximal input sequence length.
- Return type:
Example
from sentence_transformers import SentenceTransformer model = SentenceTransformer("all-mpnet-base-v2") print(model.max_seq_length) # => 384
- property transformers_model: PreTrainedModel | None[source]¶
Property to get the underlying transformers PreTrainedModel instance, if it exists. Note that it’s possible for a model to have multiple underlying transformers models, but this property will return the first one it finds in the module hierarchy.
- Returns:
The underlying transformers model or None if not found.
- Return type:
PreTrainedModel or None
Example
from sentence_transformers import SentenceTransformer model = SentenceTransformer("all-mpnet-base-v2") # You can now access the underlying transformers model transformers_model = model.transformers_model print(type(transformers_model)) # => <class 'transformers.models.mpnet.modeling_mpnet.MPNetModel'>
- class tooluniverse.tool_finder_embedding.BaseTool(tool_config)[source][source]¶
Bases:
object
- classmethod get_default_config_file()[source][source]¶
Get the path to the default configuration file for this tool type.
This method uses a robust path resolution strategy that works across different installation scenarios:
Installed packages: Uses importlib.resources for proper package resource access
Development mode: Falls back to file-based path resolution
Legacy Python: Handles importlib.resources and importlib_resources
Override this method in subclasses to specify a custom defaults file.
- Returns:
Path or resource object pointing to the defaults file
- tooluniverse.tool_finder_embedding.register_tool(tool_type_name=None, config=None)[source][source]¶
Decorator to automatically register tool classes and their configs.
- Usage:
@register_tool(‘CustomToolName’, config={…}) class MyTool:
pass
- class tooluniverse.tool_finder_embedding.ToolFinderEmbedding(tool_config, tooluniverse)[source][source]¶
Bases:
BaseTool
A tool finder model that uses RAG (Retrieval-Augmented Generation) to find relevant tools based on user queries using semantic similarity search.
This class leverages sentence transformers to encode tool descriptions and find the most relevant tools for a given query through embedding-based similarity matching.
- __init__(tool_config, tooluniverse)[source][source]¶
Initialize the ToolFinderEmbedding with configuration and RAG model.
- Parameters:
tool_config (dict) – Configuration dictionary for the tool
- load_rag_model()[source][source]¶
Load the sentence transformer model for RAG-based tool retrieval.
Configures the model with appropriate sequence length and tokenizer settings for optimal performance in tool description encoding.
- load_tool_desc_embedding(tooluniverse, include_names=None, exclude_names=None, include_categories=None, exclude_categories=None)[source][source]¶
Load or generate embeddings for tool descriptions from the tool universe.
This method either loads cached embeddings from disk or generates new ones by encoding all tool descriptions. Embeddings are cached to disk for faster subsequent loads. Memory is properly cleaned up after embedding generation to avoid OOM issues.
- Parameters:
tooluniverse – ToolUniverse instance containing all available tools
include_names (list, optional) – Specific tool names to include
exclude_names (list, optional) – Tool names to exclude
include_categories (list, optional) – Tool categories to include
exclude_categories (list, optional) – Tool categories to exclude
- rag_infer(query, top_k=5)[source][source]¶
Perform RAG inference to find the most relevant tools for a given query.
Uses semantic similarity between the query embedding and pre-computed tool embeddings to identify the most relevant tools.
- Parameters:
- Returns:
List of top-k tool names ranked by relevance to the query
- Return type:
- Raises:
SystemExit – If tool_desc_embedding is not loaded
- find_tools(message=None, picked_tool_names=None, rag_num=5, return_call_result=False, categories=None)[source][source]¶
Find relevant tools based on a message or pre-selected tool names.
This method either uses RAG inference to find tools based on a message or processes a list of pre-selected tool names. It filters out special tools and returns tool prompts suitable for use in agent workflows.
- Parameters:
message (str, optional) – Query message to find tools for. Required if picked_tool_names is None.
picked_tool_names (list, optional) – Pre-selected tool names to process. Required if message is None.
rag_num (int, optional) – Number of tools to return after filtering. Defaults to 5.
return_call_result (bool, optional) – If True, returns both prompts and tool names. Defaults to False.
categories (list, optional) – List of tool categories to filter by. Currently not implemented for embedding-based search.
- Returns:
If return_call_result is False: Tool prompts as a formatted string
If return_call_result is True: Tuple of (tool_prompts, tool_names)
- Return type:
- Raises:
AssertionError – If both message and picked_tool_names are None
- run(arguments)[source][source]¶
Run the tool finder with given arguments following the standard tool interface.
This is the main entry point for using ToolFinderEmbedding as a standard tool. It extracts parameters from the arguments dictionary and delegates to find_tools().
- Parameters:
arguments (dict) – Dictionary containing: - description (str, optional): Query message to find tools for (maps to ‘message’) - limit (int, optional): Number of tools to return (maps to ‘rag_num’). Defaults to 5. - picked_tool_names (list, optional): Pre-selected tool names to process - return_call_result (bool, optional): Whether to return both prompts and names. Defaults to False. - categories (list, optional): List of tool categories to filter by