tooluniverse.embedding_database module

Embedding Database Tool for ToolUniverse

A unified tool for managing embedding databases with FAISS vector search and SQLite metadata storage. Supports creating databases from documents, adding documents, searching, and loading existing databases. Uses OpenAI’s embedding models for text-to-vector conversion, with support for Azure OpenAI.

class tooluniverse.embedding_database.Path(*args, **kwargs)[source][source]

Bases: PurePath

PurePath subclass that can make system calls.

Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.

classmethod cwd()[source][source]

Return a new path pointing to the current working directory (as returned by os.getcwd()).

classmethod home()[source][source]

Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).

samefile(other_path)[source][source]

Return whether other_path is the same or not as this file (as returned by os.path.samefile()).

iterdir()[source][source]

Iterate over the files in this directory. Does not yield any result for the special paths ‘.’ and ‘..’.

glob(pattern)[source][source]

Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.

rglob(pattern)[source][source]

Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.

absolute()[source][source]

Return an absolute version of this path. This function works even if the path doesn’t point to anything.

No normalization is done, i.e. all ‘.’ and ‘..’ will be kept along. Use resolve() to get the canonical path to a file.

resolve(strict=False)[source][source]

Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).

stat(*, follow_symlinks=True)[source][source]

Return the result of the stat() system call on this path, like os.stat() does.

owner()[source][source]

Return the login name of the file owner.

group()[source][source]

Return the group name of the file gid.

open(mode='r', buffering=-1, encoding=None, errors=None, newline=None)[source][source]

Open the file pointed by this path and return a file object, as the built-in open() function does.

read_bytes()[source][source]

Open the file in bytes mode, read it, and close the file.

read_text(encoding=None, errors=None)[source][source]

Open the file in text mode, read it, and close the file.

write_bytes(data)[source][source]

Open the file in bytes mode, write to it, and close the file.

write_text(data, encoding=None, errors=None, newline=None)[source][source]

Open the file in text mode, write to it, and close the file.

Return the path to which the symbolic link points.

touch(mode=438, exist_ok=True)[source][source]

Create this file with the given access mode, if it doesn’t exist.

mkdir(mode=511, parents=False, exist_ok=False)[source][source]

Create a new directory at this given path.

chmod(mode, *, follow_symlinks=True)[source][source]

Change the permissions of the path, like os.chmod().

lchmod(mode)[source][source]

Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.

Remove this file or link. If the path is a directory, use rmdir() instead.

rmdir()[source][source]

Remove this directory. The directory must be empty.

lstat()[source][source]

Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.

rename(target)[source][source]

Rename this path to the target path.

The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.

Returns the new Path instance pointing to the target path.

replace(target)[source][source]

Rename this path to the target path, overwriting if that path exists.

The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.

Returns the new Path instance pointing to the target path.

Make this path a symlink pointing to the target path. Note the order of arguments (link, target) is the reverse of os.symlink.

Make this path a hard link pointing to the same file as target.

Note the order of arguments (self, target) is the reverse of os.link’s.

Make the target path a hard link pointing to this path.

Note this function does not make this path a hard link to target, despite the implication of the function and argument names. The order of arguments (target, link) is the reverse of Path.symlink_to, but matches that of os.link.

Deprecated since Python 3.10 and scheduled for removal in Python 3.12. Use hardlink_to() instead.

exists()[source][source]

Whether this path exists.

is_dir()[source][source]

Whether this path is a directory.

is_file()[source][source]

Whether this path is a regular file (also True for symlinks pointing to regular files).

is_mount()[source][source]

Check if this path is a POSIX mount point

Whether this path is a symbolic link.

is_block_device()[source][source]

Whether this path is a block device.

is_char_device()[source][source]

Whether this path is a character device.

is_fifo()[source][source]

Whether this path is a FIFO.

is_socket()[source][source]

Whether this path is a socket.

expanduser()[source][source]

Return a new path with expanded ~ and ~user constructs (as returned by os.path.expanduser)

class tooluniverse.embedding_database.OpenAI(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, base_url: str | URL | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = 2, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False)[source]

Bases: SyncAPIClient

__init__(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, base_url: str | URL | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = 2, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False) None[source][source]

Construct a new synchronous OpenAI client instance.

This automatically infers the following arguments from their corresponding environment variables if they are not provided: - api_key from OPENAI_API_KEY - organization from OPENAI_ORG_ID - project from OPENAI_PROJECT_ID - webhook_secret from OPENAI_WEBHOOK_SECRET

property audio: Audio[source]
property auth_headers: dict[str, str][source]
property batches: Batches[source]
property beta: Beta[source]
property chat: Chat[source]
property completions: Completions[source]
property containers: Containers[source]
property conversations: Conversations[source]
copy(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, http_client: Client | None = None, max_retries: int | NotGiven = NOT_GIVEN, default_headers: Mapping[str, str] | None = None, set_default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, set_default_query: Mapping[str, object] | None = None, _extra_kwargs: Mapping[str, Any] = {}) Self[source][source]

Create a new client instance re-using the same options given to the current client with optional overriding.

property default_headers: dict[str, str | Omit][source]
property embeddings: Embeddings[source]
property evals: Evals[source]
property files: Files[source]
property fine_tuning: FineTuning[source]
property images: Images[source]
property models: Models[source]
property moderations: Moderations[source]
property qs: Querystring[source]
property realtime: Realtime[source]
property responses: Responses[source]
property uploads: Uploads[source]
property vector_stores: VectorStores[source]
property webhooks: Webhooks[source]
with_options(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, http_client: Client | None = None, max_retries: int | NotGiven = NOT_GIVEN, default_headers: Mapping[str, str] | None = None, set_default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, set_default_query: Mapping[str, object] | None = None, _extra_kwargs: Mapping[str, Any] = {}) Self[source]

Create a new client instance re-using the same options given to the current client with optional overriding.

property with_raw_response: OpenAIWithRawResponse[source]
property with_streaming_response: OpenAIWithStreamedResponse[source]
api_key: str[source]
organization: str | None[source]
project: str | None[source]
webhook_secret: str | None[source]
websocket_base_url: str | httpx.URL | None[source]
class tooluniverse.embedding_database.AzureOpenAI(*, azure_endpoint: str, azure_deployment: str | None = None, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False)[source][source]
class tooluniverse.embedding_database.AzureOpenAI(*, azure_deployment: str | None = None, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False)
class tooluniverse.embedding_database.AzureOpenAI(*, base_url: str, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False)

Bases: BaseAzureClient[Client, Stream[Any]], OpenAI

__init__(*, azure_endpoint: str, azure_deployment: str | None = None, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False) None[source][source]
__init__(*, azure_deployment: str | None = None, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False) None
__init__(*, base_url: str, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False) None

Construct a new synchronous azure openai client instance.

This automatically infers the following arguments from their corresponding environment variables if they are not provided: - api_key from AZURE_OPENAI_API_KEY - organization from OPENAI_ORG_ID - project from OPENAI_PROJECT_ID - azure_ad_token from AZURE_OPENAI_AD_TOKEN - api_version from OPENAI_API_VERSION - azure_endpoint from AZURE_OPENAI_ENDPOINT

Parameters:
  • azure_endpoint – Your Azure endpoint, including the resource, e.g. https://example-resource.azure.openai.com/

  • azure_ad_token – Your Azure Active Directory token, https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id

  • azure_ad_token_provider – A function that returns an Azure Active Directory token, will be invoked on every request.

  • azure_deployment – A model deployment, if given with azure_endpoint, sets the base client URL to include /deployments/{azure_deployment}. Not supported with Assistants APIs.

copy(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, api_version: str | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, http_client: Client | None = None, max_retries: int | NotGiven = NOT_GIVEN, default_headers: Mapping[str, str] | None = None, set_default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, set_default_query: Mapping[str, object] | None = None, _extra_kwargs: Mapping[str, Any] = {}) Self[source][source]

Create a new client instance re-using the same options given to the current client with optional overriding.

with_options(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, api_version: str | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, http_client: Client | None = None, max_retries: int | NotGiven = NOT_GIVEN, default_headers: Mapping[str, str] | None = None, set_default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, set_default_query: Mapping[str, object] | None = None, _extra_kwargs: Mapping[str, Any] = {}) Self[source]

Create a new client instance re-using the same options given to the current client with optional overriding.

class tooluniverse.embedding_database.BaseTool(tool_config)[source][source]

Bases: object

__init__(tool_config)[source][source]
classmethod get_default_config_file()[source][source]

Get the path to the default configuration file for this tool type.

This method uses a robust path resolution strategy that works across different installation scenarios:

  1. Installed packages: Uses importlib.resources for proper package resource access

  2. Development mode: Falls back to file-based path resolution

  3. Legacy Python: Handles importlib.resources and importlib_resources

Override this method in subclasses to specify a custom defaults file.

Returns:

Path or resource object pointing to the defaults file

classmethod load_defaults_from_file()[source][source]

Load defaults from the configuration file

run(arguments=None)[source][source]

Execute the tool.

The default BaseTool implementation accepts an optional arguments mapping to align with most concrete tool implementations which expect a dictionary of inputs.

check_function_call(function_call_json)[source][source]
get_required_parameters()[source][source]

Retrieve required parameters from the endpoint definition. Returns: list: List of required parameters for the given endpoint.

tooluniverse.embedding_database.register_tool(tool_type_name=None, config=None)[source][source]

Decorator to automatically register tool classes and their configs.

Usage:

@register_tool(‘CustomToolName’, config={…}) class MyTool:

pass

tooluniverse.embedding_database.get_logger(name: str | None = None) Logger[source][source]

Get a logger instance

Parameters:

name (str, optional) – Logger name (usually __name__)

Returns:

Logger instance

Return type:

logging.Logger

class tooluniverse.embedding_database.EmbeddingDatabase(tool_config)[source][source]

Bases: BaseTool

Unified embedding database tool supporting multiple operations: - create_from_docs: Create new database from documents - add_docs: Add documents to existing database - search: Search for similar documents - load_database: Load existing database from path

__init__(tool_config)[source][source]
run(arguments)[source][source]

Main entry point for the tool