tooluniverse.embedding_database module¶
Embedding Database Tool for ToolUniverse
A unified tool for managing embedding databases with FAISS vector search and SQLite metadata storage. Supports creating databases from documents, adding documents, searching, and loading existing databases. Uses OpenAI’s embedding models for text-to-vector conversion, with support for Azure OpenAI.
- class tooluniverse.embedding_database.Path(*args, **kwargs)[source][source]¶
Bases:
PurePath
PurePath subclass that can make system calls.
Path represents a filesystem path but unlike PurePath, also offers methods to do system calls on path objects. Depending on your system, instantiating a Path will return either a PosixPath or a WindowsPath object. You can also instantiate a PosixPath or WindowsPath directly, but cannot instantiate a WindowsPath on a POSIX system or vice versa.
- classmethod cwd()[source][source]¶
Return a new path pointing to the current working directory (as returned by os.getcwd()).
- classmethod home()[source][source]¶
Return a new path pointing to the user’s home directory (as returned by os.path.expanduser(‘~’)).
- samefile(other_path)[source][source]¶
Return whether other_path is the same or not as this file (as returned by os.path.samefile()).
- iterdir()[source][source]¶
Iterate over the files in this directory. Does not yield any result for the special paths ‘.’ and ‘..’.
- glob(pattern)[source][source]¶
Iterate over this subtree and yield all existing files (of any kind, including directories) matching the given relative pattern.
- rglob(pattern)[source][source]¶
Recursively yield all existing files (of any kind, including directories) matching the given relative pattern, anywhere in this subtree.
- absolute()[source][source]¶
Return an absolute version of this path. This function works even if the path doesn’t point to anything.
No normalization is done, i.e. all ‘.’ and ‘..’ will be kept along. Use resolve() to get the canonical path to a file.
- resolve(strict=False)[source][source]¶
Make the path absolute, resolving all symlinks on the way and also normalizing it (for example turning slashes into backslashes under Windows).
- stat(*, follow_symlinks=True)[source][source]¶
Return the result of the stat() system call on this path, like os.stat() does.
- open(mode='r', buffering=-1, encoding=None, errors=None, newline=None)[source][source]¶
Open the file pointed by this path and return a file object, as the built-in open() function does.
- read_text(encoding=None, errors=None)[source][source]¶
Open the file in text mode, read it, and close the file.
- write_text(data, encoding=None, errors=None, newline=None)[source][source]¶
Open the file in text mode, write to it, and close the file.
- touch(mode=438, exist_ok=True)[source][source]¶
Create this file with the given access mode, if it doesn’t exist.
- mkdir(mode=511, parents=False, exist_ok=False)[source][source]¶
Create a new directory at this given path.
- chmod(mode, *, follow_symlinks=True)[source][source]¶
Change the permissions of the path, like os.chmod().
- lchmod(mode)[source][source]¶
Like chmod(), except if the path points to a symlink, the symlink’s permissions are changed, rather than its target’s.
- unlink(missing_ok=False)[source][source]¶
Remove this file or link. If the path is a directory, use rmdir() instead.
- lstat()[source][source]¶
Like stat(), except if the path points to a symlink, the symlink’s status information is returned, rather than its target’s.
- rename(target)[source][source]¶
Rename this path to the target path.
The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.
Returns the new Path instance pointing to the target path.
- replace(target)[source][source]¶
Rename this path to the target path, overwriting if that path exists.
The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, not the directory of the Path object.
Returns the new Path instance pointing to the target path.
- symlink_to(target, target_is_directory=False)[source][source]¶
Make this path a symlink pointing to the target path. Note the order of arguments (link, target) is the reverse of os.symlink.
- hardlink_to(target)[source][source]¶
Make this path a hard link pointing to the same file as target.
Note the order of arguments (self, target) is the reverse of os.link’s.
- link_to(target)[source][source]¶
Make the target path a hard link pointing to this path.
Note this function does not make this path a hard link to target, despite the implication of the function and argument names. The order of arguments (target, link) is the reverse of Path.symlink_to, but matches that of os.link.
Deprecated since Python 3.10 and scheduled for removal in Python 3.12. Use
hardlink_to()
instead.
- class tooluniverse.embedding_database.OpenAI(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, base_url: str | URL | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = 2, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False)[source]¶
Bases:
SyncAPIClient
- __init__(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, base_url: str | URL | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = 2, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False) None [source][source]¶
Construct a new synchronous OpenAI client instance.
This automatically infers the following arguments from their corresponding environment variables if they are not provided: -
api_key
fromOPENAI_API_KEY
-organization
fromOPENAI_ORG_ID
-project
fromOPENAI_PROJECT_ID
-webhook_secret
fromOPENAI_WEBHOOK_SECRET
- copy(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, http_client: Client | None = None, max_retries: int | NotGiven = NOT_GIVEN, default_headers: Mapping[str, str] | None = None, set_default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, set_default_query: Mapping[str, object] | None = None, _extra_kwargs: Mapping[str, Any] = {}) Self [source][source]¶
Create a new client instance re-using the same options given to the current client with optional overriding.
- with_options(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, http_client: Client | None = None, max_retries: int | NotGiven = NOT_GIVEN, default_headers: Mapping[str, str] | None = None, set_default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, set_default_query: Mapping[str, object] | None = None, _extra_kwargs: Mapping[str, Any] = {}) Self [source]¶
Create a new client instance re-using the same options given to the current client with optional overriding.
- class tooluniverse.embedding_database.AzureOpenAI(*, azure_endpoint: str, azure_deployment: str | None = None, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False)[source][source]¶
- class tooluniverse.embedding_database.AzureOpenAI(*, azure_deployment: str | None = None, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False)
- class tooluniverse.embedding_database.AzureOpenAI(*, base_url: str, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False)
Bases:
BaseAzureClient
[Client
,Stream
[Any
]],OpenAI
- __init__(*, azure_endpoint: str, azure_deployment: str | None = None, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False) None [source][source]¶
- __init__(*, azure_deployment: str | None = None, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False) None
- __init__(*, base_url: str, api_version: str | None = None, api_key: str | Callable[[], str] | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, organization: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, max_retries: int = DEFAULT_MAX_RETRIES, default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, http_client: Client | None = None, _strict_response_validation: bool = False) None
Construct a new synchronous azure openai client instance.
This automatically infers the following arguments from their corresponding environment variables if they are not provided: -
api_key
fromAZURE_OPENAI_API_KEY
-organization
fromOPENAI_ORG_ID
-project
fromOPENAI_PROJECT_ID
-azure_ad_token
fromAZURE_OPENAI_AD_TOKEN
-api_version
fromOPENAI_API_VERSION
-azure_endpoint
fromAZURE_OPENAI_ENDPOINT
- Parameters:
azure_endpoint – Your Azure endpoint, including the resource, e.g.
https://example-resource.azure.openai.com/
azure_ad_token – Your Azure Active Directory token, https://www.microsoft.com/en-us/security/business/identity-access/microsoft-entra-id
azure_ad_token_provider – A function that returns an Azure Active Directory token, will be invoked on every request.
azure_deployment – A model deployment, if given with
azure_endpoint
, sets the base client URL to include/deployments/{azure_deployment}
. Not supported with Assistants APIs.
- copy(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, api_version: str | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, http_client: Client | None = None, max_retries: int | NotGiven = NOT_GIVEN, default_headers: Mapping[str, str] | None = None, set_default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, set_default_query: Mapping[str, object] | None = None, _extra_kwargs: Mapping[str, Any] = {}) Self [source][source]¶
Create a new client instance re-using the same options given to the current client with optional overriding.
- with_options(*, api_key: str | None | Callable[[], str] = None, organization: str | None = None, project: str | None = None, webhook_secret: str | None = None, websocket_base_url: str | URL | None = None, api_version: str | None = None, azure_ad_token: str | None = None, azure_ad_token_provider: Callable[[], str] | None = None, base_url: str | URL | None = None, timeout: float | Timeout | None | NotGiven = NOT_GIVEN, http_client: Client | None = None, max_retries: int | NotGiven = NOT_GIVEN, default_headers: Mapping[str, str] | None = None, set_default_headers: Mapping[str, str] | None = None, default_query: Mapping[str, object] | None = None, set_default_query: Mapping[str, object] | None = None, _extra_kwargs: Mapping[str, Any] = {}) Self [source]¶
Create a new client instance re-using the same options given to the current client with optional overriding.
- class tooluniverse.embedding_database.BaseTool(tool_config)[source][source]¶
Bases:
object
- classmethod get_default_config_file()[source][source]¶
Get the path to the default configuration file for this tool type.
This method uses a robust path resolution strategy that works across different installation scenarios:
Installed packages: Uses importlib.resources for proper package resource access
Development mode: Falls back to file-based path resolution
Legacy Python: Handles importlib.resources and importlib_resources
Override this method in subclasses to specify a custom defaults file.
- Returns:
Path or resource object pointing to the defaults file
- tooluniverse.embedding_database.register_tool(tool_type_name=None, config=None)[source][source]¶
Decorator to automatically register tool classes and their configs.
- Usage:
@register_tool(‘CustomToolName’, config={…}) class MyTool:
pass
- tooluniverse.embedding_database.get_logger(name: str | None = None) Logger [source][source]¶
Get a logger instance
- Parameters:
name (str, optional) – Logger name (usually __name__)
- Returns:
Logger instance
- Return type:
- class tooluniverse.embedding_database.EmbeddingDatabase(tool_config)[source][source]¶
Bases:
BaseTool
Unified embedding database tool supporting multiple operations: - create_from_docs: Create new database from documents - add_docs: Add documents to existing database - search: Search for similar documents - load_database: Load existing database from path