tooluniverse.odphp_tool module¶
- class tooluniverse.odphp_tool.BaseTool(tool_config)[source][source]¶
Bases:
object
- classmethod get_default_config_file()[source][source]¶
Get the path to the default configuration file for this tool type.
This method uses a robust path resolution strategy that works across different installation scenarios:
Installed packages: Uses importlib.resources for proper package resource access
Development mode: Falls back to file-based path resolution
Legacy Python: Handles importlib.resources and importlib_resources
Override this method in subclasses to specify a custom defaults file.
- Returns:
Path or resource object pointing to the defaults file
- tooluniverse.odphp_tool.register_tool(tool_type_name=None, config=None)[source][source]¶
Decorator to automatically register tool classes and their configs.
- Usage:
@register_tool(âCustomToolNameâ, config={âŠ}) class MyTool:
pass
- class tooluniverse.odphp_tool.BeautifulSoup(markup: str | bytes | IO[str] | IO[bytes] = '', features: str | Sequence[str] | None = None, builder: TreeBuilder | Type[TreeBuilder] | None = None, parse_only: SoupStrainer | None = None, from_encoding: str | None = None, exclude_encodings: Iterable[str] | None = None, element_classes: Dict[Type[PageElement], Type[PageElement]] | None = None, **kwargs: Any)[source][source]¶
Bases:
Tag
A data structure representing a parsed HTML or XML document.
Most of the methods youâll call on a BeautifulSoup object are inherited from PageElement or Tag.
Internally, this class defines the basic interface called by the tree builders when converting an HTML/XML document into a data structure. The interface abstracts away the differences between parsers. To write a new tree builder, youâll need to understand these methods as a whole.
- These methods will be called by the BeautifulSoup constructor:
reset()
feed(markup)
- The tree builder may call these methods from its feed() implementation:
handle_starttag(name, attrs) # See note about return value
handle_endtag(name)
handle_data(data) # Appends to the current data node
endData(containerClass) # Ends the current data node
No matter how complicated the underlying parser is, you should be able to build a tree using âstart tagâ events, âend tagâ events, âdataâ events, and âdone with dataâ events.
If you encounter an empty-element tag (aka a self-closing tag, like HTMLâs <br> tag), call handle_starttag and then handle_endtag.
- ROOT_TAG_NAME: str = '[document]'[source]¶
Since
BeautifulSoup
subclassesTag
, itâs possible to treat it as aTag
with aTag.name
. Hoever, this name makes it clear theBeautifulSoup
object isnât a real markup tag.
- DEFAULT_BUILDER_FEATURES: Sequence[str] = ['html', 'fast'][source]¶
If the end-user gives no indication which tree builder they want, look for one with these features.
- ASCII_SPACES: str = ' \n\t\x0c\r'[source]¶
A string containing all ASCII whitespace characters, used in during parsing to detect data chunks that seem âemptyâ.
- original_encoding: str | None[source]¶
Beautiful Soupâs best guess as to the character encoding of the original document.
- declared_html_encoding: str | None[source]¶
The character encoding, if any, that was explicitly defined in the original document. This may or may not match
BeautifulSoup.original_encoding
.
- contains_replacement_characters: bool[source]¶
This is True if the markup that was parsed contains U+FFFD REPLACEMENT_CHARACTER characters which were not present in the original markup. These mark character sequences that could not be represented in Unicode.
- __init__(markup: str | bytes | IO[str] | IO[bytes] = '', features: str | Sequence[str] | None = None, builder: TreeBuilder | Type[TreeBuilder] | None = None, parse_only: SoupStrainer | None = None, from_encoding: str | None = None, exclude_encodings: Iterable[str] | None = None, element_classes: Dict[Type[PageElement], Type[PageElement]] | None = None, **kwargs: Any)[source][source]¶
Constructor.
- Parameters:
markup â A string or a file-like object representing markup to be parsed.
features â Desirable features of the parser to be used. This may be the name of a specific parser (âlxmlâ, âlxml-xmlâ, âhtml.parserâ, or âhtml5libâ) or it may be the type of markup to be used (âhtmlâ, âhtml5â, âxmlâ). Itâs recommended that you name a specific parser, so that Beautiful Soup gives you the same results across platforms and virtual environments.
builder â A TreeBuilder subclass to instantiate (or instance to use) instead of looking one up based on
features
. You only need to use this if youâve implemented a custom TreeBuilder.parse_only â A SoupStrainer. Only parts of the document matching the SoupStrainer will be considered. This is useful when parsing part of a document that would otherwise be too large to fit into memory.
from_encoding â A string indicating the encoding of the document to be parsed. Pass this in if Beautiful Soup is guessing wrongly about the documentâs encoding.
exclude_encodings â A list of strings indicating encodings known to be wrong. Pass this in if you donât know the documentâs encoding but you know Beautiful Soupâs guess is wrong.
element_classes â A dictionary mapping BeautifulSoup classes like Tag and NavigableString, to other classes youâd like to be instantiated instead as the parse tree is built. This is useful for subclassing Tag or NavigableString to modify default behavior.
kwargs â
For backwards compatibility purposes, the constructor accepts certain keyword arguments used in Beautiful Soup 3. None of these arguments do anything in Beautiful Soup 4; they will result in a warning and then be ignored.
Apart from this, any keyword arguments passed into the BeautifulSoup constructor are propagated to the TreeBuilder constructor. This makes it possible to configure a TreeBuilder by passing in arguments, not just by saying which one to use.
- copy_self() BeautifulSoup [source][source]¶
Create a new BeautifulSoup object with the same TreeBuilder, but not associated with any markup.
This is the first step of the deepcopy process.
- reset() None [source][source]¶
Reset this object to a state as though it had never parsed any markup.
- new_tag(name: str, namespace: str | None = None, nsprefix: str | None = None, attrs: Mapping[str | NamespacedAttribute, _RawAttributeValue] | None = None, sourceline: int | None = None, sourcepos: int | None = None, string: str | None = None, **kwattrs: str) Tag [source][source]¶
Create a new Tag associated with this BeautifulSoup object.
- Parameters:
name â The name of the new Tag.
namespace â The URI of the new Tagâs XML namespace, if any.
prefix â The prefix for the new Tagâs XML namespace, if any.
attrs â A dictionary of this Tagâs attribute values; can be used instead of
`kwattrs`
for attributes like âclassâ that are reserved words in Python.sourceline â The line number where this tag was (purportedly) found in its source document.
sourcepos â The character position within
`sourceline`
where this tag was (purportedly) found.string â String content for the new Tag, if any.
kwattrs â Keyword arguments for the new Tagâs attribute values.
- string_container(base_class: Type[NavigableString] | None = None) Type[NavigableString] [source][source]¶
Find the class that should be instantiated to hold a given kind of string.
This may be a built-in Beautiful Soup class or a custom class passed in to the BeautifulSoup constructor.
- new_string(s: str, subclass: Type[NavigableString] | None = None) NavigableString [source][source]¶
Create a new
NavigableString
associated with thisBeautifulSoup
object.- Parameters:
s â The string content of the
NavigableString
subclass â The subclass of
NavigableString
, if any, to use. If a document is being processed, an appropriate subclass for the current location in the document will be determined automatically.
- insert_before(*args: PageElement | str) List[PageElement] [source][source]¶
This method is part of the PageElement API, but
BeautifulSoup
doesnât implement it because there is nothing before or after it in the parse tree.
- insert_after(*args: PageElement | str) List[PageElement] [source][source]¶
This method is part of the PageElement API, but
BeautifulSoup
doesnât implement it because there is nothing before or after it in the parse tree.
- decode(indent_level: int | None = None, eventual_encoding: str = 'utf-8', formatter: Formatter | str = 'minimal', iterator: Iterator[PageElement] | None = None, **kwargs: Any) str [source][source]¶
- Returns a string representation of the parse tree
as a full HTML or XML document.
- Parameters:
indent_level â Each line of the rendering will be indented this many levels. (The
`formatter`
decides what a âlevelâ means, in terms of spaces or other characters output.) This is used internally in recursive calls while pretty-printing.eventual_encoding â The encoding of the final document. If this is None, the document will be a Unicode string.
formatter â Either a
Formatter
object, or a string naming one of the standard formatters.iterator â The iterator to use when navigating over the parse tree. This is only used by
Tag.decode_contents
and you probably wonât need to use it.
- class tooluniverse.odphp_tool.ODPHPRESTTool(tool_config)[source][source]¶
Bases:
BaseTool
Base class for ODPHP (MyHealthfinder) REST API tools.
- class tooluniverse.odphp_tool.ODPHPMyHealthfinder(tool_config)[source][source]¶
Bases:
ODPHPRESTTool
Search for demographic-specific health recommendations (MyHealthfinder).
- class tooluniverse.odphp_tool.ODPHPItemList(tool_config)[source][source]¶
Bases:
ODPHPRESTTool
Retrieve list of topics or categories.
- class tooluniverse.odphp_tool.ODPHPTopicSearch(tool_config)[source][source]¶
Bases:
ODPHPRESTTool
Search for health topics by ID, category, or keyword.
- class tooluniverse.odphp_tool.ODPHPOutlinkFetch(tool_config)[source][source]¶
Bases:
BaseTool
Fetch article pages referenced by AccessibleVersion / RelatedItems.Url and return readable text. - HTML: extracts main/article/body text; strips nav/aside/footer/script/style. - PDF or non-HTML: returns metadata + URL so the agent can surface it.