arjuna.tpi.parser.html module

Classes to assist in HTML Parsing.

class arjuna.tpi.parser.html.Html

Bases: object

Helper class to create HtmlNode objects.

classmethod from_file(file_path: str, partial=False) arjuna.tpi.parser.html.HtmlNode

Creates an HtmlNode from file.

Parameters

file_path – Absolute path of the json file.

Keyword Arguments

partial – If True, the provided string is considered as a part of HTML for parsing.

Returns

Arjuna’s HtmlNode object

classmethod from_lxml_element(element, clone=False) arjuna.tpi.parser.html.HtmlNode

Create an HtmlNode from an lxml element.

Parameters

elementlxml element

classmethod from_str(html_str, partial=False) arjuna.tpi.parser.html.HtmlNode

Create an HtmlNode from a string.

Keyword Arguments

partial – If True, the provided string is considered as a part of HTML for parsing.

class arjuna.tpi.parser.html.HtmlNode(node)

Bases: arjuna.tpi.parser.xml.XmlNode

Represents a single node in a parsed HTML.

Parameters

nodelxml Element object.

clone() arjuna.tpi.parser.html.HtmlNode

Create a clone of this HtmlNode object.

property inner_html: str

Unaltered inner HTML of this node.

property normalized_inner_html: str

Normalized inner XML of this node, with empty lines removed between children nodes.