arjuna.tpi.parser.html module

Classes to assist in HTML Parsing.

class arjuna.tpi.parser.html.Html

Bases: object

Helper class to create HtmlNode objects.

classmethod from_file(file_path: str, partial=False) → arjuna.tpi.parser.html.HtmlNode

Creates an HtmlNode from file.

Parameters:file_path – Absolute path of the json file.
Keyword Arguments:
 partial – If True, the provided string is considered as a part of HTML for parsing.
Returns:Arjuna’s HtmlNode object
classmethod from_lxml_element(element, clone=False) → arjuna.tpi.parser.html.HtmlNode

Create an HtmlNode from an lxml element.

Parameters:elementlxml element
classmethod from_str(html_str, partial=False) → arjuna.tpi.parser.html.HtmlNode

Create an HtmlNode from a string.

Keyword Arguments:
 partial – If True, the provided string is considered as a part of HTML for parsing.
class arjuna.tpi.parser.html.HtmlNode(node)

Bases: arjuna.tpi.parser.xml.XmlNode

Represents a single node in a parsed HTML.

Parameters:nodelxml Element object.
clone() → arjuna.tpi.parser.html.HtmlNode

Create a clone of this HtmlNode object.

inner_html

Unaltered inner HTML of this node.

normalized_inner_html

Normalized inner XML of this node, with empty lines removed between children nodes.