arjuna.tpi.parser.html module¶
Classes to assist in HTML Parsing.
- class arjuna.tpi.parser.html.Html¶
Bases:
object
Helper class to create HtmlNode objects.
- classmethod from_file(file_path: str, partial=False) arjuna.tpi.parser.html.HtmlNode ¶
Creates an HtmlNode from file.
- Parameters
file_path – Absolute path of the json file.
- Keyword Arguments
partial – If True, the provided string is considered as a part of HTML for parsing.
- Returns
Arjuna’s HtmlNode object
- classmethod from_lxml_element(element, clone=False) arjuna.tpi.parser.html.HtmlNode ¶
Create an HtmlNode from an lxml element.
- Parameters
element – lxml element
- classmethod from_str(html_str, partial=False) arjuna.tpi.parser.html.HtmlNode ¶
Create an HtmlNode from a string.
- Keyword Arguments
partial – If True, the provided string is considered as a part of HTML for parsing.
- class arjuna.tpi.parser.html.HtmlNode(node)¶
Bases:
arjuna.tpi.parser.xml.XmlNode
Represents a single node in a parsed HTML.
- Parameters
node – lxml Element object.
- clone() arjuna.tpi.parser.html.HtmlNode ¶
Create a clone of this HtmlNode object.
- property inner_html: str¶
Unaltered inner HTML of this node.
- property normalized_inner_html: str¶
Normalized inner XML of this node, with empty lines removed between children nodes.