arjuna.tpi.parser.html module¶
Classes to assist in HTML Parsing.
-
class
arjuna.tpi.parser.html.
Html
¶ Bases:
object
Helper class to create HtmlNode objects.
-
classmethod
from_file
(file_path: str, partial=False) → arjuna.tpi.parser.html.HtmlNode¶ Creates an HtmlNode from file.
Parameters: file_path – Absolute path of the json file. Keyword Arguments: partial – If True, the provided string is considered as a part of HTML for parsing. Returns: Arjuna’s HtmlNode object
-
classmethod
from_lxml_element
(element, clone=False) → arjuna.tpi.parser.html.HtmlNode¶ Create an HtmlNode from an lxml element.
Parameters: element – lxml element
-
classmethod
from_str
(html_str, partial=False) → arjuna.tpi.parser.html.HtmlNode¶ Create an HtmlNode from a string.
Keyword Arguments: partial – If True, the provided string is considered as a part of HTML for parsing.
-
classmethod
-
class
arjuna.tpi.parser.html.
HtmlNode
(node)¶ Bases:
arjuna.tpi.parser.xml.XmlNode
Represents a single node in a parsed HTML.
Parameters: node – lxml Element object. -
clone
() → arjuna.tpi.parser.html.HtmlNode¶ Create a clone of this HtmlNode object.
-
inner_html
¶ Unaltered inner HTML of this node.
-
normalized_inner_html
¶ Normalized inner XML of this node, with empty lines removed between children nodes.
-