Parsing Text, JSON, YAML, XML, HTML Files and Strings¶
Introduction¶
Text, JSON, YAML, XML and HTML are widely used and handled formats in test automation.
Accordingly, parsing these formats is a very common need that test automation engineers need to cater to.
Arjuna provides its own objects to easy handle these content types in its helper classes in Tester Programming Interface. The corresponding objects are also returned by its other objects.
Text¶
- Arjuna’s
Text
class provides with various factory methods to easily create a Text file object to read content in various formats: - file_content: Returns content as string.
- file_lines: Returns
TextFileAsLines
object to read file line by line. - delimited_file: Returns
DelimTextFileWithLineAsMap
orDelimTextFileWithLineAsSeq
object to read file line by line, parsed based on delimiter.
Following sections show the usage.
Reading Text File in One Go¶
Reading complete content of a text file is pretty simple:
content = Text.file_content('/some/file/path/abc.text')
Reading Text File Line by Line¶
Quite often you deal with reading of a text file line by line rather than as a text blob:
file = Text.file_lines('/some/file/path/abc.text')
for line in file: # line is a Python **str** object.
# Do something about the line
print(line)
file.close()
What Are Delimited Text Files?¶
Delimited files are in widespread use in test automation.
These files contain line-wise content where different parts of a line are separated by a delimiter. For example:
Tab Delimited File
CSV File (Comma as the delimiter/separator)
In the above examples, note that the first line is a header line which tells what each corresponding part of a line contains in subsequent lines.
The delimited files can also be created without the header line. For example:
Although the above is not suggested, however at times you consume files from an external source as such and do not have much of an option.
Arjuna provides features to handle all of the above situations.
Reading Delimited Text File with Header¶
Consider the following tab-delimited file (let’s name it abc.txt):
To read the above file, you can use the following Python code:
file = Text.delimited_file('/some/file/path/abc.text')
for line in file: # line is a Python **dict** object e.g. {'Left' : '1', 'Right': 2, 'Sum' : 3}
# Do something about the line
print(line)
file.close()
Tab is the default delimiter. If any other delimiter is used, then it needs to be specified by passing the delimiter argument.
For example, consider the following CSV file (let’s call it abc.csv):
To read the above file, you can use the following Python code:
file = Text.delimited_file('/some/file/path/abc.text', delimiter=',')
for line in file: # line is a Python **dict** object e.g. {'Left' : '1', 'Right': 2, 'Sum' : 3}
# Do something about the line
print(line)
file.close()
Reading Delimited Text File WITHOUT Header¶
If the input file is without header line, you need to specify the same by passing header_line_present as False. The line is returned as a Python tuple object in this case instead of a dictionary object.
Consider the following tab-delimited file without header line (let’s name it abc.txt):
To read the above file, you can use the following Python code:
file = Text.delimited_file('/some/file/path/abc.text', header_line_present=False)
for line in file: # line is a Python **tuple** object e.g. (1,2,3)
# Do something about the line
print(line)
file.close()
JSON (Javascript Object Notation)¶
Json is a popular format used in RESTful services and configurations.
Creating JSON Objects¶
Arjuna’s Json
class provides with various helper methods to easily create a Json object from various sources:
- from_file: Load Json from a file.
- from_str: Load Json from a string.
- from_map: Load Json from a mapping type object.
- from_iter: Load Json from an iterable.
- from_object: Load Json from a Python built-in data type object.
Json Class Assertions¶
Json class provides the following assertions:
- assert_list_type: Validate that the object is a JsonList or Python list
- assert_dict_type: Validate that the object is a JsonDict or Python dict
Automatic Json Schema Extraction¶
Given a Json object, you can extract its schema automatically:
Json.extract_schema(jsonobject_or_str)
This schema can be used for schema validation for another Json object.
JsonDict Object¶
JsonDict
encapsulates the Json dictionary and provides higher level methods for interaction.
- It has the following properties:
- raw_object: The underlying dictionary
- size: Number of keys in the JsonDict
- schema: The Json schema of this JsonDict (as a JsonSchema object)
Finding Json elements in a JsonDict Object¶
You can find Json elements in JsonDict by using a key name or by creating a more involved JsonPath query.
- find: Find first match using a key or JsonPath
- findall Find all matches using a JsonPath
Matching Schema of a JsonDict object¶
You can use a custom Json schema dictionary or a JsonSchema
object to validate schema of a JsonDict object.
json_dict.matches_schema(schema)
It returns True/False depending on the match success.
Asserting JsonDict Object¶
JsonDict object provides various assertions to validate its contents:
- assert_contents: Validate arbitary key-value pairs in its root.
- assert_keys_present: Validate arbitrary keys
- assert_match: Assert if it matches another Python dict or JsonDict.
- assert_schema Assert if it matches provided schema dict or JsonSchema.
- assert_match_schema Assert if it has the same schema as that of the provided dict or JsonDict.
JsonList Object¶
JsonList
encapsulates the Json list and provides higher level methods for interaction.
- It has the following properties:
- raw_object: The underlying dictionary
- size: Number of keys in the JsonList
== Operator with JsonDict and JsonList Objects¶
== operator is overridden for JsonDict and JsonList objects.
JsonDict supports comparison with a JsonDict or Python dict.
JsonList supports comparision with a JsonList or Python list.
json_dict_1 == json_dict_2 json_dict_1 == py_dict json_list_1 == json_list_2 json_list_1 == py_list
Modifying a JsonSchema object¶
JsonSchema object is primarily targeted to be created using auto-extraction using Json.extract_schema.
You can currently make two modifications to the JsonSchema once created:
- mark_optional: Mark arbitrary keys as optional in the root of the schema.
- allow_null: Allow null value for the arbitrary keys.
YAML¶
YAML is a popular format used in configurations. It is also the default format for Arjuna configuration and definition files.
Creating YAML Objects¶
Arjuna’s Json
class provides with various helper methods to easily create a YAML object from various sources:
- from_file: Load YAML from a file.
- from_str: Load YAML from a string.
- from_object: Load YAML from a Python built-in data type object.
YamlDict Object¶
YamlDict
encapsulates the YAML dictionary and provides higher level methods for interaction.
- It has the following properties:
- raw_object: The underlying dictionary
- size: Number of keys in the YamlDict
YamlList Object¶
YamlList
encapsulates the YAML list and provides higher level methods for interaction.
- It has the following properties:
- raw_object: The underlying dictionary
- size: Number of keys in the JsonList
== Operator with YamlDict and YamlList Objects¶
== operator is overridden for YamlDict and YamlList objects.
YamlDict supports comparison with a YamlDict or Python dict.
YamlList supports comparision with a YamlList or Python list.
yaml_dict_1 == yaml_dict_2 yaml_dict_1 == py_dict yaml_list_1 == yaml_list_2 yaml_list_1 == py_list
Using !join construct¶
Arjuna provides !join construct to easily construct strings by concatenating the provided list. For example:
root: &BASE /path/to/root patha: !join [*BASE, a] pathb: !join [*BASE, b]
Once loaded this YAML is equivalent to the following Python dictionary:
{ 'root': '/path/to/root', 'pathaa': '/path/to/roota', 'pathb': '/path/to/rootb' }
XML¶
XML is another popular format used for data exchange.
Creating an XmlNode Object¶
A loaded full Xml or a part of it is represented using an XmlNode
object.
Arjuna’s Xml
class provides various helper methods to easily create an XmlNode object from various sources:
- from_file: Load XmlNode from a file.
- from_str: Load XmlNode from a string.
- from_lxml_element: From an lxml element.
The loaded object is returned as an XmlNode.
Inquiring an XmlNode Object¶
XmlNode object provides the following properties for inquiry:
- node: The underlying lxml element.
- text: Unaltered text content. Text of all children is clubbed.
- normalized_text: Text of this node with empty lines removed and individual lines trimmed.
- texts: Texts returned as a sequence.
- inner_xml: Xml of children.
- normalized_inner_xml: Normalized inner XML of this node, with empty lines removed between children nodes.
- source: String representation of this node’s XML.
- normalized_source: String representation of this node with all new lines removed and more than one conseuctive space converted to a single space.
- tag: Tag name
- chidlren: All Children of this node as a Tuple of XmlNodes
- parent: Parent XmlNode
- preceding_sibling: The XmlNode before this node at same hierarchial level.
- following_sibling: The XmlNode after this node at same hierarchial level.
- attrs: All attributes as a mapping.
- value: Content of value attribute.
- Following inquiry methods are available:
- attr: Get value of an attribute by name.
- has_attr: Check presence of an attribute.
Cloning an XmlNode object¶
You can clone an XmlNode by calling its clone method.
Finding XmlNodes in an XmlNode Object using XPath¶
You can find XmlNodes in a given XmlNode object using XPath:
- find_with_xpath: Find first match using XPath
- findall_with_xpath Find all matches using XPath
Finding XmlNodes in an XmlNode Object using XML.node_locator¶
Arjuna’s NodeLocator object helps you in easily defining locating criteria.
# XmlNode with tag input locator = Xml.node_locator(tags='input') # XmlNode with attr 'a' with value 1 locator = Xml.node_locator(a=1) # XmlNode with tag input and attr 'a' with value 1 locator = Xml.node_locator(tags='input', a=1)
Note
‘tags’ can be provided as:
- A string containing a single tag
- A string containing multiple tags
- A list/tuple containing multiple tags.
When multiple tags are provided, they are treated as a sequential descendant tags.
# XmlNode with tag input and attr 'a' with value 1
locator = Xml.node_locator(tags='form input', a=1)
locator = Xml.node_locator(tags=('form', 'input'), a=1)
You can search for all XMlNodes using this locator in an XmlNode:
locator.search_node(node=some_xml_node)
For finer control, you can use finder methods in XmlNode object itself and provide the locator:
- find: Find first match using XPath
- findall Find all matches using XPath
node.findall(locator) # Returns None if not found node.find(locator) # Raise Exception if not found node.find(locator, strict=True)
Providing Alternative NodeLocators (OR Relationship)¶
In some situations, you might want to find XmlNode(s) which match any of the provided locators.
You can provide any number of locators in XmlNode finder methods.
node.find(locator1, locator2, locator3) node.findall(locator1, locator2, locator3)
Exiting XmlNode.findall on First Matched Locator¶
You can stop findall logic at first matched locator by setting stop_when_matched to True:
node.findall(locator1, locator2, locator3, stop_when_matched=True)
HTML¶
In Web UI automation and HTTP Automation, extracting data from and matching data are common needs.
Creating an HtmlNode Object¶
A loaded full HTML or a part of it is represented using an HtmlNode
object.
Arjuna’s Html
class provides various helper methods to easily create an HtmlNode object from various sources:
- from_file: Load HtmlNode from a file.
- from_str: Load HtmlNode from a string.
- from_lxml_element: Load HtmlNode from an lxml element.
Arjuna uses BeautifulSoup based lxml parser to fix broken HTML while loading.
Loading Partial HTML¶
While using from_file or from_file methods of Html object, you can load pass partial HTML content to be loaded as an HtmlNode
For this provide partial=True as the keyword argument.
node = Html.from_str(partial_html_str, partial=True)
An HtmlNode is an XmlNode¶
As the HtmlNode inherits from XmlNode, it supports all properties, methods and flexbilities that are discussed above for XmlNode object.
Additionally, it has the following properties:
- inner_html: HTML of children.
- normalized_inner_html: Normalized inner HTML of this node, with empty lines removed between children nodes.