Parsing Text, JSON, YAML, XML, HTML Files and Strings

Introduction

Text, JSON, YAML, XML and HTML are widely used and handled formats in test automation.

Accordingly, parsing these formats is a very common need that test automation engineers need to cater to.

Arjuna provides its own objects to easy handle these content types in its helper classes in Tester Programming Interface. The corresponding objects are also returned by its other objects.

Text

Arjuna’s Text class provides with various factory methods to easily create a Text file object to read content in various formats:

Following sections show the usage.

Reading Text File in One Go

Reading complete content of a text file is pretty simple:

content = Text.file_content('/some/file/path/abc.text')

Reading Text File Line by Line

Quite often you deal with reading of a text file line by line rather than as a text blob:

file = Text.file_lines('/some/file/path/abc.text')
for line in file: # line is a Python **str** object.
    # Do something about the line
    print(line)
file.close()

What Are Delimited Text Files?

Delimited files are in widespread use in test automation.

These files contain line-wise content where different parts of a line are separated by a delimiter. For example:

Tab Delimited File

CSV File (Comma as the delimiter/separator)

In the above examples, note that the first line is a header line which tells what each corresponding part of a line contains in subsequent lines.

The delimited files can also be created without the header line. For example:

Although the above is not suggested, however at times you consume files from an external source as such and do not have much of an option.

Arjuna provides features to handle all of the above situations.

Reading Delimited Text File with Header

Consider the following tab-delimited file (let’s name it abc.txt):

To read the above file, you can use the following Python code:

file = Text.delimited_file('/some/file/path/abc.text')
for line in file: # line is a Python **dict** object e.g. {'Left' : '1', 'Right': 2, 'Sum' : 3}
    # Do something about the line
    print(line)
file.close()

Tab is the default delimiter. If any other delimiter is used, then it needs to be specified by passing the delimiter argument.

For example, consider the following CSV file (let’s call it abc.csv):

To read the above file, you can use the following Python code:

file = Text.delimited_file('/some/file/path/abc.text', delimiter=',')
for line in file: # line is a Python **dict** object e.g. {'Left' : '1', 'Right': 2, 'Sum' : 3}
    # Do something about the line
    print(line)
file.close()

Reading Delimited Text File WITHOUT Header

If the input file is without header line, you need to specify the same by passing header_line_present as False. The line is returned as a Python tuple object in this case instead of a dictionary object.

Consider the following tab-delimited file without header line (let’s name it abc.txt):

To read the above file, you can use the following Python code:

file = Text.delimited_file('/some/file/path/abc.text', header_line_present=False)
for line in file: # line is a Python **tuple** object e.g. (1,2,3)
    # Do something about the line
    print(line)
file.close()

JSON (Javascript Object Notation)

Json is a popular format used in RESTful services and configurations.

Creating JSON Objects

Arjuna’s Json class provides with various helper methods to easily create a Json object from various sources:

  • from_file: Load Json from a file.
  • from_str: Load Json from a string.
  • from_map: Load Json from a mapping type object.
  • from_iter: Load Json from an iterable.
  • from_object: Load Json from a Python built-in data type object.
The loaded object is returned as one of the following:
  • JsonDict
  • JsonList
  • If allow_any is set to True, then from_file, from_str and from_object calls return the same object as passed, if it is not a mapping or iterable.

Json Class Assertions

Json class provides the following assertions:

  • assert_list_type: Validate that the object is a JsonList or Python list
  • assert_dict_type: Validate that the object is a JsonDict or Python dict

Automatic Json Schema Extraction

Given a Json object, you can extract its schema automatically:

Json.extract_schema(jsonobject_or_str)

This schema can be used for schema validation for another Json object.

JsonDict Object

JsonDict encapsulates the Json dictionary and provides higher level methods for interaction.

It has the following properties:
  • raw_object: The underlying dictionary
  • size: Number of keys in the JsonDict
  • schema: The Json schema of this JsonDict (as a JsonSchema object)

Finding Json elements in a JsonDict Object

You can find Json elements in JsonDict by using a key name or by creating a more involved JsonPath query.

  • find: Find first match using a key or JsonPath
  • findall Find all matches using a JsonPath

Matching Schema of a JsonDict object

You can use a custom Json schema dictionary or a JsonSchema object to validate schema of a JsonDict object.

json_dict.matches_schema(schema)

It returns True/False depending on the match success.

Asserting JsonDict Object

JsonDict object provides various assertions to validate its contents:

  • assert_contents: Validate arbitary key-value pairs in its root.
  • assert_keys_present: Validate arbitrary keys
  • assert_match: Assert if it matches another Python dict or JsonDict.
  • assert_schema Assert if it matches provided schema dict or JsonSchema.
  • assert_match_schema Assert if it has the same schema as that of the provided dict or JsonDict.

JsonList Object

JsonList encapsulates the Json list and provides higher level methods for interaction.

It has the following properties:
  • raw_object: The underlying dictionary
  • size: Number of keys in the JsonList

== Operator with JsonDict and JsonList Objects

== operator is overridden for JsonDict and JsonList objects.

JsonDict supports comparison with a JsonDict or Python dict.

JsonList supports comparision with a JsonList or Python list.

json_dict_1 == json_dict_2
json_dict_1 == py_dict

json_list_1 == json_list_2
json_list_1 == py_list

Modifying a JsonSchema object

JsonSchema object is primarily targeted to be created using auto-extraction using Json.extract_schema.

You can currently make two modifications to the JsonSchema once created:

  • mark_optional: Mark arbitrary keys as optional in the root of the schema.
  • allow_null: Allow null value for the arbitrary keys.

YAML

YAML is a popular format used in configurations. It is also the default format for Arjuna configuration and definition files.

Creating YAML Objects

Arjuna’s Json class provides with various helper methods to easily create a YAML object from various sources:

  • from_file: Load YAML from a file.
  • from_str: Load YAML from a string.
  • from_object: Load YAML from a Python built-in data type object.
The loaded object is returned as one of the following:
  • YamlDict
  • YamlList
  • If allow_any is set to True, then from_file, from_str and from_object calls return the same object as passed, if it is not a mapping or iterable.

YamlDict Object

YamlDict encapsulates the YAML dictionary and provides higher level methods for interaction.

It has the following properties:
  • raw_object: The underlying dictionary
  • size: Number of keys in the YamlDict

YamlList Object

YamlList encapsulates the YAML list and provides higher level methods for interaction.

It has the following properties:
  • raw_object: The underlying dictionary
  • size: Number of keys in the JsonList

== Operator with YamlDict and YamlList Objects

== operator is overridden for YamlDict and YamlList objects.

YamlDict supports comparison with a YamlDict or Python dict.

YamlList supports comparision with a YamlList or Python list.

yaml_dict_1 == yaml_dict_2
yaml_dict_1 == py_dict

yaml_list_1 == yaml_list_2
yaml_list_1 == py_list

Using !join construct

Arjuna provides !join construct to easily construct strings by concatenating the provided list. For example:

root: &BASE /path/to/root
patha: !join [*BASE, a]
pathb: !join [*BASE, b]

Once loaded this YAML is equivalent to the following Python dictionary:

{
    'root': '/path/to/root',
    'pathaa': '/path/to/roota',
    'pathb': '/path/to/rootb'
}

XML

XML is another popular format used for data exchange.

Creating an XmlNode Object

A loaded full Xml or a part of it is represented using an XmlNode object.

Arjuna’s Xml class provides various helper methods to easily create an XmlNode object from various sources:

  • from_file: Load XmlNode from a file.
  • from_str: Load XmlNode from a string.
  • from_lxml_element: From an lxml element.

The loaded object is returned as an XmlNode.

Inquiring an XmlNode Object

XmlNode object provides the following properties for inquiry:

  • node: The underlying lxml element.
  • text: Unaltered text content. Text of all children is clubbed.
  • normalized_text: Text of this node with empty lines removed and individual lines trimmed.
  • texts: Texts returned as a sequence.
  • inner_xml: Xml of children.
  • normalized_inner_xml: Normalized inner XML of this node, with empty lines removed between children nodes.
  • source: String representation of this node’s XML.
  • normalized_source: String representation of this node with all new lines removed and more than one conseuctive space converted to a single space.
  • tag: Tag name
  • chidlren: All Children of this node as a Tuple of XmlNodes
  • parent: Parent XmlNode
  • preceding_sibling: The XmlNode before this node at same hierarchial level.
  • following_sibling: The XmlNode after this node at same hierarchial level.
  • attrs: All attributes as a mapping.
  • value: Content of value attribute.
Following inquiry methods are available:
  • attr: Get value of an attribute by name.
  • has_attr: Check presence of an attribute.

Cloning an XmlNode object

You can clone an XmlNode by calling its clone method.

Finding XmlNodes in an XmlNode Object using XPath

You can find XmlNodes in a given XmlNode object using XPath:

  • find_with_xpath: Find first match using XPath
  • findall_with_xpath Find all matches using XPath

Finding XmlNodes in an XmlNode Object using XML.node_locator

Arjuna’s NodeLocator object helps you in easily defining locating criteria.

# XmlNode with tag input
locator = Xml.node_locator(tags='input')

# XmlNode with attr 'a' with value 1
locator = Xml.node_locator(a=1)

# XmlNode with tag input and attr 'a' with value 1
locator = Xml.node_locator(tags='input', a=1)

Note

‘tags’ can be provided as:

  • A string containing a single tag
  • A string containing multiple tags
  • A list/tuple containing multiple tags.

When multiple tags are provided, they are treated as a sequential descendant tags.

# XmlNode with tag input and attr 'a' with value 1
locator = Xml.node_locator(tags='form input', a=1)
locator = Xml.node_locator(tags=('form', 'input'), a=1)

You can search for all XMlNodes using this locator in an XmlNode:

locator.search_node(node=some_xml_node)

For finer control, you can use finder methods in XmlNode object itself and provide the locator:

  • find: Find first match using XPath
  • findall Find all matches using XPath
node.findall(locator)

# Returns None if not found
node.find(locator)

# Raise Exception if not found
node.find(locator, strict=True)

Providing Alternative NodeLocators (OR Relationship)

In some situations, you might want to find XmlNode(s) which match any of the provided locators.

You can provide any number of locators in XmlNode finder methods.

node.find(locator1, locator2, locator3)
node.findall(locator1, locator2, locator3)

Exiting XmlNode.findall on First Matched Locator

You can stop findall logic at first matched locator by setting stop_when_matched to True:

node.findall(locator1, locator2, locator3, stop_when_matched=True)

HTML

In Web UI automation and HTTP Automation, extracting data from and matching data are common needs.

Creating an HtmlNode Object

A loaded full HTML or a part of it is represented using an HtmlNode object.

Arjuna’s Html class provides various helper methods to easily create an HtmlNode object from various sources:

  • from_file: Load HtmlNode from a file.
  • from_str: Load HtmlNode from a string.
  • from_lxml_element: Load HtmlNode from an lxml element.

Arjuna uses BeautifulSoup based lxml parser to fix broken HTML while loading.

Loading Partial HTML

While using from_file or from_file methods of Html object, you can load pass partial HTML content to be loaded as an HtmlNode

For this provide partial=True as the keyword argument.

node = Html.from_str(partial_html_str, partial=True)

An HtmlNode is an XmlNode

As the HtmlNode inherits from XmlNode, it supports all properties, methods and flexbilities that are discussed above for XmlNode object.

Additionally, it has the following properties:

  • inner_html: HTML of children.
  • normalized_inner_html: Normalized inner HTML of this node, with empty lines removed between children nodes.