d1_test package¶

DataONE Test Utilities.

The DataONE Test Utilities package contains various utilities for testing DataONE infrastructure components and clients. These include:

Instance Generator: Used for creating randomized System Metadata documents

Stress Tester: Used for stress testing of Member Node implementations. The stress_tester creates a configurable number of concurrent connections to a Member Node and populates the MN with randomly generated objects while running queries and object retrievals.

Utilities: Misc test utilities.

Although this directory is not a package, this __init__.py file is required for pytest to be able to reach test directories below this directory.

Subpackages¶

Submodules¶

d1_test.d1_test_case module¶

d1_test.pycharm module¶

d1_test.sample module¶

d1_test.slender_node_test_client module¶

class d1_test.slender_node_test_client.SlenderNodeTestClient(sciobj_store_path='./sciobj_store', keep_existing=False, *_args, **_kwargs)¶

Bases: object

A simple drop-in replacement for a MN client, for use when developing and testing SlenderNode scripts.

MN is simulated to the bare minimum required by SN scripts
Objects are stored in local files instead of on an MN
SID to PID dict is held in memory and dumped to file
Most args are simply ignored

__init__(sciobj_store_path='./sciobj_store', keep_existing=False, *_args, **_kwargs)¶

Create the test client.

Store the sciobj and sysmeta in sciobj_store_path
sciobj_store_path is created if it does not exist
If delete_existing is True, delete any existing files in sciobj_store_path

create(pid, sciobj_file, sysmeta_pyxb, *args, **kwargs)¶

update(old_pid, sciobj_file, new_pid, new_sysmeta_pyxb, *args, **kwargs)¶

get(did, *args, **kwargs)¶: Return a file-like object with the sciobj bytes.

getSystemMetadata(did, *args, **kwargs)¶: Return sysmeta_pyxb.

d1_test.test_files module¶

Utilities for loading test files.

d1_test.test_files.get_abs_test_file_path(rel_path)¶

d1_test.test_files.load_bin(rel_path)¶

d1_test.test_files.load_utf8_to_str(rel_path)¶: Load file, decode from UTF-8 and return as str.

d1_test.test_files.load_xml_to_pyxb(filename)¶

d1_test.test_files.load_xml_to_str(filename)¶

d1_test.test_files.load_xml_to_bytes(filename)¶

d1_test.test_files.load_json(filename)¶

d1_test.test_files.load_cert(filename)¶

d1_test.test_files.load_jwt(filename)¶

d1_test.test_files.save(obj_str, rel_path, encoding='utf-8')¶

d1_test.xml_normalize module¶

Generate a str that contains a normalized representation of an XML document.

For unit testing, we want to be able to store and compare samples representing XML documents that are guaranteed to be stable.

Often, XML docs have various sections containing unordered sets of elements where there are no semantics associated with the order in which they appear in the doc. The same is true for element attributes. For DataONE, typical examples are lists of subjects, permission rules and services.

Since the source for such elements are often dict and set based containers that themselves don’t provide deterministic ordering, serializing a group of such objects can generate a large number of possible XML docs that, while semantically identical, cannot be directly compared as text or in the DOM.

Normalizing the formatting can be done with a single deserialize to DOM and back to XML, but that will not normalize the ordering of the elements, Without a schema, automated tools cannot rearrange elements in an XML doc, since it is not known if the order is significant. However, for generating and comparing XML doc samples, a stable document that contains all the information from the XML doc is sufficient.

The strategy for generating a stable representation of an XML doc is as follows:

All sibling XML elements must be sorted regardless of where they are in the tree.
Each element is the root of a branch of the node tree. Sorting, of course, is based on comparing individual elements in order to determine their relative orderings. If the information in the elements themselves is identical, it is necessary to break the tie by recursively comparing their descendants until either a difference is found, or the two elements are determined to be the roots of two identical branches.
To enable the sort algorithm to compare the branches, sort keys that hold all information in the branch are generated and passed to the sort. For comparisons to properly compare elements in the most to least significant order, each node in the branch must be in a single list item. So the key is a nested list of lists.
Finally, since the sort keys are generated from the descendants, siblings in a given element can only be sorted after all their descendants in the tree have been sorted. So the tree must be traversed depth first, and the sort performed as the algorithm is stepping up from a completed level.
To avoid having to build a new tree depth first, inline sort is used.

Notes

# RDF-XML

Although the hierarchical structure of elements is almost always significant in XML, there are instances where semantically identical XML docs can have different hierarchies. This often occurs when generating RDF-XML docs from RDF.

This module only normalizes the ordering of sibling elements and attributes. Parent-child relationships are never changed. So RDF-XML docs generated in such a way that parent-child relationships may differ without change in semantics are not supported.

## Background

RDF is an unordered set of subject-predicate-object triples. Triples cannot share values, so when there are multiple triples for a subject, each triple must contain a copy of the subject.

RDF-XML supports expressing triples with less redundancy by factoring shared values out to parent elements. E.g., a set of triples for a subject can be expressed as a series of predicate-object children with a single subject parent.

When generating RDF-XML from RDF that contains many triples that share values, the same set of triples can be represented by many different hierarchies. The hierarchy that is actually generated depends on the algorithm and may also depend on the order in which the triples are processed. If the triples are retrieved from an unordered set, the processing order is pseudo-random, causing pseudo-random variations in the generated hierarchy.

d1_test.xml_normalize.get_normalized_xml_representation(xml)¶: Return a str that contains a normalized representation of an XML document.

d1_test.xml_normalize.xml_to_stabletree(xml)¶: Return a StableTree that contains a normalized representation of an XML document.

d1_test.xml_normalize.etree_to_stable_tree(et_node)¶

Convert an ElementTree to a StableTree.

Node attributes become @key:string - Text elements become @text:string - name is the name of the xml element

class d1_test.xml_normalize.StableNode(name, child_node=None)¶

Bases: object

Tree structure that uses lists instead of dicts, as lists have deterministic ordering.

__init__(name, child_node=None)¶: child is E or str.

add_child(e)¶

get_str(s, indent)¶

get_sort_key_()¶

sort(p=None)¶

d1_test.xml_normalize.StableTree¶: alias of d1_test.xml_normalize.StableNode

d1_test package¶

Subpackages¶

Submodules¶

d1_test.d1_test_case module¶

d1_test.pycharm module¶

d1_test.sample module¶

d1_test.slender_node_test_client module¶

d1_test.test_files module¶

d1_test.xml_normalize module¶

Table of Contents

Previous topic

Next topic

This Page