d1_common.wrap package

DataONE API Type wrappers.

Although this directory is not a package, this __init__.py file is required for pytest to be able to reach test directories below this directory.

Submodules

d1_common.wrap.access_policy module

Context manager for working with the DataONE AccessPolicy type.

Examples

Perform multiple operations on an AccessPolicy:

# Wrap a SystemMetadata PyXB object to modify its AccessPolicy section
with d1_common.wrap.access_policy.wrap(sysmeta_pyxb) as ap:

  # Print a list of subjects that have the changePermission access level
  print(ap.get_subjects('changePermission'))

  # Clear any existing rules in the access policy
  ap.clear()

  # Add a new rule
  ap.add_perm('subj1', 'read')

# Exit the context manager scope to write the changes that were made back to the
# wrapped SystemMetadata.

If only a single operation is to be performed, use one of the module level functions:

# Add public public read permission to an AccessPolicy. This adds an allow rule with
# a "read" permission for the symbolic subject, "public". It is a no no-op if any of
# the existing rules already provide "read" or better to "public".
add_public_read(access_pyxb)

Notes

Overview:

Each science object in DataONE has an associated SystemMetadata document in which there is an AccessPolicy element. The AccessPolicy contains rules assigning permissions to subjects. The supported permissions are read, write and changePermission.

write implicitly includes read, and changePermission implicitly includes read and write. So, only a single permission needs to be assigned to a subject in order to determine all permissions for the subject.

There can be multiple rules in a policy and each rule can contain multiple subjects and permissions. So the same subject can be specified multiple times in the same rules or in different rules, each time with a different set of permissions, while permissions also implicitly include lower permissions.

Due to this, the same permissions can be expressed in many different ways. This wrapper hides the variations, exposing a single canonical set of rules that can be read, modified and written. That is, the wrapper allows working with any set of permissions in terms of the simplest possible representation that covers the resulting effective permissions.

E.g., the following two access policies are equivalent. The latter represents the canonical representation of the former.

<accessPolicy>
  <allow>
    <subject>subj2</subject>
    <subject>subj1</subject>
    <perm>read</perm>
  </allow>
  <allow>
    <subject>subj4</subject>
    <perm>read</perm>
    <perm>changePermission</perm>
  </allow>
  <allow>
    <subject>subj2</subject>
    <subject>subj3</subject>
    <perm>read</perm>
    <perm>write</perm>
  </allow>
  <allow>
    <subject>subj5</subject>
    <perm>read</perm>
    <perm>write</perm>
  </allow>
</accessPolicy>

and

<accessPolicy>
  <allow>
    <subject>subj1</subject>
    <perm>read</perm>
  </allow>
  <allow>
    <subject>subj2</subject>
    <subject>subj3</subject>
    <subject>subj5</subject>
    <perm>write</perm>
  </allow>
  <allow>
    <subject>subj4</subject>
    <perm>changePermission</perm>
  </allow>
</accessPolicy>

Representations of rules, permissions and subjects:

subj_dict maps each subj to the perms the the subj has specifically been given. It holds perms just having been read for PyXB. Duplicates caused by the same subj being given the same perm in multiple ways are filtered out.

{
  'subj1': { 'read' },
  'subj2': { 'read', 'write' },
  'subj3': { 'read', 'write' },
  'subj4': { 'changePermission', 'read' },
  'subj5': { 'read', 'write' }
}

perm_dict maps each perm that a subj has specifically been given, to the subj. If the AccessPolicy contains multiple allow elements, and they each give different perms to a subj, those show up as additional mappings. Duplicates caused by the same subj being given the same perm in multiple ways are filtered out. Calls such as add_perm() also cause extra mappings to be added here, as long as they’re not exact duplicates. Whenever this dict is used for generating PyXB or making comparisons, it is first normalized to a norm_perm_list.

{
  'read': { 'subj1', 'subj2' },
  'write': { 'subj3' },
  'changePermission': { 'subj2' },
}

subj_highest_dict maps each subj to the highest perm the subj has. The dict has the same number of keys as there are subj.

{
  'subj1': 'write',
  'subj2': 'changePermission',
  'subj3': 'write',
}

highest_perm_dict maps the highest perm a subj has, to the subj. The dict can have at most 3 keys:

{
  'changePermission': { 'subj2', 'subj3', 'subj5', 'subj6' },
  'read': { 'public' },
  'write': { 'subj1', 'subj4' }
}

norm_perm_list is a minimal, ordered and hashable list of lists. The top level has up to 3 lists, one for each perm that is in use. Each of the lists then has a list of subj for which that perm is the highest perm. norm_perm_list is the shortest way that the required permissions can be expressed, and is used for comparing access policies and creating uniform PyXB objects:

[
  ['read', ['public']],
  ['write', ['subj1', 'subj4']],
  ['changePermission', ['subj2', 'subj3', 'subj5', 'subj6']]
]
d1_common.wrap.access_policy.wrap(access_pyxb, pyxb_binding=None, read_only=False)

Work with the AccessPolicy in a SystemMetadata PyXB object.

Parameters
  • access_pyxb – AccessPolicy PyXB object The AccessPolicy to modify.

  • read_only – bool Do not update the wrapped AccessPolicy.

When only a single AccessPolicy operation is needed, there’s no need to use this context manager. Instead, use the generated context manager wrappers.

d1_common.wrap.access_policy.wrap_sysmeta_pyxb(sysmeta_pyxb, pyxb_binding=None, read_only=False)

Work with the AccessPolicy in a SystemMetadata PyXB object.

Parameters
  • sysmeta_pyxb – SystemMetadata PyXB object SystemMetadata containing the AccessPolicy to modify.

  • read_only – bool Do not update the wrapped AccessPolicy.

When only a single AccessPolicy operation is needed, there’s no need to use this context manager. Instead, use the generated context manager wrappers.

There is no clean way in Python to make a context manager that allows client code to replace the object that is passed out of the manager. The AccessPolicy schema does not allow the AccessPolicy element to be empty. However, the SystemMetadata schema specifies the AccessPolicy as optional. By wrapping the SystemMetadata instead of the AccessPolicy when working with AccessPolicy that is within SystemMetadata, the wrapper can handle the situation of empty AccessPolicy by instead dropping the AccessPolicy from the SystemMetadata.

class d1_common.wrap.access_policy.AccessPolicyWrapper(access_pyxb, pyxb_binding=None)

Bases: object

Wrap an AccessPolicy and provide convenient methods to read, write and update it.

Parameters

access_pyxb – AccessPolicy PyXB object The AccessPolicy to modify.

update()

Update the wrapped AccessPolicy PyXB object with normalized and minimal rules representing current state.

get_normalized_pyxb()

Returns:

AccessPolicy PyXB object : Current state of the wrapper as the minimal rules required for correctly representing the perms.

get_normalized_perm_list()

Returns:

A minimal, ordered, hashable list of subjects and permissions that represents the current state of the wrapper.

get_highest_perm_str(subj_str)
Parameters

subj_str – str Subject for which to retrieve the highest permission.

Returns

The highest permission for subject or None if subject does not have any permissions.

get_effective_perm_list(subj_str)
Parameters

subj_str – str Subject for which to retrieve the effective permissions.

Returns

List of permissions up to and including the highest permission for subject, ordered lower to higher, or empty list if subject does not have any permissions.

E.g.: If ‘write’ is highest permission for subject, return [‘read’, ‘write’].

Return type

list of str

get_subjects_with_equal_or_higher_perm(perm_str)
Parameters

perm_str – str Permission, read, write or changePermission.

Returns

Subj that have perm equal or higher than perm_str.

Since the lowest permission a subject can have is read, passing read will return all subjects.

Return type

set of str

dump()

Dump the current state to debug level log.

is_public()

Returns:

bool: True if AccessPolicy allows public read.

is_private()

Returns:

bool: True if AccessPolicy does not grant access to any subjects.

is_empty()

Returns:

bool: True if AccessPolicy does not grant access to any subjects.

are_equivalent_pyxb(access_pyxb)
Parameters

access_pyxb – AccessPolicy PyXB object with which to compare.

Returns

True if access_pyxb grants the exact same permissions as the wrapped AccessPolicy.

Differences in how the permissions are represented in the XML docs are handled by transforming to normalized lists before comparison.

Return type

bool

are_equivalent_xml(access_xml)
Parameters

access_xml – AccessPolicy XML doc with which to compare.

Returns

True if access_xml grants the exact same permissions as the wrapped AccessPolicy.

Differences in how the permissions are represented in the XML docs are handled by transforming to normalized lists before comparison.

Return type

bool

subj_has_perm(subj_str, perm_str)

Returns:

bool: True if subj_str has perm equal to or higher than perm_str.

clear()

Remove AccessPolicy.

Only the rightsHolder set in the SystemMetadata will be able to access the object unless new perms are added after calling this method.

add_public_read()

Add public public read perm.

Add an allow rule with a read permission for the symbolic subject, public. It is a no no-op if any of the existing rules already provide read or higher to public.

add_authenticated_read()

Add read perm for all authenticated subj.

Public read is removed if present.

add_verified_read()

Add read perm for all verified subj.

Public read is removed if present.

add_perm(subj_str, perm_str)

Add a permission for a subject.

Parameters
  • subj_str – str Subject for which to add permission(s)

  • perm_str – str Permission to add. Implicitly adds all lower permissions. E.g., write will also add read.

remove_perm(subj_str, perm_str)

Remove permission from a subject.

Parameters
  • subj_str – str Subject for which to remove permission(s)

  • perm_str – str Permission to remove. Implicitly removes all higher permissions. E.g., write will also remove changePermission if previously granted.

remove_subj(subj_str)

Remove all permissions for subject.

Parameters

subj_str – str Subject for which to remove all permissions. Since subjects can only be present in the AccessPolicy when they have one or more permissions, this removes the subject itself as well.

The subject may still have access to the obj. E.g.:

  • The obj has public access.

  • The subj has indirect access by being in a group which has access.

  • The subj has an equivalent subj that has access.

  • The subj is set as the rightsHolder for the object.

d1_common.wrap.access_policy.update(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.get_normalized_pyxb(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.get_normalized_perm_list(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.get_highest_perm_str(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.get_effective_perm_list(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.get_subjects_with_equal_or_higher_perm(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.dump(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.is_public(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.is_private(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.is_empty(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.are_equivalent_pyxb(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.are_equivalent_xml(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.subj_has_perm(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.clear(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.add_public_read(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.add_authenticated_read(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.add_verified_read(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.add_perm(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.remove_perm(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.remove_subj(access_pyxb, *args, **kwargs)
d1_common.wrap.access_policy.mk_func(func_name)
d1_common.wrap.access_policy.method_obj(self)

Update the wrapped AccessPolicy PyXB object with normalized and minimal rules representing current state.

d1_common.wrap.simple_xml module

Context manager for simple XML processing.

Example

with d1_common.wrap.simple_xml.wrap(my_xml_str) as xml_wrapper:
  # Read, modify and write the text in an XML element
  text_str = xml.get_element_text('my_el')
  xml.set_element_text('{} more text'.format(text_str)
  # Discard the wrapped XML and replace it with the modified XML. Calling get_xml()
  # is required because context managers cannot replace the object that was passed
  # to the manager, and strings are immutable. If the wrapped XML is needed later,
  # just store another reference to it.
  my_xml_str = xml_wrapper.get_xml()

Notes

Typically, the DataONE Python stack, and any apps based on the stack, process XML using the PyXB bindings for the DataONE XML types. However, in some rare cases, it is necessary to process XML without using PyXB, and this wrapper provides some basic methods for such processing.

Uses include:

  • Process XML that is not DataONE types, and so does not have PyXB binding.

  • Process XML that is invalid in such a way that PyXB cannot parse or generate it.

  • Process XML without causing xs:dateTime fields to be normalized to the UTC time zone (PyXB is based on the XML DOM, which requires such normalization.)

  • Generate intentionally invalid XML for DataONE types in order to test how MNs, CNs and other components of the DataONE architecture handle and recover from invalid input.

  • Speed up simple processing, when the performance overhead of converting the documents to and from PyXB objects, with the schema validation and other processing that it entails, would be considered too high.

Usage:

  • Methods that take el_name and el_idx operate on the element with index el_idx of elements with name el_name. If el_idx is higher than the number of elements with name el_name, SimpleXMLWrapperException is raised.

  • Though this wrapper does not require XML to validate against the DataONE schemas, it does require that the wrapped XML is well formed and it will only generate well formed XML.

  • If it’s necessary to process XML that is not well formed, a library such as BeautifulSoup may be required.

  • In some cases, it may be possible read or write XML that is not well formed by manipulating the XML directly as a string before wrapping or after generating.

  • This wrapper is based on the ElementTree module.

d1_common.wrap.simple_xml.wrap(xml_str)

Simple processing of XML.

class d1_common.wrap.simple_xml.SimpleXMLWrapper(xml_str)

Bases: object

Wrap an XML document and provide convenient methods for performing simple processing on it.

Parameters

xml_str – str XML document to read, write or modify.

parse_xml(xml_str)
get_xml(encoding='unicode')

Returns:

str : Current state of the wrapper as XML

get_pretty_xml(encoding='unicode')

Returns:

str : Current state of the wrapper as a pretty printed XML string.

get_xml_below_element(el_name, el_idx=0, encoding='unicode')
Parameters
  • el_name – str Name of element that is the base of the branch to retrieve.

  • el_idx – int Index of element to use as base in the event that there are multiple sibling elements with the same name.

Returns

XML fragment rooted at el.

Return type

str

get_element_list_by_name(el_name, namespaces=None)
Parameters

el_name – str Name of element for which to search.

Returns

List of elements with name el_name.

If there are no matching elements, an empty list is returned.

Return type

list

get_element_list_by_attr_key(attr_key, namespaces=None)
Parameters

attr_key – str Name of attribute for which to search

Returns

List of elements containing an attribute key named attr_key.

If there are no matching elements, an empty list is returned.

Return type

list

get_element_by_xpath(xpath_str, namespaces=None)
Parameters

xpath_str – str XPath matching the elements for which to search.

Returns

List of elements matching xpath_str.

If there are no matching elements, an empty list is returned.

Return type

list

get_element_by_name(el_name, el_idx=0)
Parameters
  • el_name – str Name of element to get.

  • el_idx – int Index of element to use as base in the event that there are multiple sibling elements with the same name.

Returns

The selected element.

Return type

element

get_element_by_attr_key(attr_key, el_idx=0)
Parameters
  • attr_key – str Name of attribute for which to search

  • el_idx – int Index of element to use as base in the event that there are multiple sibling elements with the same name.

Returns

Element containing an attribute key named attr_key.

get_element_text(el_name, el_idx=0)
Parameters
  • el_name – str Name of element to use.

  • el_idx – int Index of element to use in the event that there are multiple sibling elements with the same name.

Returns

Text of the selected element.

Return type

str

set_element_text(el_name, el_text, el_idx=0)
Parameters
  • el_name – str Name of element to update.

  • el_text – str Text to set for element.

  • el_idx – int Index of element to use in the event that there are multiple sibling elements with the same name.

get_element_text_by_attr_key(attr_key, el_idx=0)
Parameters
  • attr_key – str Name of attribute for which to search

  • el_idx – int Index of element to use in the event that there are multiple sibling elements with the same name.

Returns

Text of the selected element.

Return type

str

set_element_text_by_attr_key(attr_key, el_text, el_idx=0)
Parameters
  • attr_key – str Name of attribute for which to search

  • el_text – str Text to set for element.

  • el_idx – int Index of element to use in the event that there are multiple sibling elements with the same name.

get_attr_value(attr_key, el_idx=0)

Return the value of the selected attribute in the selected element.

Parameters
  • attr_key – str Name of attribute for which to search

  • el_idx – int Index of element to use in the event that there are multiple sibling elements with the same name.

Returns

Value of the selected attribute in the selected element.

Return type

str

set_attr_text(attr_key, attr_val, el_idx=0)

Set the value of the selected attribute of the selected element.

Parameters
  • attr_key – str Name of attribute for which to search

  • attr_val – str Text to set for the attribute.

  • el_idx – int Index of element to use in the event that there are multiple sibling elements with the same name.

get_element_dt(el_name, tz=None, el_idx=0)

Return the text of the selected element as a datetime.datetime object.

The element text must be a ISO8601 formatted datetime

Parameters
  • el_name – str Name of element to use.

  • tz – datetime.tzinfo Timezone in which to return the datetime.

    • Without a timezone, other contextual information is required in order to determine the exact represented time.

    • If dt has timezone: The tz parameter is ignored.

    • If dt is naive (without timezone): The timezone is set to tz.

    • tz=None: Prevent naive dt from being set to a timezone. Without a timezone, other contextual information is required in order to determine the exact represented time.

    • tz=d1_common.date_time.UTC(): Set naive dt to UTC.

  • el_idx – int Index of element to use in the event that there are multiple sibling elements with the same name.

Returns

datetime.datetime

set_element_dt(el_name, dt, tz=None, el_idx=0)

Set the text of the selected element to an ISO8601 formatted datetime.

Parameters
  • el_name – str Name of element to update.

  • dt – datetime.datetime Date and time to set

  • tz – datetime.tzinfo Timezone to set

    • Without a timezone, other contextual information is required in order to determine the exact represented time.

    • If dt has timezone: The tz parameter is ignored.

    • If dt is naive (without timezone): The timezone is set to tz.

    • tz=None: Prevent naive dt from being set to a timezone. Without a timezone, other contextual information is required in order to determine the exact represented time.

    • tz=d1_common.date_time.UTC(): Set naive dt to UTC.

  • el_idx – int Index of element to use in the event that there are multiple sibling elements with the same name.

remove_children(el_name, el_idx=0)

Remove any child elements from element.

Parameters
  • el_name – str Name of element to update.

  • el_idx – int Index of element to use in the event that there are multiple sibling elements with the same name.

replace_by_etree(root_el, el_idx=0)

Replace element.

Select element that has the same name as root_el, then replace the selected element with root_el

root_el can be a single element or the root of an element tree.

Parameters

root_el – element New element that will replace the existing element.

replace_by_xml(xml_str, el_idx=0)

Replace element.

Select element that has the same name as xml_str, then replace the selected element with xml_str

  • xml_str must have a single element in the root.

  • The root element in xml_str can have an arbitrary number of children.

Parameters

xml_str – str New element that will replace the existing element.

exception d1_common.wrap.simple_xml.SimpleXMLWrapperException

Bases: Exception