d1_common package¶
DataONE Common Library.
Although this directory is not a package, this __init__.py file is required for pytest to be able to reach test directories below this directory.
Subpackages¶
- d1_common.cert package
- d1_common.ext package
- d1_common.iter package
- d1_common.types package
- d1_common.utils package
- d1_common.wrap package
Submodules¶
d1_common.bagit module¶
Create and validate BagIt Data Packages / zip file archives.
See also
-
d1_common.bagit.
validate_bagit_file
(bagit_path)¶ Check if a BagIt file is valid.
- Raises
ServiceFailure – If the BagIt zip archive file fails any of the following checks: - Is a valid zip file. - The tag and manifest files are correctly formatted. - Contains all the files listed in the manifests. - The file checksums match the manifests.
-
d1_common.bagit.
create_bagit_stream
(dir_name, payload_info_list)¶ Create a stream containing a BagIt zip archive.
- Parameters
dir_name – str The name of the root directory in the zip file, under which all the files are placed (avoids “zip bombs”).
payload_info_list – list List of payload_info_dict, each dict describing a file.
keys: pid, filename, iter, checksum, checksum_algorithm
If the filename is None, the pid is used for the filename.
d1_common.checksum module¶
Utilities for handling checksums.
Warning
The MD5
checksum algorithm is not cryptographically secure. It’s possible to
craft a sequence of bytes that yields a predetermined checksum.
-
d1_common.checksum.
create_checksum_object_from_stream
(f, algorithm='SHA-1')¶ Calculate the checksum of a stream.
- Parameters
f – file-like object Only requirement is a
read()
method that returnsbytes
.algorithm – str Checksum algorithm,
MD5
orSHA1
/SHA-1
.
- Returns
Populated Checksum PyXB object.
-
d1_common.checksum.
create_checksum_object_from_iterator
(itr, algorithm='SHA-1')¶ Calculate the checksum of an iterator.
- Parameters
itr – iterable Object which supports the iterator protocol.
algorithm – str Checksum algorithm,
MD5
orSHA1
/SHA-1
.
- Returns
Populated Checksum PyXB object.
-
d1_common.checksum.
create_checksum_object_from_bytes
(b, algorithm='SHA-1')¶ Calculate the checksum of
bytes
.Warning
This method requires the entire object to be buffered in (virtual) memory, which should normally be avoided in production code.
- Parameters
b – bytes Raw bytes
algorithm – str Checksum algorithm,
MD5
orSHA1
/SHA-1
.
- Returns
Populated PyXB Checksum object.
-
d1_common.checksum.
calculate_checksum_on_stream
(f, algorithm='SHA-1', chunk_size=1048576)¶ Calculate the checksum of a stream.
- Parameters
f – file-like object Only requirement is a
read()
method that returnsbytes
.algorithm – str Checksum algorithm,
MD5
orSHA1
/SHA-1
.chunk_size – int Number of bytes to read from the file and add to the checksum at a time.
- Returns
Checksum as a hexadecimal string, with length decided by the algorithm.
- Return type
str
-
d1_common.checksum.
calculate_checksum_on_iterator
(itr, algorithm='SHA-1')¶ Calculate the checksum of an iterator.
- Parameters
itr – iterable Object which supports the iterator protocol.
algorithm – str Checksum algorithm,
MD5
orSHA1
/SHA-1
.
- Returns
Checksum as a hexadecimal string, with length decided by the algorithm.
- Return type
str
-
d1_common.checksum.
calculate_checksum_on_bytes
(b, algorithm='SHA-1')¶ Calculate the checksum of
bytes
.Warning: This method requires the entire object to be buffered in (virtual) memory, which should normally be avoided in production code.
- Parameters
b – bytes Raw bytes
algorithm – str Checksum algorithm,
MD5
orSHA1
/SHA-1
.
- Returns
Checksum as a hexadecimal string, with length decided by the algorithm.
- Return type
str
-
d1_common.checksum.
are_checksums_equal
(checksum_a_pyxb, checksum_b_pyxb)¶ Determine if checksums are equal.
- Parameters
checksum_a_pyxb, checksum_b_pyxb – PyXB Checksum objects to compare.
- Returns
- bool
True: The checksums contain the same hexadecimal values calculated with the same algorithm. Identical checksums guarantee (for all practical purposes) that the checksums were calculated from the same sequence of bytes.
False: The checksums were calculated with the same algorithm but the hexadecimal values are different.
- Raises
ValueError – The checksums were calculated with different algorithms, hence cannot be compared.
-
d1_common.checksum.
get_checksum_calculator_by_dataone_designator
(dataone_algorithm_name)¶ Get a checksum calculator.
- Parameters
dataone_algorithm_name – str Checksum algorithm,
MD5
orSHA1
/SHA-1
.- Returns
Checksum calculator from the
hashlib
libraryObject that supports
update(arg)
,digest()
,hexdigest()
andcopy()
.
-
d1_common.checksum.
get_default_checksum_algorithm
()¶ Get the default checksum algorithm.
- Returns
Checksum algorithm that is supported by DataONE, the DataONE Python stack and is in common use within the DataONE federation. Currently,
SHA-1
.The returned string can be passed as the
algorithm_str
to the functions in this module.- Return type
str
-
d1_common.checksum.
is_supported_algorithm
(algorithm_str)¶ Determine if string is the name of a supported checksum algorithm.
- Parameters
algorithm_str – str String that may or may not contain the name of a supported algorithm.
- Returns
- bool
True: The string contains the name of a supported algorithm and can be passed as the
algorithm_str
to the functions in this module.False: The string is not a supported algorithm.
-
d1_common.checksum.
get_supported_algorithms
()¶ Get a list of the checksum algorithms that are supported by the DataONE Python stack.
- Returns
List of algorithms that are supported by the DataONE Python stack and can be passed to as the
algorithm_str
to the functions in this module.- Return type
list
-
d1_common.checksum.
format_checksum
(checksum_pyxb)¶ Create string representation of a PyXB Checksum object.
- Parameters
PyXB Checksum object
- Returns
Combined hexadecimal value and algorithm name.
- Return type
str
d1_common.const module¶
System wide constants for the Python DataONE stack.
d1_common.date_time module¶
Utilities for handling date-times in DataONE.
Timezones (tz):
A datetime object can be tz-naive or tz-aware.
tz-naive: The datetime does not include timezone information. As such, it does not by itself fully specify an absolute point in time. The exact point in time depends on in which timezone the time is specified, and the information may not be accessible to the end user. However, as timezones go from GMT-12 to GMT+14, and when including a possible daylight saving offset of 1 hour, a tz-naive datetime will always be within 14 hours of the real time.
tz-aware: The datetime includes a timezone, specified as an abbreviation or as a hour and minute offset. It specifies an exact point in time.
-
class
d1_common.date_time.
UTC
¶ Bases:
datetime.tzinfo
datetime.tzinfo based class that represents the UTC timezone.
Date-times in DataONE should have timezone information that is fixed to UTC. A naive Python datetime can be fixed to UTC by attaching it to this datetime.tzinfo based class.
-
utcoffset
(dt)¶ Returns:
UTC offset of zero
-
tzname
(dt=None)¶ Returns:
str: “UTC”
-
dst
(dt=None)¶ Args: dt: Ignored.
Returns: timedelta(0), meaning that daylight saving is never in effect.
-
-
class
d1_common.date_time.
FixedOffset
(name, offset_hours=0, offset_minutes=0)¶ Bases:
datetime.tzinfo
datetime.tzinfo derived class that represents any timezone as fixed offset in minutes east of UTC.
Date-times in DataONE should have timezone information that is fixed to UTC. A naive Python datetime can be fixed to UTC by attaching it to this datetime.tzinfo based class.
See the UTC class for representing timezone in UTC.
-
__init__
(name, offset_hours=0, offset_minutes=0)¶ Args: name: str Name of the timezone this offset represents.
- offset_hours:
Number of hours offset from UTC.
- offset_minutes:
Number of minutes offset from UTC.
-
utcoffset
(dt)¶ Args: dt: Ignored.
- Returns
The time offset from UTC.
- Return type
datetime.timedelta
-
tzname
(dt)¶ Args: dt: Ignored.
Returns: Name of the timezone this offset represents.
-
dst
(dt=None)¶ Args: dt: Ignored.
Returns: timedelta(0), meaning that daylight saving is never in effect.
-
d1_common.date_time.
is_valid_iso8601
(iso8601_str)¶ Determine if string is a valid ISO 8601 date, time, or datetime.
- Parameters
iso8601_str – str String to check.
- Returns
True
if string is a valid ISO 8601 date, time, or datetime.- Return type
bool
-
d1_common.date_time.
has_tz
(dt)¶ Determine if datetime has timezone (is not naive)
- Parameters
dt – datetime
- Returns
- bool
True:
datetime
is tz-aware.False:
datetime
is tz-naive.
-
d1_common.date_time.
is_utc
(dt)¶ Determine if datetime has timezone and the timezone is in UTC.
- Parameters
dt – datetime
- Returns
True
if datetime has timezone and the timezone is in UTC- Return type
bool
-
d1_common.date_time.
are_equal
(a_dt, b_dt, round_sec=1)¶ Determine if two datetimes are equal with fuzz factor.
A naive datetime (no timezone information) is assumed to be in in UTC.
- Parameters
a_dt – datetime Timestamp to compare.
b_dt – datetime Timestamp to compare.
round_sec – int or float Round the timestamps to the closest second divisible by this value before comparing them.
E.g.:
n_round_sec
= 0.1: nearest 10th of a second.n_round_sec
= 1: nearest second.n_round_sec
= 30: nearest half minute.
Timestamps may lose resolution or otherwise change slightly as they go through various transformations and storage systems. This again may cause timestamps that have been processed in different systems to fail an exact equality compare even if they were initially the same timestamp. This rounding avoids such problems as long as the error introduced to the original timestamp is not higher than the rounding value. Of course, the rounding also causes a loss in resolution in the values compared, so should be kept as low as possible. The default value of 1 second should be a good tradeoff in most cases.
- Returns
- bool
True: If the two datetimes are equal after being rounded by
round_sec
.
-
d1_common.date_time.
ts_from_dt
(dt)¶ Convert datetime to POSIX timestamp.
- Parameters
dt – datetime
Timezone aware datetime: The tz is included and adjusted to UTC (since timestamp is always in UTC).
Naive datetime (no timezone information): Assumed to be in UTC.
- Returns
- int or float
The number of seconds since Midnight, January 1st, 1970, UTC.
If
dt
contains sub-second values, the returned value will be a float with fraction.
See also
dt_from_ts()
for the reverse operation.
-
d1_common.date_time.
dt_from_ts
(ts, tz=None)¶ Convert POSIX timestamp to a timezone aware datetime.
- Parameters
ts – int or float, optionally with fraction The number of seconds since Midnight, January 1st, 1970, UTC.
tz – datetime.tzinfo - If supplied: The dt is adjusted to that tz before being returned. It does not
affect the ts, which is always in UTC.
If not supplied: the dt is returned in UTC.
- Returns
- datetime
Timezone aware datetime, in UTC.
See also
ts_from_dt()
for the reverse operation.
-
d1_common.date_time.
http_datetime_str_from_dt
(dt)¶ Format datetime to HTTP Full Date format.
- Parameters
dt – datetime
tz-aware: Used in the formatted string.
tz-naive: Assumed to be in UTC.
- Returns
- str
The returned format is a is fixed-length subset of that defined by RFC 1123 and is the preferred format for use in the HTTP Date header. E.g.:
Sat, 02 Jan 1999 03:04:05 GMT
-
d1_common.date_time.
xsd_datetime_str_from_dt
(dt)¶ Format datetime to a xs:dateTime string.
- Parameters
dt – datetime
tz-aware: Used in the formatted string.
tz-naive: Assumed to be in UTC.
- Returns
- str
The returned format can be used as the date in xs:dateTime XML elements. It will be on the form
YYYY-MM-DDTHH:MM:SS.mmm+00:00
.
-
d1_common.date_time.
dt_from_http_datetime_str
(http_full_datetime)¶ Parse HTTP Full Date formats and return as datetime.
- Parameters
http_full_datetime – str Each of the allowed formats are supported:
Sun, 06 Nov 1994 08:49:37 GMT ; RFC 822, updated by RFC 1123
Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036
Sun Nov 6 08:49:37 1994 ; ANSI C’s asctime() format
HTTP Full Dates are always in UTC.
- Returns
- datetime
The returned datetime is always timezone aware and in UTC.
-
d1_common.date_time.
dt_from_iso8601_str
(iso8601_str)¶ Parse ISO8601 formatted datetime string.
- Parameters
iso8601_str – str ISO 8601 formatted datetime.
tz-aware: Used in the formatted string.
tz-naive: Assumed to be in UTC.
Partial strings are accepted as long as they’re on the general form. Everything from just
2014
to2006-10-20T15:34:56.123+02:30
will work. The sections that are not present in the string are set to zero in the returned datetime.See
test_iso8601.py
in the iso8601 package for examples.
- Returns
- datetime
The returned datetime is always timezone aware and in UTC.
- Raises
d1_common.date_time.iso8601.ParseError – If ``iso8601_string` is not on the general form of ISO 8601.
-
d1_common.date_time.
normalize_datetime_to_utc
(dt)¶ Adjust datetime to UTC.
Apply the timezone offset to the datetime and set the timezone to UTC.
This is a no-op if the datetime is already in UTC.
- Parameters
dt – datetime - tz-aware: Used in the formatted string. - tz-naive: Assumed to be in UTC.
- Returns
- datetime
The returned datetime is always timezone aware and in UTC.
Notes
This forces a new object to be returned, which fixes an issue with serialization to XML in PyXB. PyXB uses a mixin together with datetime to handle the XML xs:dateTime. That type keeps track of timezone information included in the original XML doc, which conflicts if we return it here as part of a datetime mixin.
See also
cast_naive_datetime_to_tz()
-
d1_common.date_time.
cast_naive_datetime_to_tz
(dt, tz=UTC)¶ If datetime is tz-naive, set it to
tz
. If datetime is tz-aware, return it unmodified.- Parameters
dt – datetime tz-naive or tz-aware datetime.
tz – datetime.tzinfo The timezone to which to adjust tz-naive datetime.
- Returns
- datetime
tz-aware datetime.
Warning
This will change the actual moment in time that is represented if the datetime is naive and represents a date and time not in
tz
.See also
normalize_datetime_to_utc()
-
d1_common.date_time.
strip_timezone
(dt)¶ Make datetime tz-naive by stripping away any timezone information.
- Parameters
dt – datetime
- tz-aware – Used in the formatted string.
- tz-naive – Returned unchanged.
- Returns
- datetime
tz-naive datetime.
-
d1_common.date_time.
utc_now
()¶ Returns: tz-aware datetime: The current local date and time adjusted to the UTC timezone.
Notes
Local time is retrieved from the local machine clock.
Relies on correctly set timezone on the local machine.
Relies on current tables for Daylight Saving periods.
Local machine timezone can be checked with:
$ date +'%z %Z'
.
-
d1_common.date_time.
date_utc_now_iso
()¶ Returns:
- strThe current local date as an ISO 8601 string in the UTC timezone
Does not include the time.
-
d1_common.date_time.
local_now
()¶ Returns:
tz-aware datetime : The current local date and time in the local timezone
-
d1_common.date_time.
local_now_iso
()¶ Returns:
str : The current local date and time as an ISO 8601 string in the local timezone
-
d1_common.date_time.
to_iso8601_utc
(dt)¶ Args: dt: datetime.
Returns: str: ISO 8601 string in the UTC timezone
-
d1_common.date_time.
create_utc_datetime
(*datetime_parts)¶ Create a datetime with timezone set to UTC.
- Parameters
tuple of int – year, month, day, hour, minute, second, microsecond
- Returns
datetime
-
d1_common.date_time.
round_to_nearest
(dt, n_round_sec=1.0)¶ Round datetime up or down to nearest divisor.
Round datetime up or down to nearest number of seconds that divides evenly by the divisor.
Any timezone is preserved but ignored in the rounding.
- Parameters
dt – datetime
n_round_sec – int or float Divisor for rounding
Examples
n_round_sec
= 0.1: nearest 10th of a second.n_round_sec
= 1: nearest second.n_round_sec
= 30: nearest half minute.
d1_common.env module¶
Utilities for handling DataONE environments.
-
d1_common.env.
get_d1_env_keys
()¶ Get the DataONE env dict keys in preferred order.
- Returns
DataONE env dict keys
- Return type
list
-
d1_common.env.
get_d1_env
(env_key)¶ Get the values required in order to connect to a DataONE environment.
- Returns
Values required in order to connect to a DataONE environment.
- Return type
dict
-
d1_common.env.
get_d1_env_by_base_url
(cn_base_url)¶ Given the BaseURL for a CN, return the DataONE environment dict for the CN’s environemnt.
d1_common.logging_context module¶
Context manager that enables temporary changes in logging level.
Source: https://docs.python.org/2/howto/logging-cookbook.html
-
class
d1_common.logging_context.
LoggingContext
(logger, level=None, handler=None, close=True)¶ Bases:
object
Logging Context Manager.
-
__init__
(logger, level=None, handler=None, close=True)¶ Args: logger: logger Logger for which to change the logging level.
- level:
Temporary logging level.
- handler:
Optional logging handler to use. Supplying a new handler allows temporarily changing the logging format as well.
- close:
Automatically close handler (if supplied).
-
d1_common.multipart module¶
Utilities for handling MIME Multipart documents.
-
d1_common.multipart.
parse_response
(response, encoding='utf-8')¶ Parse a multipart Requests.Response into a tuple of BodyPart objects.
- Parameters
response – Requests.Response
encoding – The parser will assume that any text in the HTML body is encoded with this encoding when decoding it for use in the
text
attribute.
- Returns
- tuple of BodyPart
Members: headers (CaseInsensitiveDict), content (bytes), text (Unicode), encoding (str).
-
d1_common.multipart.
parse_str
(mmp_bytes, content_type, encoding='utf-8')¶ Parse multipart document bytes into a tuple of BodyPart objects.
- Parameters
mmp_bytes – bytes Multipart document.
content_type – str Must be on the form,
multipart/form-data; boundary=<BOUNDARY>
, where<BOUNDARY>
is the string that separates the parts of the multipart document inmmp_bytes
. In HTTP requests and responses, it is passed in the Content-Type header.encoding – str The coding used for the text in the HTML body.
- Returns
- tuple of BodyPart
Members: headers (CaseInsensitiveDict), content (bytes), text (Unicode), encoding (str).
-
d1_common.multipart.
normalize
(body_part_tup)¶ Normalize a tuple of BodyPart objects to a string.
Normalization is done by sorting the body_parts by the Content- Disposition headers, which is typically on the form,
form-data; name="name_of_part
.
-
d1_common.multipart.
is_multipart
(header_dict)¶ - Parameters
header_dict – CaseInsensitiveDict
- Returns
True
ifheader_dict
has a Content-Type key (case insensitive) with value that begins with ‘multipart’.- Return type
bool
d1_common.node module¶
Utilities for handling the DataONE Node and NodeList types.
-
d1_common.node.
pyxb_to_dict
(node_list_pyxb)¶ - Returns
Representation of
node_list_pyxb
, keyed on the Node identifier (urn:node:*
).- Return type
dict
Example:
{ u'urn:node:ARCTIC': { 'base_url': u'https://arcticdata.io/metacat/d1/mn', 'description': u'The US National Science Foundation...', 'name': u'Arctic Data Center', 'ping': None, 'replicate': 0, 'state': u'up', 'synchronize': 1, 'type': u'mn' }, u'urn:node:BCODMO': { 'base_url': u'https://www.bco-dmo.org/d1/mn', 'description': u'Biological and Chemical Oceanography Data...', 'name': u'Biological and Chemical Oceanography Data...', 'ping': None, 'replicate': 0, 'state': u'up', 'synchronize': 1, 'type': u'mn' }, }
d1_common.object_format_cache module¶
Local cache of the DataONE ObjectFormatList for a given DataONE environment.
As part of the metadata for a science object, DataONE stores a type identifier called an ObjectFormatID. The ObjectFormatList allows mapping ObjectFormatIDs to filename extensions and content type.
The cache is stored in a file and is automatically updated periodically.
Simple methods for looking up elements of the ObjectFormatList are provided.
Examples
Section of an ObjectFormatList:
- {
- ‘-//ecoinformatics.org//eml-access-2.0.0beta4//EN’: {
‘extension’: ‘xml’, ‘format_name’: ‘Ecological Metadata Language, Access module, version 2.0.0beta4’, ‘format_type’: ‘METADATA’, ‘media_type’: {
‘name’: ‘text/xml’, ‘property_list’: []
}
}, ‘-//ecoinformatics.org//eml-access-2.0.0beta6//EN’: {
‘extension’: ‘xml’, ‘format_name’: ‘Ecological Metadata Language, Access module, version 2.0.0beta6’, ‘format_type’: ‘METADATA’, ‘media_type’: {
‘name’: ‘text/xml’, ‘property_list’: []}
},
}
-
class
d1_common.object_format_cache.
Singleton
¶ Bases:
object
-
class
d1_common.object_format_cache.
ObjectFormatListCache
(cn_base_url='https://cn.dataone.org/cn', object_format_cache_path='/home/docs/checkouts/readthedocs.org/user_builds/dataone-python/checkouts/latest/lib_common/src/d1_common/object_format_cache.json', cache_refresh_period=datetime.timedelta(days=30), lock_file_path='/tmp/object_format_cache.lock')¶ Bases:
d1_common.object_format_cache.Singleton
-
__init__
(cn_base_url='https://cn.dataone.org/cn', object_format_cache_path='/home/docs/checkouts/readthedocs.org/user_builds/dataone-python/checkouts/latest/lib_common/src/d1_common/object_format_cache.json', cache_refresh_period=datetime.timedelta(days=30), lock_file_path='/tmp/object_format_cache.lock')¶ - Parameters
cn_base_url – str: BaseURL for a CN in the DataONE Environment being targeted.
This can usually be left at the production root, even if running in other environments.
object_format_cache_path – str Path to a file in which the cached ObjectFormatList is or will be stored.
By default, the path is set to a cache file that is distributed together with this module.
The directories must exist. The file is created if it doesn’t exist. The file is recreated whenever needed. Paths under “/tmp” will typically cause the file to have to be recreated after reboot while paths under “/var/tmp/” typically persist over reboot.
cache_refresh_period – datetime.timedelta or None Period of time in which to use the cached ObjectFormatList before refreshing it by downloading a new copy from the CN. The ObjectFormatList does not change often, so a month is probably a sensible default.
Set to None to disable refresh. When refresh is disabled,
object_format_cache_path
must point to an existing file.
-
property
object_format_dict
¶ Direct access to a native Python dict representing cached ObjectFormatList.
-
get_content_type
(format_id, default=None)¶
-
get_filename_extension
(format_id, default=None)¶
-
refresh_cache
()¶ Force a refresh of the local cached version of the ObjectFormatList.
This is typically not required, as the cache is refreshed automatically after the configured
cache_expiration_period
.
-
is_valid_format_id
(format_id)¶
-
d1_common.replication_policy module¶
Utilities for handling the DataONE ReplicationPolicy type.
The Replication Policy is an optional section of the System Metadata which may be used to enable or disable replication, set the desired number of replicas and specify remote MNs to either prefer or block as replication targets.
Examples:
ReplicationPolicy:
<replicationPolicy replicationAllowed="true" numberReplicas="3">
<!--Zero or more repetitions:-->
<preferredMemberNode>node1</preferredMemberNode>
<preferredMemberNode>node2</preferredMemberNode>
<preferredMemberNode>node3</preferredMemberNode>
<!--Zero or more repetitions:-->
<blockedMemberNode>node4</blockedMemberNode>
<blockedMemberNode>node5</blockedMemberNode>
</replicationPolicy>
-
d1_common.replication_policy.
has_replication_policy
(sysmeta_pyxb)¶ Args: sysmeta_pyxb: SystemMetadata PyXB object.
Returns: bool:
True
if SystemMetadata includes the optional ReplicationPolicy section.
-
d1_common.replication_policy.
sysmeta_add_preferred
(sysmeta_pyxb, node_urn)¶ Add a remote Member Node to the list of preferred replication targets to this System Metadata object.
Also remove the target MN from the list of blocked Member Nodes if present.
If the target MN is already in the preferred list and not in the blocked list, this function is a no-op.
- Parameters
sysmeta_pyxb – SystemMetadata PyXB object. System Metadata in which to add the preferred replication target.
If the System Metadata does not already have a Replication Policy, a default replication policy which enables replication is added and populated with the preferred replication target.
node_urn –
- str
Node URN of the remote MN that will be added. On the form
urn:node:MyMemberNode
.
-
d1_common.replication_policy.
sysmeta_add_blocked
(sysmeta_pyxb, node_urn)¶ Add a remote Member Node to the list of blocked replication targets to this System Metadata object.
The blocked node will not be considered a possible replication target for the associated System Metadata.
Also remove the target MN from the list of preferred Member Nodes if present.
If the target MN is already in the blocked list and not in the preferred list, this function is a no-op.
- Parameters
sysmeta_pyxb – SystemMetadata PyXB object. System Metadata in which to add the blocked replication target.
If the System Metadata does not already have a Replication Policy, a default replication policy which enables replication is added and then populated with the blocked replication target.
node_urn – str Node URN of the remote MN that will be added. On the form
urn:node:MyMemberNode
.
-
d1_common.replication_policy.
sysmeta_set_default_rp
(sysmeta_pyxb)¶ Set a default, empty, Replication Policy.
This will clear any existing Replication Policy in the System Metadata.
The default Replication Policy disables replication and sets number of replicas to 0.
- Parameters
sysmeta_pyxb – SystemMetadata PyXB object. System Metadata in which to set a default Replication Policy.
-
d1_common.replication_policy.
normalize
(rp_pyxb)¶ Normalize a ReplicationPolicy PyXB type in place.
The preferred and blocked lists are sorted alphabetically. As blocked nodes override preferred nodes, and any node present in both lists is removed from the preferred list.
- Parameters
rp_pyxb – ReplicationPolicy PyXB object The object will be normalized in place.
-
d1_common.replication_policy.
is_preferred
(rp_pyxb, node_urn)¶ - Parameters
rp_pyxb – ReplicationPolicy PyXB object The object will be normalized in place.
node_urn – str Node URN of the remote MN for which to check preference.
- Returns
True
ifnode_urn
is a preferred replica target.As blocked nodes override preferred nodes, return False if
node_urn
is in both lists.- Return type
bool
-
d1_common.replication_policy.
is_blocked
(rp_pyxb, node_urn)¶ - Parameters
rp_pyxb – ReplicationPolicy PyXB object The object will be normalized in place.
node_urn – str Node URN of the remote MN for which to check preference.
- Returns
True
ifnode_urn
is a blocked replica target.As blocked nodes override preferred nodes, return True if
node_urn
is in both lists.- Return type
bool
-
d1_common.replication_policy.
are_equivalent_pyxb
(a_pyxb, b_pyxb)¶ Check if two ReplicationPolicy objects are semantically equivalent.
The ReplicationPolicy objects are normalized before comparison.
- Parameters
a_pyxb, b_pyxb – ReplicationPolicy PyXB objects to compare
- Returns
True
if the resulting policies for the two objects are semantically equivalent.- Return type
bool
-
d1_common.replication_policy.
are_equivalent_xml
(a_xml, b_xml)¶ Check if two ReplicationPolicy XML docs are semantically equivalent.
The ReplicationPolicy XML docs are normalized before comparison.
- Parameters
a_xml, b_xml – ReplicationPolicy XML docs to compare
- Returns
True
if the resulting policies for the two objects are semantically equivalent.- Return type
bool
-
d1_common.replication_policy.
add_preferred
(rp_pyxb, node_urn)¶ Add a remote Member Node to the list of preferred replication targets.
Also remove the target MN from the list of blocked Member Nodes if present.
If the target MN is already in the preferred list and not in the blocked list, this function is a no-op.
- Parameters
rp_pyxb – SystemMetadata PyXB object. Replication Policy in which to add the preferred replication target.
node_urn – str Node URN of the remote MN that will be added. On the form
urn:node:MyMemberNode
.
-
d1_common.replication_policy.
add_blocked
(rp_pyxb, node_urn)¶ Add a remote Member Node to the list of blocked replication targets.
Also remove the target MN from the list of preferred Member Nodes if present.
If the target MN is already in the blocked list and not in the preferred list, this function is a no-op.
- Parameters
rp_pyxb – SystemMetadata PyXB object. Replication Policy in which to add the blocked replication target.
node_urn – str Node URN of the remote MN that will be added. On the form
urn:node:MyMemberNode
.
-
d1_common.replication_policy.
pyxb_to_dict
(rp_pyxb)¶ Convert ReplicationPolicy PyXB object to a normalized dict.
- Parameters
rp_pyxb – ReplicationPolicy to convert.
- Returns
Replication Policy as normalized dict.
- Return type
dict
Example:
{ 'allowed': True, 'num': 3, 'blockedMemberNode': {'urn:node:NODE1', 'urn:node:NODE2', 'urn:node:NODE3'}, 'preferredMemberNode': {'urn:node:NODE4', 'urn:node:NODE5'}, }
-
d1_common.replication_policy.
dict_to_pyxb
(rp_dict)¶ Convert dict to ReplicationPolicy PyXB object.
- Parameters
rp_dict – Native Python structure representing a Replication Policy.
Example:
{ 'allowed': True, 'num': 3, 'blockedMemberNode': {'urn:node:NODE1', 'urn:node:NODE2', 'urn:node:NODE3'}, 'preferredMemberNode': {'urn:node:NODE4', 'urn:node:NODE5'}, }
- Returns
ReplicationPolicy PyXB object.
d1_common.resource_map module¶
Read and write DataONE OAI-ORE Resource Maps.
DataONE supports a system that allows relationships between Science Objects to be described. These relationships are stored in OAI-ORE Resource Maps.
This module provides functionality for the most common use cases when parsing and generating Resource Maps for use in DataONE.
For more information about how Resource Maps are used in DataONE, see:
https://releases.dataone.org/online/api-documentation-v2.0.1/design/DataPackage.html
Common RDF-XML namespaces:
dc: <http://purl.org/dc/elements/1.1/>
foaf: <http://xmlns.com/foaf/0.1/>
rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns# >
rdfs1: <http://www.w3.org/2001/01/rdf-schema# >
ore: <http://www.openarchives.org/ore/terms/>
dcterms: <http://purl.org/dc/terms/>
cito: <http://purl.org/spar/cito/>
Note
In order for Resource Maps to be recognized and indexed by DataONE, they must be created
with formatId
set to http://www.openarchives.org/ore/terms
.
-
d1_common.resource_map.
createSimpleResourceMap
(ore_pid, scimeta_pid, sciobj_pid_list)¶ Create a simple OAI-ORE Resource Map with one Science Metadata document and any number of Science Data objects.
This creates a document that establishes an association between a Science Metadata object and any number of Science Data objects. The Science Metadata object contains information that is indexed by DataONE, allowing both the Science Metadata and the Science Data objects to be discoverable in DataONE Search. In search results, the objects will appear together and can be downloaded as a single package.
- Parameters
ore_pid – str Persistent Identifier (PID) to use for the new Resource Map
scimeta_pid – str PID for an object that will be listed as the Science Metadata that is describing the Science Data objects.
sciobj_pid_list – list of str List of PIDs that will be listed as the Science Data objects that are being described by the Science Metadata.
- Returns
OAI-ORE Resource Map
- Return type
-
d1_common.resource_map.
createResourceMapFromStream
(in_stream, base_url='https://cn.dataone.org/cn')¶ Create a simple OAI-ORE Resource Map with one Science Metadata document and any number of Science Data objects, using a stream of PIDs.
- Parameters
in_stream – The first non-blank line is the PID of the resource map itself. Second line is the science metadata PID and remaining lines are science data PIDs.
Example stream contents:
PID_ORE_value sci_meta_pid_value data_pid_1 data_pid_2 data_pid_3
base_url – str Root of the DataONE environment in which the Resource Map will be used.
- Returns
OAI-ORE Resource Map
- Return type
-
class
d1_common.resource_map.
ResourceMap
(ore_pid=None, scimeta_pid=None, scidata_pid_list=None, base_url='https://cn.dataone.org/cn', api_major=2, ore_software_id='DataONE.org Python ITK 3.4.5', *args, **kwargs)¶ Bases:
rdflib.graph.ConjunctiveGraph
OAI-ORE Resource Map.
-
__init__
(ore_pid=None, scimeta_pid=None, scidata_pid_list=None, base_url='https://cn.dataone.org/cn', api_major=2, ore_software_id='DataONE.org Python ITK 3.4.5', *args, **kwargs)¶ Create a OAI-ORE Resource Map.
- Parameters
ore_pid – str Persistent Identifier (PID) to use for the new Resource Map
scimeta_pid – str PID for an object that will be listed as the Science Metadata that is describing the Science Data objects.
scidata_pid_list – list of str List of PIDs that will be listed as the Science Data objects that are being described by the Science Metadata.
base_url – str Root of the DataONE environment in which the Resource Map will be used.
api_major – The DataONE API version to use for the the DataONE Resolve API. Clients call the Resolve API to get a list of download locations for the objects in the Resource Map.
ore_software_id – str Optional string which identifies the software that was used for creating the Resource Map. If specified, should be on the form of a UserAgent string.
args and kwargs – Optional arguments forwarded to rdflib.ConjunctiveGraph.__init__().
-
initialize
(pid, ore_software_id='DataONE.org Python ITK 3.4.5')¶ Create the basic ORE document structure.
-
serialize_to_transport
(doc_format='xml', *args, **kwargs)¶ Serialize ResourceMap to UTF-8 encoded XML document.
- Parameters
doc_format – str One of:
xml
,n3
,turtle
,nt
,pretty-xml
,trix
,trig
andnquads
.args and kwargs – Optional arguments forwarded to rdflib.ConjunctiveGraph.serialize().
- Returns
UTF-8 encoded XML doc.
- Return type
bytes
Note
Only the default, “xml”, is automatically indexed by DataONE.
-
serialize_to_display
(doc_format='pretty-xml', *args, **kwargs)¶ Serialize ResourceMap to an XML doc that is pretty printed for display.
- Parameters
doc_format – str One of:
xml
,n3
,turtle
,nt
,pretty-xml
,trix
,trig
andnquads
.args and kwargs – Optional arguments forwarded to rdflib.ConjunctiveGraph.serialize().
- Returns
Pretty printed Resource Map XML doc
- Return type
str
Note
Only the default, “xml”, is automatically indexed by DataONE.
-
deserialize
(*args, **kwargs)¶ Deserialize Resource Map XML doc.
The source is specified using one of source, location, file or data.
- Parameters
source – InputSource, file-like object, or string In the case of a string the string is the location of the source.
location – str String indicating the relative or absolute URL of the source. Graph``s absolutize method is used if a relative location is specified.
file – file-like object
data – str The document to be parsed.
format – str Used if format can not be determined from source. Defaults to
rdf/xml
. Format support can be extended with plugins.Built-in:
xml
,n3
,nt
,trix
,rdfa
publicID – str Logical URI to use as the document base. If None specified the document location is used (at least in the case where there is a document location).
- Raises
xml.sax.SAXException based exception – On parse error.
-
getAggregation
()¶ Returns:
str : URIRef of the Aggregation entity
-
getObjectByPid
(pid)¶ - Parameters
pid – str
- Returns
URIRef of the entry identified by
pid
.- Return type
str
-
addResource
(pid)¶ Add a resource to the Resource Map.
- Parameters
pid – str
-
setDocuments
(documenting_pid, documented_pid)¶ Add a CiTO, the Citation Typing Ontology, triple asserting that
documenting_pid
documentsdocumented_pid
.Adds assertion:
documenting_pid cito:documents documented_pid
- Parameters
documenting_pid – str PID of a Science Object that documents
documented_pid
.documented_pid – str PID of a Science Object that is documented by
documenting_pid
.
-
setDocumentedBy
(documented_pid, documenting_pid)¶ Add a CiTO, the Citation Typing Ontology, triple asserting that
documented_pid
isDocumentedBydocumenting_pid
.Adds assertion:
documented_pid cito:isDocumentedBy documenting_pid
- Parameters
documented_pid – str PID of a Science Object that is documented by
documenting_pid
.documenting_pid – str PID of a Science Object that documents
documented_pid
.
-
addMetadataDocument
(pid)¶ Add a Science Metadata document.
- Parameters
pid – str PID of a Science Metadata object.
-
addDataDocuments
(scidata_pid_list, scimeta_pid=None)¶ Add Science Data object(s)
- Parameters
scidata_pid_list – list of str List of one or more PIDs of Science Data objects
scimeta_pid – str PID of a Science Metadata object that documents the Science Data objects.
-
getResourceMapPid
()¶ Returns:
str : PID of the Resource Map itself.
-
getAllTriples
()¶ Returns:
list of tuples : Each tuple holds a subject, predicate, object triple
-
getAllPredicates
()¶ Returns: list of str: All unique predicates.
Notes
Equivalent SPARQL:
SELECT DISTINCT ?p WHERE { ?s ?p ?o . }
-
getSubjectObjectsByPredicate
(predicate)¶ - Parameters
predicate – str Predicate for which to return subject, object tuples.
- Returns
All subject/objects with
predicate
.- Return type
list of subject, object tuples
Notes
Equivalent SPARQL:
SELECT DISTINCT ?s ?o WHERE {{ ?s {0} ?o . }}
-
getAggregatedPids
()¶ Returns: list of str: All aggregated PIDs.
Notes
Equivalent SPARQL:
SELECT ?pid WHERE { ?s ore:aggregates ?o . ?o dcterms:identifier ?pid . }
-
getAggregatedScienceMetadataPids
()¶ Returns: list of str: All Science Metadata PIDs.
Notes
Equivalent SPARQL:
SELECT DISTINCT ?pid WHERE { ?s ore:aggregates ?o . ?o cito:documents ?o2 . ?o dcterms:identifier ?pid . }
-
getAggregatedScienceDataPids
()¶ Returns: list of str: All Science Data PIDs.
Notes
Equivalent SPARQL:
SELECT DISTINCT ?pid WHERE { ?s ore:aggregates ?o . ?o cito:isDocumentedBy ?o2 . ?o dcterms:identifier ?pid . }
-
asGraphvizDot
(stream)¶ Serialize the graph to .DOT format for ingestion in Graphviz.
Args: stream: file-like object open for writing that will receive the resulting document.
-
parseDoc
(doc_str, format='xml')¶ Parse a OAI-ORE Resource Maps document.
See Also:
rdflib.ConjunctiveGraph.parse
for documentation on arguments.
-
d1_common.revision module¶
Utilities for working with revision / obsolescence chains.
-
d1_common.revision.
get_identifiers
(sysmeta_pyxb)¶ Get set of identifiers that provide revision context for SciObj.
Returns: tuple: PID, SID, OBSOLETES_PID, OBSOLETED_BY_PID
-
d1_common.revision.
topological_sort
(unsorted_dict)¶ Sort objects by dependency.
Sort a dict of obsoleting PID to obsoleted PID to a list of PIDs in order of obsolescence.
- Parameters
unsorted_dict – dict Dict that holds obsolescence information. Each
key/value
pair establishes that the PID inkey
identifies an object that obsoletes an object identifies by the PID invalue
.- Returns
sorted_list
: A list of PIDs ordered so that all PIDs that obsolete an object are listed after the object they obsolete.unconnected_dict
: A dict of PID to obsoleted PID of any objects that could not be added to a revision chain. These items will have obsoletes PIDs that directly or indirectly reference a PID that could not be sorted.- Return type
tuple of sorted_list, unconnected_dict
Notes
obsoletes_dict
is modified by the sort and on return holds any items that could not be sorted.The sort works by repeatedly iterating over an unsorted list of PIDs and moving PIDs to the sorted list as they become available. A PID is available to be moved to the sorted list if it does not obsolete a PID or if the PID it obsoletes is already in the sorted list.
-
d1_common.revision.
get_pids_in_revision_chain
(client, did)¶ Args: client: d1_client.cnclient.CoordinatingNodeClient or d1_client.mnclient.MemberNodeClient.
- didstr
SID or a PID of any object in a revision chain.
- Returns
All PIDs in the chain. The returned list is in the same order as the chain. The initial PID is typically obtained by resolving a SID. If the given PID is not in a chain, a list containing the single object is returned.
- Return type
list of str
-
d1_common.revision.
revision_list_to_obsoletes_dict
(revision_list)¶ Args: revision_list: list of tuple tuple: PID, SID, OBSOLETES_PID, OBSOLETED_BY_PID.
Returns: dict: Dict of obsoleted PID to obsoleting PID.
-
d1_common.revision.
revision_list_to_obsoleted_by_dict
(revision_list)¶ Args: revision_list: list of tuple tuple: PID, SID, OBSOLETES_PID, OBSOLETED_BY_PID.
Returns: dict: Dict of obsoleting PID to obsoleted PID.
d1_common.system_metadata module¶
Utilities for handling the DataONE SystemMetadata type.
DataONE API methods such as MNStorage.create() require a Science Object and System Metadata pair.
Examples
Example v2 SystemMetadata XML document with all optional values included:
<v2:systemMetadata xmlns:v2="http://ns.dataone.org/service/types/v2.0">
<!--Optional:-->
<serialVersion>11</serialVersion>
<identifier>string</identifier>
<formatId>string</formatId>
<size>11</size>
<checksum algorithm="string">string</checksum>
<!--Optional:-->
<submitter>string</submitter>
<rightsHolder>string</rightsHolder>
<!--Optional:-->
<accessPolicy>
<!--1 or more repetitions:-->
<allow>
<!--1 or more repetitions:-->
<subject>string</subject>
<!--1 or more repetitions:-->
<permission>read</permission>
</allow>
</accessPolicy>
<!--Optional:-->
<replicationPolicy replicationAllowed="true" numberReplicas="3">
<!--Zero or more repetitions:-->
<preferredMemberNode>string</preferredMemberNode>
<!--Zero or more repetitions:-->
<blockedMemberNode>string</blockedMemberNode>
</replicationPolicy>
<!--Optional:-->
<obsoletes>string</obsoletes>
<obsoletedBy>string</obsoletedBy>
<archived>true</archived>
<dateUploaded>2014-09-18T17:18:33</dateUploaded>
<dateSysMetadataModified>2006-08-19T11:27:14-06:00</dateSysMetadataModified>
<originMemberNode>string</originMemberNode>
<authoritativeMemberNode>string</authoritativeMemberNode>
<!--Zero or more repetitions:-->
<replica>
<replicaMemberNode>string</replicaMemberNode>
<replicationStatus>failed</replicationStatus>
<replicaVerified>2013-05-21T19:02:49-06:00</replicaVerified>
</replica>
<!--Optional:-->
<seriesId>string</seriesId>
<!--Optional:-->
<mediaType name="string">
<!--Zero or more repetitions:-->
<property name="string">string</property>
</mediaType>
<!--Optional:-->
<fileName>string</fileName>
</v2:systemMetadata>
-
d1_common.system_metadata.
is_sysmeta_pyxb
(sysmeta_pyxb)¶ Args: sysmeta_pyxb: Object that may or may not be a SystemMetadata PyXB object.
- Returns
True
ifsysmeta_pyxb
is a SystemMetadata PyXB object.False
ifsysmeta_pyxb
is not a PyXB object or is a PyXB object of a type other than SystemMetadata.
- Return type
bool
-
d1_common.system_metadata.
normalize_in_place
(sysmeta_pyxb, reset_timestamps=False, reset_filename=False)¶ Normalize SystemMetadata PyXB object in-place.
- Parameters
sysmeta_pyxb – SystemMetadata PyXB object to normalize.
reset_timestamps – bool
True
: Timestamps in the SystemMetadata are set to a standard value so that objects that are compared after normalization register as equivalent if only their timestamps differ.
Notes
The SystemMetadata is normalized by removing any redundant information and ordering all sections where there are no semantics associated with the order. The normalized SystemMetadata is intended to be semantically equivalent to the un-normalized one.
-
d1_common.system_metadata.
are_equivalent_pyxb
(a_pyxb, b_pyxb, ignore_timestamps=False, ignore_filename=False)¶ Determine if SystemMetadata PyXB objects are semantically equivalent.
Normalize then compare SystemMetadata PyXB objects for equivalency.
- Parameters
a_pyxb, b_pyxb – SystemMetadata PyXB objects to compare
ignore_timestamps – bool
True
: Timestamps are ignored during the comparison.ignore_filename – bool
True
: FileName elements are ignored during the comparison.This is necessary in cases where GMN returns a generated filename because one was not provided in the SysMeta.
- Returns
True if SystemMetadata PyXB objects are semantically equivalent.
- Return type
bool
Notes
The SystemMetadata is normalized by removing any redundant information and ordering all sections where there are no semantics associated with the order. The normalized SystemMetadata is intended to be semantically equivalent to the un-normalized one.
-
d1_common.system_metadata.
are_equivalent_xml
(a_xml, b_xml, ignore_timestamps=False)¶ Determine if two SystemMetadata XML docs are semantically equivalent.
Normalize then compare SystemMetadata XML docs for equivalency.
- Parameters
a_xml, b_xml – bytes UTF-8 encoded SystemMetadata XML docs to compare
ignore_timestamps – bool
True
: Timestamps in the SystemMetadata are ignored so that objects that are compared register as equivalent if only their timestamps differ.
- Returns
True if SystemMetadata XML docs are semantically equivalent.
- Return type
bool
Notes
The SystemMetadata is normalized by removing any redundant information and ordering all sections where there are no semantics associated with the order. The normalized SystemMetadata is intended to be semantically equivalent to the un-normalized one.
-
d1_common.system_metadata.
clear_elements
(sysmeta_pyxb, clear_replica=True, clear_serial_version=True)¶ {clear_replica} causes any replica information to be removed from the object.
{clear_replica} ignores any differences in replica information, as this information is often different between MN and CN.
-
d1_common.system_metadata.
update_elements
(dst_pyxb, src_pyxb, el_list)¶ Copy elements specified in
el_list
fromsrc_pyxb
todst_pyxb
Only elements that are children of root are supported. See SYSMETA_ROOT_CHILD_LIST.
If an element in
el_list
does not exist insrc_pyxb
, it is removed fromdst_pyxb
.
-
d1_common.system_metadata.
generate_system_metadata_pyxb
(pid, format_id, sciobj_stream, submitter_str, rights_holder_str, authoritative_mn_urn, sid=None, obsoletes_pid=None, obsoleted_by_pid=None, is_archived=False, serial_version=1, uploaded_datetime=None, modified_datetime=None, file_name=None, origin_mn_urn=None, is_private=False, access_list=None, media_name=None, media_property_list=None, is_replication_allowed=False, prefered_mn_list=None, blocked_mn_list=None, pyxb_binding=None)¶ Generate a System Metadata PyXB object
- Parameters
pid
format_id
sciobj_stream
submitter_str
rights_holder_str
authoritative_mn_urn
pyxb_binding
sid
obsoletes_pid
obsoleted_by_pid
is_archived
serial_version
uploaded_datetime
modified_datetime
file_name
origin_mn_urn
access_list
is_private
media_name
media_property_list
is_replication_allowed
prefered_mn_list
blocked_mn_list
- Returns
systemMetadata PyXB object
-
d1_common.system_metadata.
gen_checksum_and_size
(sciobj_stream)¶
-
d1_common.system_metadata.
gen_access_policy
(pyxb_binding, sysmeta_pyxb, is_private, access_list)¶
-
d1_common.system_metadata.
gen_replication_policy
(pyxb_binding, prefered_mn_list=None, blocked_mn_list=None, is_replication_allowed=False)¶
-
d1_common.system_metadata.
gen_media_type
(pyxb_binding, media_name, media_property_list=None)¶
d1_common.type_conversions module¶
Utilities for handling the DataONE types.
Handle conversions between XML representations used in the D1 Python stack.
Handle conversions between v1 and v2 DataONE XML types.
The DataONE Python stack uses the following representations for the DataONE API XML docs:
As native Unicode
str
, typically “pretty printed” with indentations, when formatted for display.As UTF-8 encoded
bytes
when send sending or receiving over the network, or loading or saving as files.Schema validation and manipulation in Python code as PyXB binding objects.
General processing as ElementTrees.
In order to allow conversions between all representations without having to implement separate conversions for each combination of input and output representation, a “hub and spokes” model is used. Native Unicode str was selected as the “hub” representation due to:
PyXB provides translation to/from string and DOM.
ElementTree provides translation to/from string.
-
d1_common.type_conversions.
get_version_tag_by_pyxb_binding
(pyxb_binding)¶ Map PyXB binding to DataONE API version.
Given a PyXB binding, return the API major version number.
- Parameters
pyxb_binding – PyXB binding object
- Returns
DataONE API major version number, currently,
v1
,1
,v2
or2
.
-
d1_common.type_conversions.
get_pyxb_binding_by_api_version
(api_major, api_minor=0)¶ Map DataONE API version tag to PyXB binding.
Given a DataONE API major version number, return PyXB binding that can serialize and deserialize DataONE XML docs of that version.
- Parameters
api_major, api_minor – str or int DataONE API major and minor version numbers.
If
api_major
is an integer, it is combined withapi_minor
to form an exact version.If
api_major
is a string ofv1
orv2
,api_minor
is ignored and the latest PyXB bindingavailable for theapi_major
version is returned.
- Returns
E.g.,
d1_common.types.dataoneTypes_v1_1
.- Return type
PyXB binding
-
d1_common.type_conversions.
get_version_tag
(api_major)¶ Args:
api_major: int DataONE API major version. Valid versions are currently 1 or 2. Returns: str: DataONE API version tag. Valid version tags are currently
v1
orv2
.
-
d1_common.type_conversions.
extract_version_tag_from_url
(url)¶ Extract a DataONE API version tag from a MN or CN service endpoint URL.
- Parameters
url – str Service endpoint URL. E.g.:
https://mn.example.org/path/v2/object/pid
.- Returns
Valid version tags are currently
v1
orv2
.- Return type
str
-
d1_common.type_conversions.
get_pyxb_namespaces
()¶ Returns:
list of str: XML namespaces currently known to PyXB
-
d1_common.type_conversions.
str_to_v1_str
(xml_str)¶ Convert a API v2 XML doc to v1 XML doc.
Removes elements that are only valid for v2 and changes namespace to v1.
If doc is already v1, it is returned unchanged.
- Parameters
xml_str – str API v2 XML doc. E.g.:
SystemMetadata v2
.- Returns
API v1 XML doc. E.g.:
SystemMetadata v1
.- Return type
str
-
d1_common.type_conversions.
pyxb_to_v1_str
(pyxb_obj)¶ Convert a API v2 PyXB object to v1 XML doc.
Removes elements that are only valid for v2 and changes namespace to v1.
- Parameters
pyxb_obj – PyXB object API v2 PyXB object. E.g.:
SystemMetadata v2_0
.- Returns
API v1 XML doc. E.g.:
SystemMetadata v1
.- Return type
str
-
d1_common.type_conversions.
str_to_v1_pyxb
(xml_str)¶ Convert a API v2 XML doc to v1 PyXB object.
Removes elements that are only valid for v2 and changes namespace to v1.
- Parameters
xml_str – str API v2 XML doc. E.g.:
SystemMetadata v2
.- Returns
API v1 PyXB object. E.g.:
SystemMetadata v1_2
.- Return type
PyXB object
-
d1_common.type_conversions.
str_to_v2_str
(xml_str)¶ Convert a API v1 XML doc to v2 XML doc.
All v1 elements are valid for v2, so only changes namespace.
- Parameters
xml_str – str API v1 XML doc. E.g.:
SystemMetadata v1
.- Returns
API v2 XML doc. E.g.:
SystemMetadata v2
.- Return type
str
-
d1_common.type_conversions.
pyxb_to_v2_str
(pyxb_obj)¶ Convert a API v1 PyXB object to v2 XML doc.
All v1 elements are valid for v2, so only changes namespace.
- Parameters
pyxb_obj – PyXB object API v1 PyXB object. E.g.:
SystemMetadata v1_0
.- Returns
API v2 XML doc. E.g.:
SystemMetadata v2
.- Return type
str
-
d1_common.type_conversions.
str_to_v2_pyxb
(xml_str)¶ Convert a API v1 XML doc to v2 PyXB object.
All v1 elements are valid for v2, so only changes namespace.
- Parameters
xml_str – str API v1 XML doc. E.g.:
SystemMetadata v1
.- Returns
API v2 PyXB object. E.g.:
SystemMetadata v2_0
.- Return type
PyXB object
-
d1_common.type_conversions.
is_pyxb
(pyxb_obj)¶ Returns:
bool: True if
pyxb_obj
is a PyXB object.
-
d1_common.type_conversions.
is_pyxb_d1_type
(pyxb_obj)¶ Returns:
bool: True if
pyxb_obj
is a PyXB object holding a DataONE API type.
-
d1_common.type_conversions.
is_pyxb_d1_type_name
(pyxb_obj, expected_pyxb_type_name)¶ - Parameters
pyxb_obj – object May be a PyXB object and may hold a DataONE API type.
expected_pyxb_type_name – str Case sensitive name of a DataONE type.
E.g.:
SystemMetadata
,LogEntry
,ObjectInfo
.
- Returns
True if object is a PyXB object holding a value of the specified type.
- Return type
bool
-
d1_common.type_conversions.
pyxb_get_type_name
(obj_pyxb)¶ Args: obj_pyxb: PyXB object.
- Returns
Name of the type the PyXB object is holding.
E.g.:
SystemMetadata
,LogEntry
,ObjectInfo
.- Return type
str
-
d1_common.type_conversions.
pyxb_get_namespace_name
(obj_pyxb)¶ Args: obj_pyxb: PyXB object.
- Returns
Namespace and Name of the type the PyXB object is holding.
E.g.:
{http://ns.dataone.org/service/types/v2.0}SystemMetadata
- Return type
str
-
d1_common.type_conversions.
str_is_v1
(xml_str)¶ - Parameters
xml_str – str DataONE API XML doc.
- Returns
True if XML doc is a DataONE API v1 type.
- Return type
bool
-
d1_common.type_conversions.
str_is_v2
(xml_str)¶ - Parameters
xml_str – str DataONE API XML doc.
- Returns
True if XML doc is a DataONE API v2 type.
- Return type
bool
-
d1_common.type_conversions.
str_is_error
(xml_str)¶ - Parameters
xml_str – str DataONE API XML doc.
- Returns
True if XML doc is a DataONE Exception type.
- Return type
bool
-
d1_common.type_conversions.
str_is_identifier
(xml_str)¶ - Parameters
xml_str – str DataONE API XML doc.
- Returns
True if XML doc is a DataONE Identifier type.
- Return type
bool
-
d1_common.type_conversions.
str_is_objectList
(xml_str)¶ - Parameters
xml_str – str DataONE API XML doc.
- Returns
True if XML doc is a DataONE ObjectList type.
- Return type
bool
-
d1_common.type_conversions.
str_is_well_formed
(xml_str)¶ - Parameters
xml_str – str DataONE API XML doc.
- Returns
True if XML doc is well formed.
- Return type
bool
-
d1_common.type_conversions.
pyxb_is_v1
(pyxb_obj)¶ - Parameters
pyxb_obj – PyXB object PyXB object holding an unknown type.
- Returns
True if
pyxb_obj
holds an API v1 type.- Return type
bool
-
d1_common.type_conversions.
pyxb_is_v2
(pyxb_obj)¶ - Parameters
pyxb_obj – PyXB object PyXB object holding an unknown type.
- Returns
True if
pyxb_obj
holds an API v2 type.- Return type
bool
-
d1_common.type_conversions.
str_to_pyxb
(xml_str)¶ Deserialize API XML doc to PyXB object.
- Parameters
xml_str – str DataONE API XML doc
- Returns
Matching the API version of the XML doc.
- Return type
PyXB object
-
d1_common.type_conversions.
str_to_etree
(xml_str, encoding='utf-8')¶ Deserialize API XML doc to an ElementTree.
- Parameters
xml_str – bytes DataONE API XML doc
encoding – str Decoder to use when converting the XML doc
bytes
to a Unicode str.
- Returns
Matching the API version of the XML doc.
- Return type
ElementTree
-
d1_common.type_conversions.
pyxb_to_str
(pyxb_obj, encoding='utf-8')¶ Serialize PyXB object to XML doc.
- Parameters
pyxb_obj – PyXB object
encoding – str Encoder to use when converting the Unicode strings in the PyXB object to XML doc
bytes
.
- Returns
API XML doc, matching the API version of
pyxb_obj
.- Return type
str
-
d1_common.type_conversions.
etree_to_str
(etree_obj, encoding='utf-8')¶ Serialize ElementTree to XML doc.
- Parameters
etree_obj – ElementTree
encoding – str Encoder to use when converting the Unicode strings in the ElementTree to XML doc
bytes
.
- Returns
XML doc.
- Return type
str
-
d1_common.type_conversions.
etree_to_pretty_xml
(etree_obj, encoding='unicode')¶ Serialize ElementTree to pretty printed XML doc.
- Parameters
etree_obj – ElementTree
encoding – str Encoder to use when converting the Unicode strings in the ElementTree to XML doc
bytes
.
- Returns
Pretty printed XML doc.
- Return type
str
-
d1_common.type_conversions.
pyxb_to_etree
(pyxb_obj)¶ Convert PyXB object to ElementTree.
- Parameters
pyxb_obj – PyXB object
- Returns
Matching the API version of the PyXB object.
- Return type
ElementTree
-
d1_common.type_conversions.
etree_to_pyxb
(etree_obj)¶ Convert ElementTree to PyXB object.
- Parameters
etree_obj – ElementTree
- Returns
Matching the API version of the ElementTree object.
- Return type
PyXB object
-
d1_common.type_conversions.
replace_namespace_with_prefix
(tag_str, ns_reverse_dict=None)¶ Convert XML tag names with namespace on the form
{namespace}tag
to formprefix:tag
.- Parameters
tag_str – str Tag name with namespace. E.g.:
{http://www.openarchives.org/ore/terms/}ResourceMap
.ns_reverse_dict – dict A dictionary of namespace to prefix to use for the conversion. If not supplied, a default dict with the namespaces used in DataONE XML types is used.
- Returns
Tag name with prefix. E.g.:
ore:ResourceMap
.- Return type
str
-
d1_common.type_conversions.
etree_replace_namespace
(etree_obj, ns_str)¶ In-place change the namespace of elements in an ElementTree.
- Parameters
etree_obj – ElementTree
ns_str – str The namespace to set. E.g.:
http://ns.dataone.org/service/types/v1
.
-
d1_common.type_conversions.
strip_v2_elements
(etree_obj)¶ In-place remove elements and attributes that are only valid in v2 types.
Args: etree_obj: ElementTree ElementTree holding one of the DataONE API types that changed between v1 and v2.
-
d1_common.type_conversions.
strip_system_metadata
(etree_obj)¶ In-place remove elements and attributes that are only valid in v2 types from v1 System Metadata.
Args: etree_obj: ElementTree ElementTree holding a v1 SystemMetadata.
-
d1_common.type_conversions.
strip_log
(etree_obj)¶ In-place remove elements and attributes that are only valid in v2 types from v1 Log.
Args: etree_obj: ElementTree ElementTree holding a v1 Log.
-
d1_common.type_conversions.
strip_logEntry
(etree_obj)¶ In-place remove elements and attributes that are only valid in v2 types from v1 LogEntry.
Args: etree_obj: ElementTree ElementTree holding a v1 LogEntry.
-
d1_common.type_conversions.
strip_node
(etree_obj)¶ In-place remove elements and attributes that are only valid in v2 types from v1 Node.
Args: etree_obj: ElementTree ElementTree holding a v1 Node.
-
d1_common.type_conversions.
strip_node_list
(etree_obj)¶ In-place remove elements and attributes that are only valid in v2 types from v1 NodeList.
Args: etree_obj: ElementTree ElementTree holding a v1 NodeList.
-
d1_common.type_conversions.
v2_0_tag
(element_name)¶ Add a v2 namespace to a tag name.
- Parameters
element_name – str The name of a DataONE v2 type. E.g.:
NodeList
.- Returns
The tag name with DataONE API v2 namespace. E.g.:
{http://ns.dataone.org/service/types/v2.0}NodeList
- Return type
str
d1_common.url module¶
Utilities for handling URLs in DataONE.
-
d1_common.url.
parseUrl
(url)¶ Return a dict containing scheme, netloc, url, params, query, fragment keys.
query is a dict where the values are always lists. If the query key appears only once in the URL, the list will have a single value.
-
d1_common.url.
isHttpOrHttps
(url)¶ URL is HTTP or HTTPS protocol.
Upper and lower case protocol names are recognized.
-
d1_common.url.
encodePathElement
(element)¶ Encode a URL path element according to RFC3986.
-
d1_common.url.
decodePathElement
(element)¶ Decode a URL path element according to RFC3986.
-
d1_common.url.
encodeQueryElement
(element)¶ Encode a URL query element according to RFC3986.
-
d1_common.url.
decodeQueryElement
(element)¶ Decode a URL query element according to RFC3986.
-
d1_common.url.
stripElementSlashes
(element)¶ Strip any slashes from the front and end of an URL element.
-
d1_common.url.
joinPathElements
(*elements)¶ Join two or more URL elements, inserting ‘/’ as needed.
Note: Any leading and trailing slashes are stripped from the resulting URL. An empty element (‘’) causes an empty spot in the path (‘//’).
-
d1_common.url.
encodeAndJoinPathElements
(*elements)¶ Encode URL path element according to RFC3986 then join them, inserting ‘/’ as needed.
Note: Any leading and trailing slashes are stripped from the resulting URL. An empty element (‘’) causes an empty spot in the path (‘//’).
-
d1_common.url.
normalizeTarget
(target)¶ If necessary, modify target so that it ends with ‘/’.
-
d1_common.url.
urlencode
(query, doseq=0)¶ Modified version of the standard urllib.urlencode that is conforms to RFC3986. The urllib version encodes spaces as ‘+’ which can lead to inconsistency. This version will always encode spaces as ‘%20’.
Encode a sequence of two-element tuples or dictionary into a URL query string.
If any values in the query arg are sequences and doseq is true, each sequence element is converted to a separate parameter.
If the query arg is a sequence of two-element tuples, the order of the parameters in the output will match the order of parameters in the input.
-
d1_common.url.
makeCNBaseURL
(url)¶ Attempt to create a valid CN BaseURL when one or more sections of the URL are missing.
-
d1_common.url.
makeMNBaseURL
(url)¶ Attempt to create a valid MN BaseURL when one or more sections of the URL are missing.
-
d1_common.url.
find_url_mismatches
(a_url, b_url)¶ Given two URLs, return a list of any mismatches.
If the list is empty, the URLs are equivalent. Implemented by parsing and comparing the elements. See RFC 1738 for details.
-
d1_common.url.
is_urls_equivalent
(a_url, b_url)¶
d1_common.util module¶
General utilities often needed by DataONE clients and servers.
-
d1_common.util.
log_setup
(is_debug=False, is_multiprocess=False)¶ Set up a standardized log format for the DataONE Python stack. All Python components should use this function. If
is_multiprocess
is True, include process ID in the log so that logs can be separated for each process.Output only to stdout and stderr.
-
d1_common.util.
get_content_type
(content_type)¶ Extract the MIME type value from a content type string.
Removes any subtype and parameter values that may be present in the string.
- Parameters
content_type – str String with content type and optional subtype and parameter fields.
- Returns
String with only content type
- Return type
str
Example:
Input: multipart/form-data; boundary=aBoundaryString Returns: multipart/form-data
-
d1_common.util.
nested_update
(d, u)¶ Merge two nested dicts.
Nested dicts are sometimes used for representing various recursive structures. When updating such a structure, it may be convenient to present the updated data as a corresponding recursive structure. This function will then apply the update.
- Parameters
d – dict dict that will be updated in-place. May or may not contain nested dicts.
u – dict dict with contents that will be merged into
d
. May or may not contain nested dicts.
-
class
d1_common.util.
EventCounter
(logger_=<module 'logging' from '/home/docs/.pyenv/versions/3.7.3/lib/python3.7/logging/__init__.py'>)¶ Bases:
object
Count events during a lengthy operation and write running totals and/or a summary to a logger when the operation has completed.
The summary contains the name and total count of each event that was counted.
Example
Summary written to the log:
Events: Creating SciObj DB representations: 200 Retrieving revision chains: 200 Skipped Node registry update: 1 Updating obsoletedBy: 42 Whitelisted subject: 2
-
property
event_dict
¶ Provide direct access to the underlying dict where events are recorded.
Returns: dict: Events and event counts.
-
count
(event_str, inc_int=1)¶ Count an event.
- Parameters
event_str – The name of an event to count. Used as a key in the event dict. The same name will also be used in the summary.
inc_int – int Optional argument to increase the count for the event by more than 1.
-
log_and_count
(event_str, msg_str=None, inc_int=None)¶ Count an event and write a message to a logger.
- Parameters
event_str – str The name of an event to count. Used as a key in the event dict. The same name will be used in the summary. This also becomes a part of the message logged by this function.
msg_str – str Optional message with details about the events. The message is only written to the log. While the
event_str
functions as a key and must remain the same for the same type of event,log_str
may change between calls.inc_int – int Optional argument to increase the count for the event by more than 1.
-
dump_to_log
()¶ Write summary to logger with the name and number of times each event has been counted.
This function may be called at any point in the process. Counts are not zeroed.
-
property
-
d1_common.util.
print_logging
()¶ Context manager to temporarily suppress additional information such as timestamps when writing to loggers.
This makes logging look like
print()
. The main use case is in scripts that mix logging andprint()
, as Python uses separate streams for those, and output can and does end up getting shuffled ifprint()
and logging is used interchangeably.When entering the context, the logging levels on the current handlers are saved then modified to WARNING levels. A new DEBUG level handler with a formatter that does not write timestamps, etc, is then created.
When leaving the context, the DEBUG handler is removed and existing loggers are restored to their previous levels.
By modifying the log levels to WARNING instead of completely disabling the loggers, it is ensured that potentially serious issues can still be logged while the context manager is in effect.
-
d1_common.util.
save_json
(py_obj, json_path)¶ Serialize a native object to JSON and save it normalized, pretty printed to a file.
The JSON string is normalized by sorting any dictionary keys.
- Parameters
py_obj – object Any object that can be represented in JSON. Some types, such as datetimes are automatically converted to strings.
json_path – str File path to which to write the JSON file. E.g.: The path must exist. The filename will normally end with “.json”.
See also
ToJsonCompatibleTypes()
-
d1_common.util.
load_json
(json_path)¶ Load JSON file and parse it to a native object.
- Parameters
json_path – str File path from which to load the JSON file.
- Returns
Typically a nested structure of
list
anddict
objects.- Return type
object
-
d1_common.util.
format_json_to_normalized_pretty_json
(json_str)¶ Normalize and pretty print a JSON string.
The JSON string is normalized by sorting any dictionary keys.
- Parameters
json_str – A valid JSON string.
- Returns
normalized, pretty printed JSON string.
- Return type
str
-
d1_common.util.
serialize_to_normalized_pretty_json
(py_obj)¶ Serialize a native object to normalized, pretty printed JSON.
The JSON string is normalized by sorting any dictionary keys.
- Parameters
py_obj – object Any object that can be represented in JSON. Some types, such as datetimes are automatically converted to strings.
- Returns
normalized, pretty printed JSON string.
- Return type
str
-
d1_common.util.
serialize_to_normalized_compact_json
(py_obj)¶ Serialize a native object to normalized, compact JSON.
The JSON string is normalized by sorting any dictionary keys. It will be on a single line without whitespace between elements.
- Parameters
py_obj – object Any object that can be represented in JSON. Some types, such as datetimes are automatically converted to strings.
- Returns
normalized, compact JSON string.
- Return type
str
-
class
d1_common.util.
ToJsonCompatibleTypes
(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶ Bases:
json.encoder.JSONEncoder
Some native objects such as
datetime.datetime
are not automatically converted to strings for use as values in JSON.This helper adds such conversions for types that the DataONE Python stack encounters frequently in objects that are to be JSON encoded.
-
default
(o)¶ Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
-
d1_common.xml module¶
Utilities for handling XML docs.
-
d1_common.xml.
deserialize
(doc_xml, pyxb_binding=None)¶ Deserialize DataONE XML types to PyXB.
- Parameters
doc_xml – UTF-8 encoded
bytes
pyxb_binding – PyXB binding object. If not specified, the correct one should be
selected automatically.
- Returns
PyXB object
See also
deserialize_d1_exception()
for deserializing DataONE Exception types.
-
d1_common.xml.
deserialize_d1_exception
(doc_xml)¶ Args: doc_xml: UTF-8 encoded
bytes
An XML doc that conforms to the dataoneErrors XML Schema.Returns: DataONEException object
-
d1_common.xml.
serialize_gen
(obj_pyxb, encoding='utf-8', pretty=False, strip_prolog=False, xslt_url=None)¶ Serialize PyXB object to XML.
- Parameters
obj_pyxb – PyXB object PyXB object to serialize.
encoding – str Encoding to use for XML doc bytes
pretty – bool True: Use pretty print formatting for human readability.
strip_prolog – True: remove any XML prolog (e.g.,
<?xml version="1.0" encoding="utf-8"?>
), from the resulting XML doc.xslt_url – str If specified, add a processing instruction to the XML doc that specifies the download location for an XSLT stylesheet.
- Returns
XML document
-
d1_common.xml.
serialize_for_transport
(obj_pyxb, pretty=False, strip_prolog=False, xslt_url=None)¶ Serialize PyXB object to XML
bytes
with UTF-8 encoding for transport over the network, filesystem storage and other machine usage.- Parameters
obj_pyxb – PyXB object PyXB object to serialize.
pretty – bool True: Use pretty print formatting for human readability.
strip_prolog – True: remove any XML prolog (e.g.,
<?xml version="1.0" encoding="utf-8"?>
), from the resulting XML doc.xslt_url – str If specified, add a processing instruction to the XML doc that specifies the download location for an XSLT stylesheet.
- Returns
UTF-8 encoded XML document
- Return type
bytes
See also
serialize_for_display()
-
d1_common.xml.
serialize_to_xml_str
(obj_pyxb, pretty=True, strip_prolog=False, xslt_url=None)¶ Serialize PyXB object to pretty printed XML
str
for display.- Parameters
obj_pyxb – PyXB object PyXB object to serialize.
pretty – bool False: Disable pretty print formatting. XML will not have line breaks.
strip_prolog – True: remove any XML prolog (e.g.,
<?xml version="1.0" encoding="utf-8"?>
), from the resulting XML doc.xslt_url – str If specified, add a processing instruction to the XML doc that specifies the download location for an XSLT stylesheet.
- Returns
Pretty printed XML document
- Return type
str
-
d1_common.xml.
reformat_to_pretty_xml
(doc_xml)¶ Pretty print XML doc.
- Parameters
doc_xml – str Well formed XML doc
- Returns
Pretty printed XML doc
- Return type
str
-
d1_common.xml.
are_equivalent_pyxb
(a_pyxb, b_pyxb)¶ Return True if two PyXB objects are semantically equivalent, else False.
-
d1_common.xml.
are_equivalent
(a_xml, b_xml, encoding=None)¶ Return True if two XML docs are semantically equivalent, else False.
TODO: Include test for tails. Skipped for now because tails are not used in any D1 types.
-
d1_common.xml.
are_equal_or_superset
(superset_tree, base_tree)¶ Return True if
superset_tree
is equal to or a superset ofbase_tree
Checks that all elements and attributes in
superset_tree
are present and contain the same values as inbase_tree
. For elements, also checks that the order is the same.Can be used for checking if one XML document is based on another, as long as all the information in
base_tree
is also present and unmodified insuperset_tree
.
-
d1_common.xml.
are_equal_xml
(a_xml, b_xml)¶ Normalize and compare XML documents for equality. The document may or may not be a DataONE type.
- Parameters
a_xml – str
b_xml – str XML documents to compare for equality.
- Returns
True
if the XML documents are semantically equivalent.- Return type
bool
-
d1_common.xml.
are_equal_pyxb
(a_pyxb, b_pyxb)¶ Normalize and compare PyXB objects for equality.
- Parameters
a_pyxb – PyXB object
b_pyxb – PyXB object PyXB objects to compare for equality.
- Returns
True
if the PyXB objects are semantically equivalent.- Return type
bool
-
d1_common.xml.
are_equal_elements
(a_el, b_el)¶ Normalize and compare ElementTrees for equality.
- Parameters
a_el – ElementTree
b_el – ElementTree ElementTrees to compare for equality.
- Returns
True
if the ElementTrees are semantically equivalent.- Return type
bool
-
d1_common.xml.
sort_value_list_pyxb
(obj_pyxb)¶ In-place sort complex value siblings in a PyXB object.
Args: obj_pyxb: PyXB object
-
d1_common.xml.
sort_elements_by_child_values
(obj_pyxb, child_name_list)¶ In-place sort simple or complex elements in a PyXB object by values they contain in child elements.
- Parameters
obj_pyxb – PyXB object
child_name_list – list of str List of element names that are direct children of the PyXB object.
-
d1_common.xml.
format_diff_pyxb
(a_pyxb, b_pyxb)¶ Create a diff between two PyXB objects.
- Parameters
a_pyxb – PyXB object
b_pyxb – PyXB object
- Returns
Differ-style delta
- Return type
str
-
d1_common.xml.
format_diff_xml
(a_xml, b_xml)¶ Create a diff between two XML documents.
- Parameters
a_xml – str
b_xml – str
- Returns
Differ-style delta
- Return type
str
-
d1_common.xml.
is_valid_utf8
(o)¶ Determine if object is valid UTF-8 encoded bytes.
- Parameters
o – str
- Returns
True
if object isbytes
containing valid UTF-8.- Return type
bool
Notes
An empty
bytes
object is valid UTF-8.Any type of object can be checked, not only
bytes
.
-
d1_common.xml.
get_auto
(obj_pyxb)¶ Return value from simple or complex PyXB element.
PyXB complex elements have a
.value()
member which must be called in order to retrieve the value of the element, while simple elements represent their values directly. This function allows retrieving element values without knowing the type of element.- Parameters
obj_pyxb – PyXB object
- Returns
Value of the PyXB object.
- Return type
str
-
d1_common.xml.
get_opt_attr
(obj_pyxb, attr_str, default_val=None)¶ Get an optional attribute value from a PyXB element.
The attributes for elements that are optional according to the schema and not set in the PyXB object are present and set to None.
PyXB validation will fail if required elements are missing.
- Parameters
obj_pyxb – PyXB object
attr_str – str Name of an attribute that the PyXB object may contain.
default_val – any object Value to return if the attribute is not present.
- Returns
Value of the attribute if present, else
default_val
.- Return type
str
-
d1_common.xml.
get_opt_val
(obj_pyxb, attr_str, default_val=None)¶ Get an optional Simple Content value from a PyXB element.
The attributes for elements that are optional according to the schema and not set in the PyXB object are present and set to None.
PyXB validation will fail if required elements are missing.
- Parameters
obj_pyxb – PyXB object
attr_str – str Name of an attribute that the PyXB object may contain.
default_val – any object Value to return if the attribute is not present.
- Returns
Value of the attribute if present, else
default_val
.- Return type
str
-
d1_common.xml.
get_req_val
(obj_pyxb)¶ Get a required Simple Content value from a PyXB element.
The attributes for elements that are required according to the schema are always present, and provide a value() method.
PyXB validation will fail if required elements are missing.
Getting a Simple Content value from PyXB with .value() returns a PyXB object that lazily evaluates to a native Unicode string. This confused parts of the Django ORM that check types before passing values to the database. This function forces immediate conversion to Unicode.
- Parameters
obj_pyxb – PyXB object
- Returns
Value of the element.
- Return type
str
-
exception
d1_common.xml.
CompareError
¶ Bases:
Exception
Raised when objects are compared and found not to be semantically equivalent.