d1_common.utils package

DataONE Common Library.

Although this directory is not a package, this __init__.py file is required for pytest to be able to reach test directories below this directory.

Submodules

d1_common.utils.filesystem module

Utilities for filesystem paths and operations.

d1_common.utils.filesystem.gen_safe_path(*path_list)

Escape characters that are not allowed or often cause issues when used in file- or directory names, then join the arguments to a filesystem path.

Parameters

positional args – str Strings to use as elements in a filesystem path, such as PID, SID or URL.

Returns

A path safe for use as a as a file- or directory name.

Return type

str

d1_common.utils.filesystem.gen_safe_path_element(s)

Escape characters that are not allowed or often cause issues when used in file- or directory names.

Leading and trailing “.” (period) are stripped out in order to prevent inadvertently creating hidden files.

Parameters

s – str Any string, such as a PID, SID or URL

Returns

A string safe for use as a file- or directory name.

Return type

str

d1_common.utils.filesystem.create_missing_directories_for_file(file_path)

Create any directories in dir_path that do not yet exist.

Parameters

file_path – str Relative or absolute path to a file that may or may not exist.

Must be a file path, as any directory element at the end of the path will not be created.

See also

create_missing_directories_for_dir()

d1_common.utils.filesystem.create_missing_directories_for_dir(dir_path)

Create any directories in dir_path that do not yet exist.

Parameters

dir_path – str Relative or absolute path to a directory that may or may not exist.

Must be a directory path, as any filename element at the end of the path will also be created as a directory.

See also

create_missing_directories_for_file()

d1_common.utils.filesystem.abs_path_from_base(base_path, rel_path)

Join a base and a relative path and return an absolute path to the resulting location.

Parameters
  • base_path – str Relative or absolute path to prepend to rel_path.

  • rel_path – str Path relative to the location of the module file from which this function is called.

Returns

Absolute path to the location specified by rel_path.

Return type

str

d1_common.utils.filesystem.abs_path(rel_path)

Convert a path that is relative to the module from which this function is called, to an absolute path.

Parameters

rel_path – str Path relative to the location of the module file from which this function is called.

Returns

Absolute path to the location specified by rel_path.

Return type

str

d1_common.utils.progress_logger module

One stop shop for providing progress information and event counts during time consuming operations performed in command line scripts and Django management commands.

The ProgressLogger keeps track of how many tasks have been processed by a script, how many are remaining, and how much time has been used. It then calculates and periodically displays a progress update containing an ETA and completed percentage.

The ProgressLogger can also be used for counting errors and other notable events that may occur during processing, and displays total count for each type of tracked event in the progress updates.

In the following example, progress information is added to a script that processes the tasks in a list of tasks. All the tasks require the same processing, so there’s only one task type, and one loop in the script.

import logging import d1_common.utils.progress_logger

def main():

logging.basicConfig(level=logging.DEBUG)

progress_logger = d1_common.utils.progress_logger.ProgressLogger()

long_task_list = get_long_task_list()

self.progress_logger.start_task_type(

“My time consuming task”, len(long_task_list)

)

for task in long_task_list:

self.progress_logger.start_task(“My time consuming task”) do_time_consuming_work_on_task(task) if task.has_some_issue():

progress_logger.event(‘Task has issue’)

if task.has_other_issue():

progress_logger.event(‘Task has other issue’)

self.progress_logger.end_task_type(“My time consuming task”)

self.progress_logger.completed()

Yields progress output such as:

My time consuming task: 64/1027 (6.23% 0d00h00m) My time consuming task: 123/1027 (11.98% 0d00h00m) My time consuming task: 180/1027 (17.53% 0d00h00m) Events:

Task has issue: 1

My time consuming task: 236/1027 (22.98% 0d00h00m) Events:

Task has issue: 2 Task has other issue: 1

My time consuming task: 436/1027 (32.98% 0d00h00m) Events:

Task has issue: 2 Task has other issue: 1

My time consuming task: 636/1027 (44.12% 0d00h00m) Events:

Task has issue: 2 Task has other issue: 1

Completed. runtime_sec=5.44 total_run_dhm=”0d00h00m”

class d1_common.utils.progress_logger.ProgressLogger(logger=None, log_level=20, log_interval_sec=1.0)

Bases: object

__init__(logger=None, log_level=20, log_interval_sec=1.0)

Create one object of this class at the start of the script and keep a reference to it while the script is running.

Parameters
  • logger – Optional logger to which the progress log entries are written. A new logger is created if not provided.

  • log_level – The level of severity to set for the progress log entries.

  • log_interval_sec – Minimal time between writing log entries. Log entries may be written with less time between entries if the total processing time for a task type is less than the interval, or if processing multiple task types concurrently.

start_task_type(task_type_str, total_task_count)

Call when about to start processing a new type of task, typically just before entering a loop that processes many task of the given type.

Parameters
  • task_type_str (str) – The name of the task, used as a dict key and printed in the progress updates.

  • total_task_count (int) – The total number of the new type of task that will be processed.

This starts the timer that is used for providing an ETA for completing all tasks of the given type.

The task type is included in progress updates until end_task_type() is called.

end_task_type(task_type_str)

Call when processing of all tasks of the given type is completed, typically just after exiting a loop that processes many tasks of the given type.

Progress messages logged at intervals will typically not include the final entry which shows that processing is 100% complete, so a final progress message is logged here.

start_task(task_type_str, current_task_index=None)

Call when processing is about to start on a single task of the given task type, typically at the top inside of the loop that processes the tasks.

Parameters
  • task_type_str (str) – The name of the task, used as a dict key and printed in the progress updates.

  • current_task_index (int) – If the task processing loop may skip or repeat tasks, the index of the current task must be provided here. This parameter can normally be left unset.

event(event_name, count_int=1)

Register an event that occurred during processing of a task of the given type.

Args: event_name: str A name for a type of events. Events of the same type are displayed as a single entry and a total count of occurences.

completed()

Call when about to exit the script.

Logs total runtime for the script and issues a warning if there are still active task types. Active task types should be closed with end_task_type() when processing is completed for tasks of the given type in order for accurate progress messages to be displayed.