d1_common.iter package

This package contains iterators that provide a convenient way to retrieve and iterate over Node contents.

Although this directory is not a package, this __init__.py file is required for pytest to be able to reach test directories below this directory.

Submodules

d1_common.iter.bytes module

Generator that returns a bytes object in chunks.

class d1_common.iter.bytes.BytesIterator(bytes_, chunk_size=1024)

Bases: object

Generator that returns a bytes object in chunks.

property size

Returns:

int: The total number of bytes that will be returned by the iterator.

d1_common.iter.path module

Generator that resolves a list of file and dir paths and returns file paths with optional filtering and client feedback.

d1_common.iter.path.path_generator(path_list, include_glob_list=None, exclude_glob_list=None, recursive=True, ignore_invalid=False, default_excludes=True, return_entered_dir_paths=False, return_skipped_dir_paths=False)

# language=rst.

Parameters
  • path_list – list of str

    List of file- and dir paths. File paths are used directly and dirs are searched for files.

    path_list does not accept glob patterns, as it’s more convenient to let the shell expand glob patterns to directly specified files and dirs. E.g., to use a glob to select all .py files in a subdir, the command may be called with sub/dir/*.py, which the shell expands to a list of files, which are then passed to this function. The paths should be Unicode or utf-8 strings. Tilde (“~”) to home expansion is performed on the paths.

    The shell can also expand glob patterns to dir paths or a mix of file and dir paths.

  • include_glob_list – list of str

  • exclude_glob_list – list of str

    Patterns ending with “/” are matched only against dir names. All other patterns are matched only against file names.

    If the include list contains any file patterns, files must match one or more of the patterns in order to be returned.

    If the include list contains any dir patterns, dirs must match one or more of the patterns in order for the recursive search to descend into them.

    The exclude list works in the same way except that matching files and dirs are excluded instead of included. If both include and exclude lists are specified, files and dirs must both match the include and not match the exclude patterns in order to be returned or descended into.

  • recursive – bool

    • True (default): Search subdirectories

    • False: Do not search subdirectories

  • ignore_invalid – bool

    • True: Invalid paths in path_list are ignored.

    • False (default): EnvironmentError is raised if any of the paths in path_list do not reference an existing file or dir.

  • default_excludes – bool

    • True: A list of glob patterns for files and dirs that should typically be ignored is added to any exclude patterns passed to the function. These include dirs such as .git and backup files, such as files appended with “~”.

    • False: No files or dirs are excluded by default.

  • return_entered_dir_paths – bool

    • False: Only file paths are returned.

    • True: Directory paths are also returned.

  • return_skipped_dir_paths – bool

    • False: Paths of skipped dirs are not returned.

    • True: Paths of skipped dirs are returned.

    The iterator never descends into excluded dirs, and by default, does not return the paths of excluded dirs. However, the client may need to get the paths of dirs that were excluded instead of dirs that were included. E.g., when looking for dirs to delete.

Returns

File path iterator

Notes

During iteration, the iterator can be prevented from descending into a directory by sending a “skip” flag when the iterator yields the directory path. This allows the client to determine if directories should be iterated by, for instance, which files are present in the directory. This can be used in conjunction with the include and exclude glob lists. Note that, in order to receive directory paths that can be skipped, return_entered_dir_paths must be set to True.

The regular for...in syntax does not support sending the “skip” flag back to the iterator. Instead, use a pattern like:

itr = file_iterator.file_iter(..., return_entered_dir_paths=True)
try:
  path = itr.next()
  while True:
  skip_dir = determine_if_dir_should_be_skipped(path)
  file_path = itr.send(skip_dir)
except KeyboardInterrupt:
  raise StopIteration
except StopIteration:
  pass

Glob patterns are matched only against file and directory names, not the full paths.

Paths passed directly in path_list are not filtered.

The same file can be returned multiple times if path_list contains duplicated file paths or dir paths, or dir paths that implicitly include the same subdirs.

include_glob_list and exclude_glob_list are handy for filtering the files found in dir searches.

Remember to escape the include and exclude glob patterns on the command line so that they’re not expanded by the shell.

class d1_common.iter.path.ArgParser(description_str=None, formatter_class=<class 'argparse.RawDescriptionHelpFormatter'>, **val_dict)

Bases: object

An argparse.ArgumentParser populated with a standard set of command line arguments for controlling the path generator from the command line.

The script that calls this function will typically add its own specific arguments by making additional parser.add_argument() calls.

When creating the path_generator, simply pass parser.path_arg_dict to path_generator().

Example

import d1_common.iter.path

parser = d1_common.iter.path.ArgParser(

__doc__, # Set non-configurable values include_glob_list=[‘*.py’], return_entered_dir_paths=True,

) # Add command specific arguments parser.add_argument(…) # Create the path_generator and iterate over the resulting paths for p in d1_common.iter.path.path_generator(parser.path_arg_dict):

print(p)

ARG_DICT = {'default_excludes': ('--no-default-excludes', {'action': 'store_false', 'help': "Don't add default glob exclude patterns"}), 'exclude_glob_list': ('--exclude', {'default': ['*egg-info/', '.git/', '.idea/', '__pycache__/', '.eggs/', '.pytest_cache/', 'build/', 'dist/', 'doc/', 'generated/', 'migrations/', '*~', '*.bak', '*.tmp', '*.pyc'], 'nargs': '+', 'metavar': 'glob', 'help': 'Exclude glob patterns'}), 'ignore_invalid': ('--ignore-invalid', {'action': 'store_true', 'help': 'Ignore invalid paths'}), 'include_glob_list': ('--include', {'nargs': '+', 'metavar': 'glob', 'help': 'Exclude glob patterns'}), 'path_list': ('path', {'nargs': '+', 'help': 'File or directory path'}), 'recursive': ('--no-recursive', {'action': 'store_false', 'help': 'Do not search directories recursively'}), 'return_entered_dir_paths': ('--return-entered-dir-paths', {'action': 'store_true', 'help': 'Return the paths of dirs that the generator enters'}), 'return_skipped_dir_paths': ('--return-skipped-dir-paths', {'action': 'store_true', 'help': 'Return the paths of dirs that the generator enters'})}
__init__(description_str=None, formatter_class=<class 'argparse.RawDescriptionHelpFormatter'>, **val_dict)

Create a ArgumentParser populated with a standard set of command line arguments for controlling the path generator from the command line.

Parameters
  • description_str – Description of the command The description is included in the automatically generated help message.

  • formatter_class – Modify the help message format. See the argparse module for available Formatter classes.

  • fixed value overrides – Passing any of these arguments causes provided value to be used when instantiating the path generator, and the corresponding command line argument to become hidden and unavailable.

    fixed_path_list fixed_exclude_glob_list fixed_include_glob_list fixed_recursive fixed_ignore_invalid fixed_default_excludes fixed_return_entered_dir_paths fixed_return_skipped_dir_paths

  • default value overrides – Passing any of these arguments causes the provided value to be used as the default. The corresponding command line argument is still available and can be used to override the default value

    default_path_list default_exclude_glob_list default_include_glob_list default_recursive default_ignore_invalid default_default_excludes default_return_entered_dir_paths default_return_skipped_dir_paths

add_argument(*arg_list, **arg_dict)

Add command specific arguments.

property args

Get complete command line arguments as Namespace object.

Returns

Complete command line arguments.

This is an exact representation of the parsed command line and does not include any fixed value substitutions from the val_dict passed to __init__().

Return type

Namespace

property path_arg_dict

Get command line arguments as dict suitable for passing to a path_generator create call via argument unpacking.

Returns

Arguments valid for passing to path_generator() create call.

The dict will include any fixed value substitutions that were passed to __init__() via the val_dict.

Return type

dict

d1_common.iter.stream module

class d1_common.iter.stream.StreamIterator(stream, chunk_size=1024)

Bases: object

Generator that returns a stream in chunks.

In this context, a stream is anything with a read() method and, if the client requires it, a way to determine the total number of elements that will be returned by the read() method at any point during iteration.

Typical sources for streams are files and HTML responses.

__init__(stream, chunk_size=1024)

Args: stream: Object with read() method, such as an open file.

chunk_size: int Max number of elements to return in each chunk. The last chunk will normally be smaller. Other chunks may be smaller as well, but never empty.

property size

Returns:

int : The total number of bytes that will be returned by the iterator.

d1_common.iter.string module

Generator that returns the Unicode characters of a str in chunks.

class d1_common.iter.string.StringIterator(string, chunk_size=1024)

Bases: object

Generator that returns the Unicode characters of a str in chunks.

property size

Returns:

int : The total number of characters that will be returned by the iterator.