The Filesystem Implementation

The ‘filesystem implementation’ implements The API in its entirety, using the json and pathlib standard libraries to store data on the local filesystem.

Examples

Several examples notebooks are included to illustrate the basics of using the HARDS filesystem implementation. The final example is a complete example of the HARDS workflow for managing experimental data.

Code Documentation

A filesystem implementation of the abstract hierarchical data management API.

Generally the only object that should be instantiated by the user is the FilesystemDatabase. Databases and datapoints should be instantiated by calling methods on the database (and subsequently databases) which will handle other object instantiations.

class hards.filesystem.FilesystemDatabase(location: Path)

Bases: _FilesystemHasChildrenMixin, AbstractDatabase

Database represented by a directory on the filesystem.

property children: list[str]

The names of the object’s current children (datasets).

classmethod create_database(location: Path) FilesystemDatabase

Create a database at the given location in the filesystem.

Parameters:

location (Path) – The location to create the directory that will represent the database.

Returns:

The new database object.

Return type:

FilesystemDatabase

create_dataset(name: str) FilesystemDataset

Create a dataset as a child of this object.

Parameters:

name (str) – The name of the new dataset.

Returns:

The new dataset object.

Return type:

FilesystemDataset

Raises:
  • AlreadyExistsError – If a database with the same name already exists.

  • InvalidNameError – If the dataset name contains invalid character.

database() AbstractDatabase

Return the database (this object).

fullname() str

Return full name of the object.

This is the name that, when calling a recursive get method on the database would return a new instantiation of this object.

get_dataset(name: str) FilesystemDataset

Return a dataset that is a child of this object.

Parameters:

name (str) – The name of the dataset to get

Returns:

The dataset with the given name.

Return type:

FilesystemDataset

Raises:

DoesNotExistError – If a dataset with the given name does not exist.

has_dataset(name: str) bool

Indicate if the object has a child dataset with the given name.

Parameters:

name (str) – The name of the dataset to check exists.

Returns:

True if dataset exists, else False.

Return type:

bool

property is_database: bool

Returns true because this object does represent a database.

property name: str

The objects name.

property parent: _TreeNode | None

The object’s parent.

path_to_database() list[str]

Return the names of this object and the intermediates to the database.

recursively_get_datapoint(name: str) AbstractDatapoint

Recurisvely follow a tree of datasets and return a datapoint.

Parameters:

name (str) – The name of the datasets to follow and the datapoint to return of the form <intermediate dataset>/<intermediate dataset>/<…>/<datapoint>.

Returns:

The datapoint object.

Return type:

AbstractDatapoint

Raises:

DoesNotExistError – If any of the intermediate datasets or the datapoint does not exist.

recursively_get_dataset(name: str) AbstractDataset

Recursively follow a tree of datasets and return the final dataset.

Parameters:

name (str) – The name of the datasets to follow and the dataset to return of the form <intermediate dataset>/<intermediate dataset>/<…>/<dataset of interest>

Returns:

The dataset object.

Return type:

AbstractDataset

Raises:

DoesNotExistError – If any of the intermediate datasets do not exist.

class hards.filesystem.FilesystemDatapoint(location: Path, parent: FilesystemDataset)

Bases: _FilesystemHasFilesMixin, _FilesystemHasDataMixin, AbstractDatapoint

Datapoint represented by a directory on the filesystem.

add_data(new_data: dict[str, Any]) None

Add new data to the object.

Adding new data does not remove old data unless a key already exists, in which case the old data of that key is overwritten by the newer data.

Parameters:

new_data (dict[str, Any]) – New data, in key-value form, to be added to this objects data store.

add_file(file: Path, *, name: str | None = None, permissions: int = 256) None

Add a file to be managed by this object.

Files are copied into the database and should be treated therein as read-only. They can be ‘modified’ by re-adding an updated file with the same name.

Parameters:
  • file (Path) – A path to the file to add to this object.

  • name (str | None, optional) – Give the file a new name (include the extension).

Raises:
  • DoesNotExistError – If the file does not exist or is not a file (e.g. it is a directory).

  • InvalidNameError – If the filename contains invalid characters. The filename may not be checked if a name is not explicitly provided.

database() _TreeNode

Recursively find the database.

property files: list[str]

The list of file names (including extensions).

fullname() str

Return full name of the object.

This is the name that, when calling a recursive get method on the database would return a new instantiation of this object.

get_file(name: str) Path

Return the path to the file with the given name.

Parameters:

name (str) – The name of the file (including its extension)

Returns:

The path to the file (read-only).

Return type:

Path

Raises:

DoesNotExistError – If a file with the given name does not exist.

has_file(name: str) bool

Return a bool indicating whether the file with the given name exists.

The name must include the file extension.

property is_database: bool

True if the object is the database (root node).

False by default.

property name: str

The name of the datapoint.

property parent: FilesystemDataset

The parent Dataset of this Datapoint.

path_to_database() list[str]

Return the names of this object and the intermediates to the database.

class hards.filesystem.FilesystemDataset(location: Path, parent: FilesystemDatabase | FilesystemDataset)

Bases: _FilesystemHasFilesMixin, _FilesystemHasDataMixin, _FilesystemHasChildrenMixin, AbstractDataset

Dataset represented by a directory on the filesystem.

add_data(new_data: dict[str, Any]) None

Add new data to the object.

Adding new data does not remove old data unless a key already exists, in which case the old data of that key is overwritten by the newer data.

Parameters:

new_data (dict[str, Any]) – New data, in key-value form, to be added to this objects data store.

add_file(file: Path, *, name: str | None = None, permissions: int = 256) None

Add a file to be managed by this object.

Files are copied into the database and should be treated therein as read-only. They can be ‘modified’ by re-adding an updated file with the same name.

Parameters:
  • file (Path) – A path to the file to add to this object.

  • name (str | None, optional) – Give the file a new name (include the extension).

Raises:
  • DoesNotExistError – If the file does not exist or is not a file (e.g. it is a directory).

  • InvalidNameError – If the filename contains invalid characters. The filename may not be checked if a name is not explicitly provided.

property children: list[str]

The names of the object’s current children (datasets).

create_datapoint(name: str) FilesystemDatapoint

Create and return a datapoint with a given name.

Parameters:

name (str) – The name of the new datapoint.

Returns:

The new object.

Return type:

AbstractDatapoint

Raises:
  • AlreadyExistsError – If a datapoint with the same name exists.

  • InvalidNameError – If the datapoint name contains invalid characters.

create_dataset(name: str) FilesystemDataset

Create a dataset as a child of this object.

Parameters:

name (str) – The name of the new dataset.

Returns:

The new dataset object.

Return type:

FilesystemDataset

Raises:
  • AlreadyExistsError – If a database with the same name already exists.

  • InvalidNameError – If the dataset name contains invalid character.

database() _TreeNode

Recursively find the database.

property datapoints: list[str]

The names of the Dataset’s current datapoints.

property files: list[str]

The list of file names (including extensions).

fullname() str

Return full name of the object.

This is the name that, when calling a recursive get method on the database would return a new instantiation of this object.

get_datapoint(name: str) FilesystemDatapoint

Get the datapoint with a given name.

Parameters:

name (str) – The name of the datapoint.

Returns:

The datapoint object.

Return type:

FilesystemDatapoint

Raises:

DoesNotExistError – If the datapoint does not exist.

get_dataset(name: str) FilesystemDataset

Return a dataset that is a child of this object.

Parameters:

name (str) – The name of the dataset to get

Returns:

The dataset with the given name.

Return type:

FilesystemDataset

Raises:

DoesNotExistError – If a dataset with the given name does not exist.

get_file(name: str) Path

Return the path to the file with the given name.

Parameters:

name (str) – The name of the file (including its extension)

Returns:

The path to the file (read-only).

Return type:

Path

Raises:

DoesNotExistError – If a file with the given name does not exist.

has_datapoint(name: str) bool

Indicate if the object has a datapoint with the given name.

has_dataset(name: str) bool

Indicate if the object has a child dataset with the given name.

Parameters:

name (str) – The name of the dataset to check exists.

Returns:

True if dataset exists, else False.

Return type:

bool

has_file(name: str) bool

Return a bool indicating whether the file with the given name exists.

The name must include the file extension.

property is_database: bool

True if the object is the database (root node).

False by default.

property name: str

The objects name.

property parent: FilesystemDatabase | FilesystemDataset

Return the parent of the Datset.

path_to_database() list[str]

Return the names of this object and the intermediates to the database.

recursively_get_datapoint(name: str) AbstractDatapoint

Recurisvely follow a tree of datasets and return a datapoint.

Parameters:

name (str) – The name of the datasets to follow and the datapoint to return of the form <intermediate dataset>/<intermediate dataset>/<…>/<datapoint>.

Returns:

The datapoint object.

Return type:

AbstractDatapoint

Raises:

DoesNotExistError – If any of the intermediate datasets or the datapoint does not exist.

recursively_get_datapoints(*, reconstruct: bool = True, parents: bool = True) list[AbstractDatapoint]

Get all datapoints of this dataset and its parents (iff parents is True).

Follows the parents until the database, collecting their datapoints.

Parameters:
  • reconstruct (bool) – Reconstruct the entire tree above this object and re-calls this method on a new instance of this dataset. This mitigates the situation where this datasets parent has been modified in another instance. This makes this method safe in single-threaded synchronous applications but makes no guarantees about parallel or asynchronous applications.

  • parents (bool) – If False, only the datapoints for this dataset are returned. Ie this method acts as a safe way to get the instantiated datapoints of this dataset only.

Notes

Reconstruction does not guarantee the safety of this method. See the relevant documentation sections for considerations.

recursively_get_dataset(name: str) AbstractDataset

Recursively follow a tree of datasets and return the final dataset.

Parameters:

name (str) – The name of the datasets to follow and the dataset to return of the form <intermediate dataset>/<intermediate dataset>/<…>/<dataset of interest>

Returns:

The dataset object.

Return type:

AbstractDataset

Raises:

DoesNotExistError – If any of the intermediate datasets do not exist.