The Filesystem Implementation ¶

The ‘filesystem implementation’ implements The API in its entirety, using the json and pathlib standard libraries to store data on the local filesystem.

Examples ¶

Several examples notebooks are included to illustrate the basics of using the HARDS filesystem implementation. The final example is a complete example of the HARDS workflow for managing experimental data.

Examples

Code Documentation ¶

A filesystem implementation of the abstract hierarchical data management API.

Generally the only object that should be instantiated by the user is the FilesystemDatabase. Databases and datapoints should be instantiated by calling methods on the database (and subsequently databases) which will handle other object instantiations.

class hards.filesystem.FilesystemDatabase(location: Path)¶

Bases: _FilesystemHasChildrenMixin, AbstractDatabase

Database represented by a directory on the filesystem.

property children: list[str]¶: The names of the object’s current children (datasets).

classmethod create_database(location: Path) → FilesystemDatabase¶

Create a database at the given location in the filesystem.

Parameters:: location (Path) – The location to create the directory that will represent the database.
Returns:: The new database object.
Return type:: FilesystemDatabase

create_dataset(name: str) → FilesystemDataset¶

Create a dataset as a child of this object.

Parameters:

name (str) – The name of the new dataset.

Returns:

The new dataset object.

Return type:

FilesystemDataset

Raises:

AlreadyExistsError – If a database with the same name already exists.
InvalidNameError – If the dataset name contains invalid character.

database() → AbstractDatabase¶: Return the database (this object).

fullname() → str¶

Return full name of the object.

This is the name that, when calling a recursive get method on the database would return a new instantiation of this object.

get_dataset(name: str) → FilesystemDataset¶

Return a dataset that is a child of this object.

Parameters:: name (str) – The name of the dataset to get
Returns:: The dataset with the given name.
Return type:: FilesystemDataset
Raises:: DoesNotExistError – If a dataset with the given name does not exist.

has_dataset(name: str) → bool¶

Indicate if the object has a child dataset with the given name.

Parameters:: name (str) – The name of the dataset to check exists.
Returns:: True if dataset exists, else False.
Return type:: bool

property is_database: bool¶: Returns true because this object does represent a database.

property name: str¶: The objects name.

property parent: _TreeNode | None¶: The object’s parent.

path_to_database() → list[str]¶: Return the names of this object and the intermediates to the database.

recursively_get_datapoint(name: str) → AbstractDatapoint¶

Recurisvely follow a tree of datasets and return a datapoint.

Parameters:: name (str) – The name of the datasets to follow and the datapoint to return of the form <intermediate dataset>/<intermediate dataset>/<…>/<datapoint>.
Returns:: The datapoint object.
Return type:: AbstractDatapoint
Raises:: DoesNotExistError – If any of the intermediate datasets or the datapoint does not exist.

recursively_get_dataset(name: str) → AbstractDataset¶

Recursively follow a tree of datasets and return the final dataset.

Parameters:: name (str) – The name of the datasets to follow and the dataset to return of the form <intermediate dataset>/<intermediate dataset>/<…>/<dataset of interest>
Returns:: The dataset object.
Return type:: AbstractDataset
Raises:: DoesNotExistError – If any of the intermediate datasets do not exist.

class hards.filesystem.FilesystemDatapoint(location: Path, parent: FilesystemDataset)¶

Bases: _FilesystemHasFilesMixin, _FilesystemHasDataMixin, AbstractDatapoint

Datapoint represented by a directory on the filesystem.

add_data(new_data: dict[str, Any]) → None¶

Add new data to the object.

Adding new data does not remove old data unless a key already exists, in which case the old data of that key is overwritten by the newer data.

Parameters:: new_data (dict[str, Any]) – New data, in key-value form, to be added to this objects data store.

add_file(file: Path, *, name: str | None = None, permissions: int = 256) → None¶

Add a file to be managed by this object.

Files are copied into the database and should be treated therein as read-only. They can be ‘modified’ by re-adding an updated file with the same name.

Parameters:

file (Path) – A path to the file to add to this object.
name (str | None, optional) – Give the file a new name (include the extension).

Raises:

DoesNotExistError – If the file does not exist or is not a file (e.g. it is a directory).
InvalidNameError – If the filename contains invalid characters. The filename may not be checked if a name is not explicitly provided.

database() → _TreeNode¶: Recursively find the database.

property files: list[str]¶: The list of file names (including extensions).

fullname() → str¶

Return full name of the object.

This is the name that, when calling a recursive get method on the database would return a new instantiation of this object.

get_file(name: str) → Path¶

Return the path to the file with the given name.

Parameters:: name (str) – The name of the file (including its extension)
Returns:: The path to the file (read-only).
Return type:: Path
Raises:: DoesNotExistError – If a file with the given name does not exist.

has_file(name: str) → bool¶

Return a bool indicating whether the file with the given name exists.

The name must include the file extension.

property is_database: bool¶

True if the object is the database (root node).

False by default.

property name: str¶: The name of the datapoint.

property parent: FilesystemDataset¶: The parent Dataset of this Datapoint.

path_to_database() → list[str]¶: Return the names of this object and the intermediates to the database.

class hards.filesystem.FilesystemDataset(location: Path, parent: FilesystemDatabase | FilesystemDataset)¶

Bases: _FilesystemHasFilesMixin, _FilesystemHasDataMixin, _FilesystemHasChildrenMixin, AbstractDataset

Dataset represented by a directory on the filesystem.

add_data(new_data: dict[str, Any]) → None¶

Add new data to the object.

Adding new data does not remove old data unless a key already exists, in which case the old data of that key is overwritten by the newer data.

Parameters:: new_data (dict[str, Any]) – New data, in key-value form, to be added to this objects data store.

add_file(file: Path, *, name: str | None = None, permissions: int = 256) → None¶

Add a file to be managed by this object.

Files are copied into the database and should be treated therein as read-only. They can be ‘modified’ by re-adding an updated file with the same name.

Parameters:

file (Path) – A path to the file to add to this object.
name (str | None, optional) – Give the file a new name (include the extension).

Raises:

DoesNotExistError – If the file does not exist or is not a file (e.g. it is a directory).
InvalidNameError – If the filename contains invalid characters. The filename may not be checked if a name is not explicitly provided.

property children: list[str]¶: The names of the object’s current children (datasets).

create_datapoint(name: str) → FilesystemDatapoint¶

Create and return a datapoint with a given name.

Parameters:

name (str) – The name of the new datapoint.

Returns:

The new object.

Return type:

AbstractDatapoint

Raises:

AlreadyExistsError – If a datapoint with the same name exists.
InvalidNameError – If the datapoint name contains invalid characters.

create_dataset(name: str) → FilesystemDataset¶

Create a dataset as a child of this object.

Parameters:

name (str) – The name of the new dataset.

Returns:

The new dataset object.

Return type:

FilesystemDataset

Raises:

AlreadyExistsError – If a database with the same name already exists.
InvalidNameError – If the dataset name contains invalid character.

database() → _TreeNode¶: Recursively find the database.

property datapoints: list[str]¶: The names of the Dataset’s current datapoints.

property files: list[str]¶: The list of file names (including extensions).

fullname() → str¶

Return full name of the object.

This is the name that, when calling a recursive get method on the database would return a new instantiation of this object.

get_datapoint(name: str) → FilesystemDatapoint¶

Get the datapoint with a given name.

Parameters:: name (str) – The name of the datapoint.
Returns:: The datapoint object.
Return type:: FilesystemDatapoint
Raises:: DoesNotExistError – If the datapoint does not exist.

get_dataset(name: str) → FilesystemDataset¶

Return a dataset that is a child of this object.

Parameters:: name (str) – The name of the dataset to get
Returns:: The dataset with the given name.
Return type:: FilesystemDataset
Raises:: DoesNotExistError – If a dataset with the given name does not exist.

get_file(name: str) → Path¶

Return the path to the file with the given name.

Parameters:: name (str) – The name of the file (including its extension)
Returns:: The path to the file (read-only).
Return type:: Path
Raises:: DoesNotExistError – If a file with the given name does not exist.

has_datapoint(name: str) → bool¶: Indicate if the object has a datapoint with the given name.

has_dataset(name: str) → bool¶

Indicate if the object has a child dataset with the given name.

Parameters:: name (str) – The name of the dataset to check exists.
Returns:: True if dataset exists, else False.
Return type:: bool

has_file(name: str) → bool¶

Return a bool indicating whether the file with the given name exists.

The name must include the file extension.

property is_database: bool¶

True if the object is the database (root node).

False by default.

property name: str¶: The objects name.

property parent: FilesystemDatabase | FilesystemDataset¶: Return the parent of the Datset.

path_to_database() → list[str]¶: Return the names of this object and the intermediates to the database.

recursively_get_datapoint(name: str) → AbstractDatapoint¶

Recurisvely follow a tree of datasets and return a datapoint.

Parameters:: name (str) – The name of the datasets to follow and the datapoint to return of the form <intermediate dataset>/<intermediate dataset>/<…>/<datapoint>.
Returns:: The datapoint object.
Return type:: AbstractDatapoint
Raises:: DoesNotExistError – If any of the intermediate datasets or the datapoint does not exist.

recursively_get_datapoints(*, reconstruct: bool = True, parents: bool = True) → list[AbstractDatapoint]¶

Get all datapoints of this dataset and its parents (iff parents is True).

Follows the parents until the database, collecting their datapoints.

Parameters:

reconstruct (bool) – Reconstruct the entire tree above this object and re-calls this method on a new instance of this dataset. This mitigates the situation where this datasets parent has been modified in another instance. This makes this method safe in single-threaded synchronous applications but makes no guarantees about parallel or asynchronous applications.
parents (bool) – If False, only the datapoints for this dataset are returned. Ie this method acts as a safe way to get the instantiated datapoints of this dataset only.

Notes

Reconstruction does not guarantee the safety of this method. See the relevant documentation sections for considerations.

recursively_get_dataset(name: str) → AbstractDataset¶

Recursively follow a tree of datasets and return the final dataset.

Parameters:: name (str) – The name of the datasets to follow and the dataset to return of the form <intermediate dataset>/<intermediate dataset>/<…>/<dataset of interest>
Returns:: The dataset object.
Return type:: AbstractDataset
Raises:: DoesNotExistError – If any of the intermediate datasets do not exist.

The Filesystem Implementation ¶

Examples ¶

Code Documentation ¶

Table of Contents

Previous topic

Next topic

This Page

The Filesystem Implementation¶

Examples¶

Code Documentation¶

The Filesystem Implementation ¶

Examples ¶

Code Documentation ¶