{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Sub-Datasets in the Filesystem Implementation\n",
                "\n",
                "A sub-dataset is a dataset that exists under another dataset, not the database. This allows data to be organised and shared in a _hierarchical_ structure."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "from pathlib import Path\n",
                "from tempfile import TemporaryDirectory\n",
                "\n",
                "from hards.filesystem import FilesystemDatabase\n",
                "\n",
                "temp_dir = TemporaryDirectory()\n",
                "\n",
                "database = FilesystemDatabase.create_database(Path(temp_dir.name) / \"database\")\n",
                "dataset = database.create_dataset(\"dataset\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "One can create sub-datasets of sub-datasets ad infinitum (subject to the limits of the filesystem).\n",
                "\n",
                "For example, we can create the following structure:\n",
                "```\n",
                ".\n",
                "└── database\n",
                "    ├── dataset\n",
                "    │   └── sub_dataset\n",
                "    │       └── sub_dataset\n",
                "    │           └── sub_dataset\n",
                "    └── <other datasets>\n",
                "```\n",
                "\n",
                "_**NOTE:**_ there is no functional difference between a dataset and sub-dataset, the distinction is only made for clarity of this example."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "sub_dataset = dataset.create_dataset(\"sub_dataset\")\n",
                "\n",
                "# NOTE the names can be the same because they are not on the same level!\n",
                "sub_sub_dataset = sub_dataset.create_dataset(\"sub_dataset\")\n",
                "sub_sub_sub_dataset = sub_sub_dataset.create_dataset(\"sub_dataset\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "The rules of data sharing are very simple: a dataset has access to its parent(s) direct data.\n",
                "\n",
                "For example, adding a datapoint to the dataset will make it available to all of the sub-datasets."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "dataset.create_datapoint(\"my_datapoint\")\n",
                "\n",
                "print(f\"Dataset number of datapoints: {len(dataset.recursively_get_datapoints())}\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "print(\n",
                "    f\"Sub-dataset number of datapoints: {len(sub_dataset.recursively_get_datapoints())}\"\n",
                ")\n",
                "print(\n",
                "    \"Sub-sub-dataset number of datapoints: \"\n",
                "    f\"{len(sub_sub_dataset.recursively_get_datapoints())}\"\n",
                ")\n",
                "print(\n",
                "    \"Sub-sub-sub-dataset number of datapoints: \"\n",
                "    f\"{len(sub_sub_sub_dataset.recursively_get_datapoints())}\"\n",
                ")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "# NOTE that the .datapoints property only provides access to direct datapoints\n",
                "# of a dataset. E.g.\n",
                "print(f\"Dataset direct datapoints: {dataset.datapoints}\")\n",
                "print(f\"Sub-dataset direct datapoints: {sub_dataset.datapoints}\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Therefore, if we add a datapoint onto the sub-dataset, it will not be available to the dataset but will be to the other sub-datasets."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "sub_dataset.create_datapoint(\"sub_datapoint\")\n",
                "\n",
                "print(f\"Dataset number of datapoints: {len(dataset.recursively_get_datapoints())}\")\n",
                "print(\n",
                "    f\"Sub-dataset number of datapoints: {len(sub_dataset.recursively_get_datapoints())}\"\n",
                ")\n",
                "print(\n",
                "    \"Sub-sub-dataset number of datapoints: \"\n",
                "    f\"{len(sub_sub_dataset.recursively_get_datapoints())}\"\n",
                ")"
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": ".venv",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.10.13"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 2
}