{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Filesystem Implementation Introduction\n",
                "\n",
                "This notebook aims to provide an overview of the functionality of `hards`. \n",
                "\n",
                "Throughout the notebook, we will use two temporary directories called `project_dir` and `database_dir`. These can both be thought of as arbitrary directories on the filesystem. The former might be where we have some software running a simulation that produces some data files, we wish to store that data in the latter so we can perform some aggregate analysis at a later date. "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "from pathlib import Path\n",
                "from tempfile import TemporaryDirectory\n",
                "\n",
                "temp_dir = TemporaryDirectory()\n",
                "\n",
                "project_dir = Path(temp_dir.name) / \"project_dir\"\n",
                "database_dir = Path(temp_dir.name) / \"database_dir\"\n",
                "\n",
                "project_dir.mkdir()\n",
                "database_dir.mkdir()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "First, we need to create a database since one does not already exist."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "from hards.filesystem import FilesystemDatabase\n",
                "\n",
                "database = FilesystemDatabase.create_database(database_dir / \"database\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Datasets\n",
                "\n",
                "A dataset has a name which must be unique on this level (ie. the `database` cannot have another dataset named `database_1`). The dataset name (and all names of `hards` objects) should only contain ASCII letters and digits, full stops (periods), hypens, and underscores. "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "dataset_1 = database.create_dataset(\"dataset_1\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "We can assign data to the dataset using a dictionary with a string key and JSON-serializable values. Note that data on a dataset should be thought of as _metadata_, the datapoints should contain your actual data (e.g. one datapoint per run of a simulation)."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "dataset_1.add_data({\n",
                "    \"version\": 1,\n",
                "    \"cost\": 123.78,\n",
                "})\n",
                "\n",
                "# Data can be added multiple times\n",
                "dataset_1.add_data({\n",
                "    \"name\": \"database1\",\n",
                "    \"owners\": [\"you\", 1234],\n",
                "})\n",
                "\n",
                "# The data attribute reflects the data from both calls\n",
                "print(dataset_1.data)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "If you attempt to add data with an existing key, the old data is overwritten."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "dataset_1.add_data({\"version\": 2})\n",
                "\n",
                "print(dataset_1.data[\"version\"])"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "You can also attach files to a dataset."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "my_file = project_dir / \"my_file.txt\"\n",
                "with my_file.open(\"w\") as f:\n",
                "    f.write(\"my important data!\")\n",
                "\n",
                "dataset_1.add_file(my_file)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Files are copied, so the original is preserved and a new file exists within the datasets structure."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "print(f\"Does my_file.txt exist? {my_file.exists()}\")\n",
                "\n",
                "print(f\"{dataset_1.name} has the following files: {dataset_1.files}\")"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "We can access this file from the dataset and read it like a regular file. It is not advised to attempt to modify the files in the dataset structure. Instead, modify the original file and re-add it under the same name (the file will be overwritten with the new version)."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "my_file_in_dataset = dataset_1.get_file(\"my_file.txt\")\n",
                "\n",
                "with my_file_in_dataset.open() as f:\n",
                "    print(f.read())"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "## Datapoints\n",
                "\n",
                "A dataset can contain many datapoints. Similar to the dataset above, a datapoint has its own data and can manage files. \n",
                "\n",
                "_**NOTE:**_ a file can be renamed using the `name` keyword. "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "datapoint = dataset_1.create_datapoint(\"dataset\")\n",
                "\n",
                "datapoint.add_data({\"input_1\": 1.0, \"input_2\": 12.2})\n",
                "\n",
                "my_new_file = project_dir / \"data_point_file.txt\"\n",
                "with my_new_file.open(\"w\") as f:\n",
                "    f.write(\"data on the datapoint!\")\n",
                "\n",
                "datapoint.add_file(my_new_file, name=\"alternative_name.txt\")"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {},
            "outputs": [],
            "source": [
                "print(datapoint.data)\n",
                "\n",
                "with datapoint.get_file(\"alternative_name.txt\").open() as f:\n",
                "    print(f.read())"
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": ".venv",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.10.13"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 2
}