Manage your data with Stash

Manage your data with Stash

You can find this tutorial in the demos folder of your Jupyter notebook environment.

    • stash_tutorial.ipynb
  • The Camber stash package offers an interface to pass your data, code, and analysis between your personal and Camber’s public cloud storage.

    There are two types of stashes:

    • private: your personal cloud storage, which also mirrors your notebook’s local filesystem.
    • public: a read-only cloud storage that all Camber users have access to, also known as the “Open Stash”

    Each Stash inits with a given current working directory:

    • private: this is equivalent to the $HOME of your Jupyter notebook environment, or /home/{username}
    • public: this is just the cloud storage location used by Camber to provision things like datasets

    In this tutorial, follow along to learn how use stash to:

    1. view files and directories in your stash.
    2. transfer data from open to private stashes.

    View files and directories

    First, import Camber and assign variables to your stashes.

    import camber
    prv_stash = camber.stash.private
    pub_stash = camber.stash.public

    Inspect the home directories of your stashes with the ls method:

    print("private stash data:", prv_stash.ls("~"))
    print("public stash data:", pub_stash.ls("~"))
    private stash data: ['demos/']
    public stash data: ['datasets/']
    

    Note that demos/ is included in the results of your private stash ls. This is the aforementioned private stash mirror at play. You are welcome to use shell to manipulate files in your Jupyter notebook filesystem, however, using Stash will allow you to interface with other cloud storage more efficiently, as we see below.

    Copy from open to private stashes

    The datasets directory in the public stash holds datasets that are managed by Camber. Use ls to list the files in the open stash tutorials/ dataset:

    pub_stash.ls("~/datasets/tutorial")
    ['cereal.csv', 'titanic.csv']

    Public stash is read-only. To manipulate an open dataset, you need to copy it to your private stash. Before doing that, though, make a file in your Jupyter space called stash-tutorial. This is to help keep your private stash organized.

    !mkdir -p ~/demos/20-tutorials/01-stash/stash-tutorial

    Now use the cp method to copy the cereal dataset from the open stash to the stash-tutorial/ directory in your private stash:

    pub_stash.cp(
        dest_stash=prv_stash,
        src_path="~/datasets/tutorial/cereal.csv",
        dest_path="~/demos/20-tutorials/01-stash/stash-tutorial/cereal.csv",
    )

    Confirm that it’s in your private stash:

    prv_stash.ls("base/demos/20-tutorials/01-stash/stash-tutorial")
    ['cereal.csv']

    Read more