Tutorial
========

Welcome to the Cerulean tutorial. This tutorial demonstrates the basics of using
Cerulean: using local and remote file systems, running processes locally and
remotely, and using schedulers.

To install Cerulean, use

.. code-block:: bash

  pip install cerulean

If you're using Cerulean in a program, you will probably want to use a
virtualenv and install Cerulean into that, together with your other
dependencies.


Accessing files
---------------

The file access functions of Cerulean use a ``pathlib``-like API, but unlike in
``pathlib``, Cerulean supports remote file systems. That means that there is no
longer just the local file system, but multiple file systems, and that Path
objects have a particular file system that they are on.

Of course, Cerulean also supports the local file system. To make an object
representing the local file system, you use this:

.. code-block:: python

  import cerulean

  fs = cerulean.LocalFileSystem()

And then you can make a path on the file system using:

.. code-block:: python

  import cerulean

  fs = cerulean.LocalFileSystem()
  my_home_dir = fs / 'home' / 'username'

In this example, ``my_home_dir`` will be a :class:`cerulean.Path` object,
which is very similar to a normal Python ``pathlib.PosixPath``. For example, you
can read the contents of a file through it:

.. code-block:: python

  import cerulean

  fs = cerulean.LocalFileSystem()
  passwd_file = fs / 'etc' / 'passwd'

  users = passwd_file.read_text()
  print(users)

Note that :class:`cerulean.Path` does not support ``open()``. Cerulean can copy
files and stream data from and to them, but it does not offer random access, as
not all remote file access protocols support this.

You can use the ``/`` operator to build paths from components as with
``pathlib``, and there's a wide variety of supported operations. See the API
documentation for :class:`cerulean.Path` for details.

Remote filesystems
``````````````````

Cerulean supports remote file systems through the SFTP protocol. (It uses the
Paramiko library internally for this.) Accessing a remote file system through
SFTP goes like this:

.. code-block:: python

  import cerulean

  credential = cerulean.PasswordCredential('username', 'password')
  with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
      with SftpFileSystem(term) as fs:
          my_home_dir = fs / 'home' / 'username'
          test_txt = (my_home_dir / 'test.txt').read_text()
          print(test_txt)

Since we are going to connect to a remote system, we need a credential.
Cerulean has two types of credentials, :class:`PasswordCredential` and
:class:`PubKeyCredential`. They are what you expect, one holds a username and
a password, the other a username, a local path to a public key file, and
optionally a passphrase for the key.

Once we have a credential, we can open a terminal. Like a terminal window on
your desktop, a :class:`Terminal` object lets you run commands. Cerulean
supports local terminals and remote terminals through SSH. Since the SFTP
protocol is an extension to the SSH protocol, we need an SSH terminal connection
first, so we make one, connecting to a host, on a port, with our credential.
This terminal holds an SSH connection, which needs to be closed when we are done
with it. :class:`SshTerminal` is therefore a context manager and needs to be
used in a ``with`` statement. Note that :class:`LocalTerminal` is not a context
manager, as it does not hold any resources.

Once we have the terminal, we can make an :class:`SftpFileSystem` object, and
from there it works just like a local file system. Just like
:class:`SshTerminal`, :class:`SftpFileSystem` is a context manager, so we need
another ``with``-statement.

Copying files
`````````````

When running jobs on HPC machines, you often start with copying the input files
from the local system to the HPC machine, and finish with copying the results
back. Cerulean's :meth:`copy` function takes care of this for you, and works as
you would expect:

.. code-block:: python

  import cerulean


  local_fs = cerulean.LocalFileSystem()

  credential = cerulean.PasswordCredential('username', 'password')
  with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
      with SftpFileSystem(term) as remote_fs:
          input_file = local_fs / 'home' / 'username' / 'input.txt'
          job_dir = remote_fs / 'home' / 'username' / 'my_job'
          cerulean.copy(input_file, job_dir)

          # run job and wait for it to finish

          output_file = local_fs / 'home' / 'username' / 'output.txt'
          cerulean.copy(job_dir / 'output.txt', output_file)

Running commands
----------------

If you have read the above, then the secret is already out: running commands
using Cerulean is done using a :class:`Terminal`. For example, you can run a
command locally using:

.. code-block:: python

  import cerulean

  term = cerulean.LocalTerminal()

  exit_code, stdout_text, stderr_text = term.run(
          10.0, 'ls', ['-l'], None, '/home/username')

The first argument to :meth:`Terminal.run` is a timeout value in seconds,
which determines how long Cerulean will wait for the command to finish. The
second argument is the command to run, followed by a list of arguments. Next is
an optional string that, if you specify it, will be fed into the standard input
of the program you are starting. The final argument is a string specifying the
working directory in which to execute the command.

The function returns a tuple containing three values: the exit code of the
process (or `None` if it didn't finish in time), a string containing text
printed to standard output, and a string containing text printed to standard
error by the command you ran.

Running commands remotely through SSH of course works in exactly the same way,
except you use an :class:`SshTerminal`, as above:

.. code-block:: python

  import cerulean

  credential = cerulean.PasswordCredential('username', 'password')
  with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
      exit_code, stdout_text, stderr_text = term.run(
              10.0, 'ls', ['-l'], None, '/home/username')


Submitting jobs
---------------

On High Performance Computing machines, you don't run commands directly.
Instead, you submit batch jobs to a scheduler, which will place them in a queue,
and run them when everyone else in line before you is done. The most popular
scheduler at the moment seems to be Slurm, but Cerulean also supports
Torque/PBS.

The usual way of working with a scheduler is to use ``ssh`` to connect to the
cluster, where you run commands that submit jobs and check on their status.
Cerulean works in the same way:

.. code-block:: python

  import cerulean
  import time

  credential = cerulean.PasswordCredential('username', 'password')
  with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
      sched = cerulean.SlurmScheduler(term)

      job = cerulean.JobDescription()
      job.name = 'cerulean_test'
      job.command = 'ls'
      job.arguments = ['-l']

      job_id = sched.submit_job(job)

      time.sleep(5)
      status = sched.get_status(job_id)

      if status == cerulean.JobStatus.DONE:
          exit_code = sched.get_exit_code()
          print('Job exited with code {}'.format(exit_code))

Of course, if you intend to run your submission script on the head node, then
the scheduler is local, and you want to use a :class:`LocalTerminal` with your
:class:`SlurmScheduler`. If your HPC machine runs Torque/PBS, use a
:class:`TorqueScheduler` instead.


More information
----------------

To find all the details of what Cerulean can do and how to do it, please refer
to the :doc:`API documentation<api>`.