Tutorial¶
Welcome to the Cerulean tutorial. This tutorial demonstrates the basics of using Cerulean: using local and remote file systems, running processes locally and remotely, and using schedulers.
To install Cerulean, use
pip install cerulean
If you’re using Cerulean in a program, you will probably want to use a virtualenv and install Cerulean into that, together with your other dependencies.
Accessing files¶
The file access functions of Cerulean use a pathlib
-like API, but unlike in
pathlib
, Cerulean supports remote file systems. That means that there is no
longer just the local file system, but multiple file systems, and that Path
objects have a particular file system that they are on.
Of course, Cerulean also supports the local file system. To make an object representing the local file system, you use this:
import cerulean
fs = cerulean.LocalFileSystem()
And then you can make a path on the file system using:
import cerulean
fs = cerulean.LocalFileSystem()
my_home_dir = fs / 'home' / 'username'
In this example, my_home_dir
will be a cerulean.Path
object,
which is very similar to a normal Python pathlib.PosixPath
. For example, you
can read the contents of a file through it:
import cerulean
fs = cerulean.LocalFileSystem()
passwd_file = fs / 'etc' / 'passwd'
users = passwd_file.read_text()
print(users)
Note that cerulean.Path
does not support open()
. Cerulean can copy
files and stream data from and to them, but it does not offer random access, as
not all remote file access protocols support this.
You can use the /
operator to build paths from components as with
pathlib
, and there’s a wide variety of supported operations. See the API
documentation for cerulean.Path
for details.
Remote filesystems¶
Cerulean supports remote file systems through the SFTP protocol. (It uses the Paramiko library internally for this.) Accessing a remote file system through SFTP goes like this:
import cerulean
credential = cerulean.PasswordCredential('username', 'password')
with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
with SftpFileSystem(term) as fs:
my_home_dir = fs / 'home' / 'username'
test_txt = (my_home_dir / 'test.txt').read_text()
print(test_txt)
Since we are going to connect to a remote system, we need a credential.
Cerulean has two types of credentials, PasswordCredential
and
PubKeyCredential
. They are what you expect, one holds a username and
a password, the other a username, a local path to a public key file, and
optionally a passphrase for the key.
Once we have a credential, we can open a terminal. Like a terminal window on
your desktop, a Terminal
object lets you run commands. Cerulean
supports local terminals and remote terminals through SSH. Since the SFTP
protocol is an extension to the SSH protocol, we need an SSH terminal connection
first, so we make one, connecting to a host, on a port, with our credential.
This terminal holds an SSH connection, which needs to be closed when we are done
with it. SshTerminal
is therefore a context manager and needs to be
used in a with
statement. Note that LocalTerminal
is not a context
manager, as it does not hold any resources.
Once we have the terminal, we can make an SftpFileSystem
object, and
from there it works just like a local file system. Just like
SshTerminal
, SftpFileSystem
is a context manager, so we need
another with
-statement.
Copying files¶
When running jobs on HPC machines, you often start with copying the input files
from the local system to the HPC machine, and finish with copying the results
back. Cerulean’s copy()
function takes care of this for you, and works as
you would expect:
import cerulean
local_fs = cerulean.LocalFileSystem()
credential = cerulean.PasswordCredential('username', 'password')
with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
with SftpFileSystem(term) as remote_fs:
input_file = local_fs / 'home' / 'username' / 'input.txt'
job_dir = remote_fs / 'home' / 'username' / 'my_job'
cerulean.copy(input_file, job_dir)
# run job and wait for it to finish
output_file = local_fs / 'home' / 'username' / 'output.txt'
cerulean.copy(job_dir / 'output.txt', output_file)
Running commands¶
If you have read the above, then the secret is already out: running commands
using Cerulean is done using a Terminal
. For example, you can run a
command locally using:
import cerulean
term = cerulean.LocalTerminal()
exit_code, stdout_text, stderr_text = term.run(
10.0, 'ls', ['-l'], None, '/home/username')
The first argument to Terminal.run()
is a timeout value in seconds,
which determines how long Cerulean will wait for the command to finish. The
second argument is the command to run, followed by a list of arguments. Next is
an optional string that, if you specify it, will be fed into the standard input
of the program you are starting. The final argument is a string specifying the
working directory in which to execute the command.
The function returns a tuple containing three values: the exit code of the process (or None if it didn’t finish in time), a string containing text printed to standard output, and a string containing text printed to standard error by the command you ran.
Running commands remotely through SSH of course works in exactly the same way,
except you use an SshTerminal
, as above:
import cerulean
credential = cerulean.PasswordCredential('username', 'password')
with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
exit_code, stdout_text, stderr_text = term.run(
10.0, 'ls', ['-l'], None, '/home/username')
Submitting jobs¶
On High Performance Computing machines, you don’t run commands directly. Instead, you submit batch jobs to a scheduler, which will place them in a queue, and run them when everyone else in line before you is done. The most popular scheduler at the moment seems to be Slurm, but Cerulean also supports Torque/PBS.
The usual way of working with a scheduler is to use ssh
to connect to the
cluster, where you run commands that submit jobs and check on their status.
Cerulean works in the same way:
import cerulean
import time
credential = cerulean.PasswordCredential('username', 'password')
with cerulean.SshTerminal('remotehost.example.com', 22, credential) as term
sched = cerulean.SlurmScheduler(term)
job = cerulean.JobDescription()
job.name = 'cerulean_test'
job.command = 'ls'
job.arguments = ['-l']
job_id = sched.submit_job(job)
time.sleep(5)
status = sched.get_status(job_id)
if status == cerulean.JobStatus.DONE:
exit_code = sched.get_exit_code()
print('Job exited with code {}'.format(exit_code))
Of course, if you intend to run your submission script on the head node, then
the scheduler is local, and you want to use a LocalTerminal
with your
SlurmScheduler
. If your HPC machine runs Torque/PBS, use a
TorqueScheduler
instead.
More information¶
To find all the details of what Cerulean can do and how to do it, please refer to the API documentation.