Frequenty Asked Questions and support for Parcels


Parcels (Probably A Really Computationally Efficient Lagrangian Simulator) is a set of Python classes and methods to create customisable particle tracking simulations using output from Ocean Circulation models. Parcels focusses on tracking of passive water parcels, as well as active particulates such as plankton, plastic and fish.

The Parcels code is licensed under an open source MIT license and can be downloaded from https://github.com/OceanParcels/parcels.

There is also an extensive documentation of all methods and classes in Parcels.


Parcels design overview

The figure above gives a brief overview of the most important classes and methods in Parcels and how they are related. Classes are in blue, methods are in green. Note that not all classes and methods are shown.

Tips on constructing FieldSet objects

Probably the trickiest bit to get right when starting with Parcels on your own model output is how to construct a FieldSet object. The general method to use is FieldSet.from_netcdf(), which requires filenames, variables and dimensions. Each of these is a dictionary, and variables requires at least a U and V, but any other variable can be added too (e.g. temperature, mixedlayerdepth, etc). Note also that filenames can contain wildcards. For example, the GlobCurrent data that is shipped with Parcels can be read with:
fname = 'GlobCurrent_example_data/*.nc'
filenames = {'U': fname, 'V': fname}
variables = {'U': 'eastward_eulerian_current_velocity', 'V': 'northward_eulerian_current_velocity'}
dimensions = {'lat': 'lat', 'lon': 'lon', 'depth': 'depth', 'time': 'time'}
fset = FieldSet.from_netcdf(filenames, variables, dimensions)
Note that dimensions can also be a dictionary-of-dictionaries. For example, if you have wind data on a completely different grid (and without depth dimension), you can do:
fname = 'GlobCurrent_example_data/*.nc'
wname = 'path_to_your_windfiles'
filenames = {'U': fname, 'V': fname, 'wind': wname}
variables = {'U': 'eastward_eulerian_current_velocity', 'V': 'northward_eulerian_current_velocity', 'wind': 'wind'}
dimensions = {}
dimensions['U'] = {'lat': 'lat', 'lon': 'lon', 'depth': 'depth', 'time': 'time'}
dimensions['V'] = {'lat': 'lat', 'lon': 'lon', 'depth': 'depth', 'time': 'time'}
dimensions['wind'] = {'lat': 'latw', 'lon': 'lonw', 'time': 'time'}
fset = FieldSet.from_netcdf(filenames, variables, dimensions)
In a similar way, you can add U and V fields that are on different grids (e.g. Arakawa C-grids). Parcels will take care under the hood that the different grids are dealt with properly.

Writing Parcels Kernels

One of the most powerful features of Parcels is the ability to write custom Kernels (see e.g. the Adding-a-custom-behaviour-kernel part of the Tutorial). These Kernels are little snippets of code that get executed by Parcels, giving the ability to add ‘behaviour’ to particles.

However, there are some key limitations to the Kernels that everyone who wants to write their own should be aware of: All other functions and methods are not supported yet in Parcels Kernels. If there is a functionality that can not be programmed with this limited set of commands, please create an Issue ticket.

The output format of ParticleFile

The information on particle trajectories is stored in NetCDF format, when using the ParticleFile class. The file contains arrays of at least the time, lon, lat and z (depth) of each particle trajectory, plus any extra custom Variables defined in the pclass.

Each row in these arrays corresponds to a particle, and each column is an 'observation'. Note that, when particles are added during runtime, their first position is still stored in the first column, so that not all particles in the same column necessarily share the same time. The time of each observation is stored in the time array.

Memory Behaviour and Performance

Parcels, with its Lagrangian simulation approach and the commonly large amount of particles being simulated, is a computationally-demanding application. Global scale, fine-resolution hydronamic input data, such as from CMEMS and NEMO, add to this computational demand as they require large blocks of memory to be available. Despite the continuous improvements in performance and efficiency, it is thus possible to encounter performance- and memory-related problems. The following paragraphs are meant as a guideline to resolve those related issues.

If you encounter random failures of your particle simulations, the first step is to simplify the simulation in order to make the error reproducible for external persons. Therefore, it is advisable to run the simulations with the simplest type of particles (ideally: common Particle objects) and possibly just a simple AdvectionRK4 kernel. This rules out any errors coming from faulty calculations. In case this simplification resolves your issue, then the failure of your simulation is due to the specific particle or kernel you attempt to execute, and you may want to refine your calculations accordingly.

When calculation-related cancellations are ruled out as error source, the first step in tracking memory-related issues is a reasonability analysis: can issues that you are experiencing actual stem from memory exhaustion? Get an overview of the models and fields that you are using, consider if you're running the application with- or without depth information, and keep in mind that at each time in the simulation you need at least space for 3 timestamps of all numerical field values in addition to the actual particles. Does this really exhaust the memory? As a safeguard check, you can run a subset of your simulation (e.g the first day or week) locally while tracking the memory consumption with tools such as the Task Manager (Windows), the Activity Monitor (MacOS) or the System Monitor (Linux). If you can, run the simplified simulation over the whole timespan locally on your computer (even though this may be slow).

Suppose that you have determined that reason for your simulation not finishing can be or is memory exhaustion. Then, the next step is to determine if the cause of memory exhaustion is due to the particles or due to the fields. Fortunately, this is easy to assess: Run your simulation again and observe the memory with the tools mentioned above. If your simulation runs through without problems, it is almost certain your real simulation requires too many particles. In that case, an applicable general advice is to reduce the overall particle density (i.e. creating less particles overall), or manually split your simulation in smaller pieces and reduce the particle count accordingly, if you want to preserve the particle density. Apart from this advice, there is no generic solution at this point for the problem of exhaustive particle sets, so: Get in touch with us at the OceanParcels GitHub page via the issue tracker to find an individual solution.

If the reduction of particles indeed does not solve your issue and your simulation still crashes, then this is likely due to the fields consuming more memory than available. Luckily, there are some solutions to this problem.

Introduce field chunking (i.e. dynamic loading of fields)

So, your situation is as follows: you found out that your Parcels simulation consumes more field data than can fit in memory on any system that you tried. Then, it may be that you are not using field chunking.

Field chunking is introduced in Parcels > 2.1.0. It manages the access and data transfer between your hydrodynamic data (i.e. your NetCDF files), the computer memory, and the kernels, in a way so that only the parts of the data ends up in memory which are directly needed by some particles. This technique is commonly referred to as dynamic loading. In Python, with its numeric backend of Numpy and SciPy, the Dask module implements this data access technique, which is called chunking or lazy loading within Dask. Chunking for field data is activated by default in Parcels > 2.1.0.

As an example, let us assume we load a field set as such:
fieldset = FieldSet.from_netcdf(files, variables, dimensions)
then, this is equivalent to
fieldset = FieldSet.from_netcdf(files, variables, dimensions, deferred_load=True, field_chunksize='auto')

Hence, if you are using an older version of Parcels (<= 2.1.0), or you have accidentally switched of chunking via fieldset = FieldSet.from_netcdf(..., field_chunksize=False), then you may exceed your available memory. Try out the latest version of Parcels and activate field chunking.

Simulations crash on HPC clusters (via job submission systems such as SGE or SLURM), but locally finish without problems

The behaviour of simulations crashing on cluster systems whereas running as-expected on local machines can occur because

Advanced control over chunking and memory - configure Dask

In the case that you have field chunking running, your job submission files are all set correctly, but your fields are still too big, then your fields must be either many (in terms of individual fields), or they are complex (i.e. many diverse, hydrological properties), or they are in dense (i.e. fine-resolution) 4D grids (i.e. using time, depth, latitude and longitude). In this case, you may need to tune your environment a bit more directly to your application or scenario.

  1. Tune your chunk size configuration manually: Instead of leaving the chunking up to the black-box that is Dask, you can define your chunks yourself. Say you have simulation data of 30 timesteps on a grid with depth=150, lat=2100, lon=1800 (just an example), then you may wanna define the chunks yourself.
    • use the parameter field_chunksize of the FieldSet class as a tuple: here, the order is (time steps, depth size, latitude size, longitude size), so e.g. fieldset = FieldSet.from_netcdf(files, variables, dimensions, field_chunksize=(1,8,128,128))
    • use the parameter field_chunksize of the FieldSet class as a dictionary: here, you define the chunks by name, such that e.g. fieldset = FieldSet.from_netcdf(files, variables, dimensions, field_chunksize={'time_counter': 1, 'depth': 8, 'nav_lat': 128, 'nav_lon': 128})
  2. Override Dask's chunking procedure: Dask's auto-chunking feature attempts to minimize read-access from the file, hence it will chunk the data so that an individual chunk fits into memory. If you are working with multiple fields (or: multiple cores via MPI), then this assumption doesn't apply to you and, thus, the parameter setting field_chunksize='auto' may not provided you with the expected result. If you want to adapt this behaviour without manually tuning every application and scenario script, then you can configure the auto-chunking function itself.

    Information on how to adapt the auto-chunking can be found in the Dask documentation. Dask itself can use a term in the Dask configuration file, which defines a new upper limit for a chunked array. Detailed information and guidance on the configuration file can be found in the Dask guidelines.

    As a short summary: when having installed Dask, there is a (hidden) user-defined folder in your home directory ${HOME}/.config/dask that has two files: dask.yaml and distributed.yaml. The second one is of less interest for you unless you have distributed file management (if unsure, contact your system administrator). Under normal conditions, Dask will read configuration parameters from dask.yaml. That file could look like this (default after installation):

      # temporary-directory: null     # Directory for local disk like /tmp, /scratch, or /local
    
      # array:
      #   svg:
      #     size: 120  # pixels
    You can adapt this so the chunks, for example, are smaller, which improves loading and allows multiple simultaneous fields being accessed without exceeding memory.
      temporary-directory: /tmp
    
      array:
        svg:
          size: 120  # pixels
        chunk-size: 128 MiB
    Another very important change is that we removed the # from the start of the lines, which transforms them from being comments into actual information. Of course, if you have e.g. 4 fields, then you may want to change 128 MiB into 32 MiB. Power-of-two sizes here are also advantageous, although not strictly necessary.