SciVis Contest 2016 Data

This README provides instructions for downloading the data and describes the data format and arrangement.

The main contest website (including problem and task descriptions and entry evaluation criteria) is located here.

For a list of all available files, please see here.

Dataset Overview

As described on the website[1], the FPM simulations are conducted at three resolution levels. The technical term describing the resolution (i.e. number of particles) is the so called smoothing length. This inversely affects the number of particles, i.e. smaller smoothing length uses more particles.

For each resolution level, 50 runs will be provided that form an ensemble. The runs vary due to stochastic effects. Each run contains 120 timesteps that span a time interval from 0s to approximately 60s of simulation time. Due to the adaptive time-stepping employed in the simulation code, the simulation time at which these steps are written out vary between the ensemble members. Similarly, resolution is controlled adaptively, leading to a non-uniform particle count over time and between runs.

Download Instructions

All data can be downloaded from the San Diego Supercomputing Center cloud. The full data will be made available progressively until the end of January, 2016. For simplified downloading of the (ultimately) 150 datasets, the runs are provided individually, grouped by resolution.

To download the data, please use the wget utility (typically present in most Unix distributions and on OS X; a binary for Windows is can be found here, make sure to use the "Setup" package to obtain all dependencies as well). In the target directory, run

wget --mirror -nH --cut-dirs=3 --continue --no-check-certificate -e robots=off https://cloud.sdsc.edu/v1/AUTH_sciviscontest/2016/list.html

from the command line. This will download all available datasets that are not already downloaded. The indidual packs are in .tar.bz2 format, which can be unpacked using the command

tar jxvf <pack>.tar.bz2

On Windows, they can be extracted using e.g. the 7-Zip freeware.

Of course, it is also possible to download simulation packs individually from this list.

Data Format

Every run consists of one file per timestep, stored in VTK's .vtu format (see the documentation), plus an additional file 'timesteps' listing all timesteps and corresponding time value, e.g.

...
004.vtu 2.12750005722
005.vtu 2.65980005264
006.vtu 3.14350008965
007.vtu 3.55870008469
008.vtu 4.09310007095
009.vtu 4.52099990845
010.vtu 5.07929992676
...

Every .vtu file contains the following information

the "points" array contains particle positions
the "velocity" point data array contains the flow velocity at the particle positions
the "concentration" point data array describes the concentration at the particle positions

As is usual in VTK, the ith values in the "velocity" and "concentration" arrays correspond the the ith point described in the "points" array.

Furthermore, there are three single-element arrays stored in the file's field data:

"step" indicates the simulation timestep
"time" indicates the simulation time
"size" indicates the number of particles in this timestep

It is straightforward to load these files using VTK's vtkXMLUnstructuredGridReader class. Please see the example script 'simple.py' in the code/ directory for more details. This script was also used to generate for each run a short video depicting particles of high concentration (concentration.mp4 in each run directory).

Furthermore, the vtu files have been written such that they can be read directly, e.g. from C++ code, without VTK support. A corresponding implementation can be found in the max_concentration.cpp source file in the code/ directory.

Please note that the files as provided are not directly suitable for visualization using VTK, ParaView, or VisIt, because no cell information is stored in the files. Technically, every particle or point must be represented as its own cell (of type VTK_VERTEX); in the absence of this information, most visualization tools will not work. However, since the correspondence of cells is redundant (cell 0 contains point 0, cell 1 contains point 1, etc.), in order to keep data size small, this information was omitted from the files. An easy way to reconstruct it to enable VTK to correctly visualize the dataset is to apply VTK's vtkMaskPoints filter, see code/simple.py for an example.