3.1. General Approach

VTK‑m is designed to provide a pervasive parallelism throughout all its visualization algorithms, meaning that the algorithm is designed to operate with independent concurrency at the finest possible level throughout. VTK‑m provides this pervasive parallelism by providing a programming construct called a worklet, which operates on a very fine granularity of data. The worklets are designed as serial components, and VTK‑m handles whatever layers of concurrency are necessary, thereby removing the onus from the visualization algorithm developer. Worklet operation is then wrapped into filter, which provide a simplified interface to end users.

A worklet is essentially a functor or kernel designed to operate on a small element of data. (The name “worklet” means work on a small amount of data.) The worklet is constrained to contain a serial and stateless function. These constraints form three critical purposes. First, the constraints on the worklets allow VTK‑m to schedule worklet invocations on a great many independent concurrent threads and thereby making the algorithm pervasively parallel. Second, the constraints allow VTK‑m to provide thread safety. By controlling the memory access the toolkit can insure that no worklet will have any memory collisions, false sharing, or other parallel programming pitfalls. Third, the constraints encourage good programming practices. The worklet model provides a natural approach to visualization algorithm design that also has good general performance characteristics.

VTK‑m allows developers to design algorithms that are run on massive amounts of threads. However, VTK‑m also allows developers to interface to applications, define data, and invoke algorithms that they have written or are provided otherwise. These two modes represent significantly different operations on the data. The operating code of an algorithm in a worklet is constrained to access only a small portion of data that is provided by the framework. Conversely, code that is building the data structures needs to manage the data in its entirety, but has little reason to perform computations on any particular element.

Consequently, VTK‑m is divided into two environments that handle each of these use cases. Each environment has its own API, and direct interaction between the environments is disallowed. The environments are as follows.

  • Execution Environment This is the environment in which the computational portion of algorithms are executed. The API for this environment provides work for one element with convenient access to information such as connectivity and neighborhood as needed by typical visualization algorithms. Code for the execution environment is designed to always execute on a very large number of threads.

  • Control Environment This is the environment that is used to interface with applications, interface with I/O devices, and schedule parallel execution of the algorithms. The associated API is designed for users that want to use VTK‑m to analyze their data using provided or supplied filters. Code for the control environment is designed to run on a single thread (or one single thread per process in an MPI job).

These dual programming environments are partially a convenience to isolate the application from the execution of the worklets and are partially a necessity to support GPU languages with host and device environments. The control and execution environments are logically equivalent to the host and device environments, respectively, in CUDA and other associated GPU languages.

_images/VTKmEnvironments.png

Figure 3.1 Diagram of the VTK‑m framework.

Figure 3.1 displays the relationship between the control and execution environment. The typical workflow when using VTK‑m is that first the control thread establishes a data set in the control environment and then invokes a parallel operation on the data using a filter. From there the data is logically divided into its constituent elements, which are sent to independent invocations of a worklet. The worklet invocations, being independent, are run on as many concurrent threads as are supported by the device. On completion the results of the worklet invocations are collected to a single data structure and a handle is returned back to the control environment.

Did You Know?

Are you only planning to use filters in VTK‑m that already exist? If so, then everything you work with will be in the control environment. The execution environment is only used when implementing algorithms for filters.

3.1.1. Package Structure

VTK‑m is organized in a hierarchy of nested packages. VTK‑m places definitions in namespaces that correspond to the package (with the exception that one package may specialize a template defined in a different namespace).

The base package is named vtkm. All classes within VTK‑m are placed either directly in the vtkm package or in a package beneath it. This helps prevent name collisions between VTK‑m and any other library.

As described at the beginning of this chapter, the VTK‑m API is divided into two distinct environments: the control environment and the execution environment. The API for these two environments are located in the vtkm::cont and vtkmexec packages, respectively. Items located in the base vtkm namespace are available in both environments.

Did You Know?

Although it is conventional to spell out names in identifiers (as outlined in https://gitlab.kitware.com/vtk/vtk-m/blob/master/docs/CodingConventions.md) there is an exception to abbreviate control and execution to cont and exec, respectively. This is because it is also part of the coding convention to declare the entire namespace when using an identifier that is part of the corresponding package. The shorter names make the identifiers easier to read, faster to type, and more feasible to pack lines in terminal displays. These abbreviations are also used instead of more common abbreviations (e.g. ctrl for control) because, as part of actual English words, they are easier to type.

Further functionality in VTK‑m is built on top of the base vtkm, vtkm::cont, and vtkm::exec packages. Support classes for building worklets, introduced in Chapter Chapter 3.3 (Simple Worklets), are contained in the vtkm::worklet package. Other facilities in VTK‑m are provided in their own packages such as vtkm::io, vtkm::filter, and vtkm::rendering. These packages are described in Part 2 (Using VTK‑m).

VTK‑m contains code that uses specialized compiler features, such as those with CUDA, or libraries, such as Kokkos, that will not be available on all machines. Code for these features are encapsulated in their own packages under the vtkm::cont namespace: vtkm::cont::cuda and vtkm::cont::kokkos.

By convention all classes will be defined in a file with the same name as the class name (with a .h extension) located in a directory corresponding to the package name. For example, the vtkm::cont::DataSet class is found in the vtkm/cont/DataSet.h header. There are, however, exceptions to this rule. Some smaller classes and types are grouped together for convenience. These exceptions will be noted as necessary.

Within each namespace there may also be internal and detail sub-namespaces. The internal namespaces contain features that are used internally and may change without notice. The detail namespaces contain features that are used by a particular class but must be declared outside of that class. Users should generally ignore classes in these namespaces.

3.1.2. Function and Method Environment Modifiers

Any function or method defined by VTK‑m must come with a modifier that determines in which environments the function may be run. These modifiers are C macros that VTK‑m uses to instruct the compiler for which architectures to compile each method. Most user code outside of VTK‑m need not use these macros with the important exception of any classes passed to VTK‑m. This occurs when defining new worklets, array storage, and device adapters.

VTK‑m provides three modifier macros, VTKM_CONT, VTKM_EXEC, and VTKM_EXEC_CONT, which are used to declare functions and methods that can run in the control environment, execution environment, and both environments, respectively. These macros get defined by including just about any VTK‑m header file, but including vtkm/Types.h will ensure they are defined.

The modifier macro is placed after the template declaration, if there is one, and before the return type for the function. Here is a simple example of a function that will square a value. Since most types you would use this function on have operators in both the control and execution environments, the function is declared for both places.

Example 3.1 Usage of an environment modifier macro on a function.
1template<typename ValueType>
2VTKM_EXEC_CONT ValueType Square(const ValueType& inValue)
3{
4  return inValue * inValue;
5}

The primary function of the modifier macros is to inject compiler-specific keywords that specify what architecture to compile code for. For example, when compiling with CUDA, the control modifiers have __host__ in them and execution modifiers have __device__ in them.

It is sometimes the case that a function declared as VTKM_EXEC_CONT has to call a method declared as VTKM_EXEC or VTKM_CONT. Generally functions should not call other functions with incompatible control/execution modifiers, but sometimes a generic VTKM_EXEC_CONT function calls another function determined by the template parameters, and the valid environments of this subfunction may be inconsistent. For cases like this, you can use the VTKM_SUPPRESS_EXEC_WARNINGS to tell the compiler to ignore the inconsistency when resolving the template. When applied to a templated function or method, VTKM_SUPPRESS_EXEC_WARNINGS is placed before the template keyword. When applied to a non-templated method in a templated class, VTKM_SUPPRESS_EXEC_WARNINGS is placed before the environment modifier macro.

Example 3.2 Suppressing warnings about functions from mixed environments.
1VTKM_SUPPRESS_EXEC_WARNINGS
2template<typename Functor>
3VTKM_EXEC_CONT void OverlyComplicatedForLoop(Functor& functor, vtkm::Id numInterations)
4{
5  for (vtkm::Id index = 0; index < numInterations; index++)
6  {
7    functor();
8  }
9}