StarPU Handbook
|
The behavior of the StarPU library and tools may be tuned thanks to the following configure options.
Enable checking that spinlocks are taken and released properly.
Increase the verbosity of the debugging messages. This can be disabled at runtime by setting the environment variable STARPU_SILENT to any value. –enable-verbose=extra
increase even more the verbosity.
$ STARPU_SILENT=1 ./vector_scal
Specify tests and examples should be run on a smaller data set, i.e allowing a faster execution time
Enable some exhaustive checks which take a really long time.
Specify hwloc
should be used by StarPU. hwloc
should be found by the means of the tool pkg-config
.
prefix
Specify hwloc
should be used by StarPU. hwloc
should be found in the directory specified by prefix
Disable the creation of the documentation. This should be done on a machine which does not have the tools doxygen
and latex
(plus the packages latex-xcolor
and texlive-latex-extra
).
By default, ontly the HTML documentation is generated. Use this option to also enable the generation of the PDF documentation. This should be done on a machine which does have the tools doxygen
and latex
(plus the packages latex-xcolor
and texlive-latex-extra
).
Enable the compilation of specific ICC examples. StarPU itself will not be compiled with ICC unless specified with CC=icc
Disable the usage of the ICC compiler. Otherwise, when a ICC compiler is found, some specific ICC examples are compiled as explained above.
Specify flags which will be given to C, CXX and Fortran compilers when valid
Additionally, the script configure
recognize many variables, which can be listed by typing ./configure –help
. For example, ./configure NVCCFLAGS="-arch sm_20"
adds a flag for the compilation of CUDA kernels, and NVCC_CC=gcc-5
allows to change the C++ compiler used by nvcc.
By default, StarPU keeps CPU workers awake permanently, for better reactivity. This option makes StarPU put CPU workers to real sleep when there are not enough tasks to compute.
If blocking drivers are enabled, enable callbacks to notify an external resource manager about workers going to sleep and waking up.
count
Use at most count
CPU cores. This information is then available as the macro STARPU_MAXCPUS.
The default value is auto
. it allows StarPU to automatically detect the number of CPUs on the build machine. This should not be used if the running host has a larger number of CPUs than the build machine.
count
Use at most count
NUMA nodes. This information is then available as the macro STARPU_MAXNUMANODES.
Disable the use of CPUs of the machine. Only GPUs etc. will be used.
count
Use at most count
CUDA devices. This information is then available as the macro STARPU_MAXCUDADEVS.
Disable the use of CUDA, even if a valid CUDA installation was detected.
prefix
Search for CUDA under prefix
, which should notably contain the file include/cuda.h
.
dir
Search for CUDA headers under dir
, which should notably contain the file cuda.h
. This defaults to /include
appended to the value given to --with-cuda-dir.
dir
Search for CUDA libraries under dir
, which should notably contain the CUDA shared libraries—e.g., libcuda.so
. This defaults to /lib
appended to the value given to --with-cuda-dir.
count
Use at most count
OpenCL devices. This information is then available as the macro STARPU_MAXOPENCLDEVS.
prefix
Search for an OpenCL implementation under prefix
, which should notably contain include/CL/cl.h
(or include/OpenCL/cl.h
on Mac OS).
dir
Search for OpenCL headers under dir
, which should notably contain CL/cl.h
(or OpenCL/cl.h
on Mac OS). This defaults to /include
appended to the value given to --with-opencl-dir.
dir
Search for an OpenCL library under dir
, which should notably contain the OpenCL shared libraries—e.g. libOpenCL.so
. This defaults to /lib
appended to the value given to --with-opencl-dir.
Enable considering the provided OpenCL implementation as a simulator, i.e. use the kernel duration returned by OpenCL profiling information as wallclock time instead of the actual measured real time. This requires the SimGrid support.
count
Allow for at most count
codelet implementations for the same target device. This information is then available as the macro STARPU_MAXIMPLEMENTATIONS macro.
count
Allow for at most count
scheduling contexts This information is then available as the macro STARPU_NMAX_SCHED_CTXS.
Disable asynchronous copies between CPU and GPU devices. The AMD implementation of OpenCL is known to fail when copying data asynchronously. When using this implementation, it is therefore necessary to disable asynchronous data transfers.
Disable asynchronous copies between CPU and OpenCL devices. The AMD implementation of OpenCL is known to fail when copying data asynchronously. When using this implementation, it is therefore necessary to disable asynchronous data transfers.
Disable asynchronous copies between CPU and MPI Slave devices.
count
Use at most count
memory nodes. This information is then available as the macro STARPU_MAXNODES. Reducing it allows to considerably reduce memory used by StarPU data structures.
Disable the build of libstarpumpi. By default, it is enabled when MPI is found.
Enable the build of libstarpumpi. This is necessary when using Simgrid+MPI.
path
Use the compiler mpicc
at path
, for StarPU-MPI. (MPI Support).
Before performing any MPI communication, StarPU-MPI waits for the data to be available in the main memory of the node submitting the request. For send communications, data is acquired with the mode STARPU_R. When enabling the pedantic mode, data are instead acquired with the STARPU_RW which thus ensures that there is not more than 1 concurrent MPI_Isend calls accessing the data and StarPU does not read from it from tasks during the communication.
Enable the MPI Master-Slave support. By default, it is disabled.
Create one thread per MPI Slave on the MPI master to manage communications.
Increase the verbosity of the MPI debugging messages. This can be disabled at runtime by setting the environment variable STARPU_SILENT to any value. –enable-mpi-verbose=extra
increase even more the verbosity.
$ STARPU_SILENT=1 mpirun -np 2 ./insert_task
Enable the NewMadeleine implementation for StarPU-MPI. See Using the NewMadeleine communication library for more details.
Disable the fortran extension. By default, it is enabled when a fortran compiler is found.
Disable the SOCL extension (SOCL OpenCL Extensions). By default, it is enabled when an OpenCL implementation is found.
Specify the directory to the COI library for MIC support. The default value is /opt/intel/mic/coi
Specify the precise MIC architecture host identifier. The default value is x86_64-k1om-linux
Enable OpenMP Support (The StarPU OpenMP Runtime Support (SORS))
Enable cluster Support (Clustering A Machine)
Enable additional trace events which describes locks behaviour. This is however extremely heavy and should only be enabled when debugging insides of StarPU.
Define the maximum number of buffers that tasks will be able to take as parameters, then available as the macro STARPU_NMAXBUFS.
Enable the use of a data allocation cache to avoid the cost of it with CUDA. Still experimental.
Enable the use of OpenGL for the rendering of some examples.
prefix
Specify the blas library to be used by some of the examples. Librairies available :
path
path
Disable the build of libstarpufft, even if fftw
or cuFFT
is available.
Enable the compilation and the execution of the libstarpufft examples. By default, they are neither compiled nor checked.
prefix
Search for FxT under prefix
. FxT (http://savannah.nongnu.org/projects/fkt) is used to generate traces of scheduling events, which can then be rendered them using ViTE (Off-line Performance Feedback). prefix
should notably contain include/fxt/fxt.h
.
dir
Store performance models under dir
, instead of the current user's home.
prefix
Search for GotoBLAS under prefix
, which should notably contain libgoto.so
or libgoto2.so
.
prefix
Search for ATLAS under prefix
, which should notably contain include/cblas.h
.
cflags
ldflags
Use ldflags
when linking code that uses the MKL library. Note that the MKL website (http://software.intel.com/en-us/articles/intel-mkl-link-line-advisor/) provides a script to determine the linking flags.
Enable the Scheduling Context Hypervisor plugin (Scheduling Context Hypervisor). By default, it is disabled.
Enable memory statistics (Memory Feedback).
Enable simulation of execution in SimGrid, to allow easy experimentation with various numbers of cores and GPUs, or amount of memory, etc. Experimental.
The path to SimGrid can be specified through the SIMGRID_CFLAGS
and SIMGRID_LIBS
environment variables, for instance:
export SIMGRID_CFLAGS="-I/usr/local/simgrid/include" export SIMGRID_LIBS="-L/usr/local/simgrid/lib -lsimgrid"
Similar to the option --enable-simgrid but also allows to specify the location to the SimGrid library.
Similar to the option --enable-simgrid but also allows to specify the location to the SimGrid include directory.
Similar to the option --enable-simgrid but also allows to specify the location to the SimGrid lib directory.
path
Enable the Model Checker in simulation of execution in SimGrid, to allow exploring various execution paths.
Allow to set the maximum authorized percentage of deviation for the history-based calibrator of StarPU. A correct value of this parameter must be in [0..100]. The default value of this parameter is 10. Experimental.
Allow to enable multiple linear regression models (see Performance Model Example)
Allow to make multiple linear regression models use the system-provided BLAS for dgels (see Performance Model Example)