Many of ImageMagick's internal algorithms are threaded to take advantage of speed-ups offered by the multicore processor chips and OpenMP. OpenMP, is an API specification for parallel programming. If your compiler supports OpenMP (e.g. gcc, Visual Studio 2005) directives, ImageMagick automatically includes support. To verify, look for the OpenMP feature of ImageMagick with this command:
$ magick identify -version Version: ImageMagick 7.0.10-50 2021-01-04 Q16 https://im.awm.jp Copyright: © 1999-2021 ImageMagick Studio LLC Features: OpenMP(4.5)
With OpenMP enabled, most ImageMagick algorithms execute on all the cores on your system in parallel. ImageMagick typically divides the work so that each thread processes 64 rows of pixels. As rows are completed, OpenMP assigns more chunks of pixel rows to each thread until the algorithm completes. For example, if you have a quad-core system, and attempt to resize an image, the resizing takes place on 4 cores (8 if hyperthreading is enabled).
You can further increase performance by reducing lock contention with the tcmalloc memory allocation library. To enable, add --with-tcmalloc to the configure command-line when you build ImageMagick.
The Perils of Parallel Execution
It can be difficult to predict behavior in a parallel environment. Performance might depend on a number of factors including the compiler, the version of the OpenMP library, the processor type, the number of cores, the amount of memory, whether hyperthreading is enabled, the mix of applications that are executing concurrently with ImageMagick, or the particular image-processing algorithm you utilize. The only way to be certain of the optimal performance, in terms of the number of threads, is to benchmark. ImageMagick includes progressive threading when benchmarking a command and returns the elapsed time and efficiency for one or more threads. This can help you identify how many threads are the most efficient in your environment. Here is an example benchmark for threads 1-8:
$ magick -bench 40 model.png -sharpen 0x1 null: Performance[1]: 40i 0.712ips 1.000e 14.000u 0:14.040 Performance[2]: 40i 1.362ips 0.657e 14.550u 0:07.340 Performance[3]: 40i 2.033ips 0.741e 14.530u 0:04.920 Performance[4]: 40i 2.667ips 0.789e 14.590u 0:03.750 Performance[5]: 40i 3.236ips 0.820e 14.970u 0:03.090 Performance[6]: 40i 3.802ips 0.842e 15.280u 0:02.630 Performance[7]: 40i 4.274ips 0.857e 15.540u 0:02.340 Performance[8]: 40i 4.831ips 0.872e 15.680u 0:02.070
Better performance correlates with higher values of IPS (iterations-per-second). In our example, 8 cores are optimal. However, in certain cases it might be optimal to set the number of threads to 1 (e.g. -limit thread 1) or to disable OpenMP completely. To disable this feature, add --disable-openmp to your configure script command line then rebuild and re-install ImageMagick.