Notes on Tuning Apache NiFi
Russell Bateman
10 April 2023
last update:
Random notes on tuning NiFi flows...
- Modernly, most discussions of flow tuning involve a NiFi cluster of nodes.
I speculate that adapating the observations to a single-node installation
is still useful. Setting up, managing and maintaining clustered NiFi is
not the easiest thing to do. Graduating from single-node NiFi to a
cluster is a quantum-sized leap.
- Use the NiFi Summary UI to identify the number of threads used per
processor. This is reached from the General (hamburger) menu,
Summary. The summary is global (doesn't change scope just
because you're currently down inside a process group). It provides tabs
of summaries for (or by)
- Processors
- Input Ports
- Output Ports
- Remote Process Groups
- Connections (relationship arcs)
- Process Groups
- How to find the indices evoked in discussions of tuning (UI)?
(Some of this works differently depending on whether
clustered or not):
- thread-pool size
—
- active-thread count
—given as the leftmost figure on the second banner (top-level
UI)
- available cores
—see below in system diagnostics
- core load average
—see below in system diagnostics
- number of concurrent tasks
—set in
Configure Processor → Scheduling → Concurrent Tasks
-
Cloudera: Turning your data flow, published 2019-06-26, last
updated, 2022-12-13. This document discusses
- The size of flowfiles, large or small, can influence how to tune
flows. There is nothing built-in to isolate flowfiles by size; most
often, tuning should be done in a global way ensure no side effects
between data flows.
- The first and primary bottleneck in NiFi is disk I/O operations.
Distribute especially the flowfile content, flowfile
metadata, and, to a lesser degree, the provenance
repositories to dedicated disk volumes. Obviously, SSDs will be far
more performant than spindles.
- Viewing various values in the flow
- Number of threads (in a cluster)
- Number of active threads
- Number of cores
- Configuring thread-pool size
- Concurrent tasks
- Run duration
- Recommendations
- Adjust the thread-pool size to be based on the number of
cores.
- Do not increase the thread-pool size unless the number
of active threads is observed to remain equal to the maximum
available threads. (If a three-node cluster with eight cores
per node, do not increase from 24 unless the active thread
count, displayed in the UI, is frequently seen to be around
72.)
- If the count of active threads is frequently equal to
the maximum number of available threads, review the core load
average on the nodes.
- If the core load average is below 80% of the number of
cores and the count of active threads is at its maximum,
slightly increase the thread-pool size. Begin by increasing
the pool size by n+1 times the number of cores where
n is the current value.
Keep the load average around 80% of the
number of cores to account for the loss of one node and you
want some resources to remain avalable to process the
additonal amount of work on the remaining nodes.
- For the number of concurrent tasks, if
- there is back-pressure somewhere in the workflow,
- the load average is low,
- the count of active threads is not at the maximum,
consider increasing the number of concurrent tasks where the
processors are not processing enough data and are causing
back-pressure.
Increase the number of concurrent tasks iteratively by only 1
each time while monitoring
- how things are evolving globally, in particular, how the
thread pool is shared across all the workflows of the
cluster,
- the load average, and
- the active thread count.
If a processor (on its right side, the number of active
threads is displayed), has active threads and is not
processing data as fast as expected while the load average on
the server is low, the issue could be related to disk I/O
operation. Check the I/O statistics on the disks hosting the
NiFi repositories (content, flowfile and
provenance).
- It is a good practice to start by setting the Timer Driven Thread Pool
size number equal to three times the number of cores available
on your NiFi nodes. For example, if you have nodes with eight cores, you
would set this value to 24.
- Concurrent Tasks increases how many flowfiles are processed by a
single processor by using system resources that then are not usable by
other Processors. This provides a relative weighting of
processors—it controls how much of the system’s resources should be
allocated to this processor instead of other processors.
This field is available for most processors. There are, however, some
types of processors that can only be scheduled with a single concurrent
task. It is also worth noting that increasing this value for a processor
may increase thread contention and, as a result, setting this to a large
value can harm productivity.
As a best practice, you should not generally set this value to anything
greater than 12.
- NiFi is optimized to support flowfiles of any size. This is achieved by
never materializing the file into memory directly. Instead, NiFi uses
input and output streams to process events (there are a few exceptions
with some specific processors). This means that NiFi does not require
significant memory even if it is processing very large files. Most of the
memory on the system should be left available for the OS cache. By having
a large enough OS cache, many of the disk reads are skipped completely.
Consequently, unless NiFi is used for very specific memory oriented data
flows, setting the Java heap to 8 or 16 Gb is usually sufficient.
The performance expected directly depends on the hardware and the flow
design. For example, when reading compressed data from a cloud object
store, decompressing the data, filtering it based on specific values,
compressing the filtered data, and sending it to a cloud object store,
the following results can be reasonable expected:
Nodes |
Data rate per second |
Events per second |
Data rate per day |
Events per day |
1 |
192.5 Mb |
946,000 |
16.6 Tb |
81.7 billion |
25 |
5.8 Gb |
26 million |
501 Tb |
2.25 trillion |
150 |
32.6 Gb |
141.3 million |
2.75 Pb |
12.2 trillion |
The characteristics of the above include:
(per node)
- 32 cores
- 15 Gb RAM
- 2 Gb heap
- 1 Tb persistent SSD (400 Mb/s write, 1200 Mb/s read)
At a minimum:
(per node)
- 4 cores
- 6 disks disk dedicated to repositories (including mirroring)*
- 4 Gb heap
- 1 Tb persistent SSD (400 Mb/s write, 1200 Mb/s read)
See the full exposé by Mark Payne here.
* Why 6 disks dedicated to the NiFi repositories? Because one
disk and one mirror for each of content, flowfile and
provenance. In that order of priority and return on investment.
Leave out the database (4th) repository? It's only the user data
(keys, who's logged in, etc.) plus a database to track configuration
changes made in the UI.
Single-node NiFi differences...
From the NiFi UI of a single-node installation, there is no menu choice,
Cluster, under the General (hamburger) menu. What there is is
obtained under General → Summary, then clicking
system diagnostics at lower right of the window. This yields tabs
that lead to alerts for
- JVM
Heap (71.0%) Non-heap
Max: Max:
512 MB -1 bytes
Total: Total:
512 MB 261.2 MB
Used: Used:
362.02 MB 245.48 MB
Free: Free:
149.98 MB 15.72 MB
Garbage Collection
GI Old Generation:
0 times (00:00:00.000)
GI Young Generation:
Runtime
792 times (00:00:01.546)
Uptime:
87:14:57.964
- System
Available Cores: Core Load Average: (for the last minute)
16 20.02
- Version
System Diagnostics
------------------
NiFi
NiFi Version 1.13.2
Tag nifi-1.13.2-RC1
Build Date/Time 03/17/2021 20:45:53 MDT
Branch UNKNOWN
Revision 174938e
Java
Version 11.0.10
Vendor AdoptOpenJDK
Operating System
Name Linux
Version 5.4.0-146-generic
Architecture amd64
Home-grown, practical, remedial hypotheses to test
- Make processors that process next to nothing by weight get all the cycles
they want.
- UpdateAttributes
- RouteOnAttributes
- Weigh instead the heavy processing in the balance bridling their leisure,
applying back-pressure, etc. for the big guns (heavy processors).