This guide will teach how to...
- download and install Apache NiFi,
- use the NiFi UI to create a simple flow,
- create a custom processor in IntelliJ IDEA using a project already created,
- inject the custom processor into NiFi and
- change the flow.
Practical requirements include...
- a Linux desktop and basic familiarity with its filesystem,
- installed Java 8 or later and basic familiarity with Java,
- IntelliJ IDEA Community Edition or better and basic familiarity with it,
- basic familiarity with the bash command line and
- very basic browsercraft.
Table of Contents
- Introduction
- Setting up Apache NiFi
- Creating a NiFi Flow
- Run the Flow
-
- Guided Tour, part 2
- Guided Tour, part 3
Apache NiFi is a software project designed to automate the flow of data between
sofware systems leveraging the concept of extract, transform and
load (ETL). This software, originally developed by the NSA, was called,
"Niagara Files," the metaphore being a multitude of data
(in files) flowing over a waterfall.
Apache NiFi offer myriad standard and special-purpose processors that broadly
accomplish nearly any process related to the extraction from, transformation of
or loading of data into other widely disparate systems (database, queue
management, etc.).
When what NiFi offers isn't enough, it's possible and reasonably easy to write
a custom processor to use in combination with NiFi's standard array. This is
done in Java, using any IDE, to generate the custom processor as a "NiFi
archive" or NAR (compare TAR, JAR, WAR, etc.).
To get started, download and set up Apache NiFi...
Setting up Apache NiFi
- Download and install NiFi 1.19.1 locally from
Apache NiFi Downloads.
Use the Apache NiFi Binary 1.19.1 which will result in the artifact
nifi-1.19.1-bin.zip. Put this artifact someplace such as
/home/user/dev/nifi and expload it there.
- Follow the instructions at
How to get NiFi to work (unsecurely) as before.... This will
obviate the need to set up users, certificates, etc.
- In the above, depending on port-number usage on your development host, you
will need to choose a port number that suits you. In the example, this is
9999; leave it the same or reassign it to a different (valid)
port number if you like, but you'll need to remember what it is for a
browser URL later.
- Edit this same file (conf/nifi.properties), find the line
containing nifi.nar.library.directory=./lib,
then add the following line after it:
nifi.nar.library.directory.custom=./custom-lib
- Once set up, launch NiFi using this command:
~/dev/nifi/nifi-1.19.1/bin $ ./nifi.sh start
- Launch a new tab in any browser to this address (substituting your port
number):
http://localhost:port-number/nifi
Setting up a NiFi flow
- Minimize the Navigation and Operation palettes (because
they're useless to us in this exercise and take up real estate).
- In the toolbar at the top of the NiFi canvas, click (the first tool icon
from the left) and drag a new processor down onto the canvas. A dialog
will open. In the Filter edit field, type "GenerateFlowFile" then
click Add.
- Configure GenerateFlowFile by right-clicking on it, choosing
Configure, then...
- In the Settings tab, change the Yield Duration to
60 sec.
- In the Scheduling tab, ensure Run Schedule is
1 min.
- In the Properties tab, configure the following:
- Custom Text: Type in the text below using
SHIFT-ENTER to insert newlines:
This is a test of the Emergency Broadcast System. This is only a test.
The quick brown fox jumped over the lazy dog's back and got clean away.
- Important: leave all other property values at the defaults.
- Click the Apply button at the bottom right.
- Now create an instance of Wait on the canvas; there's no need to
configure it because it's going to stop the flow (any processor would do
as long as not activated). It's just a placeholder.
- Hover over GenerateFlowFile, click the circled-arrow icon produced
(by the hover action) and drag it to the Wait processor. When a
dialog appears, ensure that the success checkbox is checked,
then click Add. An arc will appear connecting the two processors
with a queue in between.
Run the Flow
- Start then stop GenerateFlowFile immediately by
- Right-click the processor and choose Start.
- Right-click the processor and choose Stop.
- In an unoccupied portion of your canvas, right-click and choose
Refresh.
- You should see between GenerateFlowFile and Wait that
one flowfile is in the queue.
- Right-click the success queue and choose List queue.
- Observe, at the extreme right end of the (single) flowfile listed that you
can:
- Download the contents of the flowfile,
- examine its contents, or
- ponder its provenance, i.e.: how it was created and where it's
been over its lifetime.
- (As you can imagine, provenance is a very useful tool
in debugging problems in the flow of files through NiFi.)
- To the extreme left of the (queue list) window that appears, click the
View Details control (a tiny dark circle with i in it).
In the resulting dialog, click the View button. This opens
a new browser tab displaying the contents of the flowfile you caused
GenerateFlowFile to create. Close the new browser tab.
- Back in the FlowFile dialog, notice that, by clicking the
Attributes tab, it's possible to inspect a flowfile's
attribute metadata. For now, click the OK button to close.
- Close the window listing the flowfiles in the queue (click the
X button at the upper right).
Write your own custom processor
This is part 2 of this guided tour. You won't need to write the processor code;
it will be more an exercise in setting up a NAR project in IntelliJ IDEA. (If
you prefer Eclipse, that can be done too, but there's not help forthcoming from
me to do that—it's been a decade since I last used that IDE.)
Please see
Apache NiFi Guided Tour, part 2: Setting up a NiFi custom-processor
project.