Deploy Apache NiFi as Statefullset
Howto, Helmchart, Rolling Upd, tls, peristence ...
Openshift Nifi cluster installation
Our process is build on helm chart and heavily modified later:
- cetic/helm-nifi repo Helm consists of 3 subcharts, may be a little tricky to settle.
- zookeeper
- nifi-registry
- nifi-toolkit - for PKI
GIT REPOSITORY
BASE install
Fast forward —>
git clone
#check render
helm template nifi . -f values-oaz-dev.yaml -n nifi --output-dir render --debug
#install
helm install nifi . -n monolog-nifi -f values-oaz-dev.yaml --create-namespace
# to upgrade
helm upgrade nifi . -n monolog-nifi -f values-oaz-dev.yaml
# unable to validate against any security context constraint:
# [provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{1001}:
# 1001 is not an allowed group spec.containers[0].securityContext.runAsUser: Invalid value: 1001: must be in the ranges: [1000690000, 1000699999]]
# ach openshift, namespace get uid range dynamically , and service account default cannot run at others uids then defined for namespace
oc get ns nifi -o=jsonpath='{.metadata.annotations}'|jq
{
"openshift.io/description": "",
"openshift.io/display-name": "",
"openshift.io/requester": "system:serviceaccount:openshift-apiserver:openshift-apiserver-sa",
"openshift.io/sa.scc.mcs": "s0:c26,c20",
"openshift.io/sa.scc.supplemental-groups": "1000690000/10000",
"openshift.io/sa.scc.uid-range": "1000690000/10000"
}
Remove helmchart
# uninstall helm chart
helm delete nifi
oc delete pvc -l app.kubernetes.io/instance=nifi
Ok we need a scc RunAsAny something like this: Short scc overview
apiVersion: security.openshift.io/v1
metadata:
annotations:
kubernetes.io/description: nifi-scc provides all features of the restricted SCC but
allows users to run with any UID and any GID.
name: nifi-scc
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities: null
defaultAddCapabilities: null
fsGroup:
type: RunAsAny
groups: []
priority: 10
readOnlyRootFilesystem: false
requiredDropCapabilities:
- MKNOD
runAsUser:
type: RunAsAny
seLinuxContext:
type: MustRunAs
supplementalGroups:
type: RunAsAny
users:
- system:serviceaccount:nifi:default #define user
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret
# ok, create scc for user default
oc get scc nifi-scc -o=jsonpath='{.users}'
> ["system:serviceaccount:nifi:default"]%
# check the pods for UID and fsGroup
oc get pod -o jsonpath='{range .items[*]}{@.metadata.name}{" runAsUser: "}{@.spec.containers[*].securityContext.runAsUser}{" fsGroup: "}{@.spec.securityContext.fsGroup}{" seLinuxOptions: "}{@.spec.securityContext.seLinuxOptions.level}{"\n"}{end}'
# security context can be defined in pod scope or container scope
oc get pod -o jsonpath='{range .items[*]}{@.metadata.name}{" runAsUser: "}{@.spec.containers[*].securityContext.runAsUser}{@.spec.securityContext.runAsUser}{" fsGroup: "}{@.spec.securityContext.fsGroup}{" seLinuxOptions: "}{@.spec.securityContext.seLinuxOptions.level}{"\n"}{end}'
# move on, but we still have a problems, now during image pulling
oc get pods
NAME READY STATUS RESTARTS AGE
nifi-zookeeper-0 0/1 ImagePullBackOff 0 25m
nifi-zookeeper-1 0/1 ImagePullBackOff 0 25m
nifi-zookeeper-2 0/1 ImagePullBackOff 0 25m
oc describe pods nifi-zookeeper-0|grep Failed
# ok it seems that our public repository is not whitelisted in containter registry (only one we can use)
Failed kubelet Failed to pull image "docker.io/bitnami/zookeeper:3.6.2-debian-10-r37": rpc error: code = Unknown desc =
(Mirrors also failed: [artifactory.sudlice.cz:443/docker-io/bitnami/zookeeper:3.6.2-debian-10-r37:
Error reading manifest 3.6.2-debian-10-r37 in artifactory.sudlice.cz:443/docker-io/bitnami/zookeeper:
manifest unknown: The named manifest is not known to the registry.]): docker.io/bitnami/zookeeper:3.6.2-debian-10-r37:
error pinging docker registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": Proxy authentication required
Failed kubelet Error: ErrImagePull
Failed kubelet Error: ImagePullBackOff
Mirror registry/local repository
As a workaroud we will use Openshift internal Container Registry: local container registry a pak kdyžtak zažádáme o přidání repozitářů do whitelistu.
SC pro stateFullSet Zookeper
For storage class, managed-premium SC will be used on AzureDisks. but:
AzureDisks cannot be created with other redundancy then LRS. Pods with PVC are strained to stay only in oneZone as nodeAffinity. VM provided in Azure has limited numbers of datadisks used as PV. For making it as ha as possible, we will spread topology between all three availibility zones:
oc get nodes -L failure-domain.beta.kubernetes.io/zone|grep worker|awk '{print $1" "$3" "$6}'
oaz-dev-trkn8-worker-westeurope1-fxthw worker westeurope-1
oaz-dev-trkn8-worker-westeurope2-vg8lc worker westeurope-2
oaz-dev-trkn8-worker-westeurope3-wmqpz worker westeurope-3
# this feature available for 1.19 kubernetes
topologySpreadConstraints:
- maxSkew: 1
topologyKey: "failure-domain.beta.kubernetes.io/zone"
whenUnsatisfiable: "DoNotSchedule"
labelSelector:
app.kubernetes.io/name: zookeeper
Zookeeper STS rolling update
Zookeeper chart use an old version of Zookeeper. Rollout for win. Change image in helm chart. Render and apply.
Rolling update will be used as default for STS
oc rollout status statefulset/nifi-zookeeper
statefulset rolling update complete 3 pods at revision nifi-zookeeper-64675fdd78...
#for change-cause annotation we can set annotation
#add it to helmchart
oc annotate statefulsets.apps/nifi-zookeeper kubernetes.io/change-cause="zookeeper:3.7.0-debian-10-r40"
NIFI CA deployment
Simple deployment with persistence, , publish CA endpoint and approve CSR for users and server infrastructure.
/bin/tls-toolkit.sh server
Apache NIFI
Apache NIFI now runs as secured. It means all nodes ask CA for certificate during boot and Admin User and certificate is created.
PERSISTENCE VOLUMES mapping
PV:
mountPath: /opt/nifi/data
mountPath: /opt/nifi/nifi-current/auth-conf/
mountPath: /opt/nifi/nifi-current/config-data
mountPath: /opt/nifi/flowfile_repository
mountPath: /opt/nifi/content_repository
mountPath: /opt/nifi/provenance_repository
mountPath: /opt/nifi/nifi-current/logs
CM:
mountPath: /opt/nifi/nifi-current/conf/bootstrap.conf
mountPath: /opt/nifi/nifi-current/conf/nifi.temp –>nifi.properties
mountPath: /opt/nifi/nifi-current/conf/authorizers.temp –>authorizers.xml
mountPath: /opt/nifi/nifi-current/conf/authorizers.empty
mountPath: /opt/nifi/nifi-current/conf/bootstrap-notification-services.xml
mountPath: /opt/nifi/nifi-current/conf/login-identity-providers.xml
mountPath: /opt/nifi/nifi-current/conf/state-management.xml
mountPath: /opt/nifi/nifi-current/conf/zookeeper.properties
mountPath: /opt/nifi/data/flow.xml
mountPath: /opt/nifi/nifi-current/config-data/certs/generate_user.sh
AUTH
Nodes and admin user are predefined(max replicaset 5):
/opt/nifi/nifi-current/conf/authorizers.xml
{{- range $i := until $nodeidentities }}
<property name="Node Identity {{ $i }}">CN={{ $fullname }}-{{ $i }}.{{ $fullname }}-headless.{{ $namespace }}.svc.cluster.local, OU=NIFI</property>
{{- end }}
–> na zakladě tohoto souboru je upraven inicializační soubor s uživateli
/opt/nifi/nifi-current/auth-conf/users.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<tenants>
<groups/>
<users>
<user identifier="171c708a-7250-37ba-95b7-e3d52258fc8a" identity="CN=nifi-3.nifi-headless.nifi.svc.cluster.local, OU=NIFI"/>
<user identifier="47c717db-75da-3d54-8ab3-1731497291c7" identity="CN=admin, OU=NIFI"/>
<user identifier="66afe269-10cc-37da-9785-3e72cbc609c8" identity="CN=nifi-2.nifi-headless.nifi.svc.cluster.local, OU=NIFI"/>
<user identifier="5ac2302b-365e-3d9a-a24e-f17565d2ca08" identity="CN=nifi-0.nifi-headless.nifi.svc.cluster.local, OU=NIFI"/>
<user identifier="f23a3051-d154-3f63-8674-fb8acb8a8030" identity="CN=nifi-4.nifi-headless.nifi.svc.cluster.local, OU=NIFI"/>
<user identifier="802187fa-2f40-30b4-8554-c32b425ab945" identity="CN=nifi-1.nifi-headless.nifi.svc.cluster.local, OU=NIFI"/>
</users>
</tenants>
Every identity with certificate signed by internal CA (CA pod) can access web UI (mutual TLS).
tls-toolkit.sh client \
-c "nifi-ca" \
-t domluveneheslokvulireplayattacku \
--subjectAlternativeNames routehostname \
-p 9090 \
-D "CN=$USER, OU=NIFI" \
-T PKCS12
Proto aby nebylo možné podepisovat si certifikáty z jiných namespaců je nasazena network policy omezující přístupy pouze na ingres(routu) nebo ze stejného namespacu.
In fact we struggle in validate internal CA, normaly we will use reencrypt at router with custom CA trust but:
passthrough - This is currently the only method that can support requiring client certificates, also known as two-way authentication.
It means that in browser our connection will be marked as “Not secure” because of use self-signed CA.
Directly in nifi users are presented like: caption
RBAC model for users is presented
/opt/nifi/nifi-current/auth-conf/authorizations.xml
For new user CSR is need to be done and approve it with custom CA, afterwards manually add user in NIFI-ui and define RBAC.
Node in lead will distribute xml files across all other nodes.
Automatically generated certificates for node and Admin take place at /opt/nifi/nifi-current/config-data/certs
For generating new user cert generate_user.sh script can be used.
ADMIN CERT
Admin CSR is generated automatically and signed during nifi bootstrap by CA.
# copy localy and import to browser
oc cp nifi-0:/opt/nifi/nifi-current/config-data/certs/admin/certAdmin.p12 ./certAdmin.p12
GENERATE new USER
Helper script is ready for use in nifi sts, DN=“CN=$user, OU=NIFI”
oc rsh nifi-0
# create certificate
/opt/nifi/nifi-current/config-data/certs/generate_user.sh $user
# cert is created in
/opt/nifi/nifi-current/config-data/certs/$user/cert_$user.p12
# copy cert to local and import it to browser
oc cp nifi-0:/opt/nifi/nifi-current/config-data/certs/$user/cert_$user.p12 ./cert_${user}.p12
SCALING
Škálovat můžeme až do hodnoty 5, kvůli předefinovaným identitám. Problém je se škálováním směrem dolů. Odpojený nifi nód zůstane ve stavu disconnected a je nutné ho ručně smazat(konfiguračně je cluster přepnut do readonly).
In a NiFi cluster, NiFi wants to make sure consistency across all nodes. You can’t have each node in a NiFi cluster running a different version/state of the flow.xml.gz file. In a cluster, NiFi will replicate a request (such as stop x processor(s)) to all nodes. Since a node is not connected, that replication cannot occur. So to protect the integrity of the cluster, the NiFi canvas is essentially read-only while a node is disconnected.
APACHE NIFI registry STS
- Implementation of a Flow Registry for storing and managing versioned flows
- Integration with NiFi to allow storing, retrieving, and upgrading versioned flows from a Flow Registry
Simple sts running with replicaset 1.
TLS CERTIFICATES for user access
Certificates will be generated with tls-toolkit against running CA
- name: cert-request
imagePullPolicy: "IfNotPresent"
image: "apache/nifi-toolkit:1.13.2"
command:
- bash
- -c
- |
CERT_PATH="/opt/nifi-registry/nifi-registry-current/certs"
CA_ADDRESS="nifi-ca:9090"
until echo "" | timeout -t 2 openssl s_client -connect "${CA_ADDRESS}"; do
# Checking if ca server using nifi-toolkit is up
echo "Waiting for CA to be available at ${CA_ADDRESS}"
sleep 2
done;
# generate node cert function
generate_node_cert() {
${NIFI_TOOLKIT_HOME}/bin/tls-toolkit.sh client \
-c "nifi-ca" \
-t sixteenCharacters \
--subjectAlternativeNames nifi-registry.apps.oshi43.sudlice.org, $(hostname -f) \
-D "CN=$(hostname -f), OU=NIFI" \
-p 9090
}
cd ${CERT_PATH}
#certs generating (reuse old certs if available)
# 1. nifi-registry node cert
if [ ! -f config.json ] || [ ! -f keystore.jks ] || [ ! -f truststore.jks ];then
rm -f *
generate_node_cert
fi
volumeMounts:
- name: "databaseflow-storage"
mountPath: /opt/nifi-registry/nifi-registry-current/certs
subPath: nifi-registry-current/certs
node cert will be used later when running registry
export_tls_values() {
CERT_PATH=/opt/nifi-registry/nifi-registry-current/certs
export AUTH=tls
export KEYSTORE_PATH=${CERT_PATH}/keystore.jks
export KEYSTORE_TYPE=jks
export KEYSTORE_PASSWORD=$(jq -r .keyStorePassword ${CERT_PATH}/config.json)
export KEY_PASSWORD=$KEYSTORE_PASSWORD
export TRUSTSTORE_PATH=${CERT_PATH}/truststore.jks
export TRUSTSTORE_TYPE=jks
export TRUSTSTORE_PASSWORD=$(jq -r .trustStorePassword ${CERT_PATH}/config.json)
export NIFI_REGISTRY_WEB_HTTPS_HOST=$(hostname -f)
export INITIAL_ADMIN_IDENTITY="CN=admin, OU=NIFI"
}
export_tls_values
${NIFI_REGISTRY_BASE_DIR}/scripts/start.sh
Admin certificate generated during NIFI bootstrap can be used.
<!-- flow.xml tweaking -->
<flowRegistry>
<id>{{ default uuidv4 }}</id>
<name>default</name>
<url>{{ template "registry.url" . }}</url>
<description/>
</flowRegistry>
flow.xml file need to be updated to reflex TLS
KAFKACAT
Kafkacat is ideal solution for quick debug Eventhub/Kafka servers.
kafkacat git
Deployment
# quick deployment with nodeselector and toleration for choose a right node
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: kafkacataff
name: kafkacataff
spec:
replicas: 1
selector:
matchLabels:
app: kafkacataff
template:
metadata:
labels:
app: kafkacataff
spec:
containers:
# local redhat repository for monolog-nifi namespace
- image: image-registry.openshift-image-registry.svc:5000/monolog-nifi/kafkacat:1.6.0
#- image: edenhill/kafkacat:1.6.0
name: kafkacataff
resources: {}
command: ["/bin/sh", "-c", "--"]
args: ["while true; do sleep 30; done;"]
nodeSelector:
node-role.kubernetes.io/logging: ""
tolerations:
- key: "node-role"
operator: "Equal"
value: "logging"
effect: "NoSchedule"
USE AGAINTS EVENTHUB
Shared access signature SASL is used
# list all namespace in eventhub namespace
kafkacat \
-b monolog.servicebus.windows.net:9093 \
-X security.protocol=sasl_ssl \
-X sasl.mechanism=PLAIN \
-X sasl.username='$ConnectionString' \
-X sasl.password='Endpoint=sb://monolog.servicebus.windows.net/;SharedAccessKeyName=ack;SharedAccessKey=$key' \
-L
# produce event with /etc/motd source
kafkacat \
-P \
-b monolog.servicebus.windows.net:9093 \
-t 'log.ocp.oaz_dev_argo_test_in' \
-X security.protocol=sasl_ssl \
-X sasl.mechanism=PLAIN \
-X sasl.username='$ConnectionString' \
-X sasl.password='Endpoint=sb://monolog.servicebus.windows.net/;SharedAccessKeyName=ack;SharedAccessKey=$key' \
-p 0 /etc/motd
# consume events from eventhub test
kafkacat \
-C \
-b monolog.servicebus.windows.net:9093 \
-t 'test' \
-X security.protocol=sasl_ssl \
-X sasl.mechanism=PLAIN \
-X sasl.username='$ConnectionString' \
-X sasl.password='Endpoint=sb://monolog.servicebus.windows.net/;SharedAccessKeyName=ack;SharedAccessKey=$key'