Deploy Apache NiFi as Statefullset

Howto, Helmchart, Rolling Upd, tls, peristence ...

Content
*file: 01-zookeper.md *

Openshift Nifi cluster installation

Our process is build on helm chart and heavily modified later:


  • zookeeper
  • nifi-registry
  • nifi-toolkit - for PKI

GIT REPOSITORY

APACHE NIFI GITHUB REPOSITORY

BASE install

Fast forward —>


git clone
#check render
helm template nifi . -f values-oaz-dev.yaml -n nifi --output-dir render --debug

#install
helm install nifi . -n monolog-nifi -f values-oaz-dev.yaml --create-namespace

# to upgrade
helm upgrade nifi . -n monolog-nifi -f values-oaz-dev.yaml

# unable to validate against any security context constraint:
# [provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{1001}:
# 1001 is not an allowed group spec.containers[0].securityContext.runAsUser: Invalid value: 1001: must be in the ranges: [1000690000, 1000699999]]

# ach openshift, namespace get uid range dynamically , and service account default cannot run at others uids then defined for namespace
oc get ns nifi -o=jsonpath='{.metadata.annotations}'|jq
{
  "openshift.io/description": "",
  "openshift.io/display-name": "",
  "openshift.io/requester": "system:serviceaccount:openshift-apiserver:openshift-apiserver-sa",
  "openshift.io/sa.scc.mcs": "s0:c26,c20",
  "openshift.io/sa.scc.supplemental-groups": "1000690000/10000",
  "openshift.io/sa.scc.uid-range": "1000690000/10000"
}

Remove helmchart

# uninstall helm chart
helm delete nifi
oc delete pvc -l app.kubernetes.io/instance=nifi

Ok we need a scc RunAsAny something like this: Short scc overview

apiVersion: security.openshift.io/v1
metadata:
  annotations:
    kubernetes.io/description: nifi-scc provides all features of the restricted SCC but
      allows users to run with any UID and any GID.
  name: nifi-scc
allowHostDirVolumePlugin: false
allowHostIPC: false
allowHostNetwork: false
allowHostPID: false
allowHostPorts: false
allowPrivilegeEscalation: true
allowPrivilegedContainer: false
allowedCapabilities: null
defaultAddCapabilities: null
fsGroup:
  type: RunAsAny
groups: []
priority: 10
readOnlyRootFilesystem: false
requiredDropCapabilities:
- MKNOD
runAsUser:
  type: RunAsAny
seLinuxContext:
  type: MustRunAs
supplementalGroups:
  type: RunAsAny
users:
- system:serviceaccount:nifi:default #define user
volumes:
- configMap
- downwardAPI
- emptyDir
- persistentVolumeClaim
- projected
- secret
# ok, create scc for user default
oc get scc nifi-scc -o=jsonpath='{.users}'
> ["system:serviceaccount:nifi:default"]%

# check the pods for  UID and fsGroup
oc get pod -o jsonpath='{range .items[*]}{@.metadata.name}{" runAsUser: "}{@.spec.containers[*].securityContext.runAsUser}{" fsGroup: "}{@.spec.securityContext.fsGroup}{" seLinuxOptions: "}{@.spec.securityContext.seLinuxOptions.level}{"\n"}{end}'
# security context can be defined in pod scope or container scope
oc get pod -o jsonpath='{range .items[*]}{@.metadata.name}{" runAsUser: "}{@.spec.containers[*].securityContext.runAsUser}{@.spec.securityContext.runAsUser}{" fsGroup: "}{@.spec.securityContext.fsGroup}{" seLinuxOptions: "}{@.spec.securityContext.seLinuxOptions.level}{"\n"}{end}'

# move on, but we still have a problems, now during image pulling
oc get pods

NAME               READY   STATUS                  RESTARTS   AGE
nifi-zookeeper-0   0/1     ImagePullBackOff        0          25m
nifi-zookeeper-1   0/1     ImagePullBackOff        0          25m
nifi-zookeeper-2   0/1     ImagePullBackOff        0          25m

oc describe pods nifi-zookeeper-0|grep Failed
# ok it seems that our public repository is not whitelisted in containter registry (only one we can use)
Failed    kubelet   Failed to pull image "docker.io/bitnami/zookeeper:3.6.2-debian-10-r37": rpc error: code = Unknown desc =
                    (Mirrors also failed: [artifactory.sudlice.cz:443/docker-io/bitnami/zookeeper:3.6.2-debian-10-r37:
                    Error reading manifest 3.6.2-debian-10-r37 in artifactory.sudlice.cz:443/docker-io/bitnami/zookeeper:
                    manifest unknown: The named manifest is not known to the registry.]): docker.io/bitnami/zookeeper:3.6.2-debian-10-r37:
                    error pinging docker registry registry-1.docker.io: Get "https://registry-1.docker.io/v2/": Proxy authentication required
Failed    kubelet   Error: ErrImagePull
Failed    kubelet   Error: ImagePullBackOff

Mirror registry/local repository

As a workaroud we will use Openshift internal Container Registry: local container registry a pak kdyžtak zažádáme o přidání repozitářů do whitelistu.

SC pro stateFullSet Zookeper

For storage class, managed-premium SC will be used on AzureDisks. but:

AzureDisks cannot be created with other redundancy then LRS. Pods with PVC are strained to stay only in oneZone as nodeAffinity. VM provided in Azure has limited numbers of datadisks used as PV. For making it as ha as possible, we will spread topology between all three availibility zones:

oc get nodes -L failure-domain.beta.kubernetes.io/zone|grep worker|awk '{print $1" "$3" "$6}'

oaz-dev-trkn8-worker-westeurope1-fxthw worker westeurope-1
oaz-dev-trkn8-worker-westeurope2-vg8lc worker westeurope-2
oaz-dev-trkn8-worker-westeurope3-wmqpz worker westeurope-3
# this feature available for 1.19 kubernetes
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: "failure-domain.beta.kubernetes.io/zone"
        whenUnsatisfiable: "DoNotSchedule"
        labelSelector:
            app.kubernetes.io/name: zookeeper

Zookeeper STS rolling update

Zookeeper chart use an old version of Zookeeper. Rollout for win. Change image in helm chart. Render and apply.

Rolling update will be used as default for STS

oc rollout status statefulset/nifi-zookeeper
statefulset rolling update complete 3 pods at revision nifi-zookeeper-64675fdd78...
#for change-cause annotation we can set annotation
#add it to helmchart
oc annotate statefulsets.apps/nifi-zookeeper kubernetes.io/change-cause="zookeeper:3.7.0-debian-10-r40"
*file: 02-nifi-ca-repository.md *

NIFI CA deployment

Simple deployment with persistence, , publish CA endpoint and approve CSR for users and server infrastructure.

/bin/tls-toolkit.sh server
*file: 03-apache-nifi.md *

Apache NIFI

Apache NIFI now runs as secured. It means all nodes ask CA for certificate during boot and Admin User and certificate is created.

PERSISTENCE VOLUMES mapping

PV:
mountPath: /opt/nifi/data
mountPath: /opt/nifi/nifi-current/auth-conf/
mountPath: /opt/nifi/nifi-current/config-data
mountPath: /opt/nifi/flowfile_repository
mountPath: /opt/nifi/content_repository
mountPath: /opt/nifi/provenance_repository
mountPath: /opt/nifi/nifi-current/logs
CM:
mountPath: /opt/nifi/nifi-current/conf/bootstrap.conf
mountPath: /opt/nifi/nifi-current/conf/nifi.temp –>nifi.properties
mountPath: /opt/nifi/nifi-current/conf/authorizers.temp –>authorizers.xml
mountPath: /opt/nifi/nifi-current/conf/authorizers.empty
mountPath: /opt/nifi/nifi-current/conf/bootstrap-notification-services.xml
mountPath: /opt/nifi/nifi-current/conf/login-identity-providers.xml
mountPath: /opt/nifi/nifi-current/conf/state-management.xml
mountPath: /opt/nifi/nifi-current/conf/zookeeper.properties
mountPath: /opt/nifi/data/flow.xml
mountPath: /opt/nifi/nifi-current/config-data/certs/generate_user.sh

AUTH

Nodes and admin user are predefined(max replicaset 5):

/opt/nifi/nifi-current/conf/authorizers.xml

        {{- range $i := until $nodeidentities }}
        <property name="Node Identity {{ $i }}">CN={{ $fullname }}-{{ $i }}.{{ $fullname }}-headless.{{ $namespace }}.svc.cluster.local, OU=NIFI</property>
        {{- end }}

–> na zakladě tohoto souboru je upraven inicializační soubor s uživateli

/opt/nifi/nifi-current/auth-conf/users.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<tenants>
    <groups/>
    <users>
        <user identifier="171c708a-7250-37ba-95b7-e3d52258fc8a" identity="CN=nifi-3.nifi-headless.nifi.svc.cluster.local, OU=NIFI"/>
        <user identifier="47c717db-75da-3d54-8ab3-1731497291c7" identity="CN=admin, OU=NIFI"/>
        <user identifier="66afe269-10cc-37da-9785-3e72cbc609c8" identity="CN=nifi-2.nifi-headless.nifi.svc.cluster.local, OU=NIFI"/>
        <user identifier="5ac2302b-365e-3d9a-a24e-f17565d2ca08" identity="CN=nifi-0.nifi-headless.nifi.svc.cluster.local, OU=NIFI"/>
        <user identifier="f23a3051-d154-3f63-8674-fb8acb8a8030" identity="CN=nifi-4.nifi-headless.nifi.svc.cluster.local, OU=NIFI"/>
        <user identifier="802187fa-2f40-30b4-8554-c32b425ab945" identity="CN=nifi-1.nifi-headless.nifi.svc.cluster.local, OU=NIFI"/>
    </users>
</tenants>

Every identity with certificate signed by internal CA (CA pod) can access web UI (mutual TLS).

tls-toolkit.sh client \
  -c "nifi-ca" \
  -t domluveneheslokvulireplayattacku \
  --subjectAlternativeNames routehostname \
  -p 9090 \
  -D "CN=$USER, OU=NIFI" \
  -T PKCS12

Proto aby nebylo možné podepisovat si certifikáty z jiných namespaců je nasazena network policy omezující přístupy pouze na ingres(routu) nebo ze stejného namespacu.

In fact we struggle in validate internal CA, normaly we will use reencrypt at router with custom CA trust but:
passthrough - This is currently the only method that can support requiring client certificates, also known as two-way authentication.
It means that in browser our connection will be marked as “Not secure” because of use self-signed CA.

Directly in nifi users are presented like:

caption

caption

RBAC model for users is presented

/opt/nifi/nifi-current/auth-conf/authorizations.xml

For new user CSR is need to be done and approve it with custom CA, afterwards manually add user in NIFI-ui and define RBAC. Node in lead will distribute xml files across all other nodes.
Automatically generated certificates for node and Admin take place at /opt/nifi/nifi-current/config-data/certs
For generating new user cert generate_user.sh script can be used.

ADMIN CERT

Admin CSR is generated automatically and signed during nifi bootstrap by CA.

# copy localy and import to browser
oc cp nifi-0:/opt/nifi/nifi-current/config-data/certs/admin/certAdmin.p12 ./certAdmin.p12

GENERATE new USER

Helper script is ready for use in nifi sts, DN=“CN=$user, OU=NIFI”

oc rsh nifi-0
# create certificate 
/opt/nifi/nifi-current/config-data/certs/generate_user.sh  $user
# cert is created in 
/opt/nifi/nifi-current/config-data/certs/$user/cert_$user.p12
# copy cert to local and import it to browser
oc cp nifi-0:/opt/nifi/nifi-current/config-data/certs/$user/cert_$user.p12 ./cert_${user}.p12

SCALING

Škálovat můžeme až do hodnoty 5, kvůli předefinovaným identitám. Problém je se škálováním směrem dolů. Odpojený nifi nód zůstane ve stavu disconnected a je nutné ho ručně smazat(konfiguračně je cluster přepnut do readonly).
In a NiFi cluster, NiFi wants to make sure consistency across all nodes. You can’t have each node in a NiFi cluster running a different version/state of the flow.xml.gz file. In a cluster, NiFi will replicate a request (such as stop x processor(s)) to all nodes. Since a node is not connected, that replication cannot occur. So to protect the integrity of the cluster, the NiFi canvas is essentially read-only while a node is disconnected.

*file: 04-apache-nifi-registry.md *

APACHE NIFI registry STS

  • Implementation of a Flow Registry for storing and managing versioned flows
  • Integration with NiFi to allow storing, retrieving, and upgrading versioned flows from a Flow Registry

Simple sts running with replicaset 1.

TLS CERTIFICATES for user access

Certificates will be generated with tls-toolkit against running CA

        - name: cert-request
          imagePullPolicy: "IfNotPresent"
          image: "apache/nifi-toolkit:1.13.2"
          command:
            - bash
            - -c
            - |
              CERT_PATH="/opt/nifi-registry/nifi-registry-current/certs"
              CA_ADDRESS="nifi-ca:9090"
              until echo "" | timeout -t 2 openssl s_client -connect "${CA_ADDRESS}"; do
                # Checking if ca server using nifi-toolkit is up
                echo "Waiting for CA to be available at ${CA_ADDRESS}"
                sleep 2
              done;
              # generate node cert function
              generate_node_cert() {
               ${NIFI_TOOLKIT_HOME}/bin/tls-toolkit.sh client \
                -c "nifi-ca" \
                -t sixteenCharacters \
                --subjectAlternativeNames nifi-registry.apps.oshi43.sudlice.org, $(hostname -f) \
                -D "CN=$(hostname -f), OU=NIFI" \
                -p 9090
                }
              cd ${CERT_PATH}
              #certs generating (reuse old certs if available)
              # 1. nifi-registry node cert
              if [ ! -f config.json ] || [ ! -f keystore.jks ] || [ ! -f truststore.jks ];then 
                rm -f *
                generate_node_cert
              fi              
          volumeMounts:
            - name: "databaseflow-storage"
              mountPath: /opt/nifi-registry/nifi-registry-current/certs
              subPath: nifi-registry-current/certs

node cert will be used later when running registry

              export_tls_values() {
                CERT_PATH=/opt/nifi-registry/nifi-registry-current/certs
                export AUTH=tls
                export KEYSTORE_PATH=${CERT_PATH}/keystore.jks
                export KEYSTORE_TYPE=jks
                export KEYSTORE_PASSWORD=$(jq -r .keyStorePassword ${CERT_PATH}/config.json)
                export KEY_PASSWORD=$KEYSTORE_PASSWORD
                export TRUSTSTORE_PATH=${CERT_PATH}/truststore.jks
                export TRUSTSTORE_TYPE=jks
                export TRUSTSTORE_PASSWORD=$(jq -r .trustStorePassword ${CERT_PATH}/config.json)
                export NIFI_REGISTRY_WEB_HTTPS_HOST=$(hostname -f)
                export INITIAL_ADMIN_IDENTITY="CN=admin, OU=NIFI"
              }
              export_tls_values
                ${NIFI_REGISTRY_BASE_DIR}/scripts/start.sh

Admin certificate generated during NIFI bootstrap can be used.

<!-- flow.xml tweaking -->
        <flowRegistry>
            <id>{{ default uuidv4 }}</id>
            <name>default</name>
            <url>{{ template "registry.url" . }}</url>
            <description/>
        </flowRegistry>

flow.xml file need to be updated to reflex TLS

*file: 05-kafka-eventhub-debug.md *

KAFKACAT

Kafkacat is ideal solution for quick debug Eventhub/Kafka servers.
kafkacat git

Deployment

# quick deployment with nodeselector and toleration for choose a right node 
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: kafkacataff
  name: kafkacataff
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kafkacataff
  template:
    metadata:
      labels:
        app: kafkacataff
    spec:
      containers:
      # local redhat repository for monolog-nifi namespace
      - image: image-registry.openshift-image-registry.svc:5000/monolog-nifi/kafkacat:1.6.0
      #- image: edenhill/kafkacat:1.6.0
        name: kafkacataff
        resources: {}
        command: ["/bin/sh", "-c", "--"]
        args: ["while true; do sleep 30; done;"]
      nodeSelector:
        node-role.kubernetes.io/logging: ""
      tolerations:
      - key: "node-role"
        operator: "Equal"
        value: "logging"
        effect: "NoSchedule"

USE AGAINTS EVENTHUB

Shared access signature SASL is used

# list all namespace in eventhub namespace
 kafkacat \
-b monolog.servicebus.windows.net:9093 \
-X security.protocol=sasl_ssl \
-X sasl.mechanism=PLAIN \
-X sasl.username='$ConnectionString' \
-X sasl.password='Endpoint=sb://monolog.servicebus.windows.net/;SharedAccessKeyName=ack;SharedAccessKey=$key' \
-L
# produce event with /etc/motd source
kafkacat \
-P \
-b monolog.servicebus.windows.net:9093 \
-t 'log.ocp.oaz_dev_argo_test_in' \
-X security.protocol=sasl_ssl \
-X sasl.mechanism=PLAIN \
-X sasl.username='$ConnectionString' \
-X sasl.password='Endpoint=sb://monolog.servicebus.windows.net/;SharedAccessKeyName=ack;SharedAccessKey=$key' \
-p 0 /etc/motd
# consume events from eventhub test
kafkacat \
-C \
-b monolog.servicebus.windows.net:9093 \
-t 'test' \
-X security.protocol=sasl_ssl \
-X sasl.mechanism=PLAIN \
-X sasl.username='$ConnectionString' \
-X sasl.password='Endpoint=sb://monolog.servicebus.windows.net/;SharedAccessKeyName=ack;SharedAccessKey=$key'