Invoke an LLM using LangChain4J - PART 2: Kubernetes

Table of Contents

Prerequisites
LLM
- Ollama + smollm2
Image Registry
- Quay.io
Deploy to Kubernetes
- Check the application
What’s next?
References

In this guide, you will learn HOW-TO run the Docker Image we built in Invoke an LLM using LangChain4J - PART 1: Container Image on Kubernetes.

Prerequisites

To complete this guide, you need:

Complete Invoke an LLM using LangChain4J - PART 1: Container Image

LLM

We basically will repeat everything we did in WildFly Java Microservice - PART 2: Kubernetes with a few changes but, before that, we will deploy smollm2 on Kubernetes using the Ollama container.

You can choose any LLM you like: we chose smollm2 because it’s small and there are fewer chances minikube complains about its size (more on this later on);

Ollama + smollm2

Ollama deployment

Create a file named ollama-deployment.yaml with the following content:

ollama-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: ollama
spec:
  selector:
    matchLabels:
      app: ollama
  template:
    metadata:
      labels:
        app: ollama
    spec:
      containers:
      - name: ollama
        image: ollama/ollama:latest
        ports:
        - name: http
          containerPort: 11434
          protocol: TCP

apply the Deployment configuration to Kubernetes:

kubectl apply -f ollama-deployment.yaml

Ollama service

Create a file named ollama-service.yaml with the following content:

ollama-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: ollama-service
  labels:
    app: ollama
spec:
  ports:
    - protocol: TCP
      port: 11434
      targetPort: 11434
  selector:
    app: ollama

apply the Service configuration to Kubernetes:

kubectl apply -f ollama-service.yaml

smollm2

Now find the name of your running Ollama POD:

$ kubectl get pods
NAME                                          READY   STATUS    RESTARTS   AGE
ollama-777d6c546-hmsps                        1/1     Running   0          39s

and use it to get a shell to the running container and, once connected, pull smollm2:

$ kubectl exec --stdin --tty ollama-777d6c546-hmsps — /bin/bash
root@ollama-777d6c546-hmsps:/# ollama pull smollm2
pulling manifest
pulling 4d2396b16114… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.8 GB
pulling fbacade46b4d… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏   68 B
pulling dfebd0343bdd… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.8 KB
pulling 58d1e17ffe51… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏  11 KB
pulling f02dd72bb242… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏   59 B
pulling 6c6b9193c417… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏  559 B
verifying sha256 digest
writing manifest
success

after smollm2 has been pulled you can exit the shell;

Note	if you are using minikube, and you want to test if Ollama + `smollm2` on Kubernetes is working, run `kubectl port-forward svc/ollama-service 11434:11434`, then open http://0.0.0.0:11434/ in your browser and you should see "Ollama is running"

Image Registry

To make the my-jaxrs-app-llm:latest Docker Image available to Kubernetes, you need to push it to some Image Registry that is accessible by the Kubernetes cluster you want to use.

Quay.io

There are many options to achieve this; in this guide, we will push the my-jaxrs-app-llm:latest Docker Image, to the quay.io Image Registry.

Create a public repository named my-jaxrs-app-llm on quay.io (e.g. https://quay.io/repository/tborgato/my-jaxrs-app-llm).

Note	replace `tborgato` with the name of your account in all the commands that will follow

Tag the Docker image:

podman tag my-jaxrs-app-llm quay.io/tborgato/my-jaxrs-app-llm

Push the my-jaxrs-app-llm Docker Image to it:

podman push quay.io/tborgato/my-jaxrs-app-llm

At this point, the my-jaxrs-app-llm:latest Docker Image should be publicly available and free to be consumed by any Kubernetes Cluster; you can verify this by running:

podman pull quay.io/tborgato/my-jaxrs-app-llm

Deploy to Kubernetes

To deploy our my-jaxrs-app-llm Docker Image on minikube, create a file named deployment-my-jaxrs-app-llm.yaml (see kubernetes deployment) in the same directory as the Dockerfile and the pom.xml file, with the following content:

deployment-my-jaxrs-app-llm.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-jaxrs-app-llm-deployment
  labels:
    app: my-jaxrs-app-llm
spec:
  replicas: 1
  selector:
    matchLabels:
      app: my-jaxrs-app-llm
  template:
    metadata:
      labels:
        app: my-jaxrs-app-llm
    spec:
      containers:
      - name: my-jaxrs-app-llm
        image: quay.io/tborgato/my-jaxrs-app-llm
        ports:
        - containerPort: 8080
        - containerPort: 9990
        livenessProbe:
          httpGet:
            path: /health/live
            port: 9990
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 9990
        startupProbe:
          httpGet:
            path: /health/started
            port: 9990
        env:
        - name: OLLAMA_CHAT_URL
          value: 'http://ollama-service:11434'
        - name: OLLAMA_CHAT_MODEL_NAME
          value: 'smollm2'

apply the Deployment configuration to Kubernetes:

kubectl apply -f deployment-my-jaxrs-app-llm.yaml

We used minikube as Kubernetes Cluster, hence we expose the deployment as NodePort:

kubectl expose deployment.apps/my-jaxrs-app-llm-deployment --type=NodePort --port=8080

Check the application

Find out on what IP address/port, minikube is exposing your service:

$ minikube service my-jaxrs-app-llm-deployment --url
http://192.168.39.178:30781

And set the following variable:

export DEPLOYMENT_URL=http://192.168.39.178:30781

Now, invoke the application endpoint using curl:

curl $DEPLOYMENT_URL/api/tommaso
AiMessage { text = "Ciao Tommaso! Nice to meet you! How are you doing today?" toolExecutionRequests = null }

Alternatively, open the $DEPLOYMENT_URL/api/tommaso URL in your browser;

Note	if you get "dev.langchain4j.exception.HttpException: {"error":"model requires more system memory (3.9 GiB) than is available (3.6 GiB)"}" try to stop minikube and re-start it with "minikube start --memory 7000": this should give minikube enough memory to run smollm2

What’s next?

Playing with Generative AI with WildFly: contains a Retrieval-Augmented Generation (RAG) example application
WildFly Mini Conference March 2025: check the last track which is about MCP
Making WildFly Glow with Intelligence: see how to use Glow to find what feature packs and layers your deployment needs to be available on WildFly
WildFly AI - monitor and troubleshoot a WildFly server with the WildFly chatbot

References

Source code for this guide: https://github.com/wildfly-extras/guides/tree/main/get-started-microservices-on-kubernetes/simple-microservice-llm

Back to Guides

< Back to Getting Started with WildFly micro-services on Kubernetes

< Back to Guides