Invoke an LLM using LangChain4J - PART 2: Kubernetes
In this guide, you will learn HOW-TO run the Docker Image we built in Invoke an LLM using LangChain4J - PART 1: Container Image on Kubernetes.
Prerequisites
To complete this guide, you need:
LLM
We basically will repeat everything we did in WildFly Java Microservice - PART 2: Kubernetes with a few changes but, before that, we will deploy smollm2
on Kubernetes using the Ollama container.
You can choose any LLM you like: we chose smollm2
because it’s small and there are fewer chances minikube complains about its size (more on this later on);
Ollama + smollm2
Ollama deployment
Create a file named ollama-deployment.yaml
with the following content:
apiVersion: apps/v1
kind: Deployment
metadata:
name: ollama
spec:
selector:
matchLabels:
app: ollama
template:
metadata:
labels:
app: ollama
spec:
containers:
- name: ollama
image: ollama/ollama:latest
ports:
- name: http
containerPort: 11434
protocol: TCP
apply the Deployment configuration to Kubernetes:
kubectl apply -f ollama-deployment.yaml
Ollama service
Create a file named ollama-service.yaml
with the following content:
apiVersion: v1
kind: Service
metadata:
name: ollama-service
labels:
app: ollama
spec:
ports:
- protocol: TCP
port: 11434
targetPort: 11434
selector:
app: ollama
apply the Service configuration to Kubernetes:
kubectl apply -f ollama-service.yaml
smollm2
Now find the name of your running Ollama POD:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
ollama-777d6c546-hmsps 1/1 Running 0 39s
and use it to get a shell to the running container and, once connected, pull smollm2
:
$ kubectl exec --stdin --tty ollama-777d6c546-hmsps — /bin/bash
root@ollama-777d6c546-hmsps:/# ollama pull smollm2
pulling manifest
pulling 4d2396b16114… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.8 GB
pulling fbacade46b4d… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 68 B
pulling dfebd0343bdd… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 1.8 KB
pulling 58d1e17ffe51… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 11 KB
pulling f02dd72bb242… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 59 B
pulling 6c6b9193c417… 100% ▕██████████████████████████████████████████████████████████████████████████████████████████████████████▏ 559 B
verifying sha256 digest
writing manifest
success
after smollm2
has been pulled you can exit the shell;
Note
|
if you are using minikube, and you want to test if Ollama + smollm2 on Kubernetes is working, run kubectl port-forward svc/ollama-service 11434:11434 , then open http://0.0.0.0:11434/ in your browser and you should see "Ollama is running"
|
Image Registry
To make the my-jaxrs-app-llm:latest
Docker Image available to Kubernetes, you need to push it to some Image Registry that is accessible by the Kubernetes cluster you want to use.
Quay.io
There are many options to achieve this; in this guide, we will push the my-jaxrs-app-llm:latest
Docker Image, to the quay.io Image Registry.
Create a public repository named my-jaxrs-app-llm
on quay.io (e.g. https://quay.io/repository/tborgato/my-jaxrs-app-llm).
Note
|
replace tborgato with the name of your account in all the commands that will follow
|
Tag the Docker image:
podman tag my-jaxrs-app-llm quay.io/tborgato/my-jaxrs-app-llm
Push the my-jaxrs-app-llm
Docker Image to it:
podman push quay.io/tborgato/my-jaxrs-app-llm
At this point, the my-jaxrs-app-llm:latest
Docker Image should be publicly available and free to be consumed by any Kubernetes Cluster; you can verify this by running:
podman pull quay.io/tborgato/my-jaxrs-app-llm
Deploy to Kubernetes
To deploy our my-jaxrs-app-llm
Docker Image on minikube, create a file named deployment-my-jaxrs-app-llm.yaml
(see kubernetes deployment) in the same directory as the Dockerfile
and the pom.xml
file, with the following content:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-jaxrs-app-llm-deployment
labels:
app: my-jaxrs-app-llm
spec:
replicas: 1
selector:
matchLabels:
app: my-jaxrs-app-llm
template:
metadata:
labels:
app: my-jaxrs-app-llm
spec:
containers:
- name: my-jaxrs-app-llm
image: quay.io/tborgato/my-jaxrs-app-llm
ports:
- containerPort: 8080
- containerPort: 9990
livenessProbe:
httpGet:
path: /health/live
port: 9990
readinessProbe:
httpGet:
path: /health/ready
port: 9990
startupProbe:
httpGet:
path: /health/started
port: 9990
env:
- name: OLLAMA_CHAT_URL
value: 'http://ollama-service:11434'
- name: OLLAMA_CHAT_MODEL_NAME
value: 'smollm2'
apply the Deployment configuration to Kubernetes:
kubectl apply -f deployment-my-jaxrs-app-llm.yaml
We used minikube as Kubernetes Cluster, hence we expose the deployment as NodePort
:
kubectl expose deployment.apps/my-jaxrs-app-llm-deployment --type=NodePort --port=8080
Check the application
Find out on what IP address/port, minikube is exposing your service:
$ minikube service my-jaxrs-app-llm-deployment --url
http://192.168.39.178:30781
And set the following variable:
export DEPLOYMENT_URL=http://192.168.39.178:30781
Now, invoke the application endpoint using curl
:
curl $DEPLOYMENT_URL/api/tommaso
AiMessage { text = "Ciao Tommaso! Nice to meet you! How are you doing today?" toolExecutionRequests = null }
Alternatively, open the $DEPLOYMENT_URL/api/tommaso
URL in your browser;
Note
|
if you get "dev.langchain4j.exception.HttpException: {"error":"model requires more system memory (3.9 GiB) than is available (3.6 GiB)"}" try to stop minikube and re-start it with "minikube start --memory 7000": this should give minikube enough memory to run smollm2 |
What’s next?
-
Playing with Generative AI with WildFly: contains a Retrieval-Augmented Generation (RAG) example application
-
WildFly Mini Conference March 2025: check the last track which is about MCP
-
Making WildFly Glow with Intelligence: see how to use Glow to find what feature packs and layers your deployment needs to be available on WildFly
-
WildFly AI - monitor and troubleshoot a WildFly server with the WildFly chatbot
References
Back to Guides