Avoid launching a web browser

This is helpful. I got it running by downgrading to Open-WebUI 5.18. I realize this is not an ideal or sustainable solution but it may work for the time being. Appreciate your reply.

Hi Dan,

Sorry for the delay.
I simply use docker build with the Dockerfile provided in the branch and --build-arg ā€œUSE_CUDA=trueā€.
Then, I run the branch with a podman compose file to start necessary services including authentication proxy etc.

export app_port=$(find_port "$host")
export image_port=$(find_port "$host")
export ollama_port=$(find_port "$host")
cat <<EOF >compose.yml
name: open-webui-pod
services:
  open-webui:
    container_name: open-webui
    image: YOUR_CUSTOM_WEBUI_TAG_HERE
    environment:
      PORT: $app_port
      ROOT_PATH: rnode/$host/$port
      ENABLE_OPENAI_API: False
      WEBUI_AUTH: False
      WHISPER_MODEL: large-v2
      AUDIO_TTS_OPENAI_API_BASE_URL: http://openedai-speech:4134/v1
      ENABLE_IMAGE_GENERATION: True
      AUTOMATIC1111_BASE_URL: http://host.containers.internal:$image_port
      OLLAMA_BASE_URL: http://host.containers.internal:$ollama_port
    volumes:
      - open-webui/open-webui:/app/backend/data:z
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            capabilities: [gpu]
  ollama:
    network_mode: host
    container_name: open-webui-ollama
    image: docker.io/ollama/ollama:latest
    tty: true
    restart: unless-stopped
    environment:
      OLLAMA_HOST: 127.0.0.1:$ollama_port
    volumes:
      - open-webui/ollama:/root/.ollama:z
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
  openedai-speech:
    container_name: open-webui-openedai-speech
    image: ghcr.io/matatonic/openedai-speech
    volumes:
      - openedai-speech/speech.env:/app/speech.env:z
      - openedai-speech/config:/app/config:z
      - openedai-speech/voices:/app/voices:z
    entrypoint: ["python", "/app/speech.py", "--port" , "4134"]
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              capabilities: [gpu]
  caddy:
    container_name: open-webui-caddy
    image: docker.io/caddy:latest
    restart: unless-stopped
    cap_add:
      - NET_ADMIN
    ports:
      - $port:$port
    volumes:
      - $PWD/Caddyfile:/etc/caddy/Caddyfile:z
EOF
podman-compose up

As you can see it’s a bit of extra work hence the move to the new reverse proxy setup.

1 Like

Thanks for this! I’ll give it a try. We don’t have podman on our cluster but perhaps I can convert this to Apptainer.

When you mention the ā€œnew reverse proxy setupā€, are you talking about @karcaw 's nginx proxy setup described in this thread, or something else?

I had some trouble with that as well but I’ll give it another shot.

Actually I seem to be having a general problem that is not specific to Open-WebUI but I may as well ask about it while I’m here.

For example, I set Open-WebUI aside and tried with a different UI, called Ollama-chat (here’s my fork of it).

I had it working last week and have not really changed anything since, but now when I start it up and go to the URL:

https://openondemand.fredhutch.org/node/gizmok3/49211/

I get an apache 404 error.

However, if I go directly to the node and port and pass the additional path information:

http://gizmok3:49211/node/gizmok3/49211/

I see the correct page.

My understanding of the reverse proxy config is that node aka node_uri will proxy /<node_uri>/<host>/<port>/... to /<node_uri>/<host>/<port>/... on the target node and port.

This is frustrating because it seems to come and go at will - does anyone have any idea how to troubleshoot this? We’re running OOD 4.0.2.

EDIT: Found the problem. There were some cluster nodes that were not matched by host_regex in ood_portal.yml. Was pointed to the solution by other posts here - this forum is super helpful!

Glad to hear this discussion. @wfeinstein and good to see you are on the same track as well! I am looking into Open WebUI recently and indeed the issue was the relative path. Apparently the developer is not planning on natively supporting it any time in the near future.

I saw a few people forked the repo and made their implementations. Their merge request were ultimately rejected. The fix seems too extensive to maintain for a small operation like us…

So beside working on a fix, does anyone look into any alternatives? There are so many to choose from. If anyone knows any alternative that natively supports relative paths, I would be glad to try it out.

Sorry for ignoring messages lately because of the workload.

We have created an Ollama application, which allows users to launch JupyterLab and VS Code sessions. From there, the Jupyternaut interface is enabled, which is supported by Ollama LLMs. For VS Code server, Ollama works via Continuum to provide coding assistance.

LLM inference works fast on GPU nodes, but very slow on CPUs.

This is an interesting point. Since OpenWebUI does not currently plan to support configurable base URLs, I’m curious whether there are alternative solutions that provide this feature natively and would therefore integrate more seamlessly with Open OnDemand.

I set up ollama with open web ui in an vnc session which opens Firefox in kiosk mode. It’s fine but a bit clunky to use. I ended up building my own with a stream lit ui. There are no issues with the ood proxy. It launches like a Jupyter notebook with a token for authentication. It’s still super early on and admittedly needs work. Since this video was made I’ve optimized it a lot and it’s much faster and has a few more features.

1 Like

Hi everyone,
I’ve built an interactive app that containerized text-generation-webui and works with OOD’s proxy system and relative paths. It lets users run and interact with LLMs directly in their browser without any path configuration hassles with openwebui / librechat etc.

Hope this helps! :slight_smile:

3 Likes

Hi All,

Forgot to post this in this thread!

I developed a simple framework for sharing services, like LLM server APIs, amongst users. This does not solve the problem discussed here, but it does provide a way to have an LLM API shared among many users on a traditional Slurm cluster. This can be more efficient than having single jobs running both the API and the client (jupyter, chatbot) that can often be quite wasteful of GPU resources.

Here is the link to a recording of my talk:

You can also check out the GitHub repo – it also contains my presentation slides:

1 Like