Debugging Hugging Face Spaces: Fixing "Connection Refused" and Container Crashes

You push your latest Docker container to a Hugging Face Space. The build logs show a successful image creation. You see "Building..." turn into "Running...". Then, ten seconds later, the status flips to "Runtime Error" or "Paused".

The logs provide a cryptic hint: Connection refused or simply a silence where the application logs should be.

For DevOps engineers and AI prototypers deploying custom stacks—particularly those involving Ollama, heavy CUDA dependencies, or custom Gradio endpoints—this is the most common friction point. The container runs perfectly on localhost, yet fails immediately inside the Spaces infrastructure.

This guide provides the root cause analysis for these networking failures and definitive, production-ready Dockerfile configurations to fix them.

The Root Cause: Networking Interfaces and Port 7860

To fix the crash, you must understand the constraint. Hugging Face Spaces (specifically Docker Spaces) operate on a strict networking contract.

1. The Localhost Trap

When you run a modern web server (FastAPI, Flask, Gradio, or Ollama) locally, it often defaults to binding to 127.0.0.1 (localhost). This is a loopback interface accessible only from within the container itself.

The Spaces load balancer sits outside your container. If your app listens on 127.0.0.1, the load balancer attempts to connect to your container's IP, finds no service listening on the public interface, and terminates the connection.

The Solution: Your application must listen on 0.0.0.0 (all network interfaces).

2. The Immutable Port

Hugging Face Spaces expects your application to expose exactly one port: 7860.

If you deploy a custom Docker container, the infrastructure health check probes port 7860 over HTTP. If it receives a 200 OK (or similar successful response), the Space is marked "Running." If your app runs on port 3000, 8000, or 11434 (Ollama's default), the health check fails, and the orchestration layer kills the container.

3. Permission Denials (User 1000)

By default, Spaces containers do not run as root. They typically run with user ID 1000. If your Dockerfile copies files as root but doesn't adjust permissions, or if your application tries to write cache files (like Hugging Face Hub models or Ollama blobs) to a root-owned directory, the process will crash silently on startup.

Solution 1: The Robust Gradio Dockerfile

If you are wrapping a standard Python application, you must explicitly enforce the host and port via environment variables. Relying on default arguments is risky.

Here is a rigorous Dockerfile that solves both the networking and permission issues.

FROM python:3.10-slim

# Set up a new user named "user" with user ID 1000
RUN useradd -m -u 1000 user

# Switch to this user
USER user

# Set home to the user's home directory
ENV HOME=/home/user \
    PATH=/home/user/.local/bin:$PATH

# Set the working directory to the user's home directory
WORKDIR $HOME/app

# Copy the current directory contents into the container at $HOME/app setting the owner to the user
COPY --chown=user . $HOME/app

# Install requirements.txt
RUN pip install --no-cache-dir --upgrade pip && \
    pip install --no-cache-dir -r requirements.txt

# CRITICAL: Define environment variables for Gradio
# 0.0.0.0 exposes the app to the external load balancer
# 7860 is the required port for HF Spaces
ENV GRADIO_SERVER_NAME="0.0.0.0"
ENV GRADIO_SERVER_PORT="7860"

# Command to run the application
CMD ["python", "app.py"]

Why This Works

--chown=user: Ensures the application code is owned by the user running the process, preventing PermissionError during runtime file operations.
ENV GRADIO_SERVER_NAME="0.0.0.0": Forces Gradio to bind to all interfaces, bypassing the "localhost trap."
ENV GRADIO_SERVER_PORT="7860": Hardcodes the listener port to match the Spaces infrastructure requirement.

Solution 2: The Complex Fix (Ollama + UI)

The challenge multiplies when using tools like Ollama. Ollama runs a background server on port 11434. You cannot simply expose port 11434 because Spaces only listens on 7860.

To solve this, you need an Entrypoint Script. This script acts as a process manager that:

Starts the Ollama server in the background.
Waits for the server to be ready.
Starts your UI (Gradio/Streamlit) on port 7860 to serve as the frontend.

Step 1: The Entrypoint Script (`entrypoint.sh`)

Create a file named entrypoint.sh in your project root.

#!/bin/bash

# Start Ollama in the background.
# We redirect output to a log file to keep the console clean for the main app,
# or you can leave it to stdout for debugging.
ollama serve &

# Record the Process ID of the background process
OLLAMA_PID=$!

echo "⏳ Waiting for Ollama to start..."

# Loop until Ollama accepts connections on localhost:11434
while ! curl -s http://localhost:11434/api/tags > /dev/null; do
    sleep 1
done

echo "✅ Ollama is active. Pulling model..."

# Pull the requested model (e.g., llama3, mistral)
# This might take time on the first boot if not cached
ollama pull mistral

echo "🚀 Starting Gradio Interface..."

# Start the Python application
# The Python app MUST connect to localhost:11434 to talk to Ollama
# And MUST listen on 0.0.0.0:7860 to talk to the internet
python app.py

Step 2: The Advanced Dockerfile

This Dockerfile installs Ollama, configures permissions, and utilizes the entrypoint script.

FROM python:3.10-slim

# Install curl (required for Ollama installation and health checks)
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

# Install Ollama
RUN curl -fsSL https://ollama.com/install.sh | sh

# Create non-root user
RUN useradd -m -u 1000 user
USER user
ENV HOME=/home/user \
    PATH=/home/user/.local/bin:$PATH

WORKDIR $HOME/app

# Create directory for Ollama models and set ownership
# This prevents permission errors when "ollama pull" tries to write to ~/.ollama
RUN mkdir -p $HOME/.ollama && chmod 777 $HOME/.ollama

COPY --chown=user . $HOME/app

RUN pip install --no-cache-dir -r requirements.txt

# Grant execution rights to the entrypoint script
RUN chmod +x entrypoint.sh

# Environment variables
ENV OLLAMA_HOST="0.0.0.0:11434"
ENV GRADIO_SERVER_NAME="0.0.0.0"
ENV GRADIO_SERVER_PORT="7860"

ENTRYPOINT ["./entrypoint.sh"]

Deep Dive: Handling Application State

When running dual processes (Ollama + Python) inside a single container, handling the application state is critical for uptime.

The "Zombie" Container Issue

If you use the standard shell format for CMD or ENTRYPOINT, the shell becomes PID 1. If the shell doesn't handle signals correctly, your container might hang when Hugging Face tries to restart it during updates.

In the entrypoint.sh above, if the Python script crashes, the container effectively stops doing useful work, but the ollama serve background process keeps the container alive.

Best Practice: Use Python's subprocess or a tool like supervisord if you need rigorous process management. However, for most Spaces prototypes, simply ensuring the Python script is the last command in the shell script (blocking the exit) is sufficient.

Persistent Storage

In the standard Docker Space free tier, the filesystem is ephemeral. If your Space restarts, you lose the downloaded Ollama models.

To fix this, you must enable Persistent Storage in your Space settings. Once enabled, map the Ollama directory to the persistent volume path (usually /data).

Update your Dockerfile ENV:

# Assuming /data is the persistent volume mount point
ENV OLLAMA_MODELS="/data/.ollama/models"

Debugging Workflow

If you are still facing connection refusals, follow this diagnostic workflow:

Local Replication: Run the container locally with the exact constraints.
```
docker build -t my-space .
docker run -it -p 7860:7860 my-space
```
If you cannot access http://localhost:7860 in your browser, the issue is inside the container, not the HF infrastructure.
Check IP Binding: Inside the container, run netstat -tuln (you may need to install net-tools). You must see 0.0.0.0:7860. If you see 127.0.0.1:7860, your GRADIO_SERVER_NAME variable is ignored or unset.
Inspect Health Check Logs: In the Spaces "Logs" tab, look for lines immediately preceding the crash. If you see "Application startup complete" followed by a crash, it is likely a memory OOM (Out of Memory) error. Free tier Spaces have limited RAM (16GB). Loading a 70B parameter model will crash the container immediately regardless of port configurations.

Conclusion

Deploying to Hugging Face Spaces requires strictly adhering to the "Port 7860 on 0.0.0.0" rule. By controlling your Dockerfile user permissions and using an entrypoint script for multi-process containers like Ollama, you can eliminate generic "Runtime Error" messages.

Always remember: The container is an island. Unless you build a bridge to 0.0.0.0, the world cannot visit.

Programming Tutorials

Search This Blog