Skip to content

OpenAI-compatible HTTP server + docker image#8

Merged
mudler merged 4 commits into
masterfrom
worktree-openai-server
Jun 17, 2026
Merged

OpenAI-compatible HTTP server + docker image#8
mudler merged 4 commits into
masterfrom
worktree-openai-server

Conversation

@mudler

@mudler mudler commented Jun 3, 2026

Copy link
Copy Markdown
Owner

A small OpenAI-drop-in HTTP server for transcription, built on parakeet.cpp.
Point any OpenAI client's base_url at it and call POST /v1/audio/transcriptions.

Rebased onto current master.

What's here

  • examples/server: httplib-based server exposing POST /v1/audio/transcriptions
    (json / text / verbose_json, plus timestamp_granularities[]=word) and /health.
  • Model resolver/fetcher: accepts a local .gguf, an http(s) URL, a <name>.gguf
    in mudler/parakeet-cpp-gguf, or a friendly alias (downloaded and cached once).
  • Unit tests (response formatting, model resolution) plus an opt-in e2e test that
    drives the real binary.
  • A dedicated docker image published to ghcr.io/<owner>/parakeet.cpp-server
    (CPU + CUDA, multi-arch), built from the same Dockerfile as the cli image via a
    second runtime target so ggml compiles once per build job. Binds 0.0.0.0,
    exposes 8080, ships curl for alias fetch.

See examples/server/README.md for usage and the known simplifications
(WAV-only uploads, single segment, serialized inference).

@mudler mudler changed the title Worktree OpenAI server OpenAI server Jun 3, 2026
@mudler mudler changed the title OpenAI server OpenAI-compatible http-server Jun 3, 2026
@jwinpbe

jwinpbe commented Jun 7, 2026

Copy link
Copy Markdown

hello!

just curious if you want any help testing this, or what state it's in. i'm using some odd hardware that can only do vulkan and i'd rather be using parakeet than qwen3 ASR. let me know what you'd like tested.

thanks!

-- jwin

mudler and others added 2 commits June 17, 2026 17:13
Squashed rebase of the worktree-openai-server branch (PR #8) onto current
master. Adds examples/server (httplib-based POST /v1/audio/transcriptions),
model resolver/fetcher, OpenAI response formatter, and unit + e2e tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the Dockerfile into a shared build stage plus two runtime targets:
runtime (cli, default, unchanged) and runtime-server (entrypoint
parakeet-server --host 0.0.0.0, EXPOSE 8080, curl for alias fetch). The
docker workflow now builds and publishes both ghcr images, cli and server,
for each (variant, arch); the server build reuses the cli build stage so
ggml compiles once per job.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mudler mudler force-pushed the worktree-openai-server branch from 23bd49c to d655ede Compare June 17, 2026 17:26
@localai-bot localai-bot changed the title OpenAI-compatible http-server OpenAI-compatible HTTP server + docker image Jun 17, 2026
mudler and others added 2 commits June 17, 2026 17:30
…for production

Add an OpenAI-compatible server section to the main README (build, curl and
OpenAI-client usage) and extend the Docker section to cover the new
parakeet.cpp-server image alongside the cli. Note LocalAI as the production
path in both the main README and examples/server/README.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Builds parakeet-server and runs tests/server_e2e.sh (PARAKEET_SERVER_E2E=1):
fetches the ~125 MB tdt_ctc-110m-q4_k model by alias, starts the server, and
hits POST /v1/audio/transcriptions with a real WAV in json/text/verbose_json
(plus word timestamps), checking the transcription and the 400 paths. Runs on
pull_request and workflow_dispatch, like closed-loop; no NeMo/Python venv.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mudler mudler merged commit 1055fb6 into master Jun 17, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants