OpenAI-compatible HTTP server + docker image#8
Merged
Conversation
|
hello! just curious if you want any help testing this, or what state it's in. i'm using some odd hardware that can only do vulkan and i'd rather be using parakeet than qwen3 ASR. let me know what you'd like tested. thanks! -- jwin |
Squashed rebase of the worktree-openai-server branch (PR #8) onto current master. Adds examples/server (httplib-based POST /v1/audio/transcriptions), model resolver/fetcher, OpenAI response formatter, and unit + e2e tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Split the Dockerfile into a shared build stage plus two runtime targets: runtime (cli, default, unchanged) and runtime-server (entrypoint parakeet-server --host 0.0.0.0, EXPOSE 8080, curl for alias fetch). The docker workflow now builds and publishes both ghcr images, cli and server, for each (variant, arch); the server build reuses the cli build stage so ggml compiles once per job. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
23bd49c to
d655ede
Compare
…for production Add an OpenAI-compatible server section to the main README (build, curl and OpenAI-client usage) and extend the Docker section to cover the new parakeet.cpp-server image alongside the cli. Note LocalAI as the production path in both the main README and examples/server/README.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Builds parakeet-server and runs tests/server_e2e.sh (PARAKEET_SERVER_E2E=1): fetches the ~125 MB tdt_ctc-110m-q4_k model by alias, starts the server, and hits POST /v1/audio/transcriptions with a real WAV in json/text/verbose_json (plus word timestamps), checking the transcription and the 400 paths. Runs on pull_request and workflow_dispatch, like closed-loop; no NeMo/Python venv. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A small OpenAI-drop-in HTTP server for transcription, built on parakeet.cpp.
Point any OpenAI client's
base_urlat it and callPOST /v1/audio/transcriptions.Rebased onto current master.
What's here
examples/server: httplib-based server exposingPOST /v1/audio/transcriptions(
json/text/verbose_json, plustimestamp_granularities[]=word) and/health..gguf, an http(s) URL, a<name>.ggufin
mudler/parakeet-cpp-gguf, or a friendly alias (downloaded and cached once).drives the real binary.
ghcr.io/<owner>/parakeet.cpp-server(CPU + CUDA, multi-arch), built from the same Dockerfile as the cli image via a
second runtime target so ggml compiles once per build job. Binds
0.0.0.0,exposes 8080, ships
curlfor alias fetch.See
examples/server/README.mdfor usage and the known simplifications(WAV-only uploads, single segment, serialized inference).