日本語

LLM Post-processing Guide

Configure Ollama or LM Studio so HootVoice can clean up, summarize, and rephrase Whisper transcripts using a local LLM.

Overview

When LLM post-processing is enabled, HootVoice sends each Whisper transcript to a local LLM endpoint so the text can be polished, made polite, or summarised automatically. The feature expects an OpenAI-compatible API such as Ollama or LM Studio.

The toggle is disabled by default. Open Settings → LLM and enable Enable LLM post-processing, then configure the API base URL and model name to match your local server. In that tab we recommend starting with gemma-3-12b-it quantised to Q4_K_M for proofreading-focused workflows.

How it Works

  1. HootVoice transcribes your recording locally with Whisper.
  2. Once transcription finishes, it calls /v1/chat/completions on the configured LLM endpoint.
  3. The LLM response replaces the raw transcript in the log and clipboard.
  4. If auto-paste is enabled, the processed text is inserted into the frontmost app.

If the API fails or times out, HootVoice falls back to the original Whisper text. Logs capture HTTP status codes and any error payloads for quick diagnostics.

Setup Checklist

Using Ollama

Ollama exposes an OpenAI-compatible REST API at http://localhost:11434/v1, which matches the default value in HootVoice.

macOS

  1. Install via brew install ollama (requires Homebrew).
  2. Run ollama run llama3.1:8b to download and cache the model.
  3. Keep the background service running with ollama serve or the Ollama menu bar app.

Windows

  1. Download the installer from ollama.com and complete setup.
  2. Open PowerShell and run ollama run llama3.1:8b to fetch the model.
  3. The service stays active in the background; manage it from the system tray.

Linux

  1. Run curl https://ollama.ai/install.sh | sh.
  2. Enable the user service with systemctl --user enable --now ollama.
  3. Download a model via ollama run llama3.1:8b and verify the API responds.

Test connectivity with:

curl http://localhost:11434/v1/models

Using LM Studio

LM Studio offers a GUI for managing models and ships with an OpenAI-compatible server. The default port is 1234, so set HootVoice’s base URL to http://localhost:1234/v1.

macOS

  1. Download the DMG from the LM Studio website and install it.
  2. Open “Download Models” and grab the models you want.
  3. Click “Start Server” and enable the “OpenAI Compatible Server” option.

Windows

  1. Run the Windows installer with the default options.
  2. Download models from within the app, then switch to the “Server” tab.
  3. Press “Start Server” and enable auto-start if you need it on boot.

Linux

  1. Launch the AppImage or install the Debian package.
  2. Download a model, then toggle the server switch in the top-right corner.
  3. Allow inbound traffic on port 1234 if your firewall prompts.

Confirm the server is reachable:

curl http://localhost:1234/v1/models

Recommended Models

Use case Model Notes
Japanese polishing & polite tone google/gemma-3-12b (Ollama / LM Studio) Excellent Japanese fluency; a 4-bit quantization typically needs 10–12 GB VRAM.
English summaries qwen2.5:7b-instruct, Phi-3.5-mini-instruct Fast responses with concise outputs; ideal for meeting notes.
Maximum accuracy llama3.1:70b or other large instruction-tuned models Requires high-end GPU/VRAM; tune OLLAMA_NUM_PARALLEL as needed.

Make sure the model identifier matches your runtime. Ollama lists models via ollama list, while LM Studio shows the identifier in the “Local Models” panel.

Local Resource Requirements

Running Gemma-3-12B locally in 4-bit/QAT mode for proofreading tasks generally requires the following resources:

Troubleshooting

If issues persist, copy the log entry with the failing request/response and share it with the HootVoice team.