powered by
etapx

0%

hyperframes-media

Installation
Summary

Generate speech, transcribe audio with timestamps, and remove video backgrounds for transparent overlays.

  • Three CLI commands (tts, transcribe, remove-background) that each download and cache their own model on first run; no API keys required
  • Text-to-speech supports 54 multilingual voices (American, British, Spanish, French, Hindi, Italian, Japanese, Portuguese, Mandarin) with speed control; auto-detects language from voice prefix
  • Transcription produces word-level timestamps in normalized JSON; supports multiple input formats (audio, video, SRT/VTT, OpenAI responses) with configurable Whisper model sizes and explicit language selection to prevent silent translation errors
  • Background removal outputs VP9 WebM with alpha channel (or ProRes/PNG) for transparent overlays; optional --background-output flag creates a hole-cut inverse layer for compositing text or graphics between subject and background
SKILL.md

HyperFrames Media Preprocessing

Three CLI commands that produce assets for compositions: tts (speech), transcribe (timestamps), and remove-background (transparent video). Each downloads a model on first run and caches it under ~/.cache/hyperframes/. Drop the output into the project, then reference it from the composition HTML — see the hyperframes skill for the audio/video element conventions.

Text-to-Speech (tts)

Generate speech audio locally with Kokoro-82M. No API key.

npx hyperframes tts "Text here" --voice af_nova --output narration.wav
npx hyperframes tts script.txt --voice bf_emma --output narration.wav
npx hyperframes tts --list                       # all 54 voices

Voice Selection

Match voice to content. Default is af_heart.

Installs
63.9K
GitHub Stars
26.7K
First Seen
May 5, 2026

Explore more of GLSRM