Receipt · Live since April 2026

A two-week video edit cut to one phrase and thirty seconds of attention

Hyperframes · Whisper · Ollama · ElevenLabs · TikTok API · Pexels

Each short-form clip needs vertical b-roll, an opening visual hook, a synthesised audio front-load, word-timed captions, an upload draft and a caption ready on the phone for the final manual step. By hand that is roughly two weeks of work per clip. The goal was to drop a video in a folder, say one phrase, and have an upload draft waiting with the caption posted to Slack.

The pipeline runs ten steps off that one trigger. It detects and trims the source chime by cross-correlation, transcribes with Whisper, extracts the keyword locally with Ollama, adds an opening hook from one of five named patterns, generates the audio front-load through an ElevenLabs voice, and renders the branded vertical with Hyperframes. After a clean upload it prunes the render and keeps the audit trail.

That cuts a two-week manual job to about two hours of pipeline time and roughly thirty seconds of my attention. Carrying the same keyword across the audio, the on-screen hook, the caption and the hashtags gave a confirmed lift of about 1.8 times the views on a single-variable trial.