Project Overview
I burned my dinner once. Fully ruined it. I got distracted, missed a timer, and went to bed hungry.
So I built OttoCook, a conversational cooking assistant that lives in the terminal, walks me through each step, tracks every timer, and keeps talking until I acknowledge alerts.
The first dish I cooked with it was Chicken Alfredo. It worked. A bit thick, that one was on me, not Otto.
This is a working prototype, not a polished product. It is functional, opinionated, and built to solve a real kitchen problem.
What It Does
- Step-by-step guidance. Visual cues, temperatures, parallel hints, and timing. Tells you what's coming next so you can prep ahead.
- Voice output (TTS). Azure-powered speech so you don't have to stare at your screen. Audio cached to disk.
- Voice input (STT). Local Whisper model, no cloud needed. Say "Hey Chef" and start talking.
- AI recipe modification. Missing an ingredient? Tell it. It'll adjust, scale, and warn you if the change is going to ruin your dish.
- Smart timers. Background timers with escalating notifications. They stay on hold until you say you're ready, and they won't stop yelling until you acknowledge them.
- Ask questions mid-cook. The AI has full context of your recipe, current step, and timers. Straight answers, no blog posts.
- Natural language input. Type however you want. Keyword parser handles the basics, GPT picks up the rest.
- Session management. Pause, resume, skip, and check progress. Timers pause with you.
Key Technical Features
AI Agent Architecture
- A custom agent loop with tool-calling. The AI can modify recipes, adjust timers, and reason about substitutions with full session context.
- Structured tool definitions with JSON schema validation for each action the agent can take.
- Conversation history maintained per session so the AI remembers what you asked three steps ago.
Recipe Engine
- URL scraper that pulls structured recipe data from sites like AllRecipes.
- Custom parser that extracts ingredients, quantities, temperatures, and timing from free-text instructions.
- Step dependency analysis that knows which steps can run in parallel and which block on timers.
Terminal UI (Bubble Tea)
- Full TUI built on Bubble Tea with a timer bar, color-coded output, and a clean command prompt.
- Timer management runs in background goroutines with escalating notification states.
- Session state machine for pause, resume, skip, and status checks.
Voice Pipeline
- TTS via Azure Cognitive Services Speech API with local disk caching (don't re-synthesize the same sentence twice).
- STT via local whisper.cpp, no audio leaves your machine.
- Wake word detection ("Hey Chef") with configurable silence thresholds.
- Audio playback through
ebitengine/oto, pure Go with no CGO needed for output.
Architecture
- Clean separation:
internal/domain, internal/engine, internal/conversation, internal/gpt, internal/speech, internal/timer, internal/display, internal/recipe, internal/storage.
- Every external dependency (TTS provider, AI backend, recipe source) is behind an interface, so providers can be swapped without touching business logic.
- Runs on Windows, macOS, and Linux. Voice input on Windows needs MinGW-w64 + PortAudio (CGO), everything else is pure Go.