Technical Deep Dive
Afrobotica, the Earthlight Historian
Afrobotica is a live, conversational planetarium character. Before the main show, one audience member at a time steps up to a microphone and talks with her: she listens, thinks, and answers out loud — projected on the dome as a holographic avatar floating in an ISS-style cupola above the Earth, voice and visor reacting in real time.
Technically, it is a real-time AI pipeline running inside a Unity 6 application on a single workstation, driving a 4K fisheye dome projection at 60 Hz — designed so that the entire conversational loop can run with no internet connection at all.
The conversation loop
Every exchange follows the same three-stage loop, run by a small state machine (Idle → Listen → Think → Speak):
- Listen. A facilitator holds a presentation-clicker button while the visitor speaks (push-to-talk). The audio is transcribed to text on the machine by OpenAI's Whisper model running locally — the visitor's voice never leaves the building.
- Think. The transcript goes to a local large language model — Qwen2.5-7B, served by Ollama on the same workstation — wrapped in a carefully engineered character prompt. The model writes a short, in-character reply (two to three sentences by design).
- Speak. The reply is synthesized into Afrobotica's voice. The live voice is Meta's Wit.ai text-to-speech (the one cloud service in the loop); if the network hiccups, the system silently switches to Piper, a fully local neural voice, and the show continues. Lip sync, a state-reactive "visor" light language, and dome captions all track the speech in real time.
In the default Piloted mode the facilitator opens the mic for each new question; a FreeFlow mode exists where she re-listens automatically.
Three swappable modules, one event bus
The pipeline is built as three independent modules — Listener (speech-to-text), Thinker (language model), and Speaker (text-to-speech) — that never call each other directly. Each fires events onto a central event bus, and the state machine advances on those events. Each module supports several interchangeable providers:
| Module | Live provider (venue) | Alternates available |
|---|---|---|
| Listener (STT) | Whisper — offline, on-device | Meta Voice SDK, OpenAI Whisper API |
| Thinker (LLM) | Ollama → Qwen2.5-7B — offline, on-device | OpenAI ChatGPT (two integration modes) |
| Speaker (TTS) | Wit.ai (cloud) with automatic Piper offline fallback | OpenAI TTS, ElevenLabs |
Which provider is active is decided by configuration files, not code — the venue can re-point any stage (or adjust temperatures, timeouts, the active persona) with a companion settings app, without ever opening Unity or rebuilding.
Offline-first by design
A museum floor is a hostile place for cloud dependencies. The system was therefore designed to run the entire loop locally: Whisper (STT), Qwen2.5-7B via Ollama (LLM), and Piper (TTS) all execute on the show workstation's GPU and CPU. The single cloud service in the live configuration — the Wit.ai voice — exists because it best matched the character's voice and latency targets, and it is wrapped in an automatic fallback: if a request times out, the local Piper voice takes over mid-show.
This has a privacy consequence worth stating plainly: the visitor's voice is processed entirely on the local machine. The only data that travels to a cloud service during a show is Afrobotica's own generated reply text, sent to Wit.ai to be turned into audio.
Keeping a small model honest
A 7-billion-parameter local model is fast and private, but it will confidently improvise if you let it. Most of the content engineering went into not letting it:
- A versioned persona prompt defines her voice, values, and a curated, web-verified ladder of space facts — including the current status of each Artemis mission.
- Timeless framing: the prompt and a post-processing pass deliberately scrub years and dates, so answers don't silently go stale or expose the model's training cutoff.
- Response shaping: replies are clamped to two or three short sentences (about 8–10 seconds spoken), lists are stripped, and dates are removed before speech.
- A regression benchmark harness scores every prompt revision before it ships: a compliance suite (sentence length, banned phrases, date leaks) and a graded correctness suite for the Artemis fact ladder, each run multiple times against the live model. Persona v7 scored 115/120 on the correctness suite versus 51/57 for v5 — changes are only promoted when measured.
The dome
The scene is rendered to the planetarium dome in real time: a cubemap camera captures the environment, and a fisheye shader converts it into a 4096×4096 dome master at 60 Hz — all on one workstation GPU (an NVIDIA RTX 2000 Ada). The avatar can orbit between six stations around the dome, the cupola windows open during the intro, and NASA's Blue Marble imagery provides the Earth below.
Accessibility is built in: live dome captions track Afrobotica's speech sentence by sentence, rendered in B612 (the open-source typeface designed for aircraft cockpits) and toggleable mid-show.
The show, end to end
The experience runs as a closed loop a facilitator drives with a single presentation clicker (with full keyboard fallbacks):
- Dormant — title card on the dome, ambient music, avatar hidden.
- Intro (~27 s) — triggered by the clicker: lights dim, Afrobotica teleports in as a hologram, welcomes the audience, and the cupola doors open onto the Earth.
- Conversation — the Listen → Think → Speak loop, one visitor question at a time. Spoken keywords can trigger moments: "Artemis" reveals mission imagery on the dome; a birthday earns confetti.
- Outro — she says goodbye, dissolves away, and the dome fades to black; the system then resets itself to Dormant for the next audience.
Per-state watchdog timeouts and one-button recovery gestures mean a stalled provider degrades into a graceful skip, never a frozen show.
Latency as theater
Small touches make machine latency read as character rather than lag:
- A minimum "think" hold guarantees her gold-pulsing visor and a mechanical "processing" sound play out fully even when the model answers instantly — thinking is staged, not suffered.
- Streamed speech synthesis starts her talking before the full audio is rendered.
- The garbage collector is scheduled around speech — memory cleanup is deferred while she talks and executed during the fade-to-black, eliminating mid-sentence stutter.
Data and privacy
Conversations are recorded as text only — the transcribed question and her reply, timestamped. No audio is retained, no names are requested, and transcripts are not linked to any identity. The microphone is push-to-talk: it is only live while the facilitator holds the talk button. There is no camera and no face recognition of any kind.
AI disclosure
We believe meeting AI in a museum should come with the full story.
The AI at runtime
When you talk to Afrobotica, three AI systems work in sequence:
| Role | Model | Made by | Where it runs | License |
|---|---|---|---|---|
| Understands your speech | Whisper (tiny) | OpenAI | Locally, in the planetarium | MIT (open source) |
| Composes her reply | Qwen2.5-7B (via Ollama) | Alibaba Cloud (Qwen team) | Locally, in the planetarium | Apache 2.0 (open source) |
| Speaks her reply | Wit.ai TTS ("Kenyan Accent" voice) | Meta | Cloud service | Proprietary service |
| Backup voice (offline) | Piper (en_US-hfc_female) | Rhasspy/Piper project | Locally, in the planetarium | MIT engine; CC BY-NC-SA 4.0 voice corpus |
| Animates her mouth | Meta (Oculus) Lip Sync | Meta | Locally | Proprietary SDK |
During development we also evaluated Llama 3.1 (8B), Llama 3.2, and Gemma before selecting Qwen2.5-7B.
AI used in creating the experience
- AI-assisted software development. Portions of the application code were written with AI pair-programming assistance (Anthropic's Claude, via the Claude Code development tool), with human review of the result.
- One AI-generated 3D asset. The floating teddy bear — a zero-gravity indicator, a real astronaut tradition — was generated with Hunyuan3D-2, an open-source image-to-3D model from Tencent, run locally on our own hardware.
- Pre-recorded narration. The introduction and farewell narration were synthesized in advance with the same Wit.ai voice Afrobotica speaks with live.
What is NOT AI
- Afrobotica herself — the character was created by Dr. Sian Proctor; her persona, values, knowledge, and manner of speaking were written by humans as a detailed character document the AI must follow. The AI performs the character; it did not invent her.
- The poetry — "EarthLight," "Space2Inspire," and related works quoted in the experience are Dr. Proctor's own human-authored writing.
- The avatar artwork — Afrobotica's 3D character model is custom, human-made art.
- The science content in her knowledge base was researched, verified against primary sources (NASA), and written by the production team.
Where the non-AI materials came from
| Material | Source |
|---|---|
| ISS cupola environment | Licensed Unity Asset Store package |
| Earth imagery | NASA Blue Marble (public domain) |
| Cloud layer imagery | Solar System Scope textures |
| Astronaut objects | NASA 3D model library |
| Confetti effects | "Confetti FX" by Kenneth "Archanor" Foldal Moe (Unity Asset Store) |
| Caption typeface | B612 (SIL Open Font License) |
| Music | Original score by Subtractive |
Training transparency — what's known and what isn't
| Model | What the maker discloses | What is NOT public |
|---|---|---|
| Qwen2.5-7B | Up to 18 trillion tokens of "large-scale, high-quality" data | The composition and sources of that data |
| Whisper | 680,000 hours of multilingual web audio + transcripts | The specific recordings/sites used |
| Llama 3.1/3.2 (evaluated) | ~15 trillion tokens from "publicly available sources" | Sources are not itemized |
| Wit.ai TTS voice | — | How Meta's preset voices were built |
| Piper hfc_female | Fully documented: the Hi-Fi-CAPTAIN corpus, one professional performer, published by NICT under CC BY-NC-SA 4.0 | — |
| Hunyuan3D-2 | Open release; trained on "millions of 3D assets" | The composition of those assets |
No one's voice was cloned for this experience. Her live voice is a standard Wit.ai preset; her backup voice derives from a research corpus recorded by a professional performer for exactly this purpose.
The stack at a glance
| Layer | Technology |
|---|---|
| Engine | Unity 6 (URP), Windows, Direct3D 11 |
| Speech-to-text | Whisper (OpenAI, open source) — local |
| Language model | Qwen2.5-7B via Ollama — local |
| Text-to-speech | Wit.ai (Meta, cloud) → Piper fallback — local |
| Lip sync | Meta (Oculus) Lip Sync |
| Dome render | Real-time cubemap → fisheye, 4096² @ 60 Hz |
| Hardware | Single workstation, NVIDIA RTX 2000 Ada (16 GB) |
| Configuration | JSON runtime config + standalone settings app (no rebuilds) |
Credits
- Dr. Sian Proctor — creator of Afrobotica; concept, character, poetry
- Chaotic Curiosity — Don Balanzat (production, audio & voice design, show flow), Alireza Bahremand (lead research engineer), Dylan Kerr (3D & VFX)
- Subtractive — original score · Test Shot Starfish and Annu — collaborating artists
- Museum of Science, Boston — venue and production partner
- Built gratefully on open-source software: Ollama, Qwen, Whisper / whisper.cpp, Piper, B612, and the Unity community.