Afrobotica, the Earthlight Historian

Afrobotica is a live, conversational planetarium character. Before the main show, one audience member at a time steps up to a microphone and talks with her: she listens, thinks, and answers out loud — projected on the dome as a holographic avatar floating in an ISS-style cupola above the Earth, voice and visor reacting in real time.

Technically, it is a real-time AI pipeline running inside a Unity 6 application on a single workstation, driving a 4K fisheye dome projection at 60 Hz — designed so that the entire conversational loop can run with no internet connection at all.

The conversation loop

Every exchange follows the same three-stage loop, run by a small state machine (Idle → Listen → Think → Speak):

Listen. A facilitator holds a presentation-clicker button while the visitor speaks (push-to-talk). The audio is transcribed to text on the machine by OpenAI's Whisper model running locally — the visitor's voice never leaves the building.
Think. The transcript goes to a local large language model — Qwen2.5-7B, served by Ollama on the same workstation — wrapped in a carefully engineered character prompt. The model writes a short, in-character reply (two to three sentences by design).
Speak. The reply is synthesized into Afrobotica's voice. The live voice is Meta's Wit.ai text-to-speech (the one cloud service in the loop); if the network hiccups, the system silently switches to Piper, a fully local neural voice, and the show continues. Lip sync, a state-reactive "visor" light language, and dome captions all track the speech in real time.

In the default Piloted mode the facilitator opens the mic for each new question; a FreeFlow mode exists where she re-listens automatically.

Three swappable modules, one event bus

The pipeline is built as three independent modules — Listener (speech-to-text), Thinker (language model), and Speaker (text-to-speech) — that never call each other directly. Each fires events onto a central event bus, and the state machine advances on those events. Each module supports several interchangeable providers:

Module	Live provider (venue)	Alternates available
Listener (STT)	Whisper — offline, on-device	Meta Voice SDK, OpenAI Whisper API
Thinker (LLM)	Ollama → Qwen2.5-7B — offline, on-device	OpenAI ChatGPT (two integration modes)
Speaker (TTS)	Wit.ai (cloud) with automatic Piper offline fallback	OpenAI TTS, ElevenLabs

Which provider is active is decided by configuration files, not code — the venue can re-point any stage (or adjust temperatures, timeouts, the active persona) with a companion settings app, without ever opening Unity or rebuilding.

Offline-first by design

A museum floor is a hostile place for cloud dependencies. The system was therefore designed to run the entire loop locally: Whisper (STT), Qwen2.5-7B via Ollama (LLM), and Piper (TTS) all execute on the show workstation's GPU and CPU. The single cloud service in the live configuration — the Wit.ai voice — exists because it best matched the character's voice and latency targets, and it is wrapped in an automatic fallback: if a request times out, the local Piper voice takes over mid-show.

This has a privacy consequence worth stating plainly: the visitor's voice is processed entirely on the local machine. The only data that travels to a cloud service during a show is Afrobotica's own generated reply text, sent to Wit.ai to be turned into audio.

Keeping a small model honest

A 7-billion-parameter local model is fast and private, but it will confidently improvise if you let it. Most of the content engineering went into not letting it:

A versioned persona prompt defines her voice, values, and a curated, web-verified ladder of space facts — including the current status of each Artemis mission.
Timeless framing: the prompt and a post-processing pass deliberately scrub years and dates, so answers don't silently go stale or expose the model's training cutoff.
Response shaping: replies are clamped to two or three short sentences (about 8–10 seconds spoken), lists are stripped, and dates are removed before speech.
A regression benchmark harness scores every prompt revision before it ships: a compliance suite (sentence length, banned phrases, date leaks) and a graded correctness suite for the Artemis fact ladder, each run multiple times against the live model. Persona v7 scored 115/120 on the correctness suite versus 51/57 for v5 — changes are only promoted when measured.

The dome

The scene is rendered to the planetarium dome in real time: a cubemap camera captures the environment, and a fisheye shader converts it into a 4096×4096 dome master at 60 Hz — all on one workstation GPU (an NVIDIA RTX 2000 Ada). The avatar can orbit between six stations around the dome, the cupola windows open during the intro, and NASA's Blue Marble imagery provides the Earth below.

Accessibility is built in: live dome captions track Afrobotica's speech sentence by sentence, rendered in B612 (the open-source typeface designed for aircraft cockpits) and toggleable mid-show.

The show, end to end

The experience runs as a closed loop a facilitator drives with a single presentation clicker (with full keyboard fallbacks):

Dormant — title card on the dome, ambient music, avatar hidden.
Intro (~27 s) — triggered by the clicker: lights dim, Afrobotica teleports in as a hologram, welcomes the audience, and the cupola doors open onto the Earth.
Conversation — the Listen → Think → Speak loop, one visitor question at a time. Spoken keywords can trigger moments: "Artemis" reveals mission imagery on the dome; a birthday earns confetti.
Outro — she says goodbye, dissolves away, and the dome fades to black; the system then resets itself to Dormant for the next audience.

Per-state watchdog timeouts and one-button recovery gestures mean a stalled provider degrades into a graceful skip, never a frozen show.

Latency as theater

Small touches make machine latency read as character rather than lag:

A minimum "think" hold guarantees her gold-pulsing visor and a mechanical "processing" sound play out fully even when the model answers instantly — thinking is staged, not suffered.
Streamed speech synthesis starts her talking before the full audio is rendered.
The garbage collector is scheduled around speech — memory cleanup is deferred while she talks and executed during the fade-to-black, eliminating mid-sentence stutter.

Data and privacy

Conversations are recorded as text only — the transcribed question and her reply, timestamped. No audio is retained, no names are requested, and transcripts are not linked to any identity. The microphone is push-to-talk: it is only live while the facilitator holds the talk button. There is no camera and no face recognition of any kind.

AI disclosure

We believe meeting AI in a museum should come with the full story.

The AI at runtime

When you talk to Afrobotica, three AI systems work in sequence:

Role	Model	Made by	Where it runs	License
Understands your speech	Whisper (tiny)	OpenAI	Locally, in the planetarium	MIT (open source)
Composes her reply	Qwen2.5-7B (via Ollama)	Alibaba Cloud (Qwen team)	Locally, in the planetarium	Apache 2.0 (open source)
Speaks her reply	Wit.ai TTS ("Kenyan Accent" voice)	Meta	Cloud service	Proprietary service
Backup voice (offline)	Piper (en_US-hfc_female)	Rhasspy/Piper project	Locally, in the planetarium	MIT engine; CC BY-NC-SA 4.0 voice corpus
Animates her mouth	Meta (Oculus) Lip Sync	Meta	Locally	Proprietary SDK

During development we also evaluated Llama 3.1 (8B), Llama 3.2, and Gemma before selecting Qwen2.5-7B.

AI used in creating the experience

AI-assisted software development. Portions of the application code were written with AI pair-programming assistance (Anthropic's Claude, via the Claude Code development tool), with human review of the result.
One AI-generated 3D asset. The floating teddy bear — a zero-gravity indicator, a real astronaut tradition — was generated with Hunyuan3D-2, an open-source image-to-3D model from Tencent, run locally on our own hardware.
Pre-recorded narration. The introduction and farewell narration were synthesized in advance with the same Wit.ai voice Afrobotica speaks with live.

What is NOT AI

Afrobotica herself — the character was created by Dr. Sian Proctor; her persona, values, knowledge, and manner of speaking were written by humans as a detailed character document the AI must follow. The AI performs the character; it did not invent her.
The poetry — "EarthLight," "Space2Inspire," and related works quoted in the experience are Dr. Proctor's own human-authored writing.
The avatar artwork — Afrobotica's 3D character model is custom, human-made art.
The science content in her knowledge base was researched, verified against primary sources (NASA), and written by the production team.

Where the non-AI materials came from

Material	Source
ISS cupola environment	Licensed Unity Asset Store package
Earth imagery	NASA Blue Marble (public domain)
Cloud layer imagery	Solar System Scope textures
Astronaut objects	NASA 3D model library
Confetti effects	"Confetti FX" by Kenneth "Archanor" Foldal Moe (Unity Asset Store)
Caption typeface	B612 (SIL Open Font License)
Music	Original score by Subtractive

Training transparency — what's known and what isn't

Model	What the maker discloses	What is NOT public
Qwen2.5-7B	Up to 18 trillion tokens of "large-scale, high-quality" data	The composition and sources of that data
Whisper	680,000 hours of multilingual web audio + transcripts	The specific recordings/sites used
Llama 3.1/3.2 (evaluated)	~15 trillion tokens from "publicly available sources"	Sources are not itemized
Wit.ai TTS voice	—	How Meta's preset voices were built
Piper hfc_female	Fully documented: the Hi-Fi-CAPTAIN corpus, one professional performer, published by NICT under CC BY-NC-SA 4.0	—
Hunyuan3D-2	Open release; trained on "millions of 3D assets"	The composition of those assets

No one's voice was cloned for this experience. Her live voice is a standard Wit.ai preset; her backup voice derives from a research corpus recorded by a professional performer for exactly this purpose.

The stack at a glance

Layer	Technology
Engine	Unity 6 (URP), Windows, Direct3D 11
Speech-to-text	Whisper (OpenAI, open source) — local
Language model	Qwen2.5-7B via Ollama — local
Text-to-speech	Wit.ai (Meta, cloud) → Piper fallback — local
Lip sync	Meta (Oculus) Lip Sync
Dome render	Real-time cubemap → fisheye, 4096² @ 60 Hz
Hardware	Single workstation, NVIDIA RTX 2000 Ada (16 GB)
Configuration	JSON runtime config + standalone settings app (no rebuilds)

Credits

Dr. Sian Proctor — creator of Afrobotica; concept, character, poetry
Chaotic Curiosity — Don Balanzat (production, audio & voice design, show flow), Alireza Bahremand (lead research engineer), Dylan Kerr (3D & VFX)
Subtractive — original score · Test Shot Starfish and Annu — collaborating artists
Museum of Science, Boston — venue and production partner
Built gratefully on open-source software: Ollama, Qwen, Whisper / whisper.cpp, Piper, B612, and the Unity community.