← Back to Afrobotica, the Earthlight Historian

Technical Deep Dive

Afrobotica, the Earthlight Historian

Afrobotica is a live, conversational planetarium character. Before the main show, one audience member at a time steps up to a microphone and talks with her: she listens, thinks, and answers out loud — projected on the dome as a holographic avatar floating in an ISS-style cupola above the Earth, voice and visor reacting in real time.

Technically, it is a real-time AI pipeline running inside a Unity 6 application on a single workstation, driving a 4K fisheye dome projection at 60 Hz — designed so that the entire conversational loop can run with no internet connection at all.

The conversation loop

Every exchange follows the same three-stage loop, run by a small state machine (Idle → Listen → Think → Speak):

  1. Listen. A facilitator holds a presentation-clicker button while the visitor speaks (push-to-talk). The audio is transcribed to text on the machine by OpenAI's Whisper model running locally — the visitor's voice never leaves the building.
  2. Think. The transcript goes to a local large language model — Qwen2.5-7B, served by Ollama on the same workstation — wrapped in a carefully engineered character prompt. The model writes a short, in-character reply (two to three sentences by design).
  3. Speak. The reply is synthesized into Afrobotica's voice. The live voice is Meta's Wit.ai text-to-speech (the one cloud service in the loop); if the network hiccups, the system silently switches to Piper, a fully local neural voice, and the show continues. Lip sync, a state-reactive "visor" light language, and dome captions all track the speech in real time.

In the default Piloted mode the facilitator opens the mic for each new question; a FreeFlow mode exists where she re-listens automatically.

Three swappable modules, one event bus

The pipeline is built as three independent modules — Listener (speech-to-text), Thinker (language model), and Speaker (text-to-speech) — that never call each other directly. Each fires events onto a central event bus, and the state machine advances on those events. Each module supports several interchangeable providers:

ModuleLive provider (venue)Alternates available
Listener (STT)Whisper — offline, on-deviceMeta Voice SDK, OpenAI Whisper API
Thinker (LLM)Ollama → Qwen2.5-7B — offline, on-deviceOpenAI ChatGPT (two integration modes)
Speaker (TTS)Wit.ai (cloud) with automatic Piper offline fallbackOpenAI TTS, ElevenLabs

Which provider is active is decided by configuration files, not code — the venue can re-point any stage (or adjust temperatures, timeouts, the active persona) with a companion settings app, without ever opening Unity or rebuilding.

Offline-first by design

A museum floor is a hostile place for cloud dependencies. The system was therefore designed to run the entire loop locally: Whisper (STT), Qwen2.5-7B via Ollama (LLM), and Piper (TTS) all execute on the show workstation's GPU and CPU. The single cloud service in the live configuration — the Wit.ai voice — exists because it best matched the character's voice and latency targets, and it is wrapped in an automatic fallback: if a request times out, the local Piper voice takes over mid-show.

This has a privacy consequence worth stating plainly: the visitor's voice is processed entirely on the local machine. The only data that travels to a cloud service during a show is Afrobotica's own generated reply text, sent to Wit.ai to be turned into audio.

Keeping a small model honest

A 7-billion-parameter local model is fast and private, but it will confidently improvise if you let it. Most of the content engineering went into not letting it:

  • A versioned persona prompt defines her voice, values, and a curated, web-verified ladder of space facts — including the current status of each Artemis mission.
  • Timeless framing: the prompt and a post-processing pass deliberately scrub years and dates, so answers don't silently go stale or expose the model's training cutoff.
  • Response shaping: replies are clamped to two or three short sentences (about 8–10 seconds spoken), lists are stripped, and dates are removed before speech.
  • A regression benchmark harness scores every prompt revision before it ships: a compliance suite (sentence length, banned phrases, date leaks) and a graded correctness suite for the Artemis fact ladder, each run multiple times against the live model. Persona v7 scored 115/120 on the correctness suite versus 51/57 for v5 — changes are only promoted when measured.

The dome

The scene is rendered to the planetarium dome in real time: a cubemap camera captures the environment, and a fisheye shader converts it into a 4096×4096 dome master at 60 Hz — all on one workstation GPU (an NVIDIA RTX 2000 Ada). The avatar can orbit between six stations around the dome, the cupola windows open during the intro, and NASA's Blue Marble imagery provides the Earth below.

Accessibility is built in: live dome captions track Afrobotica's speech sentence by sentence, rendered in B612 (the open-source typeface designed for aircraft cockpits) and toggleable mid-show.

The show, end to end

The experience runs as a closed loop a facilitator drives with a single presentation clicker (with full keyboard fallbacks):

  1. Dormant — title card on the dome, ambient music, avatar hidden.
  2. Intro (~27 s) — triggered by the clicker: lights dim, Afrobotica teleports in as a hologram, welcomes the audience, and the cupola doors open onto the Earth.
  3. Conversation — the Listen → Think → Speak loop, one visitor question at a time. Spoken keywords can trigger moments: "Artemis" reveals mission imagery on the dome; a birthday earns confetti.
  4. Outro — she says goodbye, dissolves away, and the dome fades to black; the system then resets itself to Dormant for the next audience.

Per-state watchdog timeouts and one-button recovery gestures mean a stalled provider degrades into a graceful skip, never a frozen show.

Latency as theater

Small touches make machine latency read as character rather than lag:

  • A minimum "think" hold guarantees her gold-pulsing visor and a mechanical "processing" sound play out fully even when the model answers instantly — thinking is staged, not suffered.
  • Streamed speech synthesis starts her talking before the full audio is rendered.
  • The garbage collector is scheduled around speech — memory cleanup is deferred while she talks and executed during the fade-to-black, eliminating mid-sentence stutter.

Data and privacy

Conversations are recorded as text only — the transcribed question and her reply, timestamped. No audio is retained, no names are requested, and transcripts are not linked to any identity. The microphone is push-to-talk: it is only live while the facilitator holds the talk button. There is no camera and no face recognition of any kind.

AI disclosure

We believe meeting AI in a museum should come with the full story.

The AI at runtime

When you talk to Afrobotica, three AI systems work in sequence:

RoleModelMade byWhere it runsLicense
Understands your speechWhisper (tiny)OpenAILocally, in the planetariumMIT (open source)
Composes her replyQwen2.5-7B (via Ollama)Alibaba Cloud (Qwen team)Locally, in the planetariumApache 2.0 (open source)
Speaks her replyWit.ai TTS ("Kenyan Accent" voice)MetaCloud serviceProprietary service
Backup voice (offline)Piper (en_US-hfc_female)Rhasspy/Piper projectLocally, in the planetariumMIT engine; CC BY-NC-SA 4.0 voice corpus
Animates her mouthMeta (Oculus) Lip SyncMetaLocallyProprietary SDK

During development we also evaluated Llama 3.1 (8B), Llama 3.2, and Gemma before selecting Qwen2.5-7B.

AI used in creating the experience

  • AI-assisted software development. Portions of the application code were written with AI pair-programming assistance (Anthropic's Claude, via the Claude Code development tool), with human review of the result.
  • One AI-generated 3D asset. The floating teddy bear — a zero-gravity indicator, a real astronaut tradition — was generated with Hunyuan3D-2, an open-source image-to-3D model from Tencent, run locally on our own hardware.
  • Pre-recorded narration. The introduction and farewell narration were synthesized in advance with the same Wit.ai voice Afrobotica speaks with live.

What is NOT AI

  • Afrobotica herself — the character was created by Dr. Sian Proctor; her persona, values, knowledge, and manner of speaking were written by humans as a detailed character document the AI must follow. The AI performs the character; it did not invent her.
  • The poetry — "EarthLight," "Space2Inspire," and related works quoted in the experience are Dr. Proctor's own human-authored writing.
  • The avatar artwork — Afrobotica's 3D character model is custom, human-made art.
  • The science content in her knowledge base was researched, verified against primary sources (NASA), and written by the production team.

Where the non-AI materials came from

MaterialSource
ISS cupola environmentLicensed Unity Asset Store package
Earth imageryNASA Blue Marble (public domain)
Cloud layer imagerySolar System Scope textures
Astronaut objectsNASA 3D model library
Confetti effects"Confetti FX" by Kenneth "Archanor" Foldal Moe (Unity Asset Store)
Caption typefaceB612 (SIL Open Font License)
MusicOriginal score by Subtractive

Training transparency — what's known and what isn't

ModelWhat the maker disclosesWhat is NOT public
Qwen2.5-7BUp to 18 trillion tokens of "large-scale, high-quality" dataThe composition and sources of that data
Whisper680,000 hours of multilingual web audio + transcriptsThe specific recordings/sites used
Llama 3.1/3.2 (evaluated)~15 trillion tokens from "publicly available sources"Sources are not itemized
Wit.ai TTS voiceHow Meta's preset voices were built
Piper hfc_femaleFully documented: the Hi-Fi-CAPTAIN corpus, one professional performer, published by NICT under CC BY-NC-SA 4.0
Hunyuan3D-2Open release; trained on "millions of 3D assets"The composition of those assets

No one's voice was cloned for this experience. Her live voice is a standard Wit.ai preset; her backup voice derives from a research corpus recorded by a professional performer for exactly this purpose.

The stack at a glance

LayerTechnology
EngineUnity 6 (URP), Windows, Direct3D 11
Speech-to-textWhisper (OpenAI, open source) — local
Language modelQwen2.5-7B via Ollama — local
Text-to-speechWit.ai (Meta, cloud) → Piper fallback — local
Lip syncMeta (Oculus) Lip Sync
Dome renderReal-time cubemap → fisheye, 4096² @ 60 Hz
HardwareSingle workstation, NVIDIA RTX 2000 Ada (16 GB)
ConfigurationJSON runtime config + standalone settings app (no rebuilds)

Credits

  • Dr. Sian Proctor — creator of Afrobotica; concept, character, poetry
  • Chaotic Curiosity — Don Balanzat (production, audio & voice design, show flow), Alireza Bahremand (lead research engineer), Dylan Kerr (3D & VFX)
  • Subtractive — original score · Test Shot Starfish and Annu — collaborating artists
  • Museum of Science, Boston — venue and production partner
  • Built gratefully on open-source software: Ollama, Qwen, Whisper / whisper.cpp, Piper, B612, and the Unity community.