Giving voice to those who can't speak, one tap at a time.
AI Partner Catalyst · Google Cloud × ElevenLabs Hackathon

Role — UX/UI Designer & Product Lead
Team — 2 (Designer + Developer)
Platform — Mobile / Android
Tools — Figma · Gemini 2.5 Flash · ElevenLabs · React
A delivery partner with a speech disability received a customer call. Unable to speak, he had no choice but to hand his phone to a stranger. This wasn't a rare edge case, it is a daily reality. And no existing tool was built for it.
VocalAid is an AI-powered assistive app that listens to callers, detects intent, and lets users reply with a single tap — spoken aloud via AI voice. No typing. No speaking.
70M+ people worldwide live with a speech disorder that impacts daily communication.


3 Taps. That's all it takes to complete a Full phone conversation with VocalAid.
Key Insights
Existing tools fail in real-time.
They're built for asynchronous communication — not live, time-pressured phone calls.
Core Constraints we Designed for
Secondary research drew from real incidents, accessibility forums, and delivery worker communities to validate the problem was widespread — not isolated.
Design Principles
Tap Only
All responses via large, tappable buttons. No typing, no voice input required from user.
Predictable Loop
Fixed, repeatable cycle builds muscle memory. No hidden states or ambiguous transitions.
Instant Response
Context-aware options surface immediately. Only intent-relevant choices shown.
Zero Cognitive Load
Fewer choices, larger targets, WCAG AAA contrast on critical actions only.
01
Screen State
Listening
App detects incoming speech automatically. No manual trigger needed. Status always visible.

02
Screen State
Intent Detection
Caller's speech converts to text. Gemini AI identifies what they're asking and surfaces options.

03
Screen State
Respond
2–3 large tap responses appear. One tap sends the reply as spoken audio via ElevenLabs.

Why it works?
-
No typing. No speaking. No confusion.
-
Predictable loop anyone can learn in one call.
-
AI voice output sounds natural — not robotic.
-
System states are always transparent: Listening / Thinking / Speaking.
Under the hood.
Gemini 2.5 Flash
Interprets caller intent in real time.
ElevenLabs
Natural AI voice speaks the selected reply
React + Google Cloud
​Mobile-first frontend, cloud infrastructure
WCAG AAA contrast on every critical action. 48px touch targets, meeting Android's own accessibility guidelines.
Hi-Fi Designs




What I'd Do Differently
Real user testing.
Even one informal session with someone who has a speech disability would have sharpened the response options and interaction flow significantly.
What's Next
Built to scale.
The interaction model expands into other professions, other languages, and eventually into Android's native accessibility framework as a system-level layer.
